Google DeepMind’s AI won a silver medal at this year’s International Mathematical Olympiad (IMO), the first time an AI has made it onto the podium.
The IMO is considered the world’s most prestigious competition for young mathematicians, and answering the exam questions correctly requires mathematical ability that AI systems typically lack.
In January, Google DeepMind showed off AlphaGeometry, an AI system that could answer IMO geometry problems as well as humans could, but it wasn’t in a real competition and couldn’t answer questions in other areas of math, such as number theory, algebra, or combinatorics, that are needed to win an IMO medal.
Google DeepMind has now released a new AI called AlphaProof that can solve a wider range of math problems, and an improved version of AlphaGeometry that can solve more geometry problems.
When the team tested both systems together on this year’s IMO problems, they got four out of six questions right, earning them 28 points out of 42 possible points – good enough for a silver medal, just one point short of this year’s gold medal threshold.
At the competition held in Bath, England, last week, 58 athletes won gold medals and 123 won silver medals.
“We all know that AI will eventually be better than humans at solving most mathematical problems, but the rate at which AI is improving is astounding,” he said. Gregor Doliner“It’s incredible to have missed out on gold at IMO 2024 by just one point just a few days ago,” said IMO Chairman Jonathan McClellan.
At a press conference, Timothy Gowers A University of Cambridge researcher who helped grade AlphaProof’s solutions said the AI’s performance was surprising, and that it seemed to have found the “magic keys” to solve the problems in a way that was similar to humans. “We thought that these magic keys would probably be a bit beyond the capabilities of an AI, so we were quite surprised in one or two cases where the program actually found them,” Gowers said.
AlphaProof works similarly to Google DeepMind’s previous AIs that can beat the best humans at chess and Go. All of these AIs rely on a trial-and-error approach called reinforcement learning, in which the system finds its own way of solving a problem by trying it again and again. However, this method requires a large number of problems written in a language that the AI can understand and verify, and IMO most such problems are written in English.
To avoid this, Thomas Hubert Using Google’s Gemini AI, a language model like the one that powers ChatGPT, the DeepMind researchers and his colleagues transformed these problems into a programming language called Lean, allowing the AI to learn how to solve them.
“You’ll start by solving maybe the simplest problems, and then you’ll be able to learn from solving those simple problems and then tackle the harder problems,” Hubert said at the press conference, and the answers will be generated in a lean language so they can be immediately verified for correctness.
Despite AlphaProof’s impressive performance, it was slow, taking three days to find a solution. That’s compared to 4.5 hours for the contestants, but AlphaProof failed to solve either of the two problems. The problems were about combinatorics, the study of counting and arranging numbers. “We’re still working on figuring out why that is, and if we can do that, that will help us improve the system,” AlphaProof says. Alex Davis At Google DeepMind.
It’s also not clear how AlphaProof arrives at its answers, or whether it uses the same mathematical intuition as humans, Gowers said. But he said Lean’s ability to translate proofs into English makes it easy to check whether they’re correct.
“The results are impressive and a significant milestone,” Jordy Williamson “There have been many attempts to apply reinforcement learning based on formal proofs, but none have been very successful,” say researchers at the University of Sydney in Australia.
Systems like AlphaProof may help working mathematicians develop proofs, but they obviously don’t help them identify the problems they need to solve and tackle, which takes up the majority of researchers’ time, he says. He Yanghui At the London Mathematical Institute.
Hubert said the team hopes that by reducing false responses, AlphaProof can help improve Google’s large-scale language models like Gemini.
Trading firm XTX Markets is offering a $5 million prize to any AI that can win a gold medal at the IMO (dubbed the AI Mathematics Olympiad), but AlphaProof is ineligible because it is not publicly available. “We hope that DeepMind’s progress will encourage more teams to apply for the AIMO prize, and of course we would welcome a public submission from DeepMind itself,” said Alex Gerko of XTX Markets.
topic:
Source: www.newscientist.com