DeepMind and OpenAI Achieve Victory in the International Mathematics Olympiad

AIs are improving at solving mathematics challenges

Andresr/ Getty Images

AI models developed by Google DeepMind and OpenAI have achieved exceptional performance at the International Mathematics Olympiad (IMO).

While companies herald this as a significant advancement for AIs that might one day tackle complex scientific or mathematical challenges, mathematicians urge caution, as the specifics of the models and their methodologies remain confidential.

The IMO is one of the most respected contests for young mathematicians, often viewed by AI researchers as a critical test of mathematical reasoning, an area where AI traditionally struggles.

Following last year’s competition in Bath, UK, Google investigated how its AI systems, Alpha Proof and Alpha Jometry, achieved silver-level performance, though their submissions were not evaluated by the official competition judges.

Various companies, including Google, Huawei, and TikTok’s parent company, approached the IMO organizers requesting formal evaluation of their AI models during this year’s contest, as stated by Gregor Drinner, the President of IMO. The IMO consented, stipulating that results should be revealed only after the full closing ceremony on July 28th.

OpenAI also expressed interest in participating in the competition but did not respond or register upon being informed of the official procedures, according to Dolinar.

On July 19th, OpenAI announced the development of a new AI that achieved a gold medal score alongside three former IMO medalists, separately from the official competition. OpenAI stated the AI correctly answered five out of six questions within the same 4.5-hour time limit as human competitors.

Two days later, Google DeepMind revealed that its AI system, Gemini Deep Think, had also achieved gold-level performance within the same constraints. Dolinar confirmed that this result was validated by the official IMO judges.

Unlike Google’s Alpha Proof and Alpha Jometry, which were designed for competition, Gemini Deep Think was specifically crafted to tackle questions posed in a programming language used by both Google and OpenAI.

Utilizing LEAN, the AI was capable of quickly verifying correctness, although the output is challenging for non-experts to interpret. Thang Luong from Google indicated that a natural language approach can yield more comprehensible results while remaining applicable to broadly useful AI frameworks.

Luong noted that advancements in reinforcement learning—a training technique designed to guide AI through success and failure—have enabled large language models to validate solutions efficiently, a method essential to Google’s earlier achievements with gameplay AIs, such as AlphaZero.

Google’s model employs a technique known as parallel thinking, considering multiple solutions simultaneously. The training data comprises mathematical problems particularly relevant to the IMO.

OpenAI has disclosed few specifics regarding their system, only mentioning that it incorporates augmented learning and “experimental research methods.”

“While progress appears promising, it lacks rigorous scientific validation, making it difficult to assess at this point,” remarked Terence Tao from UCLA. “We anticipate that the participating companies will publish papers featuring more comprehensive data, allowing others to access the model and replicate its findings. However, for now, we must rely on the companies’ claims regarding their results.”

Geordy Williamson from the University of Sydney shared this sentiment, stating, “It’s remarkable to see advancements in this area, yet it’s frustrating how little in-depth information is available from inside these companies.”

Natural language systems might be beneficial for individuals without a mathematical background, but they also risk presenting complications if models produce lengthy proofs that are hard to verify, warned Joseph Myers, a co-organizer of this year’s IMO. “If AIs generate solutions to significant unsolved questions that seem plausible yet contain subtle, critical errors, we must be cautious before putting confidence in lengthy AI outputs.”

The companies plan to initially provide these systems for testing by mathematicians in the forthcoming months before making broader public releases. The models claim they could potentially offer rapid solutions for challenging problems in scientific research, as stated by June Hyuk Jeong from Google, who contributed to Gemini Deep Think. “There are numerous unresolved challenges within reach,” he noted.

Topics:

Source: www.newscientist.com

DeepMind AI achieves second place at International Mathematical Olympiad

DeepMind’s AlphaProof AI can tackle a wide range of math problems

Google DeepMind

Google DeepMind’s AI won a silver medal at this year’s International Mathematical Olympiad (IMO), the first time an AI has made it onto the podium.

The IMO is considered the world’s most prestigious competition for young mathematicians, and answering the exam questions correctly requires mathematical ability that AI systems typically lack.

In January, Google DeepMind showed off AlphaGeometry, an AI system that could answer IMO geometry problems as well as humans could, but it wasn’t in a real competition and couldn’t answer questions in other areas of math, such as number theory, algebra, or combinatorics, that are needed to win an IMO medal.

Google DeepMind has now released a new AI called AlphaProof that can solve a wider range of math problems, and an improved version of AlphaGeometry that can solve more geometry problems.

When the team tested both systems together on this year’s IMO problems, they got four out of six questions right, earning them 28 points out of 42 possible points – good enough for a silver medal, just one point short of this year’s gold medal threshold.

At the competition held in Bath, England, last week, 58 athletes won gold medals and 123 won silver medals.

“We all know that AI will eventually be better than humans at solving most mathematical problems, but the rate at which AI is improving is astounding,” he said. Gregor Doliner“It’s incredible to have missed out on gold at IMO 2024 by just one point just a few days ago,” said IMO Chairman Jonathan McClellan.

At a press conference, Timothy Gowers A University of Cambridge researcher who helped grade AlphaProof’s solutions said the AI’s performance was surprising, and that it seemed to have found the “magic keys” to solve the problems in a way that was similar to humans. “We thought that these magic keys would probably be a bit beyond the capabilities of an AI, so we were quite surprised in one or two cases where the program actually found them,” Gowers said.

AlphaProof works similarly to Google DeepMind’s previous AIs that can beat the best humans at chess and Go. All of these AIs rely on a trial-and-error approach called reinforcement learning, in which the system finds its own way of solving a problem by trying it again and again. However, this method requires a large number of problems written in a language that the AI can understand and verify, and IMO most such problems are written in English.

To avoid this, Thomas Hubert Using Google’s Gemini AI, a language model like the one that powers ChatGPT, the DeepMind researchers and his colleagues transformed these problems into a programming language called Lean, allowing the AI to learn how to solve them.

“You’ll start by solving maybe the simplest problems, and then you’ll be able to learn from solving those simple problems and then tackle the harder problems,” Hubert said at the press conference, and the answers will be generated in a lean language so they can be immediately verified for correctness.

Despite AlphaProof’s impressive performance, it was slow, taking three days to find a solution. That’s compared to 4.5 hours for the contestants, but AlphaProof failed to solve either of the two problems. The problems were about combinatorics, the study of counting and arranging numbers. “We’re still working on figuring out why that is, and if we can do that, that will help us improve the system,” AlphaProof says. Alex Davis At Google DeepMind.

It’s also not clear how AlphaProof arrives at its answers, or whether it uses the same mathematical intuition as humans, Gowers said. But he said Lean’s ability to translate proofs into English makes it easy to check whether they’re correct.

“The results are impressive and a significant milestone,” Jordy Williamson “There have been many attempts to apply reinforcement learning based on formal proofs, but none have been very successful,” say researchers at the University of Sydney in Australia.

Systems like AlphaProof may help working mathematicians develop proofs, but they obviously don’t help them identify the problems they need to solve and tackle, which takes up the majority of researchers’ time, he says. He Yanghui At the London Mathematical Institute.

Hubert said the team hopes that by reducing false responses, AlphaProof can help improve Google’s large-scale language models like Gemini.

Trading firm XTX Markets is offering a $5 million prize to any AI that can win a gold medal at the IMO (dubbed the AI Mathematics Olympiad), but AlphaProof is ineligible because it is not publicly available. “We hope that DeepMind’s progress will encourage more teams to apply for the AIMO prize, and of course we would welcome a public submission from DeepMind itself,” said Alex Gerko of XTX Markets.

topic:

Source: www.newscientist.com

DeepMind’s AI successfully tackles challenging geometry problems for Math Olympiad

Geometric problems involve proving facts about angles and lines in complex shapes

Google Deep Mind

Google DeepMind's AI can solve some International Mathematics Olympiad (IMO) problems in geometry almost as well as the best human contestants.

“AlphaGeometry's results are surprising and breathtaking,” says IMO Chairman Gregor Driner. “It looks like AI will be winning his IMO gold medal much sooner than was thought a few months ago.”

IMO is one of the most difficult math competitions in the world for middle school students. Answering questions correctly requires mathematical creativity, something AI systems have long struggled with. For example, GPT-4, who has shown remarkable reasoning ability in other areas, gets his 0% score on IMO geometry problems, and even a specialized AI can answer them just as well as an average contestant. I'm having a hard time.

This is partly due to the difficulty of the problem, but also due to the lack of training data. This contest has been held annually since 1959, and each round consists of only six questions. However, some of the most successful AI systems require millions or even billions of data points. In particular, geometry problems, which account for one or two out of six questions and require proving facts about angles or lines in complex shapes, are particularly difficult to convert into a computer-friendly format.

Thanh Luong Google's DeepMind and his colleagues got around this problem by creating a tool that can generate hundreds of millions of machine-readable geometric proofs. Using this data he trained an AI called AlphaGeometry and when he tested it on 30 of his IMO geometry questions, the IMO gold medalist's estimated score based on his score in the contest was 25.9, whereas the AI answered 25 of them correctly.

“our [current] AI systems still struggle with capabilities such as deep reasoning. There you have to plan many steps in advance and understand the big picture. That's why mathematics is such an important benchmark and test set in our explorations. to artificial general intelligence,” Luong said at a press conference.

AlphaGeometry is made up of two parts, which Luong likens to different thinking systems in the brain. One system is fast and intuitive, the other is slower and more analytical. The first intuitive part is a language model called GPT-f, similar to the technology behind ChatGPT. It is trained on millions of generated proofs and suggests which theorems and arguments to try next for your problem. Once the next step is proposed, a slower but more careful “symbolic reasoning” engine uses logical and mathematical rules to fully construct the argument proposed by GPT-f. The two systems then work together and switch between each other until the problem is resolved.

While this method has been very successful in solving IMO geometry problems, Luong says the answers it constructs tend to be longer and less “pretty” than human proofs. However, it can also find things that humans overlook. For example, a better and more general solution was discovered for the question from his IMO in 2004 than the one listed in the official answer.

I think it's great that you can solve IMO geometry problems in this way. Yang Hui He However, IMO problems must be solvable using theorems taught at undergraduate level and below, so this system inherently limits the mathematics that can be used. Expanding the amount of mathematical knowledge that AlphaGeometry can access could improve the system and even help make new mathematical discoveries, he says.

It's also interesting to see how AlphaGeometry deals with situations where you don't know what you need to prove, since mathematical insight often comes from exploring theorems that have no fixed proof. Yes, he says. “If I don't know what an endpoint is, can I find it in all sets?” [mathematical] Are there any new and interesting theorems? ”

Last year, algorithmic trading firm XTX Markets Total prize money: $10 million For AI math models, the first publicly shared AI model to earn an IMO gold medal will receive a $5 million grand prize, with small progress awards for major milestones.

“Solving the IMO geometry problem is one of the planned advancement awards supported by the $10 million AIMO Challenge Fund,” said Alex Gerko of XTX Markets. “Even before we announce all the details of this Progress Award, we are excited to see the progress we are making towards this goal, including making our models and data openly available and , which involves solving real geometry problems during a live IMO contest.”

DeepMind declined to say whether it plans to use AlphaGeometry in live IMO contests or extend the system to solve other IMO problems that are not based on geometry. However, DeepMind previously entered a public protein folding prediction competition to test the AlphaFold system.

topic:

Source: www.newscientist.com