Google DeepMind's AI can solve some International Mathematics Olympiad (IMO) problems in geometry almost as well as the best human contestants.
“AlphaGeometry's results are surprising and breathtaking,” says IMO Chairman Gregor Driner. “It looks like AI will be winning his IMO gold medal much sooner than was thought a few months ago.”
IMO is one of the most difficult math competitions in the world for middle school students. Answering questions correctly requires mathematical creativity, something AI systems have long struggled with. For example, GPT-4, who has shown remarkable reasoning ability in other areas, gets his 0% score on IMO geometry problems, and even a specialized AI can answer them just as well as an average contestant. I'm having a hard time.
This is partly due to the difficulty of the problem, but also due to the lack of training data. This contest has been held annually since 1959, and each round consists of only six questions. However, some of the most successful AI systems require millions or even billions of data points. In particular, geometry problems, which account for one or two out of six questions and require proving facts about angles or lines in complex shapes, are particularly difficult to convert into a computer-friendly format.
Thanh Luong Google's DeepMind and his colleagues got around this problem by creating a tool that can generate hundreds of millions of machine-readable geometric proofs. Using this data he trained an AI called AlphaGeometry and when he tested it on 30 of his IMO geometry questions, the IMO gold medalist's estimated score based on his score in the contest was 25.9, whereas the AI answered 25 of them correctly.
“our [current] AI systems still struggle with capabilities such as deep reasoning. There you have to plan many steps in advance and understand the big picture. That's why mathematics is such an important benchmark and test set in our explorations. to artificial general intelligence,” Luong said at a press conference.
AlphaGeometry is made up of two parts, which Luong likens to different thinking systems in the brain. One system is fast and intuitive, the other is slower and more analytical. The first intuitive part is a language model called GPT-f, similar to the technology behind ChatGPT. It is trained on millions of generated proofs and suggests which theorems and arguments to try next for your problem. Once the next step is proposed, a slower but more careful “symbolic reasoning” engine uses logical and mathematical rules to fully construct the argument proposed by GPT-f. The two systems then work together and switch between each other until the problem is resolved.
While this method has been very successful in solving IMO geometry problems, Luong says the answers it constructs tend to be longer and less “pretty” than human proofs. However, it can also find things that humans overlook. For example, a better and more general solution was discovered for the question from his IMO in 2004 than the one listed in the official answer.
I think it's great that you can solve IMO geometry problems in this way. Yang Hui He However, IMO problems must be solvable using theorems taught at undergraduate level and below, so this system inherently limits the mathematics that can be used. Expanding the amount of mathematical knowledge that AlphaGeometry can access could improve the system and even help make new mathematical discoveries, he says.
It's also interesting to see how AlphaGeometry deals with situations where you don't know what you need to prove, since mathematical insight often comes from exploring theorems that have no fixed proof. Yes, he says. “If I don't know what an endpoint is, can I find it in all sets?” [mathematical] Are there any new and interesting theorems? ”
Last year, algorithmic trading firm XTX Markets Total prize money: $10 million For AI math models, the first publicly shared AI model to earn an IMO gold medal will receive a $5 million grand prize, with small progress awards for major milestones.
“Solving the IMO geometry problem is one of the planned advancement awards supported by the $10 million AIMO Challenge Fund,” said Alex Gerko of XTX Markets. “Even before we announce all the details of this Progress Award, we are excited to see the progress we are making towards this goal, including making our models and data openly available and , which involves solving real geometry problems during a live IMO contest.”
DeepMind declined to say whether it plans to use AlphaGeometry in live IMO contests or extend the system to solve other IMO problems that are not based on geometry. However, DeepMind previously entered a public protein folding prediction competition to test the AlphaFold system.
topic:
Source: www.newscientist.com