Google DeepMind claims to have made the first ever scientific discovery in an AI chatbot by building a fact checker that filters out useless output and leaves behind only reliable solutions to mathematical or computing problems. Masu.
DeepMind’s previous achievements, such as using AI to predict the weather or the shape of proteins, rely on models created specifically for the task at hand and trained on accurate, specific data. I did. Large-scale language models (LLMs), such as GPT-4 and Google’s Gemini, are instead trained on vast amounts of disparate data, yielding a wide range of capabilities. However, this approach is also susceptible to “hallucinations,” which refers to researchers producing erroneous output.
Gemini, released earlier this month, has already shown hallucination tendencies and even gained simple facts such as: This year’s Oscar winners were wrong. Google’s previous AI-powered search engine even had errors in its self-launched advertising materials.
One common fix for this phenomenon is to add a layer on top of the AI that validates the accuracy of the output before passing it on to the user. However, given the wide range of topics that chatbots may be asked about, creating a comprehensive safety net is a very difficult task.
Al-Hussein Fawzi Google’s DeepMind and his colleagues created a general-purpose LLM called FunSearch based on Google’s PaLM2 model with a fact-checking layer they call an “evaluator.” Although this model is constrained by providing computer code that solves problems in mathematics and computer science, DeepMind says this work is important because these new ideas and solutions are inherently quickly verifiable. is a much more manageable task.
The underlying AI may still hallucinate and provide inaccurate or misleading results, but the evaluator filters out erroneous outputs, leaving only reliable and potentially useful concepts. .
“We believe that probably 90% of what LLM outputs is useless,” Fawzi says. “If you have a potential solution, it’s very easy to tell whether this is actually the correct solution and evaluate that solution, but it’s very difficult to actually come up with a solution. So , mathematics and computer science are a particularly good fit.”
DeepMind claims the model can generate new scientific knowledge and ideas, something no LLM has ever done before.
First, FunSearch is given a problem and a very basic solution in the source code as input, and then generates a database of new solutions that are checked for accuracy by evaluators. The best reliable solutions are returned as input to the LLM with prompts to improve the idea. According to DeepMind, the system generates millions of potential solutions and eventually converges on an efficient result, sometimes even exceeding the best known solution.
For mathematical problems, a model creates a computer program that can find a solution, rather than trying to solve the problem directly.
Fawzi and his colleagues challenged FunSearch to find a solution to the cap set problem. This involves determining the pattern of points where three points do not form a straight line. As the number of points increases, the computational complexity of the problem increases rapidly. The AI discovered a solution consisting of 512 points in eight dimensions, larger than previously known.
When tackling the problem of bin packing, where the goal is to efficiently place objects of different sizes into containers, FunSearch discovered a solution that outperformed commonly used algorithms. The result is a result that can be immediately applied to transportation and logistics companies. DeepMind says FunSearch could lead to improvements in more math and computing problems.
mark lee The next breakthrough in AI will not be in scaling up LLM to ever-larger sizes, but in adding a layer to ensure accuracy, as DeepMind has done with FunSearch, say researchers at the University of Birmingham, UK. It is said that it will come from.
“The strength of language models is their ability to imagine things, but the problem is their illusions,” Lee says. “And this study breaks that down, curbs that, and confirms the facts. It’s a nice idea.”
Lee says AI should not be criticized for producing large amounts of inaccurate or useless output. This is similar to how human mathematicians and scientists work: brainstorm ideas, test them, and follow up on the best while discarding the worst.
topic:
Source: www.newscientist.com