Researchers at Apple have identified “fundamental limitations” in state-of-the-art artificial intelligence models, prompting concerns about the competitive landscape in the tech industry for developing more robust systems.
In a study, Apple noted that the advanced AI model, known as the large-scale inference model (LRMS), experienced a “complete collapse in accuracy” when faced with complex challenges.
Standard AI models outperformed LRMS on tasks of lower complexity, yet both encountered “complete collapse” on highly complex tasks. LRMS attempts to handle intricate queries by creating detailed reasoning processes to break down issues into manageable steps.
The research, which evaluated the models’ puzzle-solving capabilities, revealed that LRMS began to “reduce inference efforts” as it neared performance breakdowns—something researchers labeled as “particularly concerning.”
Gary Marcus, a noted academic voice on AI capabilities, characterized the Apple paper as “quite devastating” and highlighted that these findings raise pivotal concerns regarding the race towards achieving artificial general intelligence (AGI), which would enable systems to emulate human-level cognitive tasks.
Referencing large language models (LLMs), Marcus remarked: “[of] AGIs, who can fundamentally change society, are joking about themselves.”
Moreover, the paper indicated that early in the “thinking” process, the inference model often squandered computational resources seeking solutions for simpler problems. However, as complexity increased, the model initially considered incorrect answers before ultimately arriving at correct ones.
When confronted with complex issues, the model experienced “collapse” and failed to generate accurate solutions. In one instance, it could not succeed even with an algorithm provided to assist.
The findings illustrated that “as problem difficulty rises, models begin to intuitively diminish inference efforts as they approach critical thresholds that closely align with the accuracy collapse point.”
According to Apple experts, these findings highlight “fundamental scaling limitations” in the reasoning capabilities of current inference models.
The study involved LRMS-based assignments like the Tower of Hanoi and River Crossing puzzle. The researchers acknowledged that their focus on puzzles signifies a boundary to their work.
After the newsletter promotion
The study concluded that current AI methodologies may have hit fundamental limitations. Models tested included OpenAI’s O3, Google’s Gemini Thinking, Anthropic’s Claude 3.7 Sonnet-Thinking, and Deepseek-R1. Google and Deepseek will be approached for comments, while OpenAI, the organization behind ChatGPT, opted not to provide a statement.
Discussing AI models’ capacity for “generalizable reasoning” or broader conclusions, the paper observes:
Andrew Rogoiski from the People-centered AI Institute at Surrey University remarked that Apple’s findings illustrate the industry remains grappling with AGI, suggesting that the current methods may have hit a “dead end.”
He added, “The revelation that the large model underperforms on complex tasks while faring well in simpler or medium-complexity contexts indicates we may be approaching a profound impasse.”
Source: www.theguardian.com
Discover more from Mondo News
Subscribe to get the latest posts sent to your email.