Close Menu
Mondo NewsMondo News
  • Technology
  • Science
  • Blockchain
What's Hot
From Epic Game Marathons to Military Helicopters Highlights from Summer
Technology

From Epic Game Marathons to Military Helicopters: Highlights from Summer Game Fest 2025

Spain Unveils New Prehistoric Species Of Cat
Science

Spain Unveils New Prehistoric Species of Cat

Exploring The Cosmic Landscape: Nueva Vizcaya, Philippines
Science

Exploring the Cosmic Landscape: Nueva Vizcaya, Philippines

  • About Us
  • Privacy Policy
  • Terms & Conditions
Facebook X (Twitter) Instagram
Facebook X (Twitter) Instagram
Mondo NewsMondo News
  • Technology
    Exploring the Limitations of AI Safety Management Practices

    Exploring the Limitations of AI Safety Management Practices

    May 14, 2026
    What is the likelihood of an asteroid impacting Earth

    What is the likelihood of an asteroid impacting Earth?

    December 21, 2025
    Understanding Britains Debt Through Biscuits How Labour MPs Embrace Viral

    Understanding Britain’s Debt Through Biscuits: How Labour MPs Embrace Viral Trends

    December 5, 2025
    Tesla Launches Affordable Model 3 in Europe Amid Criticism of

    Tesla Launches Affordable Model 3 in Europe Amid Criticism of Mask Sales

    December 5, 2025
    Horror Game Horses Banned Is the Controversy Bigger Than You

    Horror Game Horses Banned: Is the Controversy Bigger Than You Think?

    December 5, 2025
  • Science
    Top 5 Effective Strategies to Combat Hair Loss Explained by

    Top 5 Effective Strategies to Combat Hair Loss Explained by a Psychologist

    June 3, 2026
    Are You Eating Fiber at the Wrong Times Insights from

    Are You Eating Fiber at the Wrong Times? Insights from a Harvard Doctor

    June 3, 2026
    Otzis Frozen Remains Discovering Metabolically Active Microorganisms in Ancient Ice

    Ötzi’s Frozen Remains: Discovering Metabolically Active Microorganisms in Ancient Ice

    June 3, 2026
    Astronomers Discover Distinct Evidence of Exoplanets Magnetic Field

    Astronomers Discover Distinct Evidence of Exoplanet’s Magnetic Field

    June 3, 2026
    How Massive Submarine Volcanism Could Explain Triassic Extinctions

    Ancient Oceans’ Oxygen Decline Predated End-Triassic Mass Extinction by Millions of Years

    June 3, 2026
  • Blockchain
    Top 5 Best Altcoins Of 2024 Revealed: Etfs (etfs), Pepe

    Top 4 Altcoins Unveiled by Expert for 100x Portfolio Growth: Blockchain News, Opinion, TV, Jobs

    May 21, 2024
    Blockchain Experts Forecast Which Tokens Will Generate Profits

    Blockchain experts forecast which tokens will generate profits

    May 17, 2024
    The Leading Platform For Seasoned Traders Featuring Blockchain News,

    The Leading Platform for Seasoned Traders – Featuring Blockchain News, Insights, TV, and Job Listings

    May 8, 2024
    Darklume Fantasy Metaverse: Presale Now Available Latest Blockchain Updates,

    Darklume Fantasy Metaverse: Presale Now Available – Latest Blockchain Updates, Opinions, Television, and Job Listings

    April 30, 2024
    Sui Collaborates With Google Cloud To Drive Web3 Advancement Through

    Sui collaborates with Google Cloud to drive Web3 advancement through improved security, scalability, and AI features

    April 30, 2024
Mondo NewsMondo News
You are at:Home » ARC-AGI-2: Breakdown of Leading AI Model in Latest Artificial General Information Evaluation
Arc agi 2: breakdown of leading ai model in latest artificial general
Science March 25, 2025

ARC-AGI-2: Breakdown of Leading AI Model in Latest Artificial General Information Evaluation

Share
Facebook Twitter LinkedIn Pinterest Email
SEI 245115646

The ARC-AGI-2 benchmark is designed to be a difficult test for AI models

Just_Super/Getty Images

The most sophisticated AI models present today are inadequate scores on new benchmarks designed to measure progress towards artificial general information (AGI), and brute-force computing power is not sufficient to improve as evaluators consider the cost of running the model.

There are many competing definitions of AGI, but it is generally thought to refer to AI capable of performing cognitive tasks that humans can do. To measure this, the ARC Awards Foundation previously began a test of reasoning ability called ARC-AGI-1. Last December, Openai announced that the O3 model scored highly in tests, with some asking if the company is approaching AGI achievement.

But now the new test, the ARC-AGI-2, has raised the bar. Although current AI systems on the market are difficult enough to not achieve a score of over 100 digits of 100 in tests, all questions have been answered by at least two people on less than two attempts.

in Blog post Introducing the ARC-AGI-2, ARC president Greg Kamradt said a new benchmark is needed to test skills that differ from previous iterations. “To beat it, you need to demonstrate both high levels of adaptability and high efficiency,” he writes.

The ARC-AGI-2 benchmark differs from other AI benchmark tests in that it focuses on the ability to match the world’s leading PHD performance, but on the ability to complete simple tasks, such as replicating new image changes based on past examples of iconic interpretations. The current model is superior to “deep learning” measured by ARC-AGI-1, but not so good for seemingly simple tasks that require more challenging thinking and interaction with ARC-AGI-2. For example, Openai’s O3-low model won 75.7% on the ARC-AGI-1, but only 4% on the ARC-AGI-2.

This benchmark also adds a new dimension to measure AI capabilities by examining the efficiency of problem solving, as measured at the cost required to complete the task. For example, ARC paid a human tester $17 per task, while O3-low estimates that it would cost $200 for the same task.

“I think ARC-AGI’s new iteration, which now focuses on balancing performance and efficiency, is a major step towards a more realistic evaluation of the AI ​​model,” he says. Joseph Imperial At the University of Bath, UK. “This is a sign that we are moving from a one-dimensional evaluation test that is not only focusing on performance, but also considering a decline in computing power.”

Models that can pass the ARC-AGI-2 should not only be very capable, but also be smaller and lighter, Imperial says. Model efficiency is a key component of the new benchmark. This helps address concerns that AI models are becoming more energy-intensive – Sometimes to the point of waste – to achieve much better results.

However, not everyone is convinced that the new measure will be beneficial. “The whole framing of this to test intelligence is not the correct framing.” Catherine Frick At Staffordshire University, UK. Instead, these benchmarks are extrapolated to imply general functionality across a set of tasks, simply by assessing the ability of AI to properly complete a single task or a set of tasks.

Working well with these benchmarks should not be seen as a major moment for AGI, Flick said:

And another question is what will happen if ARC-AGI-2 is given, or when it is given. Do you need yet another benchmark? “If they develop ARC-AGI-3, I guess they’ll add another axis to the graph [the] The minimum number of humans – whether expert or not, it will take a task to solve, in addition to performance and efficiency,” says Imperial. In other words, discussions about AGI rarely resolve immediately.

topic:

Source: www.newscientist.com

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleRFK Jr. supports Vitamin A treatment for select measles patients
Next Article Astounding Discovery: Curiosity Detects Long-Chain Carbon Molecules in Martian Mudstones

Related Posts

Top 5 Effective Strategies to Combat Hair Loss Explained by
Science

Top 5 Effective Strategies to Combat Hair Loss Explained by a Psychologist

Are You Eating Fiber at the Wrong Times Insights from
Science

Are You Eating Fiber at the Wrong Times? Insights from a Harvard Doctor

Otzis Frozen Remains Discovering Metabolically Active Microorganisms in Ancient Ice
Science

Ötzi’s Frozen Remains: Discovering Metabolically Active Microorganisms in Ancient Ice

Astronomers Discover Distinct Evidence of Exoplanets Magnetic Field
Science

Astronomers Discover Distinct Evidence of Exoplanet’s Magnetic Field

How Massive Submarine Volcanism Could Explain Triassic Extinctions
Science

Ancient Oceans’ Oxygen Decline Predated End-Triassic Mass Extinction by Millions of Years

Discovering a Meteorite in Africa Evidence of a Lost Giant
Science

Discovering a Meteorite in Africa: Evidence of a Lost Giant Protoplanet Unveiled

Fishing Restrictions Lifted in Western Reservoirs Drought Conditions Expected to
Science

Fishing Restrictions Lifted in Western Reservoirs: Drought Conditions Expected to Cause Drying

Unlocking the Universe How the Electromagnetic Spectrum Reveals Cosmic Wonders
Science

Unlocking the Universe: How the Electromagnetic Spectrum Reveals Cosmic Wonders

Leave A Reply Cancel Reply

Stay In Touch
  • Facebook
  • Twitter
  • Instagram
  • Pinterest
Quote of the day

A government that is big enough to give you all you want is big enough to take it all away.

Barry Goldwater
Exchange Rate

Exchange Rate EUR: Wed, 3 Jun.

Top Insights
Chandra Uncovers the Turbulent History of Galactic Cluster Abell 2029 Science

Chandra Uncovers the Turbulent History of Galactic Cluster Abell 2029

Scientists Caution Against Invasive Longhorn Mites Linked to Debilitating Aerlicia Science

Scientists Caution Against Invasive Longhorn Mites Linked to Debilitating Aerlicia Infection

Ai System Used To Detect Uk Benefits Fraud Exposed For Technology

AI system used to detect UK benefits fraud exposed for bias | Universal Credit

Categories
  • Blockchain (65)
  • Science (7,696)
  • Technology (2,968)
Top Posts
UK Government to Renew Dispute with Apple Over Access to

UK Government to Renew Dispute with Apple Over Access to User Data | Data Protection

October 2, 2025
Ai Invents New Battery Design That Decreases Lithium Usage By

AI invents new battery design that decreases lithium usage by 70%

January 9, 2024
Human Level AI is Inevitable Harnessing the Power to Influence the

Human-Level AI is Inevitable: Harnessing the Power to Influence the Journey | Garrison Nice

July 21, 2025

Mondo News is a Professional Technology & Science Blog. Here we will provide you with only exciting content that you will enjoy and find useful. We’re working to turn our passion into a successful website. We hope you enjoy our Content as much as we enjoy offering them to you.

Facebook X (Twitter) Instagram Pinterest
Categories
  • Blockchain (65)
  • Science (7,696)
  • Technology (2,968)
Most Popular
Share Your Thoughts Family YouTube Habits We Hope Never Happen
Technology

Share Your Thoughts: Family YouTube Habits We Hope Never Happen

Halting submissions: the impact of nih budget cuts on scientific
Science

Halting Submissions: The Impact of NIH Budget Cuts on Scientific Journals

SiteLock
© 2026 Mondo News.
  • Home
  • About Us
  • Privacy Policy
  • Terms & Conditions

Type above and press Enter to search. Press Esc to cancel.

We are using cookies to give you the best experience on our website.

You can find out more about which cookies we are using or switch them off in .

Ad Blocker Enabled!
Ad Blocker Enabled!
Our website is made possible by displaying online advertisements to our visitors. Please support us by disabling your Ad Blocker.
Go to mobile version
Powered by  GDPR Cookie Compliance
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Strictly Necessary Cookies

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.