Close Menu
Mondo NewsMondo News
  • Technology
  • Science
  • Blockchain
What's Hot
Ultrasound Could Enhance Stroke Survival by Clearing Brain Debris
Science

Ultrasound Could Enhance Stroke Survival by Clearing Brain Debris

Scientists Successfully Capture The First Ever 2d Spectral Image Of The
Science

Scientists successfully capture the first-ever 2D spectral image of the aurora

Crispr Gene Therapy Shows Promise In Treating Severe Inflammatory Conditions
Science

CRISPR gene therapy shows promise in treating severe inflammatory conditions

  • About Us
  • Privacy Policy
  • Terms & Conditions
Facebook X (Twitter) Instagram
Facebook X (Twitter) Instagram
Mondo NewsMondo News
  • Technology
    Exploring the Limitations of AI Safety Management Practices

    Exploring the Limitations of AI Safety Management Practices

    May 14, 2026
    What is the likelihood of an asteroid impacting Earth

    What is the likelihood of an asteroid impacting Earth?

    December 21, 2025
    Understanding Britains Debt Through Biscuits How Labour MPs Embrace Viral

    Understanding Britain’s Debt Through Biscuits: How Labour MPs Embrace Viral Trends

    December 5, 2025
    Tesla Launches Affordable Model 3 in Europe Amid Criticism of

    Tesla Launches Affordable Model 3 in Europe Amid Criticism of Mask Sales

    December 5, 2025
    Horror Game Horses Banned Is the Controversy Bigger Than You

    Horror Game Horses Banned: Is the Controversy Bigger Than You Think?

    December 5, 2025
  • Science
    Discover a New Feathered Dinosaur from China Featuring a Peacock Like

    Discover a New Feathered Dinosaur from China Featuring a Peacock-Like Tail

    June 24, 2026
    How Parenting Can Enhance Brain Health for Moms and Dads

    How Parenting Can Enhance Brain Health for Moms and Dads

    June 24, 2026
    Mayan Elite Why Their Teeth Were Buried in Remote Caves

    Mayan Elite: Why Their Teeth Were Buried in Remote Caves Instead of Graves

    June 24, 2026
    When Should You Turn Off the Fan Understanding Safe Temperatures

    When Should You Turn Off the Fan? Understanding Safe Temperatures for Comfort

    June 23, 2026
    Cats can identify their owner's scent, researchers find

    New Study Shows Domestic Cats Age Similarly to Humans: Key Insights Revealed

    June 23, 2026
  • Blockchain
    Top 5 Best Altcoins Of 2024 Revealed: Etfs (etfs), Pepe

    Top 4 Altcoins Unveiled by Expert for 100x Portfolio Growth: Blockchain News, Opinion, TV, Jobs

    May 21, 2024
    Blockchain Experts Forecast Which Tokens Will Generate Profits

    Blockchain experts forecast which tokens will generate profits

    May 17, 2024
    The Leading Platform For Seasoned Traders Featuring Blockchain News,

    The Leading Platform for Seasoned Traders – Featuring Blockchain News, Insights, TV, and Job Listings

    May 8, 2024
    Darklume Fantasy Metaverse: Presale Now Available Latest Blockchain Updates,

    Darklume Fantasy Metaverse: Presale Now Available – Latest Blockchain Updates, Opinions, Television, and Job Listings

    April 30, 2024
    Sui Collaborates With Google Cloud To Drive Web3 Advancement Through

    Sui collaborates with Google Cloud to drive Web3 advancement through improved security, scalability, and AI features

    April 30, 2024
Mondo NewsMondo News
You are at:Home » ARC-AGI-2: Breakdown of Leading AI Model in Latest Artificial General Information Evaluation
Arc agi 2: breakdown of leading ai model in latest artificial general
Science March 25, 2025

ARC-AGI-2: Breakdown of Leading AI Model in Latest Artificial General Information Evaluation

Share
Facebook Twitter LinkedIn Pinterest Email
SEI 245115646

The ARC-AGI-2 benchmark is designed to be a difficult test for AI models

Just_Super/Getty Images

The most sophisticated AI models present today are inadequate scores on new benchmarks designed to measure progress towards artificial general information (AGI), and brute-force computing power is not sufficient to improve as evaluators consider the cost of running the model.

There are many competing definitions of AGI, but it is generally thought to refer to AI capable of performing cognitive tasks that humans can do. To measure this, the ARC Awards Foundation previously began a test of reasoning ability called ARC-AGI-1. Last December, Openai announced that the O3 model scored highly in tests, with some asking if the company is approaching AGI achievement.

But now the new test, the ARC-AGI-2, has raised the bar. Although current AI systems on the market are difficult enough to not achieve a score of over 100 digits of 100 in tests, all questions have been answered by at least two people on less than two attempts.

in Blog post Introducing the ARC-AGI-2, ARC president Greg Kamradt said a new benchmark is needed to test skills that differ from previous iterations. “To beat it, you need to demonstrate both high levels of adaptability and high efficiency,” he writes.

The ARC-AGI-2 benchmark differs from other AI benchmark tests in that it focuses on the ability to match the world’s leading PHD performance, but on the ability to complete simple tasks, such as replicating new image changes based on past examples of iconic interpretations. The current model is superior to “deep learning” measured by ARC-AGI-1, but not so good for seemingly simple tasks that require more challenging thinking and interaction with ARC-AGI-2. For example, Openai’s O3-low model won 75.7% on the ARC-AGI-1, but only 4% on the ARC-AGI-2.

This benchmark also adds a new dimension to measure AI capabilities by examining the efficiency of problem solving, as measured at the cost required to complete the task. For example, ARC paid a human tester $17 per task, while O3-low estimates that it would cost $200 for the same task.

“I think ARC-AGI’s new iteration, which now focuses on balancing performance and efficiency, is a major step towards a more realistic evaluation of the AI ​​model,” he says. Joseph Imperial At the University of Bath, UK. “This is a sign that we are moving from a one-dimensional evaluation test that is not only focusing on performance, but also considering a decline in computing power.”

Models that can pass the ARC-AGI-2 should not only be very capable, but also be smaller and lighter, Imperial says. Model efficiency is a key component of the new benchmark. This helps address concerns that AI models are becoming more energy-intensive – Sometimes to the point of waste – to achieve much better results.

However, not everyone is convinced that the new measure will be beneficial. “The whole framing of this to test intelligence is not the correct framing.” Catherine Frick At Staffordshire University, UK. Instead, these benchmarks are extrapolated to imply general functionality across a set of tasks, simply by assessing the ability of AI to properly complete a single task or a set of tasks.

Working well with these benchmarks should not be seen as a major moment for AGI, Flick said:

And another question is what will happen if ARC-AGI-2 is given, or when it is given. Do you need yet another benchmark? “If they develop ARC-AGI-3, I guess they’ll add another axis to the graph [the] The minimum number of humans – whether expert or not, it will take a task to solve, in addition to performance and efficiency,” says Imperial. In other words, discussions about AGI rarely resolve immediately.

topic:

Source: www.newscientist.com

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleRFK Jr. supports Vitamin A treatment for select measles patients
Next Article Astounding Discovery: Curiosity Detects Long-Chain Carbon Molecules in Martian Mudstones

Related Posts

Discover a New Feathered Dinosaur from China Featuring a Peacock Like
Science

Discover a New Feathered Dinosaur from China Featuring a Peacock-Like Tail

How Parenting Can Enhance Brain Health for Moms and Dads
Science

How Parenting Can Enhance Brain Health for Moms and Dads

Mayan Elite Why Their Teeth Were Buried in Remote Caves
Science

Mayan Elite: Why Their Teeth Were Buried in Remote Caves Instead of Graves

When Should You Turn Off the Fan Understanding Safe Temperatures
Science

When Should You Turn Off the Fan? Understanding Safe Temperatures for Comfort

Cats can identify their owner's scent, researchers find
Science

New Study Shows Domestic Cats Age Similarly to Humans: Key Insights Revealed

How Certain Brains Achieve Remarkable Stroke Recovery Insights and Innovations
Science

How Certain Brains Achieve Remarkable Stroke Recovery: Insights and Innovations

Unveiling SpaceXs Innovative Strategy for Cargo Delivery from Space to
Science

Unveiling SpaceX’s Innovative Strategy for Cargo Delivery from Space to Earth

Health Conditions Indicated by a Red Face After Drinking Wine
Science

Health Conditions Indicated by a Red Face After Drinking Wine

Leave A Reply Cancel Reply

Stay In Touch
  • Facebook
  • Twitter
  • Instagram
  • Pinterest
Quote of the day

A history in which every particular incident may be true may on the whole be false.

Thomas Babington Macaulay
Exchange Rate

Exchange Rate EUR: Wed, 24 Jun.

Top Insights
Fresh Research Illuminates The Mechanisms Behind The End Triassic Mass Extinction Science

Fresh research illuminates the mechanisms behind the end-Triassic mass extinction

New Strategy By Mount Sinai To Serve A Diverse Community Science

New Strategy by Mount Sinai to Serve a Diverse Community

Why Are We Drawn to Fake Lips but Reluctant About Science

Why Are We Drawn to Fake Lips but Reluctant About Fake Meat?

Categories
  • Blockchain (65)
  • Science (7,904)
  • Technology (2,968)
Top Posts
UK Government to Renew Dispute with Apple Over Access to

UK Government to Renew Dispute with Apple Over Access to User Data | Data Protection

October 2, 2025
Transform Your Filmmaking How New AI Tools Are Revolutionizing the

Transform Your Filmmaking: How New AI Tools Are Revolutionizing the Industry

July 20, 2025
Human Level AI is Inevitable Harnessing the Power to Influence the

Human-Level AI is Inevitable: Harnessing the Power to Influence the Journey | Garrison Nice

July 21, 2025

Mondo News is a Professional Technology & Science Blog. Here we will provide you with only exciting content that you will enjoy and find useful. We’re working to turn our passion into a successful website. We hope you enjoy our Content as much as we enjoy offering them to you.

Facebook X (Twitter) Instagram Pinterest
Categories
  • Blockchain (65)
  • Science (7,904)
  • Technology (2,968)
Most Popular
AI Discovers Novel Molecules with Potential Antibacterial Properties in Archaea
Science

AI Discovers Novel Molecules with Potential Antibacterial Properties in Archaea

Childline Empowers Teens To Combat Financial Sextortion Amid Growing Concerns
Technology

Childline Empowers Teens to Combat Financial Sextortion amid Growing Concerns

SiteLock
© 2026 Mondo News.
  • Home
  • About Us
  • Privacy Policy
  • Terms & Conditions

Type above and press Enter to search. Press Esc to cancel.

We are using cookies to give you the best experience on our website.

You can find out more about which cookies we are using or switch them off in .

Ad Blocker Enabled!
Ad Blocker Enabled!
Our website is made possible by displaying online advertisements to our visitors. Please support us by disabling your Ad Blocker.
Go to mobile version
Powered by  GDPR Cookie Compliance
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Strictly Necessary Cookies

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.