OpenAI's o3 model demonstrates successful AI inference capabilities, yet falls short of achieving Artificial General Intelligence (AGI)

OpenAI announces breakthrough of new o3 AI model

Locas Tennis / Alamy

OpenAI’s new o3 artificial intelligence model achieved groundbreaking high scores in a prestigious AI inference test called the ARC Challenge, leading some AI fans to wonder if o3 has achieved artificial general intelligence (AGI). However, while the organizers of the ARC Challenge described o3’s achievement as a major milestone, it did not win the competition’s grand prize. They also warned that this is just one step on the path to AGI, a term used to describe .

The o3 model is the latest in a series of AI releases following a large-scale language model that powers ChatGPT. “This is a surprising and significant step-function increase in AI capability, demonstrating novel task adaptation capabilities not previously seen in GPT family models.” François Choletan engineer at Google and the main creator of the ARC Challenge, blog post.

What did OpenAI’s o3 model actually do?

Designed by Cholet, Abstraction and Reasoning Corpus (ARC) A 2019 challenge that tests how well an AI can find the correct pattern connecting pairs of colored grids. Visual puzzles like this are intended to give the AI general intelligence with basic reasoning abilities. However, if you put enough computing power into a puzzle, a non-reasoning program could easily solve it by brute force. To prevent this, the contest also requires that official score submissions meet certain computing power limitations.

OpenAI’s newly announced o3 model (scheduled for release in early 2025) is an ARC Challenge “semi-private” test used to rank competitors on public leaderboards, with an official breakthrough score of 75.7 percent achieved. The computational cost of this work was approximately $20 per visual puzzle task, meeting the competition’s limit of less than $10,000 total. However, the more difficult “private” tests used to determine the grand prize winner have even stricter computational power limits, equivalent to spending just 10 cents on each task, and OpenAI does not meet this limit. No.

The o3 model also achieved an unofficial score of 87.5 percent by applying approximately 172 times more computing power than the official score. For comparison, a typical human score is 84 percent, and if you can also keep the model’s computing costs within the required limits, a score of 85 percent is enough to win the $600,000 grand prize in the ARC Challenge. is.

However, to achieve the unofficial score, o3’s cost rose to thousands of dollars spent solving each task. OpenAI asked challenge organizers not to publish exact computing costs.

Does achieving this o3 indicate that AGI has been reached?

No, the organizers of the ARC Challenge have clearly stated that they do not believe that beating this competition benchmark is an indicator of achieving AGI.

ARC Challenge organizer Mike Knoop of software company Zapier said that even though OpenAI applied a very large amount of computational power to create the unofficial scores, the o3 model was able to solve more than 100 visual puzzle tasks. He said on social media that he was unable to do so. post With X.

on social media post At Blue Sky, melanie mitchell of the Santa Fe Institute in New Mexico, said of o3’s progress on the ARC benchmark: “I think solving these tasks through brute force computing defeats the purpose.”

“While the new model is very impressive and represents a major milestone towards AGI, I do not believe it is AGI. There are still quite a few models that are very simple. [ARC Challenge] It’s a challenge that o3 can’t solve,” Chollet said in another X post.

But Cholet explained how we will know when human-level intelligence is demonstrated by some form of AGI. “You’ll see AGI emerge when the task of creating a task that is easy for a normal human but difficult for an AI becomes completely impossible,” he said in a blog post.

thomas dieterich Researchers at Oregon State University have proposed another way to recognize AGI. “These architectures are claimed to contain all the functional components necessary for human cognition,” he says. “This measure means commercial AI systems lack episodic memory, planning, logical reasoning, and most importantly, metacognition.”

So what does a high score in o3 actually mean?

The o3 model’s high score comes as the tech industry and AI researchers expect a slow pace of advancement for modern AI models in 2024, compared to an initial explosion of development in 2023. It belongs to.

Although it did not win the ARC Challenge, o3’s high score indicates that the AI model has the potential to outperform competitive benchmarks in the near future. Beyond the unofficial high scores, many of the official low computing submissions have already scored above 81 percent on the unofficial assessment test set, Chollet said.

Dieterich agrees: “This is a very impressive jump in performance.” However, he cautions that it’s impossible to assess how impressive this high score is without knowing more about how OpenAI’s o1 and o3 models work. For example, if o3 can practice ARC questions beforehand, it will be easier to achieve. “We’ll have to wait for open source replication to fully understand the significance of this,” says Dieterich.

ARC Challenge organizers are already considering launching a second, more difficult benchmark test sometime in 2025. We also plan to continue the ARC Prize 2025 challenge until someone wins the grand prize and open sources their solution.

topic:

artificial intelligence/
A.I.

Source: www.newscientist.com

What's Hot

Months of extreme weather in California lead to devastating wildfires

Superman’s Kryptonite is incomparable on Earth.

Apple Releases Important Updates for iOS 17.2, Including Bug Fixes, Security Patches, and a New App

President Trump signs executive order lifting ban on TikTok in the US | Trump administration

Couriers puzzled by algorithms dictating work: The nightmare of the gig economy

UK retailers embrace automation with robotic packaging machines and AI cameras to cut labor expenses

Controversy surrounding Brutalist and Emilia Perez’s voice-cloning sparks competition in AI for awards season, including the Oscars

Elon Musk facing allegations of faking video game skills – Sarcasm at its finest | Game

Research: Urination in chimpanzees can be contagious

Discovery of Two New Edible Truffle Species in Eastern North America

Study suggests that a massive flood reclaimed the Mediterranean Sea 5.3 million years ago

Winds on the alien planet reach speeds of 33,000 kilometers per hour

After US Withdraws from Paris Agreement, What Comes Next for Global Climate Action?

Sui and Atoma introduce AI capabilities to dApp developers – Blockchain Updates, Views, Videos, Opportunities

Bitcoin ETF issuer acquires 5% of BTC supply, $100 million invested in ETFSwap (ETFS) presale – Blockchain updates, insights, and career opportunities

Agora boosts Sui’s native stablecoin with addition of AUSD stablecoin to network

Meme Coin Memeinator Goes Viral, Raises $7.7 Million and Debuts on Exchanges- Latest in Blockchain News, Opinion, TV, and Job Listings

Changing the game of betting with Blockchain: New News, Opinions, TV, and Job Opportunities

Research: Urination in chimpanzees can be contagious

Discovery of Two New Edible Truffle Species in Eastern North America

Study suggests that a massive flood reclaimed the Mediterranean Sea 5.3 million years ago

Winds on the alien planet reach speeds of 33,000 kilometers per hour

After US Withdraws from Paris Agreement, What Comes Next for Global Climate Action?

OpenAI’s o3 model demonstrates successful AI inference capabilities, yet falls short of achieving Artificial General Intelligence (AGI)

Research: Urination in chimpanzees can be contagious

Discovery of Two New Edible Truffle Species in Eastern North America

Study suggests that a massive flood reclaimed the Mediterranean Sea 5.3 million years ago

Winds on the alien planet reach speeds of 33,000 kilometers per hour

After US Withdraws from Paris Agreement, What Comes Next for Global Climate Action?

The strategies used by 9 sleep researchers to achieve optimal rest

Monitoring recurring rapid radio bursts at the edge of a stationary elliptical galaxy

The Hills of Sicily were Submerged 40 Meters Below Water During the Great Flood.

Leave a ReplyCancel reply

Blocks can be transformed into anything you desire by this robot

In 2024, who will be the world’s oldest person?

Is it possible for our furry pets to contract and transmit H5N1 bird flu to humans?

Research: Urination in chimpanzees can be contagious

Newly Discovered Light Properties Unveiled by Centuries-Old Theorem

Snap collaborates with edtech firm Inspirit to introduce augmented reality technology in 50 American schools

What's Hot

OpenAI’s o3 model demonstrates successful AI inference capabilities, yet falls short of achieving Artificial General Intelligence (AGI)

What did OpenAI’s o3 model actually do?

Does achieving this o3 indicate that AGI has been reached?

So what does a high score in o3 actually mean?

Related

Related Posts

Leave a ReplyCancel reply