The Limited Advantages of GPT-5 Indicate a Slowdown in AI Advancement

GPT-5 is the latest version of OpenAI’s flagship language model

Cheng Xin/Getty Images

OpenAI has recently unveiled GPT-5, their latest AI model, marking another step in AI evolution rather than a dramatic breakthrough. Following the successful rollout of GPT-4, which significantly advanced ChatGPT’s capabilities and influence, the improvements found in GPT-5 seem marginal, indicating that innovative strategies may be needed to achieve further advancements in artificial intelligence.

OpenAI has described GPT-5 as a notable advancement over its predecessor, boasting enhancements in areas such as programming, mathematics, writing, healthcare, and visual comprehension. The company claims a reduction in the incidence of “hallucinations,” instances where AI generates incorrect information as factual. According to their internal metrics, GPT-5 claims to excel in complex and economically significant tasks across various professions, asserting it matches or exceeds expert-level performance.

Notably, however, GPT-5’s results on public benchmarks are less competitive when compared with leading models from other companies, such as Anthropic’s Claude and Google’s Gemini. Although it has improved from GPT-4, the enhancements are subtler than the leap observed between GPT-3 and GPT-4. Numerous users have expressed dissatisfaction with GPT-5’s performance, citing instances where it struggled with straightforward queries, leading to a chorus of disappointment on social media.

“Many were expecting a major breakthrough, but it seems more like an upgrade,” remarked Mirela Rapata from the University of Edinburgh. “There’s a sense of incremental progress.”

OpenAI has disclosed limited details regarding the internal benchmarks for GPT-5’s performance, making it challenging to assess them scientifically, according to Anna Rogers from the University of Copenhagen.

In a pre-release press briefing, Altman emphasized, “It feels like engaging with an expert on any topic, comparable to a PhD-level specialist.” Yet, Rogers pointed out that benchmarks do not substantiate such claims, and the correlation between advanced degrees and intelligence is questionable. “Highly intelligent individuals do not always hold PhDs, nor does a PhD guarantee superior intelligence,” she noted.

The modest advancements in GPT-5 may reflect broader challenges within the AI development community. Once believed to be an inexorable progression, the capabilities of large-scale language models (LLMs) seem to be plateauing, as recent results have not supported the prior assumptions that increased training data and computational power would lead to significant enhancements. As Lapata noted, “Now that everyone has adopted similar approaches, it’s evident that we’re following a predictable recipe, utilizing vast amounts of pre-training data and refining it during the post-training phase.”

However, whether LLMs are nearing a plateau remains uncertain, as technical design specifics about models like GPT-5 are not widely known, according to Nicos Aretra from the University of Sheffield. “It’s premature to claim that large-scale language models have reached their limits without concrete technical insights.”

OpenAI is also exploring alternative methods to enhance their offerings, such as the new routing system in GPT-5. Unlike previous versions where users could select from various models, GPT-5 intelligently assesses requests and directs them to the appropriate model based on the required computational power.

This strategy could potentially be more widely adopted, as Lapata mentions, “The reasoning model demands significant computation, which is both time-consuming and costly.” Yet, this shift has frustrated some ChatGPT users, prompting Altman to indicate that efforts are underway to enhance the routing process.

Another OpenAI model has recently achieved remarkable scores in elite mathematics and coding contests, hinting at a promising future for AI. This accomplishment was beyond the capabilities of leading AI models just a year ago. Although details on its functioning remain scarce, OpenAI staff have stated that this success implies the model possesses improved general reasoning skills.

These competitions allow us to evaluate models on data not encountered during training, according to Aletras, but they still represent a narrow aspect of intelligence. Enhanced performance in one domain may detrimentally affect results in others, warns Lapata.

GPT-5 has notably improved in pricing, as it is now significantly cheaper compared to other models—e.g., Claude models are approximately ten times more expensive when processing an equal volume of requests. However, this could lead to financial issues for OpenAI if revenue is insufficient to sustain the high costs of developing and operating new data centers. “Pricing is extraordinary. It’s so inexpensive; I’m uncertain how they can sustain it,” remarked Lapata.

Competition among leading AI models is intense. The first company to launch a superior model could secure a substantial market share. “All major companies are vying for dominance, which is a challenging endeavor,” noted Rapata. “You’ve only held the crown for three months.”

Topic:

Source: www.newscientist.com

OpenAI Withholds GPT-5 Energy Consumption Details, Potentially Exceeding Previous Models

In response to inquiries about Artichoke recipes made to OpenAI’s ChatGPT in mid-2023, whether for pasta or guidance on rituals related to Moloch, the ancient Canaanite deity, the feedback was quite harsh—2 watts—which consumes approximately the same energy as an incandescent bulb over two minutes.

On Thursday, OpenAI unveiled a model that powers the widely-used chatbot GPT-5. When queried about Artichoke recipes, experts suggest that the energy consumed for similar pasta-related text could be multiple times greater (up to 20 times).

The release of GPT-5 introduced a groundbreaking capability for the model to answer PhD-level scientific inquiries, illuminating rationales for complex questions.

Nevertheless, specialists who have assessed energy and resource consumption of AI models over recent years indicate that these newer variants come with a cost. Responses from GPT-5 may require substantially more energy than those from earlier ChatGPT models.

Like many of its rivals, OpenAI has not provided official data regarding the power consumption of models since announcing GPT-3 in 2020. In June, Altman discussed the resource usage of ChatGPT on his blog. However, the figures presented—0.34 watt-hours and 0.000085 gallons of water per query—lack specific model references and supporting documentation.

“More complex models like GPT-5 require greater power during both training and inference, leading to a significant increase in energy consumption compared to GPT-4.”

On the day GPT-5 launched, researchers from the University of Rhode Island AI Lab found that the model could consume up to 40 watts to generate a medium-length response of approximately 1,000 tokens.

A dashboard released on Friday indicated that GPT-5’s average energy use for medium-length responses exceeds 18 watts, surpassing all other models except for OpenAI’s O3 inference model launched in April, developed by Chinese AI firm Deepseek.

According to Nidhal Jegham, a researcher in the group, this is “significantly more energy than OpenAI’s prior model, GPT-4O.”

To put that in perspective, one watt of 18 watt-hours equates to using that incandescent light bulb for 18 minutes. Recent reports indicate that ChatGPT processes 2.5 billion requests daily, suggesting that GPT-5’s total energy consumption could match that of 1.5 million American households.

Despite these figures, experts in the field assert they align with expectations regarding GPT-5’s energy consumption, given its significantly larger scale compared to OpenAI’s earlier model. Since GPT-3, OpenAI has not disclosed the parameter count of any models. The earlier version contained 17.5 billion parameters.

This summer, insights from French AI company Mistral highlighted a “strong correlation” between model size and energy use, based on their internal systems research.

“The amount of resources consumed by the model size [for GPT-5] is noteworthy,” observed Xiao Len, a professor at the University of California Riverside. “We are facing a significant AI resource footprint.”

AI Power Usage Benchmark

GPT-4 was widely regarded as being 10 times larger compared to GPT-3. Jegham, Kumar, and Ren believe GPT-5 is likely to be even larger than GPT-4.

Major AI companies like OpenAI assert that significantly larger models may be essential for achieving AGI, an AI system capable of performing human tasks. Altman has emphasized this perspective, stating in February: “It seems you can invest any amount and receive continuous, predictable returns,” but that GPT-5 does not surpass human intelligence.

Skip past newsletter promotions

According to benchmarks from a study performed in July, Mistral’s LE chatbot exhibited a direct correlation between model size and its resource usage regarding power, water, and carbon emissions.

Jegham, Kumar, and Ren indicated that while the scale of GPT-5 is crucial, other factors will likely influence resource consumption. GPT-5 utilizes more efficient hardware compared to previous iterations. It employs a “mixture” architecture, allowing not all parameters to be active while responding, which could help diminish energy use.

Moreover, since GPT-5 operates as an inference model that processes text, images, and video, this is expected to lead to a larger energy footprint when compared to solely text-based processing, according to Ren and Kumar.

“In inference mode, the resources spent to achieve identical outcomes can escalate by five to ten times,” remarked Ren.

Hidden Information

To assess the resource consumption of AI models, a team from the University of Rhode Island calculated the average time taken by the model to answer queries—such as pasta recipes or offerings to Moloch—multiplied by the average power draw of the model during operation.

Estimating the model’s power draw involved significant effort, shared Abdeltawab Henderwi, a Professor of Data Science at the University of Rhode Island. The team faced difficulties in sourcing information about the deployment of various models within data centers. Their final paper includes estimates detailing chip usage for specific models and the distribution of queries among different chips in the data centers.

Altman’s blog post from June affirmed their results, revealing that his indicated energy consumption for queries on ChatGPT, at 0.34 watt-hours, closely matches findings from the team for GPT-4O.

Other team members, including Hendawi, Jegham, and others emphasized the need for increased transparency from AI firms when releasing new models.

“Addressing the true environmental costs of AI is more critical now than ever,” stated Marwan Abdelatti, a Professor. “We urge OpenAI and other developers to commit to full transparency in disclosing the environmental impact of GPT-5.”

Source: www.theguardian.com