GPT-5 is the latest version of OpenAI’s flagship language model
Cheng Xin/Getty Images
OpenAI has recently unveiled GPT-5, their latest AI model, marking another step in AI evolution rather than a dramatic breakthrough. Following the successful rollout of GPT-4, which significantly advanced ChatGPT’s capabilities and influence, the improvements found in GPT-5 seem marginal, indicating that innovative strategies may be needed to achieve further advancements in artificial intelligence.
OpenAI has described GPT-5 as a notable advancement over its predecessor, boasting enhancements in areas such as programming, mathematics, writing, healthcare, and visual comprehension. The company claims a reduction in the incidence of “hallucinations,” instances where AI generates incorrect information as factual. According to their internal metrics, GPT-5 claims to excel in complex and economically significant tasks across various professions, asserting it matches or exceeds expert-level performance.
Notably, however, GPT-5’s results on public benchmarks are less competitive when compared with leading models from other companies, such as Anthropic’s Claude and Google’s Gemini. Although it has improved from GPT-4, the enhancements are subtler than the leap observed between GPT-3 and GPT-4. Numerous users have expressed dissatisfaction with GPT-5’s performance, citing instances where it struggled with straightforward queries, leading to a chorus of disappointment on social media.
“Many were expecting a major breakthrough, but it seems more like an upgrade,” remarked Mirela Rapata from the University of Edinburgh. “There’s a sense of incremental progress.”
OpenAI has disclosed limited details regarding the internal benchmarks for GPT-5’s performance, making it challenging to assess them scientifically, according to Anna Rogers from the University of Copenhagen.
In a pre-release press briefing, Altman emphasized, “It feels like engaging with an expert on any topic, comparable to a PhD-level specialist.” Yet, Rogers pointed out that benchmarks do not substantiate such claims, and the correlation between advanced degrees and intelligence is questionable. “Highly intelligent individuals do not always hold PhDs, nor does a PhD guarantee superior intelligence,” she noted.
The modest advancements in GPT-5 may reflect broader challenges within the AI development community. Once believed to be an inexorable progression, the capabilities of large-scale language models (LLMs) seem to be plateauing, as recent results have not supported the prior assumptions that increased training data and computational power would lead to significant enhancements. As Lapata noted, “Now that everyone has adopted similar approaches, it’s evident that we’re following a predictable recipe, utilizing vast amounts of pre-training data and refining it during the post-training phase.”
However, whether LLMs are nearing a plateau remains uncertain, as technical design specifics about models like GPT-5 are not widely known, according to Nicos Aretra from the University of Sheffield. “It’s premature to claim that large-scale language models have reached their limits without concrete technical insights.”
OpenAI is also exploring alternative methods to enhance their offerings, such as the new routing system in GPT-5. Unlike previous versions where users could select from various models, GPT-5 intelligently assesses requests and directs them to the appropriate model based on the required computational power.
This strategy could potentially be more widely adopted, as Lapata mentions, “The reasoning model demands significant computation, which is both time-consuming and costly.” Yet, this shift has frustrated some ChatGPT users, prompting Altman to indicate that efforts are underway to enhance the routing process.
Another OpenAI model has recently achieved remarkable scores in elite mathematics and coding contests, hinting at a promising future for AI. This accomplishment was beyond the capabilities of leading AI models just a year ago. Although details on its functioning remain scarce, OpenAI staff have stated that this success implies the model possesses improved general reasoning skills.
These competitions allow us to evaluate models on data not encountered during training, according to Aletras, but they still represent a narrow aspect of intelligence. Enhanced performance in one domain may detrimentally affect results in others, warns Lapata.
GPT-5 has notably improved in pricing, as it is now significantly cheaper compared to other models—e.g., Claude models are approximately ten times more expensive when processing an equal volume of requests. However, this could lead to financial issues for OpenAI if revenue is insufficient to sustain the high costs of developing and operating new data centers. “Pricing is extraordinary. It’s so inexpensive; I’m uncertain how they can sustain it,” remarked Lapata.
Competition among leading AI models is intense. The first company to launch a superior model could secure a substantial market share. “All major companies are vying for dominance, which is a challenging endeavor,” noted Rapata. “You’ve only held the crown for three months.”
Topic:
Source: www.newscientist.com












