ChatGPT and other large-scale language models (LLMs) consist of billions of parameters, are pre-trained on large web-scale corpora, and are claimed to be able to acquire certain features without any special training. These features, known as emergent capabilities, have fueled debates about the promise and peril of language models. Their new paperUniversity of Bath researcher Harish Tayyar Madhavshi and his colleagues present a new theory to explain emergent abilities, taking into account potential confounding factors, and rigorously validate this theory through over 1,000 experiments. Their findings suggest that so-called emergent abilities are not in fact emergent, but rather result from a combination of contextual learning, model memory, and linguistic knowledge.
“The common perception that this type of AI is a threat to humanity is both preventing the widespread adoption and development of this technology and distracting from the real problems that need our attention,” said Dr Tayyar Madhavshi.
Dr. Tayyar Madabhushi and his colleagues carried out experiments to test LLM's ability to complete tasks that the model had not encountered before – so-called emergent capabilities.
As an example, LLMs can answer questions about social situations without being explicitly trained or programmed to do so.
While previous research has suggested that this is a product of the model's 'knowing' the social situation, the researchers show that this is actually a result of the model using a well-known ability of LLMs to complete a task based on a few examples that it is presented with – so-called 'in-context learning' (ICL).
Across thousands of experiments, the researchers demonstrated that a combination of LLMs' ability to follow instructions, memory, and language abilities explains both the capabilities and limitations they exhibit.
“There is a concern that as models get larger and larger, they will be able to solve new problems that we currently cannot predict, and as a result these large models may gain dangerous capabilities such as reasoning and planning,” Dr Tayyar Madabhshi said.
“This has generated a lot of debate – for example we were asked to comment at last year's AI Safety Summit at Bletchley Park – but our research shows that fears that the models will go off and do something totally unexpected, innovative and potentially dangerous are unfounded.”
“Concerns about the existential threat posed by the LLM are not limited to non-specialists but have been expressed by some of the leading AI researchers around the world.”
However, Dr Tayyar Madabushi and his co-authors argue that this concern is unfounded as tests show that LLMs lack complex reasoning skills.
“While it is important to address existing potential misuse of AI, such as the creation of fake news and increased risk of fraud, it would be premature to enact regulations based on perceived existential threats,” Dr Tayyar Madabhsi said.
“The point is, it is likely a mistake for end users to rely on LLMs to interpret and perform complex tasks that require complex reasoning without explicit instructions.”
“Instead, users are likely to benefit from being explicitly told what they want the model to do, and from providing examples, where possible, for all but the simplest tasks.”
“Our findings do not mean that AI is not a threat at all,” said Professor Irina Gurevich of Darmstadt University of Technology.
“Rather, the emergence of threat-specific complex thinking skills is not supported by the evidence, and we show that the learning process in LLMs can ultimately be quite well controlled.”
“Future research should therefore focus on other risks posed by the model, such as the possibility that it could be used to generate fake news.”
_____
Shen Lu others. 2024. Is emergent capability in large-scale language models just in-context learning? arXiv: 2309.01809
Source: www.sci.news