Compromised AI-driven chatbots pose risks by gaining access to harmful knowledge through illegal information encountered during their training, according to researchers.
This alert comes as an alarming trend emerges where chatbots have been “jailbroken” to bypass their inherent safety measures. These safeguards are meant to stop the systems from delivering harmful, biased, or inappropriate responses to user queries.
Powerful chatbots, including large language models (LLMs) like ChatGpt, Gemini, and Claude, consume vast amounts of content from the Internet.
Even with attempts to filter out harmful content from their training datasets, LLMs can still learn about illegal activities—including hacking, money laundering, insider trading, and bomb-making. Security protocols are intended to prevent the use of such information in their answers.
In a Report on the risks, researchers found that it is surprisingly easy to deceive many AI-powered chatbots into producing harmful and illegal content, emphasizing that the threat is “immediate, concrete, and alarming.”
The author cautions that “what was once limited to state actors and organized crime may now be accessible to anyone with a laptop or smartphone.”
The study, conducted by Professor Rior Lokach and Dr. Michael Fier from Ben Gurion University in Negev, Israel, highlights an escalating threat from “dark LLMs” developed without safety measures or altered through jailbreaks. Some entities openly promote a “no ethical guardrails” approach, facilitating illegal activities like cybercrime and fraud.
Jailbreaking involves using specially crafted prompts to manipulate chatbots into providing prohibited responses. This is achieved by taking advantage of the chatbot’s primary goal of following user requests against its secondary aim of avoiding harmful, biased, unethical, or illegal outputs. Prompts typically create scenarios where the program prioritizes usefulness over safety precautions.
To illustrate the issue, researchers created a universal jailbreak that breached several prominent chatbots, enabling them to answer questions that should normally be denied. Once compromised, LLMs consistently produced responses to nearly all inquiries, according to the report.
“It was astonishing to see the extent of knowledge this system holds,” Fier noted, citing examples that included hacking computer networks and providing step-by-step guides for drug manufacturing and other criminal activities.
“What makes this threat distinct from previous technical challenges is an unparalleled combination of accessibility, scalability, and adaptability,” Rokach added.
The researchers reached out to leading LLM providers to inform them of the universal jailbreak, but reported that the response was “overwhelmingly inadequate.” Some companies did not reply, while others claimed that the jailbreak threat lay outside the parameters of their bounty programs, which encourage ethical hackers to report software vulnerabilities.
The report suggests that chatbots need to “forget” any illegal information they learn, emphasizing that technology companies must screen training data rigorously, implement strong firewalls to block dangerous queries and responses, and develop techniques for “learning machines.” Dark LLMs should be regarded as a “serious security threat,” comparable to unlicensed weapons and explosives, warranting accountability from providers.
Dr. Isen Aloani, an AI security expert at Queen’s University Belfast, highlighted that jailbreak attacks on LLMs could lead to significant risks, ranging from detailed weapon-building instructions to sophisticated disinformation campaigns, social engineering, and automated fraud.
“A crucial part of the solution is for companies to not only rely on front-end safeguards but to also invest meaningfully in red teaming and enhancing model-level robustness. Clear standards and independent oversight are essential to adapt to the evolving threat landscape,” he stated.
Professor Peter Garraghan, an AI security authority at Lancaster University, emphasized, “Organizations need to treat LLMs as they would any other vital software component.”
“While jailbreaking is a concern, understanding the entire AI stack is vital for genuine accountability. The real security requirements involve responsible design and deployment, not merely responsible disclosure,” he added.
OpenAI, the developer behind ChatGpt, stated that the latest O1 model can better infer its safety policies and improve its resistance to jailbreak attempts. The company affirmed its ongoing research to bolster the robustness of its solutions.
Meta, Google, Microsoft, and Anthropic were contacted for their feedback. Microsoft replied with a link to a blog detailing their work to mitigate jailbreaks.
Source: www.theguardian.com
Discover more from Mondo News
Subscribe to get the latest posts sent to your email.