Lab Discovers Simple Method to Evade AI Safety Features in Multi-shot Jailbreak

A study shows that some of the most powerful AI tools meant to prevent cybercrime and terrorism can be bypassed simply by inundating them with fraudulent activities.

Researchers at Anthropic, the AI lab responsible for creating the large-scale language model (LLM) powering ChatGPT competitor Claude, detailed an attack called a “multi-shot jailbreak” in a recent paper. This attack was both simple and effective.

Claude, like many other commercial AI systems, contains safety features that block certain types of requests, such as generating violent content, hate speech, illegal instructions, deception, or discrimination. However, by providing enough examples of the “correct” responses to harmful questions like “How to create a bomb,” the system can be tricked into providing harmful responses despite being trained not to do so.

Anthropic stated, “By inputting large amounts of text in specific ways, this approach can lead the LLM to produce potentially harmful outputs even though it was trained to avoid doing so.” The company has shared its findings with industry peers and aims to address the issue promptly.

This jailbreak attack targets AI models with a large “context window” capable of processing lengthy queries. These advanced models are susceptible to such attacks as they can learn to circumvent their own safety measures faster.

Newer, more advanced AI systems are at greater risk of such attacks due to their ability to handle longer inputs and learn from examples quickly. Anthropic expressed concern over the effectiveness of this jailbreak attack on larger models.

Skip past newsletter promotions

Anthropic has identified various strategies to mitigate this issue. One approach involves adding a mandatory warning to remind the system not to provide harmful responses, which has shown promise in reducing the likelihood of a successful jailbreak. However, this method may impact the system’s performance on other tasks.

Source: www.theguardian.com

Discover more from Mondo News

Subscribe to get the latest posts sent to your email.

What's Hot

Physicists at Catlin determine the maximum weight of neutrinos

Dyson reveals the mystical properties of mushrooms in combating frizzy hair | Dyson Ltd

The Top 9 Most Unusual Robots Currently in Existence

Study Reveals Over Half of the Top 100 Mental Health Resources Spread Misinformation

Utah Lawyers Approved After Using ChatGPT in Court: An Overview

Priority Warns: Farage Could Frighten the City and Empower Truss 2 – He Might Be Correct | Cryptocurrency

Back Among Your Own: Nintendo Switch 2 Launch Revives Midnight Releases

Google Closes Due to Misunderstanding of German Autobahn

Working Groups Warning of Fever Deaths Establish Signs About Doge in National Parks

White House Appoints New NASA Administrator, Overturning Trump’s Selection

Radio Waves and X-ray Emitting Stars: A New Perspective from Our Galaxy

Fossils from 73 Million Years Ago Reveal the Earliest Evidence of Bird Nesting in Polar Regions

The Nearby Star System Might Be Guiding Our Journey

Sui and Atoma introduce AI capabilities to dApp developers – Blockchain Updates, Views, Videos, Opportunities

Bitcoin ETF issuer acquires 5% of BTC supply, $100 million invested in ETFSwap (ETFS) presale – Blockchain updates, insights, and career opportunities

Agora boosts Sui’s native stablecoin with addition of AUSD stablecoin to network

Meme Coin Memeinator Goes Viral, Raises $7.7 Million and Debuts on Exchanges- Latest in Blockchain News, Opinion, TV, and Job Listings

Changing the game of betting with Blockchain: New News, Opinions, TV, and Job Opportunities

Lab Discovers Simple Method to Evade AI Safety Features in Multi-shot Jailbreak

Study Reveals Over Half of the Top 100 Mental Health Resources Spread Misinformation

Utah Lawyers Approved After Using ChatGPT in Court: An Overview

Priority Warns: Farage Could Frighten the City and Empower Truss 2 – He Might Be Correct | Cryptocurrency

Back Among Your Own: Nintendo Switch 2 Launch Revives Midnight Releases

Google Closes Due to Misunderstanding of German Autobahn

Sure! How about: “I Need Help: Overwhelmed by Candy Crush!”

Nigel Farage Proposes Bitcoin Donations for UK Reform

New AI Tools Predict Which Men Will Respond to Prostate Cancer Treatments

Leave a ReplyCancel reply

Alphabet Embraces Rare AI Opportunity as Revenue Rises | Tech

Are you consuming the wrong amount of protein for your age? Here’s why.

DECam captures close-up of the Antria galaxy cluster

Franks secures more capital to enhance automation of wealth services in Europe

Newly Discovered Light Properties Unveiled by Centuries-Old Theorem

Snap collaborates with edtech firm Inspirit to introduce augmented reality technology in 50 American schools

What's Hot

Lab Discovers Simple Method to Evade AI Safety Features in Multi-shot Jailbreak

Related

Discover more from Mondo News

Related Posts

Leave a ReplyCancel reply