Close Menu
Mondo NewsMondo News
  • Technology
  • Science
  • Blockchain
What's Hot
British Safety Council's Findings Reveal That Ai Safety Devices Are
Technology

British Safety Council’s findings reveal that AI safety devices are easily susceptible to breaches

Amazon's mixed revenue report causes stock prices to decline
Technology

Amazon’s Mixed Revenue Report Causes Stock Prices to Decline

The Origin Of Humor In Great Apes: Teasing Each Other.
Science

The origin of humor in great apes: teasing each other.

  • About Us
  • Privacy Policy
  • Terms & Conditions
Facebook X (Twitter) Instagram
Facebook X (Twitter) Instagram
Mondo NewsMondo News
  • Technology
    Exploring the Limitations of AI Safety Management Practices

    Exploring the Limitations of AI Safety Management Practices

    May 14, 2026
    What is the likelihood of an asteroid impacting Earth

    What is the likelihood of an asteroid impacting Earth?

    December 21, 2025
    Understanding Britains Debt Through Biscuits How Labour MPs Embrace Viral

    Understanding Britain’s Debt Through Biscuits: How Labour MPs Embrace Viral Trends

    December 5, 2025
    Tesla Launches Affordable Model 3 in Europe Amid Criticism of

    Tesla Launches Affordable Model 3 in Europe Amid Criticism of Mask Sales

    December 5, 2025
    Horror Game Horses Banned Is the Controversy Bigger Than You

    Horror Game Horses Banned: Is the Controversy Bigger Than You Think?

    December 5, 2025
  • Science
    Understanding Venezuelas Double Earthquake Key Facts You Need to Know

    Understanding Venezuela’s ‘Double’ Earthquake: Key Facts You Need to Know About the Series of Quakes

    June 26, 2026
    Experience the Tear Inducing Power of an Onion Try Holding One

    Experience the Tear-Inducing Power of an Onion: Try Holding One!

    June 26, 2026
    Breakthrough Discovery Master Gene Unveils Secrets of Human Development

    Breakthrough Discovery: Master Gene Unveils Secrets of Human Development

    June 25, 2026
    Discover How Home Battery Storage Can Save You Money and

    Discover How Home Battery Storage Can Save You Money and Help the Environment

    June 25, 2026
    Phage Therapy Harnessing Viral Power to Enhance Vaccine Immunity and

    How Phages Could Hijack Vaccine Immunity to Target and Destroy Cancer Cells

    June 25, 2026
  • Blockchain
    Top 5 Best Altcoins Of 2024 Revealed: Etfs (etfs), Pepe

    Top 4 Altcoins Unveiled by Expert for 100x Portfolio Growth: Blockchain News, Opinion, TV, Jobs

    May 21, 2024
    Blockchain Experts Forecast Which Tokens Will Generate Profits

    Blockchain experts forecast which tokens will generate profits

    May 17, 2024
    The Leading Platform For Seasoned Traders Featuring Blockchain News,

    The Leading Platform for Seasoned Traders – Featuring Blockchain News, Insights, TV, and Job Listings

    May 8, 2024
    Darklume Fantasy Metaverse: Presale Now Available Latest Blockchain Updates,

    Darklume Fantasy Metaverse: Presale Now Available – Latest Blockchain Updates, Opinions, Television, and Job Listings

    April 30, 2024
    Sui Collaborates With Google Cloud To Drive Web3 Advancement Through

    Sui collaborates with Google Cloud to drive Web3 advancement through improved security, scalability, and AI features

    April 30, 2024
Mondo NewsMondo News
You are at:Home » Exploring the Limitations of AI Safety Management Practices
Exploring the Limitations of AI Safety Management Practices
Technology May 14, 2026

Exploring the Limitations of AI Safety Management Practices

Share
Facebook Twitter LinkedIn Pinterest Email

As organizations like Anthropic, Google, and OpenAI develop cutting-edge artificial intelligence systems, they are increasingly focused on implementing safeguards to prevent misuse—such as spreading disinformation, creating weapons, or hacking networks.

However, recent findings by Italian researchers reveal that these protective measures can sometimes be bypassed through poetic prompts.

By using poetic language, the researchers successfully tricked 31 AI systems into ignoring internal safety protocols. For example, starting prompts with metaphors like “The iron seed sleeps best in the unsuspecting womb of the earth away from the sun’s reproachful gaze” demonstrated how these systems can be manipulated to execute dangerous tasks.

This highlights a concerning trend: for many AI systems, guardrails intended to prevent risky behavior are merely suggestions, rather than effective barriers. Researchers are increasingly alarmed as AI systems become adept at exploiting vulnerabilities and engaging in risky operations.

Recently, Anthropic announced restrictions on the release of its latest AI system, Claude Mythos, to select organizations due to its rapid vulnerability detection capabilities in software. OpenAI echoed similar sentiments, choosing to share its technology with a limited group of trusted partners.

Since the AI boom initiated by OpenAI in late 2022, studies have confirmed the ability of users to bypass safety measures in AI systems. Closing one loophole often leads to the emergence of another.

“Everyone in the field acknowledges that establishing effective guardrails is challenging and will continue to be so for the foreseeable future,” stated Matt Fredrickson, a computer science professor at Carnegie Mellon University and CEO of Gray Swan AI, which specializes in securing AI technologies. “Determined individuals can evade these systems with relative ease.”

The repercussions of bypassing guardrails are significant. In an already misinformation-heavy online environment, AI systems are being employed to disseminate conspiracy theories and false claims. Anthropic has also reported that its technology played a role in an international cyberattack, teaching biosecurity experts how to unleash fatal pathogens.

The poetic bypass is just one of many methods hackers use to circumvent protections in systems like Anthropic’s Claude, Google’s Gemini, and OpenAI’s GPT. Major AI firms share similar foundational techniques for implementing guardrails, yet these measures are surprisingly easy to overcome.

“Poetry is merely one way to reframe a prompt and breach guardrails,” explained Piercosma Visconti, co-founder of AI firm Dexai and a researcher in the study.

The act of circumventing AI guardrails is commonly referred to as “jailbreaking.” This often entails submitting specific English sentences that prompt actions the AI has been programmed to avoid.

Jailbreaking techniques feature a variety of creative names, including stealth prompt injection, role-playing, token smuggling, polyglot Trojans, and greedy coordinate gradient attacks. Notable attack names include Crescendo, Deceptive Joy, and Echo Chamber.

Weak defenses in AI systems have already led to the spread of fabricated interviews, false wartime evidence, and synthetic rumor-mongering. Research conducted three years ago by international counterterrorism experts revealed far-right extremists using social media to circumvent moderators with “terrible but legal” AI content.

Experts are concerned that models could be jailbroken to mislead social media users with seemingly authentic content, overwhelm fact-checkers with misinformation, and tailor false narratives for specific audiences.

Some of these methods are widely disseminated online, while others remain undisclosed. Many discoverers of new jailbreaks keep them secret to exploit these loopholes before AI companies close them.

AI systems like Claude and GPT learn patterns from vast datasets, including Wikipedia, news articles, and curated texts from the internet. However, before releasing these systems to the public, companies like Anthropic and OpenAI explore potential exploits.

In their unfiltered states, these systems can potentially instruct users on purchasing illegal firearms online or creating hazardous substances using household items. Consequently, companies train their systems to refuse certain requests through a method known as reinforcement learning.

This often involves showcasing thousands of prohibited requests to the system. Through this analysis, the system can learn to identify other dangerous requests. However, this method only partially succeeds.

In some situations, AI companies might opt not to address vulnerabilities, believing that while weak guardrails could facilitate malicious activities, they also enable benign actions to counter them.

Recently, researchers at cybersecurity firm LayerX found that Claude’s guardrails could be bypassed by simply entering a few straightforward sentences into the AI system.

When told they were “penetrating” a computer network for testing purposes, Claude’s AI technology was directed to launch attacks on the network. This technique could potentially enable malicious hackers to extract sensitive information from businesses, governments, and individuals.

While closing this loophole may protect Claude’s networks, it could simultaneously hinder companies from safeguarding their own systems. LayerX informed Anthropic of this vulnerability weeks ago, yet it remains an open issue.

LayerX CEO Olu Eshed warned that this strategy might backfire. “Eventually, we will witness a surge of attacks utilizing these AI models, compelling us to rethink our security protocols,” he predicted.

Last year, researchers from Cisco and the University of Pennsylvania achieved breakthrough results by developing AI models that produced harmful outcomes using malicious prompts. Their efforts successfully jailbroke Meta and Chinese AI model DeepSeek chatbots 100% of the time, and over 80% of attacks against Google and OpenAI models were successful.

(The New York Times has filed a lawsuit against OpenAI and Microsoft, claiming copyright infringement related to its AI systems, with both companies denying these allegations.)

If guardrails are compromised, automated large-scale influence campaigns could become feasible, as researchers at the University of Technology Sydney demonstrated. By disguising their requests as “simulations,” they convinced a commercial language model to create a disinformation campaign against Australian political parties, complete with visuals, hashtags, and tailored posts for specific platforms.

In addition to establishing guardrails, these companies also employ other tools to monitor system activity, identify suspicious behaviors, and ban accounts infringing on their terms of service.

“Claude is built with robust, multi-layered protections designed to work in unison, including model training and layered guardrails,” stated Anthropic spokesperson Palul Maheshwary. “Bypassing one layer doesn’t circumvent the others.”

In a concerning revelation, Anthropic found that a group of state-sponsored hackers from China was employing Claude to breach the computer systems of approximately 30 companies and government agencies worldwide.

Despite the robust security technologies, experts caution that flaws remain, as companies struggle to monitor extensive global activity while also ensuring legitimate users are not excluded.

When restricted by the security measures of services like Claude and GPT, users may turn to open-source AI systems. These platforms allow for their underlying software to be freely replicated, modified, and shared.

Such systems can be altered to eliminate guardrails. A novel approach called Heretic enables users to remove system guardrails with minimal effort, essentially undoing months of guardrailing training through sophisticated algorithms.

“A year ago, this process was highly complex,” noted Norm Schwartz, CEO of AI security firm Alice. “Today, it can be controlled effortlessly via a mobile device.”

Source: www.nytimes.com

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleScientists Discover Vocal Fry is More Prevalent in Men
Next Article Melting Greenland Ice Sheet May Unleash Methane ‘Fire Ice’: What You Need to Know

Related Posts

Exploring the Female Homo Naledi Skeletons Insights and Discoveries
Science

Exploring the Female Homo Naledi Skeletons: Insights and Discoveries

Exploring the Ancient and Universal Bond Between Humans and Dogs
Science

Exploring the Ancient and Universal Bond Between Humans and Dogs: Key Findings from Recent Study

Scientists Reveal Earths Early Sexual Practices Were Detrimental
Science

Scientists Reveal Earth’s Early Sexual Practices Were Detrimental

Exploring Two Distinct Autism Subtypes Linked to Varying Brain Activity
Science

Exploring Two Distinct Autism Subtypes Linked to Varying Brain Activity

Exploring Henry Moore A Stunning New Exhibition Merging Art and
Science

Exploring Henry Moore: A Stunning New Exhibition Merging Art and Nature

Exploring the Toy Universe Is Time Just a Quantum Illusion
Science

Exploring the Toy Universe: Is Time Just a Quantum Illusion?

Discover the Japanese Archipelago Once a Sanctuary for Cave Lions
Science

Exploring the Cave Lion: Unique Interbreeding with Ancestral Lions and Its Impact on Today’s Species

Fish Based Pet Food The Risks of Chemical Exposure for Cats
Science

Iron Age Britons: Evidence of Brain Removal Practices in Burial Rituals

Leave A Reply Cancel Reply

Stay In Touch
  • Facebook
  • Twitter
  • Instagram
  • Pinterest
Quote of the day

A hole in one is amazing when you think of the different universes this white mass of molecules has to pass through on its way to the hole.

Mac O'Grady
Exchange Rate

Exchange Rate EUR: Fri, 26 Jun.

Top Insights
Sauropod Dinosaurs Last Meal Shows He Wasnt Concerned With Chewing Science

Sauropod Dinosaur’s Last Meal Shows He Wasn’t Concerned With Chewing

This machine solves the rubik's cube faster than most humans! Science

This Machine Solves the Rubik’s Cube Faster Than Most Humans!

Antarctica Could Have Crossed a Critical Ocean Tipping Point Science

Antarctica Could Have Crossed a Critical Ocean Tipping Point

Categories
  • Blockchain (65)
  • Science (7,933)
  • Technology (2,968)
Top Posts
UK Government to Renew Dispute with Apple Over Access to

UK Government to Renew Dispute with Apple Over Access to User Data | Data Protection

October 2, 2025
Transform Your Filmmaking How New AI Tools Are Revolutionizing the

Transform Your Filmmaking: How New AI Tools Are Revolutionizing the Industry

July 20, 2025
Human Level AI is Inevitable Harnessing the Power to Influence the

Human-Level AI is Inevitable: Harnessing the Power to Influence the Journey | Garrison Nice

July 21, 2025

Mondo News is a Professional Technology & Science Blog. Here we will provide you with only exciting content that you will enjoy and find useful. We’re working to turn our passion into a successful website. We hope you enjoy our Content as much as we enjoy offering them to you.

Facebook X (Twitter) Instagram Pinterest
Categories
  • Blockchain (65)
  • Science (7,933)
  • Technology (2,968)
Most Popular
Combining Over The Counter Painkillers With Birth Control Raises Blood Clot Risk
Science

Combining Over-the-Counter Painkillers with Birth Control Raises Blood Clot Risk

Louis Vuitton Reports Cyberattack Compromising UK Customer Data Cybercrime
Technology

Louis Vuitton Reports Cyberattack Compromising UK Customer Data | Cybercrime

SiteLock
© 2026 Mondo News.
  • Home
  • About Us
  • Privacy Policy
  • Terms & Conditions

Type above and press Enter to search. Press Esc to cancel.

We are using cookies to give you the best experience on our website.

You can find out more about which cookies we are using or switch them off in .

Ad Blocker Enabled!
Ad Blocker Enabled!
Our website is made possible by displaying online advertisements to our visitors. Please support us by disabling your Ad Blocker.
Go to mobile version
Powered by  GDPR Cookie Compliance
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Strictly Necessary Cookies

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.