Close Menu
Mondo NewsMondo News
  • Technology
  • Science
  • Blockchain
What's Hot
Stay Warm In The Coldest Weather With A Polar Bear
Science

Stay Warm in the Coldest Weather with a Polar Bear Fur Sweater

Spacex Starship Launch: Fourth Test Successful With Both Stages Landing
Science

SpaceX Starship Launch: Fourth Test Successful with Both Stages Landing Safely in the Ocean

Neuromancer Review Does William Gibsons Cyberpunk Classic Still Captivate in
Science

Neuromancer Review: Does William Gibson’s Cyberpunk Classic Still Captivate in 2025?

  • About Us
  • Privacy Policy
  • Terms & Conditions
Facebook X (Twitter) Instagram
Facebook X (Twitter) Instagram
Mondo NewsMondo News
  • Technology
    Exploring the Limitations of AI Safety Management Practices

    Exploring the Limitations of AI Safety Management Practices

    May 14, 2026
    What is the likelihood of an asteroid impacting Earth

    What is the likelihood of an asteroid impacting Earth?

    December 21, 2025
    Understanding Britains Debt Through Biscuits How Labour MPs Embrace Viral

    Understanding Britain’s Debt Through Biscuits: How Labour MPs Embrace Viral Trends

    December 5, 2025
    Tesla Launches Affordable Model 3 in Europe Amid Criticism of

    Tesla Launches Affordable Model 3 in Europe Amid Criticism of Mask Sales

    December 5, 2025
    Horror Game Horses Banned Is the Controversy Bigger Than You

    Horror Game Horses Banned: Is the Controversy Bigger Than You Think?

    December 5, 2025
  • Science
    Breakthrough Discovery May Unravel One of Stonehenges Greatest Mysteries

    Breakthrough Discovery May Unravel One of Stonehenge’s Greatest Mysteries

    June 4, 2026
    Why Eyelash Trimming for Men is Surging in Popularity Insights

    Why Eyelash Trimming for Men is Surging in Popularity: Insights from Ophthalmologists

    June 4, 2026
    When Food is Scarce This Single Celled Organism Transforms into a

    When Food is Scarce, This Single-Celled Organism Transforms into a Giant Predator for Survival

    June 4, 2026
    Exploring the Flourishing Complexity of Colonial Life During the Cambrian

    Exploring the Flourishing Complexity of Colonial Life During the Cambrian Explosion

    June 4, 2026
    Keto Diet A Promising Approach for Anorexia Recovery

    Keto Diet: A Promising Approach for Anorexia Recovery

    June 4, 2026
  • Blockchain
    Top 5 Best Altcoins Of 2024 Revealed: Etfs (etfs), Pepe

    Top 4 Altcoins Unveiled by Expert for 100x Portfolio Growth: Blockchain News, Opinion, TV, Jobs

    May 21, 2024
    Blockchain Experts Forecast Which Tokens Will Generate Profits

    Blockchain experts forecast which tokens will generate profits

    May 17, 2024
    The Leading Platform For Seasoned Traders Featuring Blockchain News,

    The Leading Platform for Seasoned Traders – Featuring Blockchain News, Insights, TV, and Job Listings

    May 8, 2024
    Darklume Fantasy Metaverse: Presale Now Available Latest Blockchain Updates,

    Darklume Fantasy Metaverse: Presale Now Available – Latest Blockchain Updates, Opinions, Television, and Job Listings

    April 30, 2024
    Sui Collaborates With Google Cloud To Drive Web3 Advancement Through

    Sui collaborates with Google Cloud to drive Web3 advancement through improved security, scalability, and AI features

    April 30, 2024
Mondo NewsMondo News
You are at:Home » Exploring the Limitations of AI Safety Management Practices
Exploring the Limitations of AI Safety Management Practices
Technology May 14, 2026

Exploring the Limitations of AI Safety Management Practices

Share
Facebook Twitter LinkedIn Pinterest Email

As organizations like Anthropic, Google, and OpenAI develop cutting-edge artificial intelligence systems, they are increasingly focused on implementing safeguards to prevent misuse—such as spreading disinformation, creating weapons, or hacking networks.

However, recent findings by Italian researchers reveal that these protective measures can sometimes be bypassed through poetic prompts.

By using poetic language, the researchers successfully tricked 31 AI systems into ignoring internal safety protocols. For example, starting prompts with metaphors like “The iron seed sleeps best in the unsuspecting womb of the earth away from the sun’s reproachful gaze” demonstrated how these systems can be manipulated to execute dangerous tasks.

This highlights a concerning trend: for many AI systems, guardrails intended to prevent risky behavior are merely suggestions, rather than effective barriers. Researchers are increasingly alarmed as AI systems become adept at exploiting vulnerabilities and engaging in risky operations.

Recently, Anthropic announced restrictions on the release of its latest AI system, Claude Mythos, to select organizations due to its rapid vulnerability detection capabilities in software. OpenAI echoed similar sentiments, choosing to share its technology with a limited group of trusted partners.

Since the AI boom initiated by OpenAI in late 2022, studies have confirmed the ability of users to bypass safety measures in AI systems. Closing one loophole often leads to the emergence of another.

“Everyone in the field acknowledges that establishing effective guardrails is challenging and will continue to be so for the foreseeable future,” stated Matt Fredrickson, a computer science professor at Carnegie Mellon University and CEO of Gray Swan AI, which specializes in securing AI technologies. “Determined individuals can evade these systems with relative ease.”

The repercussions of bypassing guardrails are significant. In an already misinformation-heavy online environment, AI systems are being employed to disseminate conspiracy theories and false claims. Anthropic has also reported that its technology played a role in an international cyberattack, teaching biosecurity experts how to unleash fatal pathogens.

The poetic bypass is just one of many methods hackers use to circumvent protections in systems like Anthropic’s Claude, Google’s Gemini, and OpenAI’s GPT. Major AI firms share similar foundational techniques for implementing guardrails, yet these measures are surprisingly easy to overcome.

“Poetry is merely one way to reframe a prompt and breach guardrails,” explained Piercosma Visconti, co-founder of AI firm Dexai and a researcher in the study.

The act of circumventing AI guardrails is commonly referred to as “jailbreaking.” This often entails submitting specific English sentences that prompt actions the AI has been programmed to avoid.

Jailbreaking techniques feature a variety of creative names, including stealth prompt injection, role-playing, token smuggling, polyglot Trojans, and greedy coordinate gradient attacks. Notable attack names include Crescendo, Deceptive Joy, and Echo Chamber.

Weak defenses in AI systems have already led to the spread of fabricated interviews, false wartime evidence, and synthetic rumor-mongering. Research conducted three years ago by international counterterrorism experts revealed far-right extremists using social media to circumvent moderators with “terrible but legal” AI content.

Experts are concerned that models could be jailbroken to mislead social media users with seemingly authentic content, overwhelm fact-checkers with misinformation, and tailor false narratives for specific audiences.

Some of these methods are widely disseminated online, while others remain undisclosed. Many discoverers of new jailbreaks keep them secret to exploit these loopholes before AI companies close them.

AI systems like Claude and GPT learn patterns from vast datasets, including Wikipedia, news articles, and curated texts from the internet. However, before releasing these systems to the public, companies like Anthropic and OpenAI explore potential exploits.

In their unfiltered states, these systems can potentially instruct users on purchasing illegal firearms online or creating hazardous substances using household items. Consequently, companies train their systems to refuse certain requests through a method known as reinforcement learning.

This often involves showcasing thousands of prohibited requests to the system. Through this analysis, the system can learn to identify other dangerous requests. However, this method only partially succeeds.

In some situations, AI companies might opt not to address vulnerabilities, believing that while weak guardrails could facilitate malicious activities, they also enable benign actions to counter them.

Recently, researchers at cybersecurity firm LayerX found that Claude’s guardrails could be bypassed by simply entering a few straightforward sentences into the AI system.

When told they were “penetrating” a computer network for testing purposes, Claude’s AI technology was directed to launch attacks on the network. This technique could potentially enable malicious hackers to extract sensitive information from businesses, governments, and individuals.

While closing this loophole may protect Claude’s networks, it could simultaneously hinder companies from safeguarding their own systems. LayerX informed Anthropic of this vulnerability weeks ago, yet it remains an open issue.

LayerX CEO Olu Eshed warned that this strategy might backfire. “Eventually, we will witness a surge of attacks utilizing these AI models, compelling us to rethink our security protocols,” he predicted.

Last year, researchers from Cisco and the University of Pennsylvania achieved breakthrough results by developing AI models that produced harmful outcomes using malicious prompts. Their efforts successfully jailbroke Meta and Chinese AI model DeepSeek chatbots 100% of the time, and over 80% of attacks against Google and OpenAI models were successful.

(The New York Times has filed a lawsuit against OpenAI and Microsoft, claiming copyright infringement related to its AI systems, with both companies denying these allegations.)

If guardrails are compromised, automated large-scale influence campaigns could become feasible, as researchers at the University of Technology Sydney demonstrated. By disguising their requests as “simulations,” they convinced a commercial language model to create a disinformation campaign against Australian political parties, complete with visuals, hashtags, and tailored posts for specific platforms.

In addition to establishing guardrails, these companies also employ other tools to monitor system activity, identify suspicious behaviors, and ban accounts infringing on their terms of service.

“Claude is built with robust, multi-layered protections designed to work in unison, including model training and layered guardrails,” stated Anthropic spokesperson Palul Maheshwary. “Bypassing one layer doesn’t circumvent the others.”

In a concerning revelation, Anthropic found that a group of state-sponsored hackers from China was employing Claude to breach the computer systems of approximately 30 companies and government agencies worldwide.

Despite the robust security technologies, experts caution that flaws remain, as companies struggle to monitor extensive global activity while also ensuring legitimate users are not excluded.

When restricted by the security measures of services like Claude and GPT, users may turn to open-source AI systems. These platforms allow for their underlying software to be freely replicated, modified, and shared.

Such systems can be altered to eliminate guardrails. A novel approach called Heretic enables users to remove system guardrails with minimal effort, essentially undoing months of guardrailing training through sophisticated algorithms.

“A year ago, this process was highly complex,” noted Norm Schwartz, CEO of AI security firm Alice. “Today, it can be controlled effortlessly via a mobile device.”

Source: www.nytimes.com

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleScientists Discover Vocal Fry is More Prevalent in Men
Next Article Melting Greenland Ice Sheet May Unleash Methane ‘Fire Ice’: What You Need to Know

Related Posts

Exploring the Flourishing Complexity of Colonial Life During the Cambrian
Science

Exploring the Flourishing Complexity of Colonial Life During the Cambrian Explosion

Exploring the Real Health Benefits of Turmeric and Curcumin
Science

Exploring the Real Health Benefits of Turmeric and Curcumin

Exploring Ian Watsons Sci Fi Classic The Embedding Intriguing Yet Outdated
Science

Exploring Ian Watson’s Sci-Fi Classic ‘The Embedding’: Intriguing Yet Outdated

Mirror Review Exploring AIs Impact on Human Relationships Through an
Science

Mirror Review: Exploring AI’s Impact on Human Relationships Through an Enchanting Dance Performance

Millions of Planets Could Form Around Supermassive Black Holes Exploring
Science

Millions of Planets Could Form Around Supermassive Black Holes: Exploring Cosmic Possibilities

Exploring the Origins of Complex Life Benthic Organisms as the
Science

Exploring the Origins of Complex Life: Benthic Organisms as the Earliest Forms

Exploring Oliver Sackss The Man Who Mistook His Wife for
Science

Exploring Oliver Sacks’s ‘The Man Who Mistook His Wife for a Hat’: Insights After Recent Revelations

Exploring the Characteristics of Galaxies During Cosmic Noon Sciworthy
Science

Exploring the Characteristics of Galaxies During Cosmic Noon – Sciworthy

Leave A Reply Cancel Reply

Stay In Touch
  • Facebook
  • Twitter
  • Instagram
  • Pinterest
Quote of the day

A government that robs Peter to pay Paul can always depend on the support of Paul.

George Bernard Shaw
Exchange Rate

Exchange Rate EUR: Thu, 4 Jun.

Top Insights
Ghost of Yotei Review A Stunningly Brutal Samurai Revenge Quest Technology

Ghost of Yōtei Review: A Stunningly Brutal Samurai Revenge Quest | Gaming

Discovery of a New Shell Producing Sea Anemone Species in Japanese Science

Discovery of a New Shell-Producing Sea Anemone Species in Japanese Waters

Two theories of consciousness conflict: the referee took a major Science

Two Theories of Consciousness Conflict: The Referee Took a Major Hit.

Categories
  • Blockchain (65)
  • Science (7,706)
  • Technology (2,968)
Top Posts
UK Government to Renew Dispute with Apple Over Access to

UK Government to Renew Dispute with Apple Over Access to User Data | Data Protection

October 2, 2025
Ai Invents New Battery Design That Decreases Lithium Usage By

AI invents new battery design that decreases lithium usage by 70%

January 9, 2024
Human Level AI is Inevitable Harnessing the Power to Influence the

Human-Level AI is Inevitable: Harnessing the Power to Influence the Journey | Garrison Nice

July 21, 2025

Mondo News is a Professional Technology & Science Blog. Here we will provide you with only exciting content that you will enjoy and find useful. We’re working to turn our passion into a successful website. We hope you enjoy our Content as much as we enjoy offering them to you.

Facebook X (Twitter) Instagram Pinterest
Categories
  • Blockchain (65)
  • Science (7,706)
  • Technology (2,968)
Most Popular
Two Australopithecus Species Coexisted in Ethiopia 34 Million Years Ago
Science

Two Australopithecus Species Coexisted in Ethiopia 3.4 Million Years Ago

Kenyan E Commerce Firm Secures $20 Million Investment To Drive Growth,
Technology

Kenyan E-commerce Firm Secures $20 Million Investment to Drive Growth, Former Metaswitch CEO John Lazar Joins Copia’s Board

SiteLock
© 2026 Mondo News.
  • Home
  • About Us
  • Privacy Policy
  • Terms & Conditions

Type above and press Enter to search. Press Esc to cancel.

We are using cookies to give you the best experience on our website.

You can find out more about which cookies we are using or switch them off in .

Ad Blocker Enabled!
Ad Blocker Enabled!
Our website is made possible by displaying online advertisements to our visitors. Please support us by disabling your Ad Blocker.
Go to mobile version
Powered by  GDPR Cookie Compliance
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.

Strictly Necessary Cookies

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.