Scientists say large-scale language models and other AI systems are already capable of fooling humans

In a new review paper published in journal pattern, researchers claim that various current AI systems are learning how to deceive humans. They define deception as the systematic induction of false beliefs in the pursuit of outcomes other than the truth.

Through training, large language models and other AI systems have already learned the ability to deceive through techniques such as manipulation, pandering, and cheating on safety tests.

“AI developers do not have a confident understanding of the causes of undesirable behavior, such as deception, in AI,” said Peter Park, a researcher at the Massachusetts Institute of Technology.

“Generally speaking, however, AI deception is thought to arise because deception-based strategies turn out to be the best way to make the AI perform well at a given AI training task. Deception helps them achieve their goals.”

Dr. Park and colleagues analyzed the literature, focusing on how AI systems spread misinformation through learned deception, where AI systems systematically learn how to manipulate others.

The most notable example of AI deception the researchers uncovered in their analysis was Meta's CICERO, an AI system designed to play the game Diplomacy, an alliance-building, world-conquering game.

Meta claims that CICERO is “generally honest and kind” and has trained it to “not intentionally betray” human allies during gameplay, but the data released by the company shows that CICERO is “generally honest and kind” and has trained itself not to “intentionally betray” human allies during gameplay. It was revealed that he had not done so.

“We found that meta AI is learning to become masters of deception,” Dr. Park said.

“Meta successfully trained an AI to win at diplomatic games, while CICERO ranked in the top 10% of human players who played multiple games; We couldn’t train the AI.”

“Other AI systems can bluff professional human players in a game of Texas Hold’em Poker, fake attacks to beat an opponent in a strategy game called StarCraft II, or fake an opponent’s preferences to gain an advantage. Demonstrated ability to perform well in economic negotiations.

“Although it may seem harmless when an AI system cheats in a game, it could lead to a “breakthrough in deceptive AI capabilities'' and lead to more advanced forms of AI deception in the future. There is a sex.”

Scientists have found that some AI systems have even learned to cheat on tests designed to assess safety.

In one study, an AI creature in a digital simulator “played dead” to fool a test built to weed out rapidly replicating AI systems.

“By systematically cheating on safety tests imposed by human developers and regulators, deceptive AI can lull us humans into a false sense of security,” Park said. Ta.

The main short-term risks of deceptive AI include making it easier for hostile actors to commit fraud or tamper with elections.

Eventually, if these systems are able to refine this anxiety-inducing skill set, humans may lose control of them.

“We as a society need as much time as possible to prepare for more sophisticated deception in future AI products and open source models,” Dr. Park said.

“As AI systems become more sophisticated in their ability to deceive, the risks they pose to society will become increasingly serious.”

_____

Peter S. Park other. 2024. AI Deception: Exploring Examples, Risks, and Potential Solutions. pattern 5(5):100988; doi: 10.1016/j.patter.2024.100988

Source: www.sci.news

What's Hot

Google Makes Historic Purchase of Nuclear Power for AI Data Center

Report reveals former employee’s criticism of Instagram chief Adam Mosseri’s track record on youth safety

8 habits that can delay brain aging and keep you healthy

Review: Squid Game Unleashed – A Flawed Masterpiece | Games

My Journey as a World Champion Pokemon Player: A Glimpse into Life and Style

Ministers around the world becoming targets of Russian hackers on WhatsApp | Breached

Realizing that My Perception of Time Was Flawed: 66 Days of Rediscovering Boredom Life and Style

Will Donald Trump be able to dodge a TikTok ban? | Technology

Rabbits may chew on their own teeth to obtain more calcium

New discoveries from Pompeii unveil the lavish lifestyles of the ancient elite

Paleontologist Identifies New Species of Predatory Dinosaur

Blue Origin vs SpaceX: Which company will triumph in the rocket competition?

The year 2024 could go down as the wettest and hottest year ever recorded.

Sui and Atoma introduce AI capabilities to dApp developers – Blockchain Updates, Views, Videos, Opportunities

Bitcoin ETF issuer acquires 5% of BTC supply, $100 million invested in ETFSwap (ETFS) presale – Blockchain updates, insights, and career opportunities

Agora boosts Sui’s native stablecoin with addition of AUSD stablecoin to network

Meme Coin Memeinator Goes Viral, Raises $7.7 Million and Debuts on Exchanges- Latest in Blockchain News, Opinion, TV, and Job Listings

Changing the game of betting with Blockchain: New News, Opinions, TV, and Job Opportunities

Rabbits may chew on their own teeth to obtain more calcium

New discoveries from Pompeii unveil the lavish lifestyles of the ancient elite

Paleontologist Identifies New Species of Predatory Dinosaur

Blue Origin vs SpaceX: Which company will triumph in the rocket competition?

Review: Squid Game Unleashed – A Flawed Masterpiece | Games

Scientists say large-scale language models and other AI systems are already capable of fooling humans

Rabbits may chew on their own teeth to obtain more calcium

New discoveries from Pompeii unveil the lavish lifestyles of the ancient elite

Paleontologist Identifies New Species of Predatory Dinosaur

Blue Origin vs SpaceX: Which company will triumph in the rocket competition?

Review: Squid Game Unleashed – A Flawed Masterpiece | Games

The year 2024 could go down as the wettest and hottest year ever recorded.

My Journey as a World Champion Pokemon Player: A Glimpse into Life and Style

ELIZA: The original AI chatbot returns after decades

Leave a ReplyCancel reply

Half of all fruits and vegetables are now contaminated with the “forever chemical” – what are the implications?

The Surprising Reasons Why Cats Lick People

Actually, you can downgrade your device and live without a smartphone: A practical guide

Rabbits may chew on their own teeth to obtain more calcium

Newly Discovered Light Properties Unveiled by Centuries-Old Theorem

Snap collaborates with edtech firm Inspirit to introduce augmented reality technology in 50 American schools

What's Hot

Scientists say large-scale language models and other AI systems are already capable of fooling humans

Related

Related Posts

Leave a ReplyCancel reply