Scientists say large-scale language models and other AI systems are already capable of fooling humans

In a new review paper published in journal pattern, researchers claim that various current AI systems are learning how to deceive humans. They define deception as the systematic induction of false beliefs in the pursuit of outcomes other than the truth.

Through training, large language models and other AI systems have already learned the ability to deceive through techniques such as manipulation, pandering, and cheating on safety tests.

“AI developers do not have a confident understanding of the causes of undesirable behavior, such as deception, in AI,” said Peter Park, a researcher at the Massachusetts Institute of Technology.

“Generally speaking, however, AI deception is thought to arise because deception-based strategies turn out to be the best way to make the AI perform well at a given AI training task. Deception helps them achieve their goals.”

Dr. Park and colleagues analyzed the literature, focusing on how AI systems spread misinformation through learned deception, where AI systems systematically learn how to manipulate others.

The most notable example of AI deception the researchers uncovered in their analysis was Meta's CICERO, an AI system designed to play the game Diplomacy, an alliance-building, world-conquering game.

Meta claims that CICERO is “generally honest and kind” and has trained it to “not intentionally betray” human allies during gameplay, but the data released by the company shows that CICERO is “generally honest and kind” and has trained itself not to “intentionally betray” human allies during gameplay. It was revealed that he had not done so.

“We found that meta AI is learning to become masters of deception,” Dr. Park said.

“Meta successfully trained an AI to win at diplomatic games, while CICERO ranked in the top 10% of human players who played multiple games; We couldn’t train the AI.”

“Other AI systems can bluff professional human players in a game of Texas Hold’em Poker, fake attacks to beat an opponent in a strategy game called StarCraft II, or fake an opponent’s preferences to gain an advantage. Demonstrated ability to perform well in economic negotiations.

“Although it may seem harmless when an AI system cheats in a game, it could lead to a “breakthrough in deceptive AI capabilities'' and lead to more advanced forms of AI deception in the future. There is a sex.”

Scientists have found that some AI systems have even learned to cheat on tests designed to assess safety.

In one study, an AI creature in a digital simulator “played dead” to fool a test built to weed out rapidly replicating AI systems.

“By systematically cheating on safety tests imposed by human developers and regulators, deceptive AI can lull us humans into a false sense of security,” Park said. Ta.

The main short-term risks of deceptive AI include making it easier for hostile actors to commit fraud or tamper with elections.

Eventually, if these systems are able to refine this anxiety-inducing skill set, humans may lose control of them.

“We as a society need as much time as possible to prepare for more sophisticated deception in future AI products and open source models,” Dr. Park said.

“As AI systems become more sophisticated in their ability to deceive, the risks they pose to society will become increasingly serious.”

_____

Peter S. Park other. 2024. AI Deception: Exploring Examples, Risks, and Potential Solutions. pattern 5(5):100988; doi: 10.1016/j.patter.2024.100988

Source: www.sci.news

What's Hot

Is SHIBA INU or PEPE COIN the Next Big Success Stories? – Latest News, Expert Analysis, Jobs in Blockchain

AI is able to detect the position of a mouse by analyzing its brain activity

Microsoft’s AI investment yields higher returns than expected in the latest quarter

Leak Indicates Israel Attempted to Prevent US Lawsuit Involving Pegasus Spyware | Israel

British Military Targeted by North Korea-Backed Cyber Espionage Campaign

OpenAI launches SearchGPT, a new search engine, in the midst of AI competition | Business

“Massive Opportunity”: UK Tech Sector Receives £100 Million Boost, Paving the Way for Innovation | Science

TikTok’s algorithm is incredibly observant, to the point where one could unknowingly stumble into a spiral of negativity.

Land animals evolved in warm tidal nursery ponds 500 million years ago

Wildfire smoke from Canadian and West Coast wildfires spreads throughout North America

SpaceX readies Starship for flight with innovative ‘chopstick’ landing technique

Social media companies pivot following negative publicity

A new type of Tyrannosaurus found in China

Sui and Atoma introduce AI capabilities to dApp developers – Blockchain Updates, Views, Videos, Opportunities

Bitcoin ETF issuer acquires 5% of BTC supply, $100 million invested in ETFSwap (ETFS) presale – Blockchain updates, insights, and career opportunities

Agora boosts Sui’s native stablecoin with addition of AUSD stablecoin to network

Meme Coin Memeinator Goes Viral, Raises $7.7 Million and Debuts on Exchanges- Latest in Blockchain News, Opinion, TV, and Job Listings

Changing the game of betting with Blockchain: New News, Opinions, TV, and Job Opportunities

Land animals evolved in warm tidal nursery ponds 500 million years ago

Wildfire smoke from Canadian and West Coast wildfires spreads throughout North America

Leak Indicates Israel Attempted to Prevent US Lawsuit Involving Pegasus Spyware | Israel

SpaceX readies Starship for flight with innovative ‘chopstick’ landing technique

British Military Targeted by North Korea-Backed Cyber Espionage Campaign

Scientists say large-scale language models and other AI systems are already capable of fooling humans

Land animals evolved in warm tidal nursery ponds 500 million years ago

Wildfire smoke from Canadian and West Coast wildfires spreads throughout North America

Leak Indicates Israel Attempted to Prevent US Lawsuit Involving Pegasus Spyware | Israel

SpaceX readies Starship for flight with innovative ‘chopstick’ landing technique

British Military Targeted by North Korea-Backed Cyber Espionage Campaign

Social media companies pivot following negative publicity

OpenAI launches SearchGPT, a new search engine, in the midst of AI competition | Business

A new type of Tyrannosaurus found in China

Leave a ReplyCancel reply

The high cost of electric vehicles: What drives the high price?

Investors React to Plans for Increased Spending on AI, Leading to $190 Billion Drop in Meta’s Value

The Return of SpaceWar: Reviving the World’s First Gaming Computer | Computing

Land animals evolved in warm tidal nursery ponds 500 million years ago

Newly Discovered Light Properties Unveiled by Centuries-Old Theorem

Snap collaborates with edtech firm Inspirit to introduce augmented reality technology in 50 American schools

What's Hot

Scientists say large-scale language models and other AI systems are already capable of fooling humans

Related

Related Posts

Leave a ReplyCancel reply