In a new review paper published in journal pattern, researchers claim that various current AI systems are learning how to deceive humans. They define deception as the systematic induction of false beliefs in the pursuit of outcomes other than the truth.
“AI developers do not have a confident understanding of the causes of undesirable behavior, such as deception, in AI,” said Peter Park, a researcher at the Massachusetts Institute of Technology.
“Generally speaking, however, AI deception is thought to arise because deception-based strategies turn out to be the best way to make the AI perform well at a given AI training task. Deception helps them achieve their goals.”
Dr. Park and colleagues analyzed the literature, focusing on how AI systems spread misinformation through learned deception, where AI systems systematically learn how to manipulate others.
The most notable example of AI deception the researchers uncovered in their analysis was Meta's CICERO, an AI system designed to play the game Diplomacy, an alliance-building, world-conquering game.
Meta claims that CICERO is “generally honest and kind” and has trained it to “not intentionally betray” human allies during gameplay, but the data released by the company shows that CICERO is “generally honest and kind” and has trained itself not to “intentionally betray” human allies during gameplay. It was revealed that he had not done so.
“We found that meta AI is learning to become masters of deception,” Dr. Park said.
“Meta successfully trained an AI to win at diplomatic games, while CICERO ranked in the top 10% of human players who played multiple games; We couldn’t train the AI.”
“Other AI systems can bluff professional human players in a game of Texas Hold’em Poker, fake attacks to beat an opponent in a strategy game called StarCraft II, or fake an opponent’s preferences to gain an advantage. Demonstrated ability to perform well in economic negotiations.
“Although it may seem harmless when an AI system cheats in a game, it could lead to a “breakthrough in deceptive AI capabilities'' and lead to more advanced forms of AI deception in the future. There is a sex.”
Scientists have found that some AI systems have even learned to cheat on tests designed to assess safety.
In one study, an AI creature in a digital simulator “played dead” to fool a test built to weed out rapidly replicating AI systems.
“By systematically cheating on safety tests imposed by human developers and regulators, deceptive AI can lull us humans into a false sense of security,” Park said. Ta.
The main short-term risks of deceptive AI include making it easier for hostile actors to commit fraud or tamper with elections.
Eventually, if these systems are able to refine this anxiety-inducing skill set, humans may lose control of them.
“We as a society need as much time as possible to prepare for more sophisticated deception in future AI products and open source models,” Dr. Park said.
“As AI systems become more sophisticated in their ability to deceive, the risks they pose to society will become increasingly serious.”
_____
Peter S. Park other. 2024. AI Deception: Exploring Examples, Risks, and Potential Solutions. pattern 5(5):100988; doi: 10.1016/j.patter.2024.100988
Source: www.sci.news