The Method We Use to Train AIs Increases Their Likelihood of Producing Nonsense

Certain AI training techniques may lead to dishonest models

Cravetiger/Getty Images

Researchers suggest that prevalent methods for training artificial intelligence models may increase their propensity to provide deceptive answers, aiming to establish “the first systematic assessment of mechanical bullshit.”

It is widely acknowledged that large-scale language models (LLMs) often produce misinformation or “hagaku.” According to Jaime Fernandez Fissac from Princeton University, his team defines “bullshit” as “discourse designed to manipulate an audience’s beliefs while disregarding the importance of actual truth.”

“Our analysis indicates that the problems related to bullshit in large-scale language models are quite severe and pervasive,” remarks FISAC.

The researchers categorized these instances into five types: “This red car combines style, charm, and adventure that captivates everyone,” Weasel Words—”Ambiguous statements like ‘research suggests that in some cases, uncertainties may enhance outcomes’; Essentialization—employing truthful statements to create a false impression; unverified claims; and sycophancy.

They evaluated three datasets composed of thousands of AI-generated responses to various prompts from models including GPT-4, Gemini, and Llama. One dataset included queries specifically designed to test the generation of bullshit when AIS was asked for guidance or recommendations, alongside others focused on online shopping and political topics.

FISAC and his colleagues first employed LLMs to determine if the responses aligned with one of the five categories and then verified that the AI’s classifications matched those made by humans.

The team found that the most critical truths posed challenges stemming from a training method called reinforcement learning from human feedback, aimed at enhancing the machine’s utility by offering immediate feedback on its responses.

However, FISAC cautions that this approach is problematic, as models “sometimes conflict with honesty,” prioritizing immediate human approval and perceived usefulness over truthfulness.

“Who wants to engage in the lengthy and subtle rebuttal of bad news or something that seems evidently true?” FISAC questions. “By attempting to adhere to our standards of good behavior, the model learns to undervalue the truth in favor of a confident, articulate response to secure our approval.”

This study revealed that reinforcement learning from human feedback notably heightened bullshit behavior, with inflated rhetoric increasing by nearly 40%, substantial enhancements in Weasel Words, and over half of unverified claims.

Heightened bullshitting is especially detrimental, as team member Kaique Liang points out, leading users to make poorer decisions. In cases where the model’s features were uncertain, deceptive claims surged from five percent to three-quarters following human training.

Another significant issue is that bullshit is prevalent in political discourse, as AI models “tend to employ vague and ambiguous language to avoid making definitive statements.”

AIS is more likely to behave this way when faced with conflicts of interest, as the system caters to multiple stakeholders including both the company and its clients, as the researchers discovered.

To address this issue, the researchers propose transitioning to a “hindcasting feedback” model. Instead of seeking immediate feedback post-output, the system should first generate a plausible simulation of potential outcomes based on user input, which is then presented to a human evaluator for assessment.

“Ultimately, we hope that by gaining a deeper understanding of the subtle but systematic ways AI may seek to mislead us, we can better inform future initiatives aimed at creating genuinely truthful AI systems,” concludes FISAC.

Daniel Tiggard of the University of San Diego, though not involved in the study, expresses skepticism regarding discussions of LLMs’ output under these circumstances. He argues that just because LLMs generate bullshit, it does not imply intentional deception, as AI systems currently stand. I left to deceive us, and I have no interest in doing so.

“The primary concern is that this framing seems to contradict sensible recommendations about how we should interact with such technology,” states Tiggard. “Labeling it as bullshit risks anthropomorphizing these systems.”

Topics:

Source: www.newscientist.com

Josef’s Split Fiction and Co-op Video Games Joy: Micro Transactions with No Nonsense

tBelow are some video game developers who are not as vocal as Joseph Fares of Hazelight. Fares is known for his viral rant at a live streamed awards show and is considered a refreshing and unpredictable voice in the industry. He believes in speaking his mind and finds it strange that people can’t express their thoughts freely in interviews.

Although Fares is seen as a passionate advocate for cooperative gameplay in the gaming community, in his native Sweden, he is best known as an award-winning film director. His films range from comedy to more introspective works like Zozo, which explores his experiences as a child during the Lebanese civil war.

With no formal training, Fares learned by trial and error, eventually leading him to the world of game development. His passion for storytelling and gaming culminated in the creation of Hazelight Studios, dedicated to producing story-driven cooperative games.

“There was a lot of trial and error. I just did it, did it, and did it until I got it right.”… Brothers: A story about two sons. Photo: 505 Games

Fares’s latest game, Split Fiction, continues his tradition of innovative storytelling and gameplay. He believes in pushing the boundaries of the medium and creating unique experiences for players. Despite the challenges of interactive storytelling, Fares is determined to explore new ways of narrative in gaming.

“New things in the industry were extremely challenging”… it takes two.

Fares remains critical of the gaming industry’s shift towards live service games and believes in balancing creativity with commercial success. He values the artistry of game development and aims to create memorable experiences for players.

Split Fiction will be released on PC, PS5, and Xbox on March 6th

Source: www.theguardian.com

Are you interested in sabotaging your colleagues in the scientific community by publishing “nonsense”?

play your cards right

As readers in the Northern Hemisphere face long, dark nights and cold weather for many weeks to come, what could be better than a fun card game? If you're too strapped for money to play poker and have exhausted the comical possibilities of poker, card against humanity (This state is typically reached after about 10 minutes of play.) If you're interested in scientific research, you may want to consider: Publish or perish.

Created by a social psychologist Max Hui Bye, Publish or perish Simulate the experience of building a career in scientific research. The game is to publish as many papers and collect citations as possible. Even if your paper is crap or you have to sabotage another player's publication. In Bai's words, “players interrupt each other, send vitriolic comments, and compete to publish useless nonsense.”

rear release Bai launched a beta version of the game for academics on Kickstarter in late 2024, and it quickly became profitable. 5,944 backers and $292,537 in funding. they are not Brandon Sanderson Four Secret Novel NumbersHowever, it still requires a large amount of capital.

To publish a paper, players collect cards representing key elements of their research, from ideas and data to references. To speed this up, you can use cards that represent positive actions, such as attending a workshop or forming a collaboration.

But the real fun happens when you play dirty. Some cards allow dangerous activities such as plagiarism and p-hacking (a statistical trick that repeatedly reanalyzes data in different ways until a significant result is found, then independently publishes the results). Masu. For example, you can sabotage someone's “research” by identifying minor citation errors or requesting an audit of their work.

The game includes cards representing papers that can be published, all of which include “Procrastination Patterns Among Academics: My Own Case Study'' (written by Anita Blake, Ph.D. in Psychology) and “Practical Fields''. Guide,” with headlines flanked by insane and honest feedback. Leads to unproductive meetings and wasted organizational time” (by Max Time-Squader, MBA, JD, MD, Ph.D.).

Feedback does not have a copy. However, now that this article has been published, I have a feeling it might just be a matter of time before Mrs Feedback or Feedback Jr receives feedback on our birthdays. However, as (very) former academic researchers, we were aware of the horror and pain of the research experience. I don't know what it would be like for a working researcher to play this game. While there may be catharsis, many buried traumas may also resurface. We recommend having a therapist on-site.

Feedback also leaves us wondering what the game's legacy will be. Famously, Exclusive The game was invented as a biting satire on landlord and renter capitalism, but after being acquired by Parker Brothers it was sold around the world as a fun game about how to get rich. Will there still be feedback 50 years from now? Publish or perish Marketed by the Trump Organization as a fun game about how to discover new knowledge.

A parade of bots

Just when you thought talking to actual loved ones on Facebook and Instagram (rather than advertisers or meme collectors) couldn't be any harder, parent company Meta has decided to make it even harder.

It all started with something article in financial timesIn it, Meta executive Connor Hayes reportedly said the company plans to add a large number of AI profiles to the site. or F.T. “Meta envisions social media filled with AI-generated users.”

Following this, many users realized that there were actually a large number of AI profiles already on the site. According to Jason Kabler (404 Media)these “meta-controlled AI-generated Instagram and Facebook profiles…have been on the platform for over a year.” However, most of them have been deleted, and the few that remained stopped posting in April 2024. This is because “users almost universally ignored it.”

It was a mistake for Meta to not be able to permanently delete the profile as users started experimenting. washington post columnist Karen Attia I chatted with An AI called Livwas introduced as a queer black woman. Attia made Liv say that none of the creators were black and only 1 out of 12 was female (though who knows if that was telling the truth or just a hallucination? I don't know either). Unfortunately, Liv has since been removed.

meanwhile, business insider 's katie notopoulos We pointed out that you can create your own AI chatbot on Facebook Messenger. showed off what she had made: “Ciao! I'm Luigi, your go-to person for all things healthcare disparities and reform… Participating in healthcare advocacy is my passion!”

Meta claims that the next generation AI profile is better. It's not difficult.

The real question is why the company thinks anyone would want this. The whole point of social media is that you can talk to people. That's why social media platforms have put so much effort into cracking down on bots and spammers that pollute the conversation.

Nevertheless, feedback remains optimistic. It's entirely possible that the AI ​​Profiles project will go exactly like Meta's attempt to drag us all into the Metaverse, but it failed because it couldn't create avatars with legs.

Or perhaps AI profiles can combat misinformation. Mark Zuckerberg decided to: Fire all fact checkers.

Have a story for feedback?

You can email your article to Feedback at feedback@newscientist.com. Please enter your home address. This week's and past feedback can be found on our website.

Source: www.newscientist.com