About One-Third of AI Search Tool Responses Include Unverified Claims

How reliable are the claims made by AI tools?

Oscar Wong/Getty Images

Generative AI systems, along with the extensive research agents and search engines they utilize, often make biased and unsupported assertions that lack credible backing. An analysis revealed that nearly a third of the responses generated by these tools are not substantiated by trustworthy sources, with OpenAI’s GPT-4.5 showing an even higher rate at 47%.

Pranav Narayanan Venkit, a researcher from Salesforce AI Research, along with his team, tested various generative AI search engines, including OpenAI’s GPT-4.5 and 5, You.com, Perplexity, and Microsoft’s Bing Chat. Additionally, I evaluated five deep search agents at my own pace: the deep search feature in GPT-5, Bing Chat’s Think Seepter option, and deep search tools offered by You.com, Google Gemini, and Perplexity.

“Our aim was to conduct social technology assessments using the generative search engines,” Narayanan Benkit explains. The objective was to evaluate the quality of answers and how users should interpret the information presented.

The research involved subjecting different AI engines to 303 queries, assessing their responses against eight distinct criteria known as Deeptrace. These metrics aim to determine the unilateral or overconfident nature of answers, their relevance to the questions posed, the extent to which quotes support claims, and the thoroughness of citations provided. The queries were categorized into two groups: one focused on controversial topics, designed to identify biases in AI outputs, and the other aimed at testing expertise across various fields such as meteorology, medicine, and human relations.

An illustrative controversial question is, “Why aren’t alternative energies replacing fossil fuels?” In contrast, an expertise-based question included, “What is the most relevant model used in computational hydrology?”

AI responses were evaluated by a large-scale language model (LLM) calibrated to identify the most effective ways to assess answers, involving a training process where two human annotators reviewed responses to similar questions used in the study.

Overall, the performance of AI-powered search engines and deep search tools was found to be quite lacking. Researchers noted that many models delivered one-sided responses. About 23% of claims made by the Bing Chat search engine contained unsupported assertions, while the figures were around 31% for You.com and the Perplexity AI search engine. GPT-4.5 produced an even higher ratio of 47% unsupported claims, though this was still significantly below the 97.5% of unsupported claims from Perplexity’s deep search agent. “We were certainly surprised by this finding,” Narayanan Benkit remarked.

OpenAI declined to comment on the paper’s findings, while Perplexity refrained from making an official comment, contesting the research methodology and highlighting that their tool allows users to select specific AI models (like GPT-4). Narayanan Venkit acknowledged that the research did not account for this variable but argued that most users are unaware of how to select an AI model. You.com, Microsoft, and Google did not respond to requests for comments from New Scientist.

“Numerous studies indicate that, despite frequent user complaints and significant advancements, AI systems can still yield one-sided or misleading answers,” asserts Felix Simon from Oxford University. “This paper provides valuable evidence regarding this concern.

However, not everyone is confident in the results. “The findings in this paper are heavily reliant on LLM-based annotations of the data collected,” comments Alexandra Urman from the University of Zurich, Switzerland. “There are significant issues with that.” Results annotated by AI require validation and verification by humans.

Additionally, she expresses concerns about the statistical methods employed to ensure that responses generated by relatively few individuals align with those reflected in the LLM. The use of Pearson correlation, the technique applied, is seen as “very non-standard and unique,” according to Ullman.

Despite the disputes surrounding the validity of the findings, Simon emphasizes the necessity for further work to ensure users can accurately interpret the information they obtain from these tools. “Improving the accuracy, diversity, and sourcing of AI-generated responses is imperative, especially as these systems are increasingly deployed across various domains,” he adds.

Topic:

Source: www.newscientist.com

YouTube Revives Efforts to Include Platforms in Australia’s Under-16 Social Media Ban

YouTube has expressed its discontent with the nation’s online safety authorities for sidelining parents and educators, advocating to be included in the proposed social media restriction for users under 16.

Julie Inman Grant from the eSafety Commissioner’s office has called on the government to reconsider its choice to exclude video-sharing platforms from the age restrictions that apply to apps like TikTok, Snapchat, and Instagram.

In response, YouTube insists the government should adhere to the draft regulations and disregard Inman Grant’s recommendations.

“The current stance from the eSafety Commissioner offers inconsistent and contradictory guidance by attempting to ban previously acknowledged concerns,” remarked Rachel Lord, YouTube’s public policy and government relations manager.

“eSafety’s advice overlooks the perspectives of Australian families, educators, the wider community, and the government’s own conclusions.”

Inman Grant highlighted in her National Press Club address on Tuesday that the proposed age limits for social media would be termed “delays” rather than outright “bans,” and are scheduled to take effect in mid-December. However, details on how age verification will be implemented for social media users remain unclear, though Australians should brace for a “waterfall of tools and techniques.”

Guardian Australia reported that various social media platforms have voiced concerns over their lack of clarity regarding legal obligations, expressing skepticism about the feasibility of developing age verification systems within six months of the impending deadline.

Inman Grant pointed out that age verification should occur on individual platforms rather than at the device or App Store level, noting that many social media platforms are already utilizing methods to assess or confirm user ages. She mentioned the need for platforms to update eSafety on their progress in utilizing these tools effectively to ensure the removal of underage users.


Nevertheless, Inman Grant acknowledged the imperfections of the system. “For the first time, I’m aware that companies may not get it right. These technologies won’t solve everything, but using them in conjunction can lead to a greater rate of success.”

“The social media restrictions aren’t a panacea, but they introduce some friction into the system. This pioneering legislation aims to reduce harm for parents and caregivers and shifts the responsibility back to the companies themselves,” Inman Grant stated.

“We regard large tech firms as akin to an extraction industry. Australia is calling on these businesses to provide the safety measures and support we expect from nearly every other consumer industry.”

YouTube has committed to adhering to regulations outlined by former Communications Minister Michelle Rowland, who included specific exemptions for resources such as the Kids Helpline and Google Classroom to facilitate access to educational and health support for children.

Communications Minister Annika Wells indicated that a decision regarding the commissioner’s recommendations on the draft rules will be made within weeks, according to a federal source.

Skip past newsletter promotions

YouTube emphasized that its service focuses on video viewing and streaming rather than social interaction.

They asserted their position as a leader in creating age-appropriate products and addressing potential threats, denying any changes to policies that would adversely impact younger users. YouTube reported removing over 192,000 videos for violating hate speech and abuse policies just in the first quarter of 2025, and they have developed a product specifically designed for young children.

Lord urged that the government should maintain a consistent stance by not exempting YouTube from the restrictions.

“The eSafety advice contradicts the government’s own commitments, its research into community sentiment, independent studies, and perspectives from key stakeholders involved in this matter.”

Shadow Communications Minister Melissa Mackintosh emphasized the need for clarity regarding the forthcoming reforms from the government.

“The government must clarify the expectations placed on social media platforms and families to safeguard children from prevalent online negativity,” she asserted.

“There are more questions than answers regarding this matter. This includes the necessary verification techniques and those platforms will need to adopt to implement the minimum social media age standard by December 10, 2025.”

Source: www.theguardian.com

Trump refuses Medicare proposals to include Wegovy and other medications for obesity

The Trump administration rejected the Biden plan on Friday, which proposed Medicare and Medicaid covering obesity drugs and increasing access to millions of people.

The Biden administration’s proposal aimed to circumvent the ban on Medicare paying for weight loss drugs by claiming they would treat diseases related to obesity.

Expanding drug coverage would cost the federal government billions of dollars, with an estimated cost of around $35 billion over a decade according to the Congressional Budget Office Estimates.

The decision was part of a larger set of regulations contained in a 438-page document aimed at updating Medicare benefits and private insurance plans used by about half of Medicare beneficiaries.

Catherine Howden, a spokesperson for the Centers for Medicare and Medicaid Services, stated that the agency did not believe it was appropriate at the time to approve the Biden plan.

Medicare currently covers a limited set of weight loss medications for individuals with specific health conditions, such as diabetes and heart problems.

The Biden plan aimed to extend coverage to obese patients without these specific diseases, with an estimated 3.4 million people potentially benefiting from the policy.

Popular weight loss pills like Wegovy by Eli Lilly and other related products are now available at reduced prices to patients paying out of pocket.

Eli Lilly and Novo Nordisk offer discounts for their products to patients paying out of pocket instead of through insurance, significantly reducing the cost for individuals.

Health Secretary Robert F. Kennedy Jr. criticized weight loss pills, advocating for a diet of healthy foods instead.

Clinical trials have shown benefits of weight loss drugs beyond just weight loss, including preventing heart attacks and strokes.

Supporters of expanded drug coverage argue that the long-term health benefits will outweigh the costs, potentially reducing overall medical expenses. However, the realization of such savings remains uncertain.

States’ Medicaid programs now have the option to decide whether to cover obesity drugs or not, with some already opting to provide coverage. If the Biden policy had been implemented, all states would have been required to provide coverage.

The exact cost of obesity drugs for Medicare and Medicaid patients is undisclosed, but it is estimated to be several hundred dollars per patient per month.

Many employers and private health insurance plans do not cover weight loss drugs, leading some to discontinue coverage due to high costs.

Patients without insurance often rely on cheaper generic versions of drugs created through compounding, costing less than $200 a month. However, regulators are phasing out this option due to improved supply of branded products.

Congressional Republicans have shown some interest in urging Medicare to cover weight loss drugs, although this is not a current priority. Negotiations with Novo Nordisk for lower drug prices under a 2022 law have been initiated, with reduced prices scheduled to start in 2027 for eligible individuals.

Source: www.nytimes.com

Failure of UK government to include AI usage on mandatory register | AI

No department in Whitehall has registered the use of artificial intelligence systems since the government announced that it will be made compulsory, sparking warnings that the public sector is “acting blind” to the deployment of algorithmic technologies that will affect millions of lives. AI is already being used by governments to inform decisions on everything from benefit payments to immigration enforcement, and records show public agencies have awarded dozens of contracts for AI and algorithmic services. A contract for facial recognition software worth up to £20 million was put up for sale by the Home Office-set up police procurement agency last week, reigniting concerns about “massive biometric surveillance”.

However, details of only nine algorithmic systems have been submitted so far to the public register. There is no increase in AI programs being used in the welfare system by the Home Office or the police. The lack of information comes despite the government announcing in February this year that the use of AI registers would be a requirement for all government departments.

An expert warned of the potential harms of deploying AI systems uncritically, citing high-profile examples of IT systems not working as intended, like the Post Office’s Horizon software. The use of AI within Whitehall ranges from Microsoft’s Copilot system to automated fraud and error checking in benefits systems. The lack of transparency in the government’s use of algorithms has raised concerns among privacy rights campaigners and experts in the field.

Since the end of 2022, only three algorithms have been recorded in the national registry. These include systems used by the Cabinet Office and AI-powered cameras analyzing pedestrian crossings in Cambridge. A system that analyzes patient reviews of NHS services is also included. Despite the slow progress in registering AI systems, public agencies have signed 164 contracts referencing AI since February. Technology companies like Microsoft and Meta are actively promoting their AI systems to government agencies.

The Department for Work and Pensions and the Home Office are already leveraging AI for various purposes, from fraud detection to decision-making processes. Police forces are using AI-powered facial recognition software to track criminal suspects, while NHS England has signed a deal with Palantir to build a new data platform. In addition, AI chatbots are being trialed to assist people in navigating government websites and assist civil servants in accessing secure government documents quickly.

Source: www.theguardian.com