Researchers are creating digital self-assessment tools for diagnosing Alzheimer’s disease.

A recent proof-of-concept study conducted by researchers at Lund University reveals that a brief self-management digital cognitive assessment called BioCog effectively detects cognitive impairments. When used alongside blood tests, it can accurately identify clinical Alzheimer’s disease in primary care settings.



In primary care, the BioCog test achieved 85% accuracy in identifying cognitive impairments with a single cutoff, significantly outperforming primary care physicians, who had a 73% accuracy rate. Image credit: Miroslaw Miras.

Alzheimer’s disease stands as the leading cause of dementia, marked by amyloid beta accumulation, tau aggregation, and progressive neurodegeneration.

Clinical presentations of Alzheimer’s typically begin with subjective cognitive decline, where individuals report memory issues and other cognitive challenges, although formal cognitive tests may not yet indicate impairment.

This initial phase advances to mild cognitive impairment, which is characterized by objective cognitive symptoms, ultimately evolving into dementia marked by significant functional limitations in daily life.

Diagnosing Alzheimer’s in its early stages, especially in primary care, can be particularly challenging.

There’s a considerable prevalence of both misdiagnosis and missed diagnoses, with 20-30% of cases incorrectly diagnosed in specialist environments and about 40% in general practice when Alzheimer’s is not corroborated by biomarkers.

“The BioCog digital assessment, designed to allow patients to perform with minimal healthcare worker involvement, will enhance primary care physicians’ ability to investigate potential Alzheimer’s pathology early through blood tests,” stated the researchers.

“Primary care typically lacks the resources, time, or expertise to explore Alzheimer’s disease with the same thoroughness as specialized memory clinics.”

“This is where digital cognitive assessments can play a vital role.”

In contrast to the traditional pen-and-paper tests utilized for evaluating cognitive impairment, digital assessments offer a more comprehensive analysis.

They easily integrate novel variables and additional factors that were not previously measured.

“Most individuals experiencing memory loss first seek help at their local health center,” remarked Pontus Tideman, a doctoral student at the University of Rand and psychologist at Skone University Hospital’s memory clinic.

“Our new digital evaluations provide the initial objective insights needed, ensuring higher accuracy in identifying cognitive impairments related to Alzheimer’s disease.”

“This determines who should undergo a blood test that measures phosphorylated tau levels, which can reliably detect Alzheimer’s disease in the brain.”

Currently, these blood tests are available exclusively at hospital specialty and memory clinics.

In time, they are expected to be accessible in primary care; however, it is not intended for all patients with cognitive complaints to undergo blood testing.

Researchers assert the immense value of digital solutions, given the challenges of diagnosing Alzheimer’s during a typical 15-20 minute patient consultation.

This is where objective digital tools for assessing cognitive skills can significantly alter the diagnostic landscape.

“A distinctive feature of our BioCog assessments is their validation within primary care settings, unlike many other digital evaluations. These assessments are aimed at patients seeking treatment due to cognitive concerns, such as memory problems,” the researchers noted.

“The combination of digital assessments with blood test results can greatly enhance the diagnostic accuracy of Alzheimer’s disease.”

“The goal of this test is to simplify the process for primary care physicians.”

The BioCog test is detailed in a study published in the journal Nature Medicine.

____

P. Tideman et al. Primary care detection of Alzheimer’s disease using self-administered digital cognitive tests and blood biomarkers. Nat Med. Published online on September 15th, 2025. doi:10.1038/s41591-025-03965-4

Source: www.sci.news

Microsoft Claims AI Systems Outperform Doctors in Diagnosing Complex Health Conditions

Microsoft is unveiling details about artificial intelligence systems that outperform human doctors in intricate health assessments, paving a “path to medical closeness.”

The company’s AI division, spearheaded by British engineer Mustafa Suleyman, has created a system that emulates a panel of specialized physicians handling “diagnostically complex and intellectually demanding” cases.

When integrated with OpenAI’s advanced O3 AI model, Microsoft claims its method “solved” more than eight out of ten carefully selected case studies for diagnostic challenges. In contrast, practice physicians with no access to colleagues, textbooks, or chatbots achieved an accuracy rate of only 2 out of 10 on these same case studies.

Microsoft also highlighted that this AI solution could be a more economical alternative to human doctors, as it streamlines the process of ordering tests.

While emphasizing potential cost reductions, Microsoft noted that it envisions AI as a complement to physician roles rather than a replacement.

“The clinical responsibilities of doctors extend beyond merely diagnosing; they must navigate uncertainty in ways that AI is not equipped to handle, and build trust with patients and their families,” the company explained in a blog post announcing the research intended for peer review.

Nevertheless, slogans like “The Road to Overmed Medical” hint at the possibility of transformative changes in the healthcare sector. Artificial General Intelligence (AGI) denotes systems that replicate human cognitive abilities for specific tasks, while superintelligence is a theoretical concept referring to systems that surpass overall human intellectual capacity.

In discussing the rationale for their study, Microsoft raised concerns about AI’s performance on U.S. medical licensing exams, a crucial assessment for acquiring medical licenses in the U.S. The multiple-choice format relies heavily on memorization, which may “exaggerate” AI capabilities compared to in-depth understanding.

Microsoft is working on a system that mimics real-world clinicians by taking step-by-step actions to arrive at a final diagnosis, such as asking targeted questions or requesting diagnostic tests. For instance, patients exhibiting cough or fever symptoms may need blood tests and chest x-rays prior to receiving a pneumonia diagnosis.

This innovative approach by Microsoft employs intricate case studies sourced from the New England Journal of Medicine (NEJM).

Suleyman’s team transformed over 300 of these studies into “interactive case challenges” to evaluate their method. Microsoft’s strategy incorporated existing AI models developed by ChatGPT creators OpenAI, Meta from Mark Zuckerberg, Anthropic, Grok from Elon Musk, and Google’s Gemini.

The company utilized a specific model for determining tests and diagnostics, employing AI systems such as tailored agents known as “diagnostic orchestrators.” These orchestrators effectively simulate a doctor’s panel, aiding in reaching a diagnosis.

Microsoft reported that in conjunction with OpenAI’s advanced O3 model, over eight of the ten NEJM case studies have been “solved.”

Microsoft believes its approach has the potential to encompass multiple medical fields, enabling a broad and in-depth application beyond individual practitioners.

“Enhancing this level of reasoning could potentially reform healthcare. AI can autonomously manage patients with routine care and offer clinicians sophisticated support for complex cases.”

However, Microsoft acknowledges that the technology is not yet ready for clinical implementation, noting that further testing with an “Orchestrator” is necessary to evaluate performance in more prevalent symptoms.

Source: www.theguardian.com

AI chatbots are incapable of diagnosing patients solely through conversation

Don’t call your favorite AI “Doctor” yet

Just_Super/Getty Images

Advanced artificial intelligence models have scored highly in professional medical examinations, but they are still challenging one of the most important doctor tasks: talking to patients, gathering relevant medical information, and providing accurate diagnoses. I am still neglecting one thing.

“Large-scale language models perform well on multiple-choice tests, but their accuracy drops significantly on dynamic conversations,” he says. Pranav Rajpurkar at Harvard University. “Models especially struggle with open-ended diagnostic inference.”

This became clear when researchers developed a method to assess the reasoning ability of clinical AI models based on simulated doctor-patient conversations. “Patients” is based on 2000 medical cases drawn primarily from the United States Medical Board Specialty Examinations.

“Simulating patient interactions allows assessment of history-taking skills, which is an important element of clinical practice that cannot be assessed through case descriptions,” he says. shreya jolialso at Harvard University. The new assessment benchmark, called CRAFT-MD, “reflects real-world scenarios where patients may not know what details are important to share and may only disclose important information if prompted by specific questions. “I do,” she says.

The CRAFT-MD benchmark itself relies on AI. OpenAI's GPT-4 model acted as a “patient AI” that conversed with the “clinical AI” being tested. GPT-4 also helped score the results by comparing the clinical AI's diagnosis with the correct answer for each case. Human medical experts reconfirmed these assessments. We also reviewed the conversations to confirm the accuracy of the patient AI and whether the clinical AI was able to gather relevant medical information.

Multiple experiments have shown that the performance of four major large-scale language models (OpenAI's GPT-3.5 and GPT-4 models, Meta's Llama-2-7b model, and Mistral AI's Mistral-v2-7b model) is performance on benchmarks was shown to be significantly lower than at the time. Makes a diagnosis based on a written summary of the case. OpenAI, Meta, and Mistral AI did not respond to requests for comment.

For example, GPT-4's diagnostic accuracy was an impressive 82 percent when a structured case summary was presented and the diagnosis could be selected from a list of multiple-choice answers, but not when a multiple-choice option was provided. However, when it had to make a diagnosis from a simulated patient conversation, its accuracy dropped to just 26%.

And GPT-4 performs best among the AI ​​models tested in this study, with GPT-3.5 often coming in second place, and Mistral AI models sometimes coming in second or third place. Meta's Llama models generally had the lowest scores.

AI models also failed to collect complete medical histories a significant proportion of the time, with the leading model, GPT-4, only able to do so in 71% of simulated patient conversations. Even if an AI model collects a patient's relevant medical history, it doesn't necessarily yield the correct diagnosis.

It says such simulated patient conversations are a “much more useful” way to assess an AI's clinical reasoning ability than medical tests. Eric Topol At the Scripps Research Institute Translational Institute in California.

Even if an AI model ultimately passes this benchmark and consistently makes accurate diagnoses based on conversations with simulated patients, it won't necessarily be better than a human doctor. says Rajpurkar. He points out that real-world medical procedures are “more troublesome” than simulations. That includes managing multiple patients, coordinating with medical teams, performing physical exams, and understanding the “complex social and systemic factors” in the local health care setting.

“While the strong performance in the benchmarks suggests that AI may be a powerful tool to support clinical practice, it does not necessarily replace the holistic judgment of experienced physicians.” says Rajpurkar.

topic:

Source: www.newscientist.com