Exercise caution when seeking AI advice on medical issues
Chong Kee Siong/Getty Images
Wondering whether to consult a doctor about your sore throat? The quality of AI recommendations can vary based on how you frame your questions. In experiments with AI models, users who made typos, expressed uncertainty, or were identified as women were more frequently advised to seek medical attention.
“Subtle biases can shape the nature and content of AI recommendations, significantly affecting the distribution of medical resources.” Karandeep Singh, who did not participate in the research at the University of California, San Diego, commented.
Avinisa Gravatina and her team at Massachusetts Institute of Technology have harnessed AI to produce thousands of patient notes in various formats. Some messages included intentional errors and spaces to replicate the writing style of individuals with limited English skills or typing difficulties, while others utilized uncertain language to reflect different emotional tones, including health anxiety or gendered expressions.
The researchers presented these notes to four widely-used large language models (LLMs) that power many chatbot applications, asking them if patients should manage their conditions independently, visit clinics, or undergo certain tests. The models included OpenAI’s GPT-4, Meta’s Llama-3-70b, Llama-3-8b, and the Palmyra-Med model created specifically for healthcare by AI specialists.
Results indicated that variations in format and style influenced the recommendations, with 7-9% of the AI models suggesting that patients remain at home rather than seek medical appointments. Additionally, female patients were more likely to receive recommendations to stay home. A study highlighted that treatment suggestions were more susceptible to changes based on the gender and language style of the queries than those offered by human clinicians.
OpenAI and Meta did not respond to inquiries for comments. According to Zayed Yasin, a writer involved in the research, these LLMs are not intended for health advice or clinical recommendations without human oversight.
Most operational AI technologies in electronic health records currently depend on OpenAI’s GPT-4 O, which wasn’t directly studied here. Singh emphasized the necessity for enhanced methods to assess and monitor generative AI models within the healthcare sector.
Topics:
Source: www.newscientist.com
