aneta
24 listopada, 2025

70% Wrong? New Study Warns: AI Chatbots Still Struggle in Medical Diagnosis

A new Italian study has raised important questions about the reliability of AI-powered chatbots in healthcare. According to research published in the European Journal of Pathology (the official journal of the European Society of Pathology), artificial intelligence models provided erroneous answers in nearly 70% of diagnostic cases and fabricated or inaccurate references in over 30%.

Conducted by a team from Humanitas University and the Humanitas Research Hospital in Milan, the study evaluated ChatGPT’s diagnostic reasoning across 200 pathology-related questions spanning 10 subspecialties. Each scenario was validated by expert pathologists and aligned with clinical guidelines.

Key Findings

Useful answers were provided in 62.2% of cases, but only 32.1% were entirely error-free.
Citations: 70.1% were correct, 12.1% inaccurate, and 17.8% entirely fabricated.
Errors included misdiagnosed cases of skin and breast cancer, and multiple non-existent scientific references that appeared convincing at first glance.

Lead researcher Dr Vincenzo Guastafierro, specialist in Pathological Anatomy at Humanitas, explained:

“These tools must be used with extreme caution. AI can assist, but not replace, clinical expertise. The clinician’s eye remains irreplaceable.”

What Does This Mean for AI in Medicine?

The findings do not suggest that AI has no place in diagnostics, but rather that it cannot yet be trusted without human oversight. The study highlights that while large language models (LLMs) like ChatGPT can support education and brainstorming, their use in clinical decision-making poses serious risks when errors go unnoticed.

Experts agree that AI should be treated like an unverified assistant—capable of generating useful insights, but requiring professional validation. This is consistent with broader findings from other peer-reviewed studies:

In oncology, chatbots scored 66–72% on multiple-choice exams (JAMA Network Open, 2024).
Specialised medical models such as Google Med-PaLM 2 and AMIE (Nature Medicine, 2025) demonstrated improved reliability but remain in research stages.

Why This Matters

As the EU AI Act (effective from August 2024) and FDA guidelines begin to shape stricter standards for AI in healthcare, studies like this remind us of the importance of transparency, verification, and continuous human supervision.

AI in medicine offers extraordinary promise (from faster diagnostics to more personalised treatment) but it also demands strong ethical frameworks and robust human-AI collaboration. As Dr Guastafierro’s team concluded:

“Artificial intelligence should be seen as a valuable support, but never a substitute for human judgment.”

The Bottom Line

Artificial intelligence is changing the face of healthcare, but not all AI is created equal. Chatbots may simulate medical reasoning, but they still lack the precision, accountability, and clinical insight of trained professionals. The message is clear: AI can assist but only humans can heal.

Source: Virchows Archiv (European Journal of Pathology), ANSA, Wirtualne Media, JAMA Network Open, Nature Medicine, npj Digital Medicine, European Commission, FDA.

Share the Post:

70% Wrong? New Study Warns: AI Chatbots Still Struggle in Medical Diagnosis

A new Italian study has raised important questions about the reliability of AI-powered chatbots in healthcare. According to research published

Integrating Precision Medicine and Machine Learning for Biomarker-Driven Oncology

Artificial intelligence (AI) is reshaping industries worldwide and oncology is emerging as one of its most promising frontiers. According to

Artificial Intelligence in Medical Care: Reducing Errors and Saving Lives

Our newsletter

Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Education and Culture Executive Agency (EACEA). Neither the European Union nor EACEA can be held responsible for them. Project number: 101140217

70% Wrong? New Study Warns: AI Chatbots Still Struggle in Medical Diagnosis

Key Findings

What Does This Mean for AI in Medicine?

Why This Matters

The Bottom Line

Related Posts

70% Wrong? New Study Warns: AI Chatbots Still Struggle in Medical Diagnosis

Integrating Precision Medicine and Machine Learning for Biomarker-Driven Oncology

Links

Our newsletter