A new Italian study has raised important questions about the reliability of AI-powered chatbots in healthcare. According to research published in the European Journal of Pathology (the official journal of the European Society of Pathology), artificial intelligence models provided erroneous answers in nearly 70% of diagnostic cases and fabricated or inaccurate references in over 30%.
Conducted by a team from Humanitas University and the Humanitas Research Hospital in Milan, the study evaluated ChatGPT’s diagnostic reasoning across 200 pathology-related questions spanning 10 subspecialties. Each scenario was validated by expert pathologists and aligned with clinical guidelines.
Key Findings
- Useful answers were provided in 62.2% of cases, but only 32.1% were entirely error-free.
- Citations: 70.1% were correct, 12.1% inaccurate, and 17.8% entirely fabricated.
- Errors included misdiagnosed cases of skin and breast cancer, and multiple non-existent scientific references that appeared convincing at first glance.
Lead researcher Dr Vincenzo Guastafierro, specialist in Pathological Anatomy at Humanitas, explained:
“These tools must be used with extreme caution. AI can assist, but not replace, clinical expertise. The clinician’s eye remains irreplaceable.”
What Does This Mean for AI in Medicine?
The findings do not suggest that AI has no place in diagnostics, but rather that it cannot yet be trusted without human oversight. The study highlights that while large language models (LLMs) like ChatGPT can support education and brainstorming, their use in clinical decision-making poses serious risks when errors go unnoticed.
Experts agree that AI should be treated like an unverified assistant—capable of generating useful insights, but requiring professional validation. This is consistent with broader findings from other peer-reviewed studies:
- In oncology, chatbots scored 66–72% on multiple-choice exams (JAMA Network Open, 2024).
- Specialised medical models such as Google Med-PaLM 2 and AMIE (Nature Medicine, 2025) demonstrated improved reliability but remain in research stages.
Why This Matters
As the EU AI Act (effective from August 2024) and FDA guidelines begin to shape stricter standards for AI in healthcare, studies like this remind us of the importance of transparency, verification, and continuous human supervision.
AI in medicine offers extraordinary promise (from faster diagnostics to more personalised treatment) but it also demands strong ethical frameworks and robust human-AI collaboration. As Dr Guastafierro’s team concluded:
“Artificial intelligence should be seen as a valuable support, but never a substitute for human judgment.”
The Bottom Line
Artificial intelligence is changing the face of healthcare, but not all AI is created equal. Chatbots may simulate medical reasoning, but they still lack the precision, accountability, and clinical insight of trained professionals. The message is clear: AI can assist but only humans can heal.
Source: Virchows Archiv (European Journal of Pathology), ANSA, Wirtualne Media, JAMA Network Open, Nature Medicine, npj Digital Medicine, European Commission, FDA.

