ChatGPT aces medical exams, forcing a rethink on how we train tomorrow’s doctors


A Stanford study shows that ChatGPT outperforms medical students on complex case-based questions, prompting a rethink of medical education.

Researchers at Stanford have found that ChatGPT can outperform first- and second-year medical students in answering complex clinical care questions.

The study, published in JAMA Internal Medicinehighlights the growing influence of AI on medical education and practice and suggests that adjustments in teaching methods may be needed for future physicians.

“We don’t want doctors who were so relying on AI at school that they failed to learn how to reason through cases on their own,” says co-author Alicia DiGiammarino, education manager at the School of Medicine. “But I’m more scared of a world where doctors aren’t trained to effectively use AI and find it prevalent in modern practice.”


AI beats medical students

Recent studies have demonstrated ChatGPT’s ability to handle multiple-choice questions on the United States Medical License Examination (USMLE). But the Stanford authors wanted to examine the AI ​​system’s ability to handle more difficult, open-ended questions used to assess clinical reasoning skills.

The study found that, on average, the AI ​​model scored more than four points higher than medical students on the case report portion of the exam. This result suggests the potential for AI tools like ChatGPT to disrupt traditional teaching and testing of medical reasoning through written text. The researchers also noted a significant jump from GPT-3.5, which was “borderline passing” on the questions.

ChatGPT and other programs like it are changing how we teach and ultimately practice medicine.

Alicia DiGiammarino

Despite its impressive performance, ChatGPT is not without its shortcomings. The biggest danger is invented facts or so-called hallucinations or confabulations. This has been significantly reduced in OpenAI’s latest model, GPT-4, which is available to paying customers and via API, but it is still very much present.

You can imagine how even very sporadic errors can have dramatic consequences when it comes to medical topics. However, embedded in an overall curriculum with multiple sources of truth, this seems like a much smaller problem.

Stanford’s School of Medicine cuts students’ line to ChatGPT in exams

Concerns about exam integrity and ChatGPT’s influence on curriculum design are already being felt at Stanford’s School of Medicine. Administrators have switched from open-book to closed-book exams to ensure that students develop clinical reasoning skills without relying on AI. But they have also created an AI working group to explore the integration of AI tools into medical education.


Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top