A new study found that OpenAi's ChatGPT can almost pass the United States Medical Licensing Exam (USMLE) after scoring around 60 percent of the passing threshold.
The researchers claim that the AI tool provided coherent responses with frequent insights, as per a press release.
ChatGPT's USMLE Performance
ChatGPT is relatively new, but it is already making rounds, with some claiming that it is the future of tech. What makes the AI tool so popular is its ability to generate human-like writing by accurately predicting word sequences thanks to its large language model (LLM).
The research team headed by Tiffany Kung and Victor Tseng evaluated ChatGPT's performance on the USMLE through three tests (Steps 1, 2CK, and 3) that are highly structured and necessary for obtaining a medical license in the US.
Medical students take the USMLE to gauge their understanding of medical disciplines, biochemistry, bioethics, and many more.
After removing image-based questions, the authors evaluated the AI tool on 350 of the 376 public questions from the June 2022 USMLE release.
ChatGPT scores on the three tests ranged from 52.4% to 75%. This is an almost passing rate since the threshold every year is around 60%.
ChatGPT showed 94.6% concordance across all of its responses, and 88.9% contained at least one central insight or something novel, original, and clinically valid, according to the research team.
More interestingly, ChatGPT outperformed PubMedGPT, a rival model trained only on literature in the biomedical domain, which obtained a score of 50.8% on an earlier dataset of USMLE-style questions.
ChatGPT's Potential in the Medical Field
Although the relatively small input size limited the depth and breadth of analyses, the authors remark that their findings show how ChatGPT has the potential to improve clinical practice and medical education.
They also cited the use of ChatGPT by clinicians at AnsibleHealth to make jargon-heavy reports simpler for patients.
According to the researchers, passing this challenging expert exam without human reinforcement is a substantial step in clinical AI development.
Kung also noted that ChatGPT's role in the study went beyond being the research subject since it helped them write the manuscript. They treated the AI tool like a colleague and asked for its input now and then.
However, experts warn that the study's results do not indicate that the tool is at par with human knowledge.
"This does not remotely suggest that chatGPT has any comparable knowledge to a human, since the test might be a good predictor of performance only for those who have already a MD and done a residency, that is for a very pre-selected population. GPT would not be part of it," Nello Cristianini, Professor of Artificial Intelligence at the University of Bath, said in a statement with Science Media Centre.
But Cristianini notes that the approach can help scientists develop ways to better manage large amounts of literature, which significantly speeds up the research process.
The results of the study were published in PLOS Digital Health.
Related Article : Google Bard vs. Microsoft Bing: AI Race is Starting with Big Tech