A specialised medical artificial intelligence program has demonstrated the capacity to ask better questions in test consultations, rank higher on empathy, and make more accurate diagnoses than human doctors, its developers say.
Dubbed the AMIE (Articulate Medical Intelligence Explorer), the Google-developed algorithm operates in the same way as other large language models like ChatGPT but is described as optimised for “diagnostic dialogue”.
This is apparently thanks to its training on a “diverse suite of real-world medical datasets”, including over 11,000 old medical exam questions, dozens of electronic health record note summaries, and transcriptions of almost 100,000 recorded medical conversation interactions.
Beyond that, the programmers fed 64 expert-crafted long-form responses to questions from HealthSearchQA, LiveQA, and Medication QA in MultiMedBench into the algorithm.
The results have been published in the arXiv preprint server (link here) and, while neither peer-reviewed nor tested in human conditions as yet, offer strong reasons for optimism about the potential of the technology, according to the team.
They put the model to the test in a text-based Objective Structured Clinical Examination (OSCE) involving 149 case scenarios from clinical providers in Canada, the UK and India, comparing its results with 20 primary care physicians from the three countries.
Findings were that AIME demonstrated both greater diagnostic accuracy and superior performance on 28 of 32 measures according to specialist physicians who marked the exam, as well as 24 of 28 assessed by the patient actors.
Accuracy was superior in diagnosing respiratory, cardiovascular and other conditions, with the chatbot also managing ask questions that elicited equivalent amounts of information as human doctors, the researchers said.