Study shows artificial intelligence can help physicians stay up to date on medical research by summarizing journal articles
Research led by KU Medical Center found that ChatGPT produced accurate short summaries of journal abstracts that could help busy doctors quickly review academic literature.
A physician’s knowledge starts in medical school, but it hardly ends there. Throughout their careers, doctors need to stay abreast of new research and practice guidelines to make the most accurate diagnoses and recommend the most effective, up-to-date treatments for patients.
But with the explosive rate of new medical knowledge, keeping up can be overwhelming for physicians. It is estimated that worldwide medical knowledge now doubles every 73 days.
“There are about a million new articles added to PubMed every year,” noted Daniel Parente, M.D., Ph.D., assistant professor of family medicine and community health at the University of Kansas Medical Center. PubMed is an online database maintained by the National Center for Biotechnology Information that contains abstracts for millions of biomedical and life sciences articles. “Even if you’re a physician restricting your focus to your field, it can still be many thousands of articles you might think about reading.”
How can clinicians be expected to keep up with so much new information — or even just sift through the mountains of literature to identify what new research is most pertinent to their field and practice? Even reviewing 300-word abstracts, rather than whole articles, takes time.
ChatGPT (Chat Generative Pretrained Transformer), the artificial intelligence chatbot designed to simulate conversation and help perform language-based tasks, can help, according to a study led by Parente and his colleagues in the Department of Family Medicine and Community Health. ChatGPT is a type of artificial intelligence known as a “large language model,” which means it uses deep learning and large datasets to generate and process language.
The researchers evaluated the ability of ChatGPT, version 3.5, to summarize 10 peer-reviewed abstracts from each of 14 different journals published in 2022. The journals were chosen by the researchers and covered a range of topics across medicine. Actual physicians read the articles and compared them with the summaries produced by ChatGPT, rating the quality, accuracy and level of bias of those summaries.
The result? ChatGPT produced summaries that were 70% shorter than the abstracts, and overall, these summaries were found to be high in quality and accuracy and low in bias. In the 140 summaries, there were just four instances in which the ChatGPT “hallucinated,” which is the term used to describe when large language models such as ChatGPT produce text that is not fact-based. There were also 20 instances of minor inaccuracies found, though those did not change the abstract’s overall meaning.
Where ChatGPT proved more lacking was in rating the relevance of the article abstracts to a particular field. “We asked the human (physician) raters to say, is this relevant to primary care or internal medicine or surgery? And then we compared to ChatGPT’s relevance ratings, and we found that at least the ChatGPT 3.5 model is not quite ready to do that yet. It works well at identifying if a journal is relevant to primary care, but it's not great for identifying if an article is relevant to primary care.”
The researchers also noted that while the study indicates that ChatGPT may serve as a useful screening tool for physicians, “critical medical decisions should — for obvious reasons — remain based on a full evaluation of the full text of articles in context with available evidence from meta-analyses and professional guidelines.”
Parente noted that future versions of ChatGPT are likely to improve and, he hopes, get better in their ability to determine the relevance of specific articles. This ability is important not only to practicing physicians, but also to students and residents learning how to keep up with advances in their fields.
“This study shows us that these tools already have some ability to help us review the literature a little bit faster, as well as figure out where we need to focus our attention,” said Parente. “And it seems very likely that future versions of these technologies that are smarter and more capable will only enhance that.”