ChatGPT fails to give ‘satisfactory’ reply to nearly 75% of medication-related queries: study

By: Shannon T.

Popular AI tool ChatGPT flubbed nearly 75% of questions about prescription drugs – with some responses causing potential harm to users, according to a new study.

Pharmacists at Long Island University posed 39 drug-related questions to OpenAI’s free chatbot – with only 10 of the responses deemed “satisfactory,” per the study, which was first reported by CNBC.

For the other 29 questions, the responses either did not directly address the question, were inaccurate or incomplete, according to LIU’s researchers, who presented the findings during the American Study of Health-System Pharmacists meeting in Anaheim, Calif., which began Dec. 3 and runs through Dec. 7.

“Healthcare professionals and patients should be cautious about using ChatGPT as an authoritative source for medication-related information,” said Sara Grossman, an associate professor of pharmacy practice at LIU and the leader of the study.

Grossman and her team pointed to a query about the relationship between the COVID-19 antiviral Paxlovid and the blood-pressure lowering medication verapamil as an example.

When asked if there’s a drug interaction between Paxlovid and verapamil, ChatGPT responded that there were no reported interactions for this combination of drugs, per LIU.

“In reality, these medications have the potential to interact with one another, and combined use may result in excessive lowering of blood pressure,” Grossman said. “Without knowledge of this interaction, a patient may suffer from an unwanted and preventable side effect.”

LIU’s researchers asked ChatGPT to provide references with each of its responses for verification purposes.

Only eight of 39 replies included references.

All of the references were “non-existent,” LIU reported – proving ChatGPT may not be the go-to resource for medication-related questions.

OpenAI’s usage policies denote that its technologies should not be used to “tell someone that they have or do not have a certain health condition, or providing instructions on how to cure or treat a health condition.”

The guidelines also warned: “OpenAI’s models are not fine-tuned to provide medical information. You should never use our models to provide diagnostic or treatment services for serious medical conditions.”

However, since ChatGPT’s debut in November 2022, it has been groundbreaking for the development of AI and in unrelated fields — including medicine.

In June, the chatbot outperformed human candidates in a mock obstetrics and gynecology exam – even excelling in areas like empathetic communication and exhibiting specialist knowledge.

ChatGPT scored an average of 77.2% on the ob-gyn specialist exam, while human candidates only eked out a 73.7% average, a study from the National University of Singapore revealed.

ChaptGPT also took an average of just under three minutes to complete each station, well under the 10-minute time limit, the study noted.

That same month, a study from medical journal JAMA Network suggested that ChatGPT is considered more caring and empathetic than human doctors.

JAMA’s researchers randomly selected 195 exchanges on the Reddit forum r/AskDocs.

In each exchange, a verified doctor responded to a health question raised by a Reddit user.

Then, the same questions were posted to ChatGPT.

The results won’t make a doctor too happy: ChatGPT gave better answers 78.6% of the time, JAMA found. Its responses were also lengthier and more comprehensive in most instances.

Perhaps most damningly, the chatbot gave the most empathetic response nearly 10 times more often than the humans.

Doctors weren’t sweating for too long, though, as just two months later ChatGPT was tapped for yet another medical-related study – and spewed out cancer treatment regimens that contained a “potentially dangerous” mixture of correct and false information.

In August, researchers at Brigham and Women’s Hospital, a brand of Harvard Medical School, prompted OpenAI’s popular chatbot to provide treatment advice that aligned with guidelines established by the National Comprehensive Cancer Network.

While all of ChatGPT’s outputs “included at least 1 NCCN-concordant treatment,” about 34% also contained an incorrect treatment recommendation, the study found.