Will AI help doctors decide whether you live or die?

“Clinicians may also become de-skilled as over-reliance on the outputs of AI diminishes critical thinking,” Shegewi said. “Large-scale deployments will likely raise issues concerning patient data privacy and regulatory compliance. The risk for bias, inherent in any AI model, is also huge and might harm underrepresented populations.”

Additionally, AI’s increasing use by healthcare insurance companies doesn’t typically translate into what’s best for a patient. Doctors who face an onslaught of AI-generated patient care denials from insurance companies are fighting back — and they’re using the same technology to automate their appeals.

“One reason the AI outperformed humans is that it’s very good at thinking about why it might be wrong,” Rodman said. “So, it’s good at what doesn’t fit with the hypothesis, which is a skill humans aren’t very good at. We’re not good at disagreeing with ourselves. We have cognitive biases.”

Of course, AI has its own biases, Rodman noted. The higher ratio of sex and racial biases has been well documented with LLMs, but it’s probably less prone to biases than people are, he said.

Even so, bias in classical AI has been a longstanding problem, and genAI has the potential to exacerbate the problem, according to Gartner’s Walk. “I think one of the biggest risks is that the technology is outpacing the industry’s ability to train and prepare clinicians to detect, respond to, and report these biases,” she said.

GenAI models are inherently prone to bias due to their training on datasets that may disproportionately represent certain populations or scenarios. For example, models trained primarily on data from dominant demographic groups might perform poorly for underrepresented groups, said Mutaz Shegewi, a senior research director with IDC’s Worldwide Healthcare Provider Digital Strategies group.

“Prompt design can further amplify bias, as poorly crafted prompts may reinforce disparities,” he said. “Additionally, genAI’s focus on common patterns risks overlooking rare but important cases.”

For example, research literature that’s ingested by LLMs is often skewed toward white males, creating critical data gaps regarding other populations, Mutaz said. “Due to this, AI models might not recognize atypical disease presentations in different groups. Symptoms for certain diseases, for example, can have stark differences between groups, and a failure to acknowledge such differences could lead to delayed or misguided treatment,” he said.

With current regulatory structures, LLMs and their genAI interfaces can’t accept liability and responsibility the way a human clinician can. So, for “official purposes,” it’s likely a human will still be needed in the loop for liability, judgement, nuance, and the many other layers of evaluation and support patients need.

Chen said it wouldn’t surprise him if physicians were already using LLMs for low-stakes purposes, like explaining medical charts or generating treatment options for less-severe symptoms.

“Good or bad, ready or not, Pandora’s box has already been opened, and we need to figure out how to effectively use these tools and counsel patients and clinicians on appropriately safe and reliable ways to do so,” Chen said.

READ SOURCE