【AI前沿】Your doctor’s AI notetaker may be making things up, Ontario audit finds

2026-05-15

What’s up, doc?Your doctor’s AI notetaker may be making things up, Ontario audit findsMade-up therapy referrals, incorrect prescriptions among the common mistakes.Kyle Orland–May 14, 2026 1:28 pm|108OK, my AI notes here says you were referred for a total heart removal. Let me just get that squared away for you…Credit:

      Getty ImagesOK, my AI notes here says you were referred for a total heart removal. Let me just get that squared away for you...Credit:

      Getty ImagesText
    settingsStory textSizeSmallStandardLargeWidth*StandardWideLinksStandardOrange* Subscribers onlyLearn moreMinimize to navIn recent years, many overworked doctors have turned toso-called AI medical scribesto help automatically summarize patient conversations, diagnoses, and care decisions into structured notes for health record logging. But a recent audit by the auditor general of Ontario found that AI scribes recommended by the provincial government regularly generated incorrect, incomplete and hallucinated information that could “potentially result in inadequate or harmful treatment plans that may potentially impact patient health outcomes.”In arecent report on Use of Artificial Intelligence in the Ontario Government, the auditor general reviewed transcription tests of two simulated patient-doctor conversations performed across 20 AI scribe vendors that were approved and pre-qualified by the provincial government for purchase by healthcare providers. All 20 of those vendors showed some issue with accuracy or completeness in at least one of these simple tests, including nine that hallucinated patient information, 12 that recorded information incorrectly, and 17 that missed key details about discussed mental health issues.In the report, the auditor general points out multiple concerning examples of mistakes in those summaries that could have a direct and negative impact on a patient’s subsequent care. That includes situations where an AI scribe hallucinated nonexistent referrals for blood tests or therapy, incorrectly transcribed the names of prescription medication, and/or missed “key details” of mental health issues discussed in the simulated conversations.Across all approved vendors, the average tested AI scribe scored only a 12 out of 20 on the “accuracy of medical notes generated” section of Supply Ontario’s evaluation rubric. But that seemingly key “accuracy” metric was only responsible for about 4 percent of a vendor’s overall score, making it easy to meet the minimum threshold for approval even if an AI scribe scored a “zero” on the accuracy metric (a separate metric measuring “domestic presence in Ontario” was worth 30 percent of the overall scoring).All these factors contributed to the auditor general’s overall finding that these AI scribes “were not evaluated adequately.” In a display of restraint and understatement, the report notes that “it is important that AI scribe systems are tested to provide assurances as to the quality of their generated notes and to minimize inaccuracies.” It also recommends that IT departments using these scribes force doctors to “confirm their review of the notes produced” before committing them to patient logs.Public sector health services in Ontario are not required to use these AI scribe systems in their work and may purchase scribes from non-approved vendors if they wish. Still, the fact that the Ontario government recommended AI summary systems with such obvious and potentially patient-harming flaws should give pause to any doctors (or their patients) making use of them.Kyle OrlandSenior Gaming EditorKyle OrlandSenior Gaming EditorKyle Orland has been the Senior Gaming Editor at Ars Technica since 2012, writing primarily about the business, tech, and culture behind video games. He has journalism and computer science degrees from University of Maryland. He oncewrote a whole book aboutMinesweeper.108 Comments

← 返回首页