First global consensus sets rules for AI in endoscopy

Interventional gastroenterology

By Sunalie Silva

11 Dec 2025

Dr Omer F Ahmad

The first international consensus on AI in gastrointestinal endoscopy warns that clinicians may face new legal and operational risks as computer-assisted detection and reporting tools move into routine care without clear standards to guide their use.

Published by the World Endoscopy Organization in Annals of Internal Medicine [link here], the statement sets out 10 recommendations across medicolegal responsibility, data governance and equity. Lead author Dr Omer F. Ahmad, an interventional endoscopist and researcher at University College London, said the statement was designed to give clinicians and regulators a baseline framework as AI systems increasingly  expand into everyday endoscopic workflows.

Medicolegal responsibility

The panel identifies medicolegal exposure as the most urgent fault line in AI-assisted endoscopy.

“The integration of AI into gastrointestinal endoscopy introduces new medicolegal complexities,” the authors write, warning the technology can “blur conventional boundaries of medical accountability” when errors occur.

The central issue is who bears responsibility when clinicians follow an incorrect AI-generated interpretation – or conversely, when they override an alert that later proves accurate. The panel argues such scenarios fall into a grey zone spanning clinicians, institutions and technology companies.

“Once an AI system receives guideline endorsement, this can effectively redefine the legal standard of care,” the statement notes, cautioning that clinicians could be exposed “both for disregarding validated AI recommendations and for relying on them uncritically.”

Data governance and algorithm transparency

Agreement was strongest for data governance. The panel supports strict requirements for information security, de-identification and explicit institutional policies covering data ownership and sharing with commercial partners.

“Transparency in algorithm development is essential to trust and patient safety,” the authors write, noting that “even minor modifications can alter clinical performance or introduce bias” and must be logged through clear versioning and change histories accessible to clinicians and regulators.

To support traceability, the panel recommends establishing a central life-cycle registry – similar to a clinical-trial register – to catalogue model provenance, updates and validation across development and deployment.

AI-derived quality metrics

AI systems are beginning to move beyond polyp detection. A growing number now provide semi-automated endoscopy reports and generate new procedural indicators such as percentage of mucosa evaluated and effective inspection time.

The panel notes these innovations may help standardise documentation and reduce inter-operator variability but says their clinical relevance remains uncertain.

They cite recent American Gastroenterological Association guidance showing that incremental gains in adenoma detection rate may not improve outcomes such as colorectal cancer incidence or mortality for endoscopists who already have high baseline ADR – underscoring the need for caution.

Novel indicators “must be validated against meaningful clinical outcomes – like missed lesion rates or interval cancer incidence – to ensure their adoption genuinely enhances patient care,” the statement says.

The authors also warn that automated reporting introduces additional medicolegal exposure if errors in AI outputs are misinterpreted or incorporated into clinical or regulatory documentation. They call for multidisciplinary assessment to define the limits and accountability attached to these tools.

Professional bodies are urged to support a clinician-driven, staged evaluation pathway, including mapping new metrics to established indicators, defining acceptable surrogate outcomes, setting reporting thresholds and requiring ongoing post-implementation evaluation.

Equity and bias

Equity emerged as one of the more contested areas in the consensus. Only 35.7% of panelists agreed strongly that demographic diversity in training datasets is essential for GI endoscopy AI, reflecting uncertainty over how much patient characteristics influence performance in narrow visual tasks, the group suggests.

The authors note that for “highly specific and biologically consistent entities, such as colon polyps,” demographic variation may have limited impact on AI accuracy. But they also point to evidence from broader health-care AI showing that even tasks that ‘seem neutral can yield biased outcomes’ when models are trained on unrepresentative data and used across populations with different disease patterns or access to care.

“Despite broad agreement, the relatively low rate of strong agreement (28.6%) suggests that some may see the risk for exacerbating disparities as theoretical and less applicable in GI endoscopy. Yet, parallels from other fields of medicine highlight how AI can inadvertently widen gaps in care”, the group stresses.

A default assumption of irrelevance, they warn, “risks overlooking subtle but important sources of inequity,” and demographic diversity should be treated as a safeguard for generalisability.

Support was far stronger for transparency. Routine reporting of demographic characteristics – including race, ethnicity and gender – received 85.8% agreement (42.9% strongly agree). The authors say transparency is essential for identifying equity risks early. The recommendation aligns with emerging documentation frameworks such as model cards and equity audits.

The panel also flags that access could become an equity issue in its own right. AI-enabled endoscopy platforms are capital-intensive and likely to be adopted first in well-resourced centres, potentially widening gaps in diagnostic performance. The group calls for scrutiny not only of how models are built, but how – and where – they are deployed.

Enter your username and password below to continue.