Clinical decision-making in neonatal gastrointestinal surgical emergencies: comparison between ChatGPT and human clinicians

Kim, Seyoon; Choi, Dongho; Son, Joonhyuk

doi:10.1136/wjps-2025-001117

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Clinical decision-making in neonatal gastrointestinal surgical emergencies: comparison between ChatGPT and human cliniciansopen access

Authors: Kim, Seyoon; Choi, Dongho; Son, Joonhyuk

Issue Date: Dec-2025

Publisher: BMJ PUBLISHING GROUP

Keywords: pediatrics; education; medical

Citation: WORLD JOURNAL OF PEDIATRIC SURGERY, v.8, no.6, pp 1 - 7

Pages: 7

Indexed: SCOPUS
ESCI

Journal Title: WORLD JOURNAL OF PEDIATRIC SURGERY

Volume: 8

Number: 6

Start Page: 1

End Page: 7

URI: https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/211489

DOI: 10.1136/wjps-2025-001117

ISSN: 2096-6938
2516-5410

Abstract: Background Neonatal gastrointestinal surgical emergencies (NGSEs) require rapid decisions to prevent morbidity and mortality. This study assessed the potential use of ChatGPT in supporting clinical decision-making for NGSEs. Methods The challenging NGSE cases (ileal atresia, midgut volvulus, Hirschsprung disease, meconium ileus, and pseudo-obstruction) were converted into structured short-answer questions including histories and radiologic images. Questions covered differential diagnosis, diagnostic plan, management plan, final diagnosis, and surgical plan. Each case was scored out of 10 (maximum 50). Scenarios were presented to 10 general surgery (GS) residents, 10 GS attendings, and 10 pediatric surgery (PS) attendings. GPT-4o was tested with 10 iterations per case. Group scores were compared using appropriate statistical tests. Results A total of five cases were involved. GPT-4o achieved a mean score of 44.95 (89.9%), higher than GS residents (27.05, p<0.001) and GS attendings (28.35, p<0.001), but lower than PS attendings (47.70, p=0.021). Subgroup analysis showed GPT-4o matched PS attendings in management, final diagnosis, and surgical planning, but scored lower in differential diagnosis (87.8% vs. 92.8%, p=0.0479) and diagnostic plan (75.0% vs. 93.8%, p<0.001). Compared with GS residents and attendings, GPT-4o performed significantly better across all categories except diagnostic plan. Conclusions GPT-4o demonstrated performance comparable to PS attendings in key management domains, while clearly surpassing GS residents and attendings overall. These findings suggest that GPT-4o may have potential as a supplementary decision-support tool for NGSEs, although clinical use requires further validation in real-world settings.

Files in This Item: Go to Link

Appears in Collections: 서울 의과대학 > 서울 외과학교실 > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Choi, Dongho photo

Choi, Dongho: 서울 의과대학 (DEPARTMENT OF SURGERY)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1366

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE