Self-supervised multi-modal training from uncurated images and reports enables monitoring AI in radiology

Park, Sangjoon; Lee, Eun Sun; Shin, Kyung Sook; Lee, Jeong Eun; Ye, Jong Chul

doi:10.1016/j.media.2023.103021

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Self-supervised multi-modal training from uncurated images and reports enables monitoring AI in radiology

Authors: Park, Sangjoon; Lee, Eun Sun; Shin, Kyung Sook; Lee, Jeong Eun; Ye, Jong Chul

Issue Date: Jan-2024

Publisher: Elsevier B.V.

Keywords: Error detection; Monitoring AI; Radiograph; Vision-language model

Citation: Medical Image Analysis, v.91

Journal Title: Medical Image Analysis

Volume: 91

URI: https://scholarworks.bwise.kr/cau/handle/2019.sw.cau/70503

DOI: 10.1016/j.media.2023.103021

ISSN: 1361-8415
1361-8423

Abstract: The escalating demand for artificial intelligence (AI) systems that can monitor and supervise human errors and abnormalities in healthcare presents unique challenges. Recent advances in vision-language models reveal the challenges of monitoring AI by understanding both visual and textual concepts and their semantic correspondences. However, there has been limited success in the application of vision-language models in the medical domain. Current vision-language models and learning strategies for photographic images and captions call for a web-scale data corpus of image and text pairs which is not often feasible in the medical domain. To address this, we present a model named medical cross-attention vision-language model (Medical X-VL), which leverages key components to be tailored for the medical domain. The model is based on the following components: self-supervised unimodal models in medical domain and a fusion encoder to bridge them, momentum distillation, sentencewise contrastive learning for medical reports, and sentence similarity-adjusted hard negative mining. We experimentally demonstrated that our model enables various zero-shot tasks for monitoring AI, ranging from the zero-shot classification to zero-shot error correction. Our model outperformed current state-of-the-art models in two medical image datasets, suggesting a novel clinical application of our monitoring AI model to alleviate human errors. Our method demonstrates a more specialized capacity for fine-grained understanding, which presents a distinct advantage particularly applicable to the medical domain. © 2023

Files in This Item: There are no files associated with this item.

Appears in Collections: ETC > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Lee, Eun Sun photo

Lee, Eun Sun: 의과대학 (의학부(임상-서울))

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

84, Heukseok-ro, Dongjak-gu, Seoul, Republic of Korea (06974)02-820-6194

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE