Detailed Information

Cited 4 time in webofscience Cited 4 time in scopus
Metadata Downloads

Protected Health Information Recognition by Fine-Tuning a Pre-training Transformer Model

Authors
오서현강민이영호
Issue Date
Jan-2022
Publisher
대한의료정보학회
Keywords
Artificial Intelligence; Big Data; Medical Informatics; Data Anonymization; Deep Learning
Citation
Healthcare Informatics Research, v.28, no.1, pp.16 - 24
Journal Title
Healthcare Informatics Research
Volume
28
Number
1
Start Page
16
End Page
24
URI
https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/83803
DOI
10.4258/hir.2022.28.1.16
ISSN
2093-3681
Abstract
Objectives: De-identifying protected health information (PHI) in medical documents is important, and a prerequisite to deidentificationis the identification of PHI entity names in clinical documents. This study aimed to compare the performanceof three pre-training models that have recently attracted significant attention and to determine which model is more suitablefor PHI recognition. Methods: We compared the PHI recognition performance of deep learning models using the i2b2 2014dataset. We used the three pre-training models—namely, bidirectional encoder representations from transformers (BERT),robustly optimized BERT pre-training approach (RoBERTa), and XLNet (model built based on Transformer-XL)—to detectPHI. After the dataset was tokenized, it was processed using an inside-outside-beginning tagging scheme and WordPiecetokenizedto place it into these models. Further, the PHI recognition performance was investigated using BERT, RoBERTa,and XLNet. Results: Comparing the PHI recognition performance of the three models, it was confirmed that XLNet had asuperior F1-score of 96.29%. In addition, when checking PHI entity performance evaluation, RoBERTa and XLNet showeda 30% improvement in performance compared to BERT. Conclusions: Among the pre-training models used in this study,XLNet exhibited superior performance because word embedding was well constructed using the two-stream self-attentionmethod. In addition, compared to BERT, RoBERTa and XLNet showed superior performance, indicating that they were moreeffective in grasping the context.
Files in This Item
There are no files associated with this item.
Appears in
Collections
IT융합대학 > 컴퓨터공학과 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Lee, Young Ho photo

Lee, Young Ho
College of IT Convergence (컴퓨터공학부(컴퓨터공학전공))
Read more

Altmetrics

Total Views & Downloads

BROWSE