Protected Health Information Recognition by Fine-Tuning a Pre-training Transformer Model
DC Field | Value | Language |
---|---|---|
dc.contributor.author | 오서현 | - |
dc.contributor.author | 강민 | - |
dc.contributor.author | 이영호 | - |
dc.date.accessioned | 2022-03-27T06:40:09Z | - |
dc.date.available | 2022-03-27T06:40:09Z | - |
dc.date.created | 2022-02-08 | - |
dc.date.issued | 2022-01 | - |
dc.identifier.issn | 2093-3681 | - |
dc.identifier.uri | https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/83803 | - |
dc.description.abstract | Objectives: De-identifying protected health information (PHI) in medical documents is important, and a prerequisite to deidentificationis the identification of PHI entity names in clinical documents. This study aimed to compare the performanceof three pre-training models that have recently attracted significant attention and to determine which model is more suitablefor PHI recognition. Methods: We compared the PHI recognition performance of deep learning models using the i2b2 2014dataset. We used the three pre-training models—namely, bidirectional encoder representations from transformers (BERT),robustly optimized BERT pre-training approach (RoBERTa), and XLNet (model built based on Transformer-XL)—to detectPHI. After the dataset was tokenized, it was processed using an inside-outside-beginning tagging scheme and WordPiecetokenizedto place it into these models. Further, the PHI recognition performance was investigated using BERT, RoBERTa,and XLNet. Results: Comparing the PHI recognition performance of the three models, it was confirmed that XLNet had asuperior F1-score of 96.29%. In addition, when checking PHI entity performance evaluation, RoBERTa and XLNet showeda 30% improvement in performance compared to BERT. Conclusions: Among the pre-training models used in this study,XLNet exhibited superior performance because word embedding was well constructed using the two-stream self-attentionmethod. In addition, compared to BERT, RoBERTa and XLNet showed superior performance, indicating that they were moreeffective in grasping the context. | - |
dc.language | 영어 | - |
dc.language.iso | en | - |
dc.publisher | 대한의료정보학회 | - |
dc.relation.isPartOf | Healthcare Informatics Research | - |
dc.title | Protected Health Information Recognition by Fine-Tuning a Pre-training Transformer Model | - |
dc.type | Article | - |
dc.type.rims | ART | - |
dc.description.journalClass | 1 | - |
dc.identifier.wosid | 000778831900003 | - |
dc.identifier.doi | 10.4258/hir.2022.28.1.16 | - |
dc.identifier.bibliographicCitation | Healthcare Informatics Research, v.28, no.1, pp.16 - 24 | - |
dc.identifier.kciid | ART002809850 | - |
dc.description.isOpenAccess | N | - |
dc.identifier.scopusid | 2-s2.0-85126588497 | - |
dc.citation.endPage | 24 | - |
dc.citation.startPage | 16 | - |
dc.citation.title | Healthcare Informatics Research | - |
dc.citation.volume | 28 | - |
dc.citation.number | 1 | - |
dc.contributor.affiliatedAuthor | 오서현 | - |
dc.contributor.affiliatedAuthor | 강민 | - |
dc.contributor.affiliatedAuthor | 이영호 | - |
dc.type.docType | Article | - |
dc.subject.keywordAuthor | Artificial Intelligence | - |
dc.subject.keywordAuthor | Big Data | - |
dc.subject.keywordAuthor | Medical Informatics | - |
dc.subject.keywordAuthor | Data Anonymization | - |
dc.subject.keywordAuthor | Deep Learning | - |
dc.relation.journalResearchArea | Medical Informatics | - |
dc.relation.journalWebOfScienceCategory | Medical Informatics | - |
dc.description.journalRegisteredClass | scopus | - |
dc.description.journalRegisteredClass | kci | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
1342, Seongnam-daero, Sujeong-gu, Seongnam-si, Gyeonggi-do, Republic of Korea(13120)031-750-5114
COPYRIGHT 2020 Gachon University All Rights Reserved.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.