Detailed Information

Cited 4 time in webofscience Cited 4 time in scopus
Metadata Downloads

Protected Health Information Recognition by Fine-Tuning a Pre-training Transformer Model

Full metadata record
DC Field Value Language
dc.contributor.author오서현-
dc.contributor.author강민-
dc.contributor.author이영호-
dc.date.accessioned2022-03-27T06:40:09Z-
dc.date.available2022-03-27T06:40:09Z-
dc.date.created2022-02-08-
dc.date.issued2022-01-
dc.identifier.issn2093-3681-
dc.identifier.urihttps://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/83803-
dc.description.abstractObjectives: De-identifying protected health information (PHI) in medical documents is important, and a prerequisite to deidentificationis the identification of PHI entity names in clinical documents. This study aimed to compare the performanceof three pre-training models that have recently attracted significant attention and to determine which model is more suitablefor PHI recognition. Methods: We compared the PHI recognition performance of deep learning models using the i2b2 2014dataset. We used the three pre-training models—namely, bidirectional encoder representations from transformers (BERT),robustly optimized BERT pre-training approach (RoBERTa), and XLNet (model built based on Transformer-XL)—to detectPHI. After the dataset was tokenized, it was processed using an inside-outside-beginning tagging scheme and WordPiecetokenizedto place it into these models. Further, the PHI recognition performance was investigated using BERT, RoBERTa,and XLNet. Results: Comparing the PHI recognition performance of the three models, it was confirmed that XLNet had asuperior F1-score of 96.29%. In addition, when checking PHI entity performance evaluation, RoBERTa and XLNet showeda 30% improvement in performance compared to BERT. Conclusions: Among the pre-training models used in this study,XLNet exhibited superior performance because word embedding was well constructed using the two-stream self-attentionmethod. In addition, compared to BERT, RoBERTa and XLNet showed superior performance, indicating that they were moreeffective in grasping the context.-
dc.language영어-
dc.language.isoen-
dc.publisher대한의료정보학회-
dc.relation.isPartOfHealthcare Informatics Research-
dc.titleProtected Health Information Recognition by Fine-Tuning a Pre-training Transformer Model-
dc.typeArticle-
dc.type.rimsART-
dc.description.journalClass1-
dc.identifier.wosid000778831900003-
dc.identifier.doi10.4258/hir.2022.28.1.16-
dc.identifier.bibliographicCitationHealthcare Informatics Research, v.28, no.1, pp.16 - 24-
dc.identifier.kciidART002809850-
dc.description.isOpenAccessN-
dc.identifier.scopusid2-s2.0-85126588497-
dc.citation.endPage24-
dc.citation.startPage16-
dc.citation.titleHealthcare Informatics Research-
dc.citation.volume28-
dc.citation.number1-
dc.contributor.affiliatedAuthor오서현-
dc.contributor.affiliatedAuthor강민-
dc.contributor.affiliatedAuthor이영호-
dc.type.docTypeArticle-
dc.subject.keywordAuthorArtificial Intelligence-
dc.subject.keywordAuthorBig Data-
dc.subject.keywordAuthorMedical Informatics-
dc.subject.keywordAuthorData Anonymization-
dc.subject.keywordAuthorDeep Learning-
dc.relation.journalResearchAreaMedical Informatics-
dc.relation.journalWebOfScienceCategoryMedical Informatics-
dc.description.journalRegisteredClassscopus-
dc.description.journalRegisteredClasskci-
Files in This Item
There are no files associated with this item.
Appears in
Collections
IT융합대학 > 컴퓨터공학과 > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Lee, Young Ho photo

Lee, Young Ho
College of IT Convergence (컴퓨터공학부(컴퓨터공학전공))
Read more

Altmetrics

Total Views & Downloads

BROWSE