Protected Health Information Recognition by Fine-Tuning a Pre-training Transformer Model

오서현; 강민; 이영호

Detailed Information

Cited 4 time in webofscience

Cited 4 time in scopus

Metadata Downloads

Protected Health Information Recognition by Fine-Tuning a Pre-training Transformer Model

Full metadata record

DC Field	Value	Language
dc.contributor.author	오서현	-
dc.contributor.author	강민	-
dc.contributor.author	이영호	-
dc.date.accessioned	2022-03-27T06:40:09Z	-
dc.date.available	2022-03-27T06:40:09Z	-
dc.date.created	2022-02-08	-
dc.date.issued	2022-01	-
dc.identifier.issn	2093-3681	-
dc.identifier.uri	https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/83803	-
dc.description.abstract	Objectives: De-identifying protected health information (PHI) in medical documents is important, and a prerequisite to deidentificationis the identification of PHI entity names in clinical documents. This study aimed to compare the performanceof three pre-training models that have recently attracted significant attention and to determine which model is more suitablefor PHI recognition. Methods: We compared the PHI recognition performance of deep learning models using the i2b2 2014dataset. We used the three pre-training models—namely, bidirectional encoder representations from transformers (BERT),robustly optimized BERT pre-training approach (RoBERTa), and XLNet (model built based on Transformer-XL)—to detectPHI. After the dataset was tokenized, it was processed using an inside-outside-beginning tagging scheme and WordPiecetokenizedto place it into these models. Further, the PHI recognition performance was investigated using BERT, RoBERTa,and XLNet. Results: Comparing the PHI recognition performance of the three models, it was confirmed that XLNet had asuperior F1-score of 96.29%. In addition, when checking PHI entity performance evaluation, RoBERTa and XLNet showeda 30% improvement in performance compared to BERT. Conclusions: Among the pre-training models used in this study,XLNet exhibited superior performance because word embedding was well constructed using the two-stream self-attentionmethod. In addition, compared to BERT, RoBERTa and XLNet showed superior performance, indicating that they were moreeffective in grasping the context.	-
dc.language	영어	-
dc.language.iso	en	-
dc.publisher	대한의료정보학회	-
dc.relation.isPartOf	Healthcare Informatics Research	-
dc.title	Protected Health Information Recognition by Fine-Tuning a Pre-training Transformer Model	-
dc.type	Article	-
dc.type.rims	ART	-
dc.description.journalClass	1	-
dc.identifier.wosid	000778831900003	-
dc.identifier.doi	10.4258/hir.2022.28.1.16	-
dc.identifier.bibliographicCitation	Healthcare Informatics Research, v.28, no.1, pp.16 - 24	-
dc.identifier.kciid	ART002809850	-
dc.description.isOpenAccess	N	-
dc.identifier.scopusid	2-s2.0-85126588497	-
dc.citation.endPage	24	-
dc.citation.startPage	16	-
dc.citation.title	Healthcare Informatics Research	-
dc.citation.volume	28	-
dc.citation.number	1	-
dc.contributor.affiliatedAuthor	오서현	-
dc.contributor.affiliatedAuthor	강민	-
dc.contributor.affiliatedAuthor	이영호	-
dc.type.docType	Article	-
dc.subject.keywordAuthor	Artificial Intelligence	-
dc.subject.keywordAuthor	Big Data	-
dc.subject.keywordAuthor	Medical Informatics	-
dc.subject.keywordAuthor	Data Anonymization	-
dc.subject.keywordAuthor	Deep Learning	-
dc.relation.journalResearchArea	Medical Informatics	-
dc.relation.journalWebOfScienceCategory	Medical Informatics	-
dc.description.journalRegisteredClass	scopus	-
dc.description.journalRegisteredClass	kci	-

Files in This Item: There are no files associated with this item.

Appears in Collections: IT융합대학 > 컴퓨터공학과 > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Lee, Young Ho photo

Lee, Young Ho: College of IT Convergence (컴퓨터공학부(컴퓨터공학전공))

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :4,228,275; Today View :304

RSS_1.0 RSS_2.0 ATOM_1.0

1342, Seongnam-daero, Sujeong-gu, Seongnam-si, Gyeonggi-do, Republic of Korea(13120)031-750-5114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE