An efficient framework for semantically-correlated term detection and sanitization in clinical documents
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Moqurrab, Syed Atif | - |
dc.contributor.author | Anjum, Adeel | - |
dc.contributor.author | Tariq, Noshina | - |
dc.contributor.author | Srivastava, Gautam | - |
dc.date.accessioned | 2023-07-12T08:40:16Z | - |
dc.date.available | 2023-07-12T08:40:16Z | - |
dc.date.created | 2023-07-12 | - |
dc.date.issued | 2022-05-01 | - |
dc.identifier.issn | 0045-7906 | - |
dc.identifier.uri | https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/88491 | - |
dc.description.abstract | In clinical documents, privacy and confidentiality protection are the two main challenges before sharing or publishing data. According to the Health Insurance Portability and Accountability Act (HIPAA) and the General Data Protection Regulation (GDPR), even a few terms can cause privacy threats. In retrospect, confidentiality threats are not fully explored due to the complex nature as well as massive number of clinical terms and phrases. Current approaches use information theoretic-based techniques to detect and sanitize risky semantically-correlated terms. However, they have language ambiguity and non-monotonic behavior, coupled with the fact that pre-trained classifiers and human-tagging are required to construct classifiers. This paper offers a generic and adaptable method for protecting risky terms in clinical data using word embedding (Word2Vec and BERT) for risky term detection and comparative analysis. Our methodology uses WordNet taxonomy to minimize a document's semantic and utility loss by substituting privacy-preserving generalization for disclosive words and by eliminating manual data tagging. The results show significant protection and utility-preservation, compared to information-theoretic approaches. | - |
dc.language | 영어 | - |
dc.language.iso | en | - |
dc.publisher | PERGAMON-ELSEVIER SCIENCE LTD | - |
dc.relation.isPartOf | COMPUTERS & ELECTRICAL ENGINEERING | - |
dc.title | An efficient framework for semantically-correlated term detection and sanitization in clinical documents | - |
dc.type | Article | - |
dc.type.rims | ART | - |
dc.description.journalClass | 1 | - |
dc.identifier.wosid | 000798029100003 | - |
dc.identifier.doi | 10.1016/j.compeleceng.2022.107985 | - |
dc.identifier.bibliographicCitation | COMPUTERS & ELECTRICAL ENGINEERING, v.100 | - |
dc.description.isOpenAccess | N | - |
dc.identifier.scopusid | 2-s2.0-85129306146 | - |
dc.citation.title | COMPUTERS & ELECTRICAL ENGINEERING | - |
dc.citation.volume | 100 | - |
dc.contributor.affiliatedAuthor | Moqurrab, Syed Atif | - |
dc.type.docType | Article | - |
dc.subject.keywordAuthor | Machine learning | - |
dc.subject.keywordAuthor | Data privacy | - |
dc.subject.keywordAuthor | Unsupervised learning | - |
dc.subject.keywordAuthor | Semantically-correlated terms | - |
dc.subject.keywordAuthor | Detection | - |
dc.subject.keywordAuthor | Sanitization | - |
dc.subject.keywordAuthor | Utility-preservation | - |
dc.subject.keywordAuthor | Clinical documents | - |
dc.subject.keywordAuthor | Clinical data privacy | - |
dc.subject.keywordAuthor | Word embedding | - |
dc.subject.keywordPlus | PRIVACY | - |
dc.subject.keywordPlus | PROTECTION | - |
dc.subject.keywordPlus | MODEL | - |
dc.relation.journalResearchArea | Computer Science | - |
dc.relation.journalResearchArea | Engineering | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Hardware & Architecture | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Interdisciplinary Applications | - |
dc.relation.journalWebOfScienceCategory | Engineering, Electrical & Electronic | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
1342, Seongnam-daero, Sujeong-gu, Seongnam-si, Gyeonggi-do, Republic of Korea(13120)031-750-5114
COPYRIGHT 2020 Gachon University All Rights Reserved.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.