An efficient framework for semantically-correlated term detection and sanitization in clinical documents
- Authors
- Moqurrab, Syed Atif; Anjum, Adeel; Tariq, Noshina; Srivastava, Gautam
- Issue Date
- 1-May-2022
- Publisher
- PERGAMON-ELSEVIER SCIENCE LTD
- Keywords
- Machine learning; Data privacy; Unsupervised learning; Semantically-correlated terms; Detection; Sanitization; Utility-preservation; Clinical documents; Clinical data privacy; Word embedding
- Citation
- COMPUTERS & ELECTRICAL ENGINEERING, v.100
- Journal Title
- COMPUTERS & ELECTRICAL ENGINEERING
- Volume
- 100
- URI
- https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/88491
- DOI
- 10.1016/j.compeleceng.2022.107985
- ISSN
- 0045-7906
- Abstract
- In clinical documents, privacy and confidentiality protection are the two main challenges before sharing or publishing data. According to the Health Insurance Portability and Accountability Act (HIPAA) and the General Data Protection Regulation (GDPR), even a few terms can cause privacy threats. In retrospect, confidentiality threats are not fully explored due to the complex nature as well as massive number of clinical terms and phrases. Current approaches use information theoretic-based techniques to detect and sanitize risky semantically-correlated terms. However, they have language ambiguity and non-monotonic behavior, coupled with the fact that pre-trained classifiers and human-tagging are required to construct classifiers. This paper offers a generic and adaptable method for protecting risky terms in clinical data using word embedding (Word2Vec and BERT) for risky term detection and comparative analysis. Our methodology uses WordNet taxonomy to minimize a document's semantic and utility loss by substituting privacy-preserving generalization for disclosive words and by eliminating manual data tagging. The results show significant protection and utility-preservation, compared to information-theoretic approaches.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - ETC > 1. Journal Articles
![qrcode](https://api.qrserver.com/v1/create-qr-code/?size=55x55&data=https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/88491)
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.