Detailed Information

Cited 1 time in webofscience Cited 3 time in scopus
Metadata Downloads

An efficient framework for semantically-correlated term detection and sanitization in clinical documents

Authors
Moqurrab, Syed AtifAnjum, AdeelTariq, NoshinaSrivastava, Gautam
Issue Date
1-May-2022
Publisher
PERGAMON-ELSEVIER SCIENCE LTD
Keywords
Machine learning; Data privacy; Unsupervised learning; Semantically-correlated terms; Detection; Sanitization; Utility-preservation; Clinical documents; Clinical data privacy; Word embedding
Citation
COMPUTERS & ELECTRICAL ENGINEERING, v.100
Journal Title
COMPUTERS & ELECTRICAL ENGINEERING
Volume
100
URI
https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/88491
DOI
10.1016/j.compeleceng.2022.107985
ISSN
0045-7906
Abstract
In clinical documents, privacy and confidentiality protection are the two main challenges before sharing or publishing data. According to the Health Insurance Portability and Accountability Act (HIPAA) and the General Data Protection Regulation (GDPR), even a few terms can cause privacy threats. In retrospect, confidentiality threats are not fully explored due to the complex nature as well as massive number of clinical terms and phrases. Current approaches use information theoretic-based techniques to detect and sanitize risky semantically-correlated terms. However, they have language ambiguity and non-monotonic behavior, coupled with the fact that pre-trained classifiers and human-tagging are required to construct classifiers. This paper offers a generic and adaptable method for protecting risky terms in clinical data using word embedding (Word2Vec and BERT) for risky term detection and comparative analysis. Our methodology uses WordNet taxonomy to minimize a document's semantic and utility loss by substituting privacy-preserving generalization for disclosive words and by eliminating manual data tagging. The results show significant protection and utility-preservation, compared to information-theoretic approaches.
Files in This Item
There are no files associated with this item.
Appears in
Collections
ETC > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Moqurrab, Syed Atif photo

Moqurrab, Syed Atif
College of IT Convergence (Department of Software)
Read more

Altmetrics

Total Views & Downloads

BROWSE