An efficient framework for semantically-correlated term detection and sanitization in clinical documents

Moqurrab, Syed Atif; Anjum, Adeel; Tariq, Noshina; Srivastava, Gautam

Detailed Information

Cited 1 time in webofscience

Cited 3 time in scopus

Metadata Downloads

An efficient framework for semantically-correlated term detection and sanitization in clinical documents

Full metadata record

DC Field	Value	Language
dc.contributor.author	Moqurrab, Syed Atif	-
dc.contributor.author	Anjum, Adeel	-
dc.contributor.author	Tariq, Noshina	-
dc.contributor.author	Srivastava, Gautam	-
dc.date.accessioned	2023-07-12T08:40:16Z	-
dc.date.available	2023-07-12T08:40:16Z	-
dc.date.created	2023-07-12	-
dc.date.issued	2022-05-01	-
dc.identifier.issn	0045-7906	-
dc.identifier.uri	https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/88491	-
dc.description.abstract	In clinical documents, privacy and confidentiality protection are the two main challenges before sharing or publishing data. According to the Health Insurance Portability and Accountability Act (HIPAA) and the General Data Protection Regulation (GDPR), even a few terms can cause privacy threats. In retrospect, confidentiality threats are not fully explored due to the complex nature as well as massive number of clinical terms and phrases. Current approaches use information theoretic-based techniques to detect and sanitize risky semantically-correlated terms. However, they have language ambiguity and non-monotonic behavior, coupled with the fact that pre-trained classifiers and human-tagging are required to construct classifiers. This paper offers a generic and adaptable method for protecting risky terms in clinical data using word embedding (Word2Vec and BERT) for risky term detection and comparative analysis. Our methodology uses WordNet taxonomy to minimize a document's semantic and utility loss by substituting privacy-preserving generalization for disclosive words and by eliminating manual data tagging. The results show significant protection and utility-preservation, compared to information-theoretic approaches.	-
dc.language	영어	-
dc.language.iso	en	-
dc.publisher	PERGAMON-ELSEVIER SCIENCE LTD	-
dc.relation.isPartOf	COMPUTERS & ELECTRICAL ENGINEERING	-
dc.title	An efficient framework for semantically-correlated term detection and sanitization in clinical documents	-
dc.type	Article	-
dc.type.rims	ART	-
dc.description.journalClass	1	-
dc.identifier.wosid	000798029100003	-
dc.identifier.doi	10.1016/j.compeleceng.2022.107985	-
dc.identifier.bibliographicCitation	COMPUTERS & ELECTRICAL ENGINEERING, v.100	-
dc.description.isOpenAccess	N	-
dc.identifier.scopusid	2-s2.0-85129306146	-
dc.citation.title	COMPUTERS & ELECTRICAL ENGINEERING	-
dc.citation.volume	100	-
dc.contributor.affiliatedAuthor	Moqurrab, Syed Atif	-
dc.type.docType	Article	-
dc.subject.keywordAuthor	Machine learning	-
dc.subject.keywordAuthor	Data privacy	-
dc.subject.keywordAuthor	Unsupervised learning	-
dc.subject.keywordAuthor	Semantically-correlated terms	-
dc.subject.keywordAuthor	Detection	-
dc.subject.keywordAuthor	Sanitization	-
dc.subject.keywordAuthor	Utility-preservation	-
dc.subject.keywordAuthor	Clinical documents	-
dc.subject.keywordAuthor	Clinical data privacy	-
dc.subject.keywordAuthor	Word embedding	-
dc.subject.keywordPlus	PRIVACY	-
dc.subject.keywordPlus	PROTECTION	-
dc.subject.keywordPlus	MODEL	-
dc.relation.journalResearchArea	Computer Science	-
dc.relation.journalResearchArea	Engineering	-
dc.relation.journalWebOfScienceCategory	Computer Science, Hardware & Architecture	-
dc.relation.journalWebOfScienceCategory	Computer Science, Interdisciplinary Applications	-
dc.relation.journalWebOfScienceCategory	Engineering, Electrical & Electronic	-
dc.description.journalRegisteredClass	scie	-
dc.description.journalRegisteredClass	scopus	-

Files in This Item: There are no files associated with this item.

Appears in Collections: ETC > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Moqurrab, Syed Atif photo

Moqurrab, Syed Atif: College of IT Convergence (Department of Software)

Read more

Altmetrics

Total Views & Downloads

STATISTICS: Total View :4,153,250; Today View :16,657

RSS_1.0 RSS_2.0 ATOM_1.0

1342, Seongnam-daero, Sujeong-gu, Seongnam-si, Gyeonggi-do, Republic of Korea(13120)031-750-5114

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE