Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Data cleansing mechanisms and approaches for big data analytics: a systematic study

Authors
Hosseinzadeh, M.Azhir, E.Ahmed, O.H.Ghafour, M.Y.Ahmed, S.H.Rahmani, A.M.Vo, B.
Issue Date
Jan-2023
Publisher
SPRINGER HEIDELBERG
Keywords
Big data; Data cleansing; Data quality; Methods; Review
Citation
Journal of Ambient Intelligence and Humanized Computing, v.14, no.1, pp 99 - 111
Pages
13
Journal Title
Journal of Ambient Intelligence and Humanized Computing
Volume
14
Number
1
Start Page
99
End Page
111
URI
https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/90532
DOI
10.1007/s12652-021-03590-2
ISSN
1868-5137
1868-5145
Abstract
With the evolution of new technologies, the production of digital data is constantly growing. It is thus necessary to develop data management strategies in order to handle the large-scale datasets. The data gathered through different sources, such as sensor networks, social media, business transactions, etc. is inherently uncertain due to noise, missing values, inconsistencies and other problems that impact the quality of big data analytics. One of the key challenges in this context is to detect and repair dirty data, i.e. data cleansing, and various techniques have been presented to solve this issue. However, to the best of our knowledge, there has not been any comprehensive review of data cleansing techniques for big data analytics. As such, a comprehensive and systematic study on the state-of-the-art mechanisms within the scope of the big data cleansing is done in this survey. Therefore, five categories to review these mechanisms are considered, which are machine learning-based, sample-based, expert-based, rule-based, and framework-based mechanisms. A number of articles are reviewed in each category. Furthermore, this paper denotes the advantages and disadvantages of the chosen data cleansing techniques and discusses the related parameters, comparing them in terms of scalability, efficiency, accuracy, and usability. Finally, some suggestions for further work are provided to improve the big data cleansing mechanisms in the future. © 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.
Files in This Item
There are no files associated with this item.
Appears in
Collections
ETC > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Hosseinzadeh, Mehdi photo

Hosseinzadeh, Mehdi
College of IT Convergence (Department of Software)
Read more

Altmetrics

Total Views & Downloads

BROWSE