Data cleansing mechanisms and approaches for big data analytics: a systematic study
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Hosseinzadeh, M. | - |
dc.contributor.author | Azhir, E. | - |
dc.contributor.author | Ahmed, O.H. | - |
dc.contributor.author | Ghafour, M.Y. | - |
dc.contributor.author | Ahmed, S.H. | - |
dc.contributor.author | Rahmani, A.M. | - |
dc.contributor.author | Vo, B. | - |
dc.date.accessioned | 2024-03-04T12:00:18Z | - |
dc.date.available | 2024-03-04T12:00:18Z | - |
dc.date.issued | 2023-01 | - |
dc.identifier.issn | 1868-5137 | - |
dc.identifier.issn | 1868-5145 | - |
dc.identifier.uri | https://scholarworks.bwise.kr/gachon/handle/2020.sw.gachon/90532 | - |
dc.description.abstract | With the evolution of new technologies, the production of digital data is constantly growing. It is thus necessary to develop data management strategies in order to handle the large-scale datasets. The data gathered through different sources, such as sensor networks, social media, business transactions, etc. is inherently uncertain due to noise, missing values, inconsistencies and other problems that impact the quality of big data analytics. One of the key challenges in this context is to detect and repair dirty data, i.e. data cleansing, and various techniques have been presented to solve this issue. However, to the best of our knowledge, there has not been any comprehensive review of data cleansing techniques for big data analytics. As such, a comprehensive and systematic study on the state-of-the-art mechanisms within the scope of the big data cleansing is done in this survey. Therefore, five categories to review these mechanisms are considered, which are machine learning-based, sample-based, expert-based, rule-based, and framework-based mechanisms. A number of articles are reviewed in each category. Furthermore, this paper denotes the advantages and disadvantages of the chosen data cleansing techniques and discusses the related parameters, comparing them in terms of scalability, efficiency, accuracy, and usability. Finally, some suggestions for further work are provided to improve the big data cleansing mechanisms in the future. © 2021, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature. | - |
dc.format.extent | 13 | - |
dc.language | 영어 | - |
dc.language.iso | ENG | - |
dc.publisher | SPRINGER HEIDELBERG | - |
dc.title | Data cleansing mechanisms and approaches for big data analytics: a systematic study | - |
dc.type | Article | - |
dc.identifier.wosid | 000719732500001 | - |
dc.identifier.doi | 10.1007/s12652-021-03590-2 | - |
dc.identifier.bibliographicCitation | Journal of Ambient Intelligence and Humanized Computing, v.14, no.1, pp 99 - 111 | - |
dc.description.isOpenAccess | N | - |
dc.identifier.scopusid | 2-s2.0-85119189953 | - |
dc.citation.endPage | 111 | - |
dc.citation.startPage | 99 | - |
dc.citation.title | Journal of Ambient Intelligence and Humanized Computing | - |
dc.citation.volume | 14 | - |
dc.citation.number | 1 | - |
dc.type.docType | Article; Early Access | - |
dc.publisher.location | 독일 | - |
dc.subject.keywordAuthor | Big data | - |
dc.subject.keywordAuthor | Data cleansing | - |
dc.subject.keywordAuthor | Data quality | - |
dc.subject.keywordAuthor | Methods | - |
dc.subject.keywordAuthor | Review | - |
dc.relation.journalResearchArea | Computer Science | - |
dc.relation.journalResearchArea | Telecommunications | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Artificial Intelligence | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Information Systems | - |
dc.relation.journalWebOfScienceCategory | Telecommunications | - |
dc.description.journalRegisteredClass | scopus | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
1342, Seongnam-daero, Sujeong-gu, Seongnam-si, Gyeonggi-do, Republic of Korea(13120)031-750-5114
COPYRIGHT 2020 Gachon University All Rights Reserved.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.