Large-scale incremental processing with MapReduce
- Authors
- Lee, D[Lee, Daewoo]; Kim, JS[Kim, Jin-Soo]; Maeng, S[Maeng, Seungryoul]
- Issue Date
- Jul-2014
- Publisher
- ELSEVIER SCIENCE BV
- Keywords
- Big data processing; Incremental processing; MapReduce; Hadoop; Data deduplication
- Citation
- FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF GRID COMPUTING AND ESCIENCE, v.36, pp.66 - 79
- Indexed
- SCIE
SCOPUS
- Journal Title
- FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF GRID COMPUTING AND ESCIENCE
- Volume
- 36
- Start Page
- 66
- End Page
- 79
- URI
- https://scholarworks.bwise.kr/skku/handle/2021.sw.skku/52409
- DOI
- 10.1016/j.future.2013.09.010
- ISSN
- 0167-739X
- Abstract
- An important property of today's big data processing is that the same computation is often repeated on datasets evolving over time, such as web and social network data. While repeating full computation of the entire datasets is feasible with distributed computing frameworks such as Hadoop, it is obviously inefficient and wastes resources. In this paper, we present HadUP (Hadoop with Update Processing), a modified Hadoop architecture tailored to large-scale incremental processing with conventional MapReduce algorithms. Several approaches have been proposed to achieve a similar goal using task-level memoization. However, task-level memoization detects the change of datasets at a coarse-grained level, which often makes such approaches ineffective. Instead, HadUP detects and computes the change of datasets at a fine-grained level using a deduplication-based snapshot differential algorithm (D-SD) and update propagation. As a result, it provides high performance, especially in an environment where task-level memoization has no benefit. HadUP requires only a small amount of extra programming cost because it can reuse the code for the map and reduce functions of Hadoop. Therefore, the development of HadUP applications is quite easy. (C) 2013 Elsevier B.V. All rights reserved.
- Files in This Item
- There are no files associated with this item.
- Appears in
Collections - Software > Computer Science and Engineering > 1. Journal Articles
![qrcode](https://api.qrserver.com/v1/create-qr-code/?size=55x55&data=https://scholarworks.bwise.kr/skku/handle/2021.sw.skku/52409)
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.