Detailed Information

Cited 27 time in webofscience Cited 35 time in scopus
Metadata Downloads

Large-scale incremental processing with MapReduce

Authors
Lee, D[Lee, Daewoo]Kim, JS[Kim, Jin-Soo]Maeng, S[Maeng, Seungryoul]
Issue Date
Jul-2014
Publisher
ELSEVIER SCIENCE BV
Keywords
Big data processing; Incremental processing; MapReduce; Hadoop; Data deduplication
Citation
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF GRID COMPUTING AND ESCIENCE, v.36, pp.66 - 79
Indexed
SCIE
SCOPUS
Journal Title
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF GRID COMPUTING AND ESCIENCE
Volume
36
Start Page
66
End Page
79
URI
https://scholarworks.bwise.kr/skku/handle/2021.sw.skku/52409
DOI
10.1016/j.future.2013.09.010
ISSN
0167-739X
Abstract
An important property of today's big data processing is that the same computation is often repeated on datasets evolving over time, such as web and social network data. While repeating full computation of the entire datasets is feasible with distributed computing frameworks such as Hadoop, it is obviously inefficient and wastes resources. In this paper, we present HadUP (Hadoop with Update Processing), a modified Hadoop architecture tailored to large-scale incremental processing with conventional MapReduce algorithms. Several approaches have been proposed to achieve a similar goal using task-level memoization. However, task-level memoization detects the change of datasets at a coarse-grained level, which often makes such approaches ineffective. Instead, HadUP detects and computes the change of datasets at a fine-grained level using a deduplication-based snapshot differential algorithm (D-SD) and update propagation. As a result, it provides high performance, especially in an environment where task-level memoization has no benefit. HadUP requires only a small amount of extra programming cost because it can reuse the code for the map and reduce functions of Hadoop. Therefore, the development of HadUP applications is quite easy. (C) 2013 Elsevier B.V. All rights reserved.
Files in This Item
There are no files associated with this item.
Appears in
Collections
Software > Computer Science and Engineering > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetrics

Total Views & Downloads

BROWSE