Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Parallel Top-K Similarity Join Algorithms Using MapReduce

Authors
Kim, YounghoonShim, Kyuseok
Issue Date
Apr-2012
Publisher
IEEE
Citation
Proceedings - International Conference on Data Engineering,(ICDE 2012), pp 510 - 521
Pages
12
Indexed
SCIE
SCOPUS
Journal Title
Proceedings - International Conference on Data Engineering,(ICDE 2012)
Start Page
510
End Page
521
URI
https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/36305
DOI
10.1109/ICDE.2012.87
ISSN
1063-6382
2375-026X
Abstract
There is a wide range of applications that require finding the top-k most similar pairs of records in a given database. However, computing such top-k similarity joins is a challenging problem today, as there is an increasing trend of applications that expect to deal with vast amounts of data. For such data-intensive applications, parallel executions of programs on a large cluster of commodity machines using the MapReduce paradigm have recently received a lot of attention. In this paper, we investigate how the top-k similarity join algorithms can get benefits from the popular MapReduce framework. We first develop the divide-and-conquer and branch-and-bound algorithms. We next propose the all pair partitioning and essential pair partitioning methods to minimize the amount of data transfers between map and reduce functions. We finally perform the experiments with not only synthetic but also real-life data sets. Our performance study confirms the effectiveness and scalability of our MapReduce algorithms.
Files in This Item
Go to Link
Appears in
Collections
COLLEGE OF COMPUTING > DEPARTMENT OF ARTIFICIAL INTELLIGENCE > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kim, Young hoon photo

Kim, Young hoon
ERICA 소프트웨어융합대학 (DEPARTMENT OF ARTIFICIAL INTELLIGENCE)
Read more

Altmetrics

Total Views & Downloads

BROWSE