Parallel Top-K Similarity Join Algorithms Using MapReduce
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Kim, Younghoon | - |
dc.contributor.author | Shim, Kyuseok | - |
dc.date.accessioned | 2021-06-23T10:02:58Z | - |
dc.date.available | 2021-06-23T10:02:58Z | - |
dc.date.issued | 2012-04 | - |
dc.identifier.issn | 1063-6382 | - |
dc.identifier.issn | 2375-026X | - |
dc.identifier.uri | https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/36305 | - |
dc.description.abstract | There is a wide range of applications that require finding the top-k most similar pairs of records in a given database. However, computing such top-k similarity joins is a challenging problem today, as there is an increasing trend of applications that expect to deal with vast amounts of data. For such data-intensive applications, parallel executions of programs on a large cluster of commodity machines using the MapReduce paradigm have recently received a lot of attention. In this paper, we investigate how the top-k similarity join algorithms can get benefits from the popular MapReduce framework. We first develop the divide-and-conquer and branch-and-bound algorithms. We next propose the all pair partitioning and essential pair partitioning methods to minimize the amount of data transfers between map and reduce functions. We finally perform the experiments with not only synthetic but also real-life data sets. Our performance study confirms the effectiveness and scalability of our MapReduce algorithms. | - |
dc.format.extent | 12 | - |
dc.language | 영어 | - |
dc.language.iso | ENG | - |
dc.publisher | IEEE | - |
dc.title | Parallel Top-K Similarity Join Algorithms Using MapReduce | - |
dc.type | Article | - |
dc.publisher.location | 미국 | - |
dc.identifier.doi | 10.1109/ICDE.2012.87 | - |
dc.identifier.scopusid | 2-s2.0-84864248187 | - |
dc.identifier.wosid | 000309122100047 | - |
dc.identifier.bibliographicCitation | Proceedings - International Conference on Data Engineering,(ICDE 2012), pp 510 - 521 | - |
dc.citation.title | Proceedings - International Conference on Data Engineering,(ICDE 2012) | - |
dc.citation.startPage | 510 | - |
dc.citation.endPage | 521 | - |
dc.type.docType | Proceedings Paper | - |
dc.description.isOpenAccess | N | - |
dc.description.journalRegisteredClass | scie | - |
dc.description.journalRegisteredClass | scopus | - |
dc.relation.journalResearchArea | Computer Science | - |
dc.relation.journalResearchArea | Engineering | - |
dc.relation.journalWebOfScienceCategory | Computer Science, Theory & Methods | - |
dc.relation.journalWebOfScienceCategory | Engineering, Electrical & Electronic | - |
dc.identifier.url | https://ieeexplore.ieee.org/document/6228110 | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
55 Hanyangdeahak-ro, Sangnok-gu, Ansan, Gyeonggi-do, 15588, Korea+82-31-400-4269 sweetbrain@hanyang.ac.kr
COPYRIGHT © 2021 HANYANG UNIVERSITY. ALL RIGHTS RESERVED.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.