Parallel computation of k-nearest neighbor joins using MapReduce
- Authors
- Kim, Wooyeol; Kim, Younghoon; Shim, Kyuseok
- Issue Date
- Dec-2016
- Publisher
- Institute of Electrical and Electronics Engineers Inc.
- Keywords
- Hadoop; kNN joins; MapReduce
- Citation
- Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016, pp 696 - 705
- Pages
- 10
- Indexed
- SCIE
SCOPUS
- Journal Title
- Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016
- Start Page
- 696
- End Page
- 705
- URI
- https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/15597
- DOI
- 10.1109/BigData.2016.7840662
- Abstract
- The k-nearest neighbor (kNN) join has recently attracted considerable attention due to its broad applications. However, processing fcNN joins is very expensive due to the quadratic nature of the join operation. Furthermore, since there is an increasing trend of applications to deal with big data, computing fcNN joins becomes more challenging. In order to process such big data, parallel and distributed computing using MapReduce recently have received a lot of attention. In this paper, we propose the efficient parallel algorithm KNN-MR to process the fcNN joins using MapReduce. To reduce not only the computational cost of fcNN joins but also the network cost of communicating across machines, we develop the novel vector projection pruning which enables us to identify non-fcNN points that are guaranteed not to be included in the result of a fcNN join. Our performance study confirms the effectiveness and scalability of the proposed algorithm. © 2016 IEEE.
- Files in This Item
-
Go to Link
- Appears in
Collections - COLLEGE OF COMPUTING > DEPARTMENT OF ARTIFICIAL INTELLIGENCE > 1. Journal Articles

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.