Integration of graphs from different data sources using crowdsourcing
- Authors
- Kim, Younghoon; Jung, Woohwan; Shim, Kyuseok
- Issue Date
- Apr-2017
- Publisher
- Elsevier BV
- Keywords
- Graph integration; Crowdsourcing; Entity resolution
- Citation
- Information Sciences, v.385, pp 438 - 456
- Pages
- 19
- Indexed
- SCI
SCIE
SCOPUS
- Journal Title
- Information Sciences
- Volume
- 385
- Start Page
- 438
- End Page
- 456
- URI
- https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/10042
- DOI
- 10.1016/j.ins.2017.01.006
- ISSN
- 0020-0255
1872-6291
- Abstract
- Data integration is the process of identifying pairs of records from different databases that refer to the same entity in the real world. It has been extensively studied with regard to entity resolution, record linkage, duplicate detection or network alignment. With the increasing use of crowdsourcing platforms as a means of assessing queries manually at low cost, many studies have begun to consider ways to exploit crowdsourcing systems for efficient data integration. In this paper, we present an efficient algorithm to integrate two graphs collected from different sources using crowdsourcing systems. Given two graphs, we repeatedly select a query node from a graph and request a human annotator to find its matching node from the other graph, which is considered to be the one indicating the same entity as the query node. The proposed method is to choose the query nodes that would increase the precision the most if it is labeled. By experiments with both the simulated answers and the labels collected by real crowdsourcing, we show that our algorithm finds more accurate graph matches with a smaller cost for crowdsourcing than the baseline algorithms. (C) 2017 Elsevier Inc. All rights reserved.
- Files in This Item
-
Go to Link
- Appears in
Collections - COLLEGE OF COMPUTING > DEPARTMENT OF ARTIFICIAL INTELLIGENCE > 1. Journal Articles
![qrcode](https://api.qrserver.com/v1/create-qr-code/?size=55x55&data=https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/10042)
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.