Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Efficient Top-k algorithms for approximate substring matching

Authors
Kim, YounghoonShim, Kyuseok
Issue Date
Jun-2013
Publisher
ACM
Keywords
Edit distance; Inverted q-gram index; Top-k approximate substring matching
Citation
Proceedings of the ACM SIGMOD International Conference on Management of Data, pp.385 - 396
Indexed
SCOPUS
Journal Title
Proceedings of the ACM SIGMOD International Conference on Management of Data
Start Page
385
End Page
396
URI
https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/29263
DOI
10.1145/2463676.2465324
ISSN
0730-8078
Abstract
There is a wide range of applications that require to query a large database of texts to search for similar strings or substrings. Traditional approximate substring matching requests a user to specify a similarity threshold. Without topfe approximate substring matching, users have to try repeatedly different maximum distance threshold values when the proper threshold is unknown in advance. In our paper, we first propose the efficient algorithms for finding the top-fc approximate substring matches with a given query string in a set of data strings. To reduce the number of expensive distance computations, the proposed algorithms utilize our novel filtering techniques which take advantages of q-grams and inverted q-gram indexes available. We conduct extensive experiments with real-life data sets. Our experimental results confirm the effectiveness and scalability of our proposed algorithms. Copyright © 2013 ACM.
Files in This Item
Go to Link
Appears in
Collections
COLLEGE OF COMPUTING > DEPARTMENT OF ARTIFICIAL INTELLIGENCE > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher Kim, Young hoon photo

Kim, Young hoon
COLLEGE OF COMPUTING (DEPARTMENT OF ARTIFICIAL INTELLIGENCE)
Read more

Altmetrics

Total Views & Downloads

BROWSE