Efficient Top-k algorithms for approximate substring matching

Kim, Younghoon; Shim, Kyuseok

doi:10.1145/2463676.2465324

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Efficient Top-k algorithms for approximate substring matching

Full metadata record

DC Field	Value	Language
dc.contributor.author	Kim, Younghoon	-
dc.contributor.author	Shim, Kyuseok	-
dc.date.accessioned	2021-06-23T04:23:38Z	-
dc.date.available	2021-06-23T04:23:38Z	-
dc.date.issued	2013-06	-
dc.identifier.issn	0730-8078	-
dc.identifier.uri	https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/29263	-
dc.description.abstract	There is a wide range of applications that require to query a large database of texts to search for similar strings or substrings. Traditional approximate substring matching requests a user to specify a similarity threshold. Without topfe approximate substring matching, users have to try repeatedly different maximum distance threshold values when the proper threshold is unknown in advance. In our paper, we first propose the efficient algorithms for finding the top-fc approximate substring matches with a given query string in a set of data strings. To reduce the number of expensive distance computations, the proposed algorithms utilize our novel filtering techniques which take advantages of q-grams and inverted q-gram indexes available. We conduct extensive experiments with real-life data sets. Our experimental results confirm the effectiveness and scalability of our proposed algorithms. Copyright © 2013 ACM.	-
dc.format.extent	12	-
dc.language	영어	-
dc.language.iso	ENG	-
dc.publisher	ACM	-
dc.title	Efficient Top-k algorithms for approximate substring matching	-
dc.type	Article	-
dc.publisher.location	미국	-
dc.identifier.doi	10.1145/2463676.2465324	-
dc.identifier.scopusid	2-s2.0-84880546189	-
dc.identifier.bibliographicCitation	Proceedings of the ACM SIGMOD International Conference on Management of Data, pp 385 - 396	-
dc.citation.title	Proceedings of the ACM SIGMOD International Conference on Management of Data	-
dc.citation.startPage	385	-
dc.citation.endPage	396	-
dc.type.docType	Conference Paper	-
dc.description.isOpenAccess	N	-
dc.description.journalRegisteredClass	scopus	-
dc.subject.keywordPlus	Distance computation	-
dc.subject.keywordPlus	Edit distance	-
dc.subject.keywordPlus	Filtering technique	-
dc.subject.keywordPlus	Q-gram indices	-
dc.subject.keywordPlus	Real life datasets	-
dc.subject.keywordPlus	Similarity threshold	-
dc.subject.keywordPlus	Substring	-
dc.subject.keywordPlus	Substring matches	-
dc.subject.keywordPlus	Query processing	-
dc.subject.keywordPlus	Algorithms	-
dc.subject.keywordAuthor	Edit distance	-
dc.subject.keywordAuthor	Inverted q-gram index	-
dc.subject.keywordAuthor	Top-k approximate substring matching	-
dc.identifier.url	https://dl.acm.org/doi/abs/10.1145/2463676.2465324?	-

Files in This Item: Go to Link

Appears in Collections: COLLEGE OF COMPUTING > DEPARTMENT OF ARTIFICIAL INTELLIGENCE > 1. Journal Articles

Show simple item record

qrcode

Related Researcher

Researcher Kim, Young hoon photo

Kim, Young hoon: ERICA 소프트웨어융합대학 (DEPARTMENT OF ARTIFICIAL INTELLIGENCE)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

55 Hanyangdeahak-ro, Sangnok-gu, Ansan, Gyeonggi-do, 15588, Korea+82-31-400-4269 sweetbrain@hanyang.ac.kr

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE