Efficient Top-k algorithms for approximate substring matching
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Kim, Younghoon | - |
dc.contributor.author | Shim, Kyuseok | - |
dc.date.accessioned | 2021-06-23T04:23:38Z | - |
dc.date.available | 2021-06-23T04:23:38Z | - |
dc.date.created | 2021-01-22 | - |
dc.date.issued | 2013-06 | - |
dc.identifier.issn | 0730-8078 | - |
dc.identifier.uri | https://scholarworks.bwise.kr/erica/handle/2021.sw.erica/29263 | - |
dc.description.abstract | There is a wide range of applications that require to query a large database of texts to search for similar strings or substrings. Traditional approximate substring matching requests a user to specify a similarity threshold. Without topfe approximate substring matching, users have to try repeatedly different maximum distance threshold values when the proper threshold is unknown in advance. In our paper, we first propose the efficient algorithms for finding the top-fc approximate substring matches with a given query string in a set of data strings. To reduce the number of expensive distance computations, the proposed algorithms utilize our novel filtering techniques which take advantages of q-grams and inverted q-gram indexes available. We conduct extensive experiments with real-life data sets. Our experimental results confirm the effectiveness and scalability of our proposed algorithms. Copyright © 2013 ACM. | - |
dc.language | 영어 | - |
dc.language.iso | en | - |
dc.publisher | ACM | - |
dc.title | Efficient Top-k algorithms for approximate substring matching | - |
dc.type | Article | - |
dc.contributor.affiliatedAuthor | Kim, Younghoon | - |
dc.identifier.doi | 10.1145/2463676.2465324 | - |
dc.identifier.scopusid | 2-s2.0-84880546189 | - |
dc.identifier.bibliographicCitation | Proceedings of the ACM SIGMOD International Conference on Management of Data, pp.385 - 396 | - |
dc.relation.isPartOf | Proceedings of the ACM SIGMOD International Conference on Management of Data | - |
dc.citation.title | Proceedings of the ACM SIGMOD International Conference on Management of Data | - |
dc.citation.startPage | 385 | - |
dc.citation.endPage | 396 | - |
dc.type.rims | ART | - |
dc.type.docType | Conference Paper | - |
dc.description.journalClass | 1 | - |
dc.description.isOpenAccess | N | - |
dc.description.journalRegisteredClass | scopus | - |
dc.subject.keywordPlus | Distance computation | - |
dc.subject.keywordPlus | Edit distance | - |
dc.subject.keywordPlus | Filtering technique | - |
dc.subject.keywordPlus | Q-gram indices | - |
dc.subject.keywordPlus | Real life datasets | - |
dc.subject.keywordPlus | Similarity threshold | - |
dc.subject.keywordPlus | Substring | - |
dc.subject.keywordPlus | Substring matches | - |
dc.subject.keywordPlus | Query processing | - |
dc.subject.keywordPlus | Algorithms | - |
dc.subject.keywordAuthor | Edit distance | - |
dc.subject.keywordAuthor | Inverted q-gram index | - |
dc.subject.keywordAuthor | Top-k approximate substring matching | - |
dc.identifier.url | https://dl.acm.org/doi/abs/10.1145/2463676.2465324? | - |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
55 Hanyangdeahak-ro, Sangnok-gu, Ansan, Gyeonggi-do, 15588, Korea+82-31-400-4269 sweetbrain@hanyang.ac.kr
COPYRIGHT © 2021 HANYANG UNIVERSITY. ALL RIGHTS RESERVED.
Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.