Suffix tree of alignment: An efficient index for similar data
- Authors
- Na, Joong Chae; Park, Heejin; Crochemore, Maxime; Holub, Jan; Iliopoulos, Costas S.; Mouchard, Laurent; Park, Kunsoo
- Issue Date
- Jul-2013
- Publisher
- Springer Verlag
- Keywords
- alignments; Indexes for similar data; suffix trees
- Citation
- Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), v.8288 LNCS, pp.337 - 348
- Indexed
- SCOPUS
- Journal Title
- Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
- Volume
- 8288 LNCS
- Start Page
- 337
- End Page
- 348
- URI
- https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/162359
- DOI
- 10.1007/978-3-642-45278-9_29
- ISSN
- 0302-9743
- Abstract
- We consider an index data structure for similar strings. The generalized suffix tree can be a solution for this. The generalized suffix tree of two strings A and B is a compacted trie representing all suffixes in A and B. It has |A| + |B| leaves and can be constructed in O(|A| + |B|) time. However, if the two strings are similar, the generalized suffix tree is not efficient because it does not exploit the similarity which is usually represented as an alignment of A and B. In this paper we propose a space/time-efficient suffix tree of alignment which wisely exploits the similarity in an alignment. Our suffix tree for an alignment of A and B has |A| + l d + l 1 leaves where l d is the sum of the lengths of all parts of B different from A and l 1 is the sum of the lengths of some common parts of A and B. We did not compromise the pattern search to reduce the space. Our suffix tree can be searched for a pattern P in O(|P| + occ) time where occ is the number of occurrences of P in A and B. We also present an efficient algorithm to construct the suffix tree of alignment. When the suffix tree is constructed from scratch, the algorithm requires O(|A| + l d + l 1 + l 2) time where l 2 is the sum of the lengths of other common substrings of A and B. When the suffix tree of A is already given, it requires O(l d + l 1 + l 2) time.
- Files in This Item
-
Go to Link
- Appears in
Collections - 서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles
![qrcode](https://api.qrserver.com/v1/create-qr-code/?size=55x55&data=https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/162359)
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.