Performance bottleneck of subsequence matching in time-series databases: Observation, solution, and performance evaluation

Kim, Sang Wook; Jeong, Byeong Soo

doi:10.1016/j.ins.2007.06.032

Detailed Information

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Performance bottleneck of subsequence matching in time-series databases: Observation, solution, and performance evaluation

Authors: Kim, Sang Wook; Jeong, Byeong Soo

Issue Date: Nov-2007

Publisher: ELSEVIER SCIENCE INC

Keywords: data mining; time-series databases; similar sequence matching; performance

Citation: INFORMATION SCIENCES, v.177, no.22, pp.4841 - 4858

Indexed: SCIE
SCOPUS

Journal Title: INFORMATION SCIENCES

Volume: 177

Number: 22

Start Page: 4841

End Page: 4858

URI: https://scholarworks.bwise.kr/hanyang/handle/2021.sw.hanyang/179410

DOI: 10.1016/j.ins.2007.06.032

ISSN: 0020-0255

Abstract: Subsequence matching is an operation that finds subsequences whose changing patterns are similar to a given query sequence from time-series databases. This paper identifies a performance bottleneck in subsequence matching, and then proposes an effective method that substantially improves the performance of entire subsequence matching by resolving the performance bottleneck. First, we analyze the disk access and CPU processing times required during the index searching and post-processing steps of subsequence matching through preliminary experiments. Based on these results, we show that the post-processing step is a main performance, bottleneck in subsequence matching. Then, we argue that the optimization of the post-processing step is a crucial issue overlooked in previous approaches. In order to resolve the performance bottleneck, we propose a simple yet highly effective method for expediting the post-processing step. By rearranging the order of candidate subsequences to be compared with a query sequence, our method completely eliminates the redundancies of disk accesses and CPU processing that occur in the post-processing step. Our method is fairly efficient, and does not incur any false dismissal. We quantitatively demonstrate the superiority of our method through extensive experimentation. The results show that our method produces a significantly faster post-processing step; When using a data set of real-world stock sequences, our method was 43.36-96.75 times faster than previous methods, and when using data sets of large numbers of synthetic sequences, our method was 12.48-26.95 times faster than previous methods. Also, the results show that our method reduces the weight of the post-processing step over entire subsequence matching from more than 97% to less than 67%. This implies that our method successfully resolves the performance bottleneck in subsequence matching. As a result, our method provides excellent performance in entire subsequence matching. Compared with previous methods, our method is 16.17-32.64 times faster when using a data set of real-world stock sequences and 8.64-14.29 times faster when using data sets of large numbers of synthetic sequences.

Files in This Item: Go to Link

Appears in Collections: 서울 공과대학 > 서울 컴퓨터소프트웨어학부 > 1. Journal Articles

Show full item record

qrcode

Related Researcher

Researcher Kim, Sang-Wook photo

Kim, Sang-Wook: COLLEGE OF ENGINEERING (SCHOOL OF COMPUTER SCIENCE)

Read more

Altmetrics

Total Views & Downloads

RSS_1.0 RSS_2.0 ATOM_1.0

222, Wangsimni-ro, Seongdong-gu, Seoul, 04763, Korea+82-2-2220-1365

Certain data included herein are derived from the © Web of Science of Clarivate Analytics. All rights reserved.
You may not copy or re-distribute this material in whole or in part without the prior written consent of Clarivate Analytics.

Detailed Information

Related Researcher

Altmetrics

Total Views & Downloads

BROWSE