Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

CROMqs: An infinitesimal successive refinement lossy compressor for the quality scores

Authors
No, AlbertHernaez, MikelOchoa, Idoia
Issue Date
Dec-2020
Publisher
WORLD SCIENTIFIC PUBL CO PTE LTD
Keywords
Rateless compression; sequencing data compression; variant calling
Citation
JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, v.18, no.6
Journal Title
JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY
Volume
18
Number
6
URI
https://scholarworks.bwise.kr/hongik/handle/2020.sw.hongik/11552
DOI
10.1142/S0219720020500316
ISSN
0219-7200
Abstract
The amount of sequencing data is growing at a fast pace due to a rapid revolution in sequencing technologies. Quality scores, which indicate the reliability of each of the called nucleotides, take a significant portion of the sequencing data. In addition, quality scores are more challenging to compress than nucleotides, and they are often noisy. Hence, a natural solution to further decrease the size of the sequencing data is to apply lossy compression to the quality scores. Lossy compression may result in a loss in precision, however, it has been shown that when operating at some specific rates, lossy compression can achieve performance on variant calling similar to that achieved with the losslessly compressed data (i.e. the original data). We propose Coding with Random Orthogonal Matrices for quality scores (CROMqs), the first lossy compressor designed for the quality scores with the "infinitesimal successive refinability" property. With this property, the encoder needs to compress the data only once, at a high rate, while the decoder can decompress it iteratively. The decoder can reconstruct the set of quality scores at each step with reduced distortion each time. This characteristic is specifically useful in sequencing data compression, since the encoder does not generally know what the most U appropriate rate of compression is, e.g. for not degrading variant calling accuracy. CROMqs avoids the need of having to compress the data at multiple rates, hence incurring time savings. In addition to this property, we show that CROMqs obtains a comparable rate-distortion performance to the state-of-the-art lossy compressors. Moreover, we also show that it achieves a comparable performance on variant calling to that of the lossless compressed data while achieving more than 50% reduction in size.
Files in This Item
There are no files associated with this item.
Appears in
Collections
College of Engineering > School of Electronic & Electrical Engineering > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Altmetrics

Total Views & Downloads

BROWSE