Advanced search
1 file | 485.34 KB

AQUa : an adaptive framework for compression of sequencing quality scores with random access functionality

Tom Paridaens (UGent) , Glenn Van Wallendael (UGent) , Wesley De Neve (UGent) and Peter Lambert (UGent)
(2018) BIOINFORMATICS. 34(3). p.425-433
Author
Organization
Abstract
Motivation: The past decade has seen the introduction of new technologies that significantly lowered the cost of genome sequencing. As a result, the amount of genomic data that must be stored and transmitted is increasing exponentially. To mitigate storage and transmission issues, we introduce a framework for lossless compression of quality scores. Results: This article proposes AQUa, an adaptive framework for lossless compression of quality scores. To compress these quality scores, AQUa makes use of a configurable set of coding tools, extended with a Context-Adaptive Binary Arithmetic Coding scheme. When benchmarking AQUa against generic single-pass compressors, file sizes are reduced by up to 38.49% when comparing with GNU Gzip and by up to 6.48% when comparing with 7-Zip at the Ultra Setting, while still providing support for random access. When comparing AQUa with the purpose-built, single-pass, and state-of-the-art compressor SCALCE, which does not support random access, file sizes are reduced by up to 21.14%. When comparing AQUa with the purpose-built, dual-pass, and state-of-the-art compressor QVZ, which does not support random access, file sizes are larger by 6.42-33.47%. However, for one test file, the file size is 0.38% smaller, illustrating the strength of our single-pass compression framework. This work has been spurred by the current activity on genomic information representation (MPEG-G) within the ISO/IEC SC29/WG11 technical committee.
Keywords
LOSSY COMPRESSION, CABAC

Downloads

  • (...).pdf
    • full text
    • |
    • UGent only
    • |
    • PDF
    • |
    • 485.34 KB

Citation

Please use this url to cite or link to this publication:

Chicago
Paridaens, Tom, Glenn Van Wallendael, Wesley De Neve, and Peter Lambert. 2018. “AQUa : an Adaptive Framework for Compression of Sequencing Quality Scores with Random Access Functionality.” Bioinformatics 34 (3): 425–433.
APA
Paridaens, T., Van Wallendael, G., De Neve, W., & Lambert, P. (2018). AQUa : an adaptive framework for compression of sequencing quality scores with random access functionality. BIOINFORMATICS, 34(3), 425–433.
Vancouver
1.
Paridaens T, Van Wallendael G, De Neve W, Lambert P. AQUa : an adaptive framework for compression of sequencing quality scores with random access functionality. BIOINFORMATICS. Oxford: Oxford Univ Press; 2018;34(3):425–33.
MLA
Paridaens, Tom, Glenn Van Wallendael, Wesley De Neve, et al. “AQUa : an Adaptive Framework for Compression of Sequencing Quality Scores with Random Access Functionality.” BIOINFORMATICS 34.3 (2018): 425–433. Print.
@article{8551438,
  abstract     = {Motivation: The past decade has seen the introduction of new technologies that significantly lowered the cost of genome sequencing. As a result, the amount of genomic data that must be stored and transmitted is increasing exponentially. To mitigate storage and transmission issues, we introduce a framework for lossless compression of quality scores. Results: This article proposes AQUa, an adaptive framework for lossless compression of quality scores. To compress these quality scores, AQUa makes use of a configurable set of coding tools, extended with a Context-Adaptive Binary Arithmetic Coding scheme. When benchmarking AQUa against generic single-pass compressors, file sizes are reduced by up to 38.49\% when comparing with GNU Gzip and by up to 6.48\% when comparing with 7-Zip at the Ultra Setting, while still providing support for random access. When comparing AQUa with the purpose-built, single-pass, and state-of-the-art compressor SCALCE, which does not support random access, file sizes are reduced by up to 21.14\%. When comparing AQUa with the purpose-built, dual-pass, and state-of-the-art compressor QVZ, which does not support random access, file sizes are larger by 6.42-33.47\%. However, for one test file, the file size is 0.38\% smaller, illustrating the strength of our single-pass compression framework. This work has been spurred by the current activity on genomic information representation (MPEG-G) within the ISO/IEC SC29/WG11 technical committee.},
  author       = {Paridaens, Tom and Van Wallendael, Glenn and De Neve, Wesley and Lambert, Peter},
  issn         = {1367-4803},
  journal      = {BIOINFORMATICS},
  language     = {eng},
  number       = {3},
  pages        = {425--433},
  publisher    = {Oxford Univ Press},
  title        = {AQUa : an adaptive framework for compression of sequencing quality scores with random access functionality},
  url          = {http://dx.doi.org/10.1093/bioinformatics/btx607},
  volume       = {34},
  year         = {2018},
}

Altmetric
View in Altmetric
Web of Science
Times cited: