Broom: application for non-redundant storage of high throughput sequencing data

Levent Albayrak, Kamil Khanipov, George Golovko, Yuriy Fofanov

Research output: Contribution to journalArticle

Abstract

Motivation: The data generation capabilities of high throughput sequencing (HTS) instruments have exponentially increased over the last few years, while the cost of sequencing has dramatically decreased allowing this technology to become widely used in biomedical studies. For small labs and individual researchers, however, storage and transfer of large amounts of HTS data present a significant challenge. The recent trends in increased sequencing quality and genome coverage can be used to reconsider HTS data storage strategies. Results: We present Broom, a stand-alone application designed to select and store only high-quality sequencing reads at extremely high compression rates. Written in C++, the application accepts single and paired-end reads in FASTQ and FASTA formats and decompresses data in FASTA format. Availability and implementation: C++ code available at https://scsb.utmb.edu/labgroups/fofanov/broom.asp. Supplementary information: Supplementary data are available at Bioinformatics online.

Original languageEnglish (US)
Pages (from-to)143-145
Number of pages3
JournalBioinformatics (Oxford, England)
Volume35
Issue number1
DOIs
StatePublished - Jan 1 2019

Fingerprint

Information Storage and Retrieval
Viperidae
Computational Biology
Sequencing
High Throughput
Research Personnel
Throughput
Genome
Technology
Costs and Cost Analysis
Bioinformatics
C++
Genes
Availability
Data storage equipment
Data Storage
Costs
Coverage
Compression

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Broom : application for non-redundant storage of high throughput sequencing data. / Albayrak, Levent; Khanipov, Kamil; Golovko, George; Fofanov, Yuriy.

In: Bioinformatics (Oxford, England), Vol. 35, No. 1, 01.01.2019, p. 143-145.

Research output: Contribution to journalArticle

Albayrak, Levent ; Khanipov, Kamil ; Golovko, George ; Fofanov, Yuriy. / Broom : application for non-redundant storage of high throughput sequencing data. In: Bioinformatics (Oxford, England). 2019 ; Vol. 35, No. 1. pp. 143-145.
@article{9225e1da59ad43c2a165689e0fbd90a0,
title = "Broom: application for non-redundant storage of high throughput sequencing data",
abstract = "Motivation: The data generation capabilities of high throughput sequencing (HTS) instruments have exponentially increased over the last few years, while the cost of sequencing has dramatically decreased allowing this technology to become widely used in biomedical studies. For small labs and individual researchers, however, storage and transfer of large amounts of HTS data present a significant challenge. The recent trends in increased sequencing quality and genome coverage can be used to reconsider HTS data storage strategies. Results: We present Broom, a stand-alone application designed to select and store only high-quality sequencing reads at extremely high compression rates. Written in C++, the application accepts single and paired-end reads in FASTQ and FASTA formats and decompresses data in FASTA format. Availability and implementation: C++ code available at https://scsb.utmb.edu/labgroups/fofanov/broom.asp. Supplementary information: Supplementary data are available at Bioinformatics online.",
author = "Levent Albayrak and Kamil Khanipov and George Golovko and Yuriy Fofanov",
year = "2019",
month = "1",
day = "1",
doi = "10.1093/bioinformatics/bty580",
language = "English (US)",
volume = "35",
pages = "143--145",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "1",

}

TY - JOUR

T1 - Broom

T2 - application for non-redundant storage of high throughput sequencing data

AU - Albayrak, Levent

AU - Khanipov, Kamil

AU - Golovko, George

AU - Fofanov, Yuriy

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Motivation: The data generation capabilities of high throughput sequencing (HTS) instruments have exponentially increased over the last few years, while the cost of sequencing has dramatically decreased allowing this technology to become widely used in biomedical studies. For small labs and individual researchers, however, storage and transfer of large amounts of HTS data present a significant challenge. The recent trends in increased sequencing quality and genome coverage can be used to reconsider HTS data storage strategies. Results: We present Broom, a stand-alone application designed to select and store only high-quality sequencing reads at extremely high compression rates. Written in C++, the application accepts single and paired-end reads in FASTQ and FASTA formats and decompresses data in FASTA format. Availability and implementation: C++ code available at https://scsb.utmb.edu/labgroups/fofanov/broom.asp. Supplementary information: Supplementary data are available at Bioinformatics online.

AB - Motivation: The data generation capabilities of high throughput sequencing (HTS) instruments have exponentially increased over the last few years, while the cost of sequencing has dramatically decreased allowing this technology to become widely used in biomedical studies. For small labs and individual researchers, however, storage and transfer of large amounts of HTS data present a significant challenge. The recent trends in increased sequencing quality and genome coverage can be used to reconsider HTS data storage strategies. Results: We present Broom, a stand-alone application designed to select and store only high-quality sequencing reads at extremely high compression rates. Written in C++, the application accepts single and paired-end reads in FASTQ and FASTA formats and decompresses data in FASTA format. Availability and implementation: C++ code available at https://scsb.utmb.edu/labgroups/fofanov/broom.asp. Supplementary information: Supplementary data are available at Bioinformatics online.

UR - http://www.scopus.com/inward/record.url?scp=85058729749&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85058729749&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bty580

DO - 10.1093/bioinformatics/bty580

M3 - Article

VL - 35

SP - 143

EP - 145

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 1

ER -