Format

Send to

Choose Destination
IEEE Life Sci Lett. 2015 Aug;1(2):22-25. doi: 10.1109/LLS.2015.2465870. Epub 2015 Aug 28.

SNAPR: a bioinformatics pipeline for efficient and accurate RNA-seq alignment and analysis.

Author information

1
Institute for Systems Biology, Seattle, WA 98109.

Abstract

The process of converting raw RNA sequencing data to interpretable results can be circuitous and time consuming, requiring multiple steps. We present an RNA-seq mapping algorithm that streamlines this process. Our algorithm utilizes a hash table approach to leverage the availability and power of high memory machines. SNAPR, which can be run on a single library or thousands of libraries, can take compressed or uncompressed FASTQ and BAM files as inputs, and can output a sorted BAM file, individual read counts, gene fusions and identify exogenous RNA species in a single step. SNAPR also does native Phred score filtering of reads. SNAPR is also well suited for future sequencing platforms that generate longer reads. Using SNAPR, we show how we can analyze data from hundreds of TCGA samples in a matter of hours, while identifying gene fusions and viral events at the same time. With the references genome and transcriptome undergoing periodic updates, and the need for uniform parameters when integrating multiple data sets, there is great need for a streamlined process for RNA-seq analysis. We demonstrate how SNAPR does this efficiently and accurately, with the high-throughput capacity needed to do high-volume analyses.

Supplemental Content

Full text links

Icon for PubMed Central
Loading ...
Support Center