Format

Send to

Choose Destination
BMC Genomics. 2018 Feb 14;19(1):144. doi: 10.1186/s12864-018-4503-6.

RNA-QC-chain: comprehensive and fast quality control for RNA-Seq data.

Author information

1
Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Key Laboratory for Sustainable Development of Marine Fisheries, Ministry of Agriculture, and Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao National Laboratory for Marine Science and Technology, Qingdao, Shandong, 266071, China.
2
Shandong Key Laboratory of Energy Genetics, CAS Key Laboratory of Biofuels, and Single Cell Center, Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, 266101, China.
3
University of Chinese Academy of Sciences, Beijing, 100049, China.
4
Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Key Laboratory for Sustainable Development of Marine Fisheries, Ministry of Agriculture, and Laboratory for Marine Fisheries Science and Food Production Processes, Qingdao National Laboratory for Marine Science and Technology, Qingdao, Shandong, 266071, China. chensl@ysfri.ac.cn.
5
Department of Bioinformatics and Systems Biology, Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular Imaging, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei, 430074, China. ningkang@hust.edu.cn.

Abstract

BACKGROUND:

RNA-Seq has become one of the most widely used applications based on next-generation sequencing technology. However, raw RNA-Seq data may have quality issues, which can significantly distort analytical results and lead to erroneous conclusions. Therefore, the raw data must be subjected to vigorous quality control (QC) procedures before downstream analysis. Currently, an accurate and complete QC of RNA-Seq data requires of a suite of different QC tools used consecutively, which is inefficient in terms of usability, running time, file usage, and interpretability of the results.

RESULTS:

We developed a comprehensive, fast and easy-to-use QC pipeline for RNA-Seq data, RNA-QC-Chain, which involves three steps: (1) sequencing-quality assessment and trimming; (2) internal (ribosomal RNAs) and external (reads from foreign species) contamination filtering; (3) alignment statistics reporting (such as read number, alignment coverage, sequencing depth and pair-end read mapping information). This package was developed based on our previously reported tool for general QC of next-generation sequencing (NGS) data called QC-Chain, with extensions specifically designed for RNA-Seq data. It has several features that are not available yet in other QC tools for RNA-Seq data, such as RNA sequence trimming, automatic rRNA detection and automatic contaminating species identification. The three QC steps can run either sequentially or independently, enabling RNA-QC-Chain as a comprehensive package with high flexibility and usability. Moreover, parallel computing and optimizations are embedded in most of the QC procedures, providing a superior efficiency. The performance of RNA-QC-Chain has been evaluated with different types of datasets, including an in-house sequencing data, a semi-simulated data, and two real datasets downloaded from public database. Comparisons of RNA-QC-Chain with other QC tools have manifested its superiorities in both function versatility and processing speed.

CONCLUSIONS:

We present here a tool, RNA-QC-Chain, which can be used to comprehensively resolve the quality control processes of RNA-Seq data effectively and efficiently.

KEYWORDS:

Alignment statistics; Contamination identification; Parallel computing; Quality control; RNA-Seq

PMID:
29444661
PMCID:
PMC5813327
DOI:
10.1186/s12864-018-4503-6
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center