Format

Send to

Choose Destination
J Proteome Res. 2015 Sep 4;14(9):3729-37. doi: 10.1021/acs.jproteome.5b00490. Epub 2015 Aug 3.

PPLine: An Automated Pipeline for SNP, SAP, and Splice Variant Detection in the Context of Proteogenomics.

Author information

1
Engelhardt Institute of Molecular Biology, Russian Academy of Sciences , Moscow, 111991 Russia.
2
Orekhovich Institute of Biomedical Chemistry, Russian Academy of Medical Sciences , Moscow, 119121 Russia.
3
Mechnikov Research Institute of Vaccines and Sera , Moscow, 105064 Russia.
4
Herzen Moscow Cancer Research Institute, Ministry of Healthcare of the Russian Federation , Moscow, 125284 Russia.

Abstract

The fundamental mission of the Chromosome-Centric Human Proteome Project (C-HPP) is the research of human proteome diversity, including rare variants. Liver tissues, HepG2 cells, and plasma were selected as one of the major objects for C-HPP studies. The proteogenomic approach, a recently introduced technique, is a powerful method for predicting and validating proteoforms coming from alternative splicing, mutations, and transcript editing. We developed PPLine, a Python-based proteogenomic pipeline providing automated single-amino-acid polymorphism (SAP), indel, and alternative-spliced-variants discovery based on raw transcriptome and exome sequence data, single-nucleotide polymorphism (SNP) annotation and filtration, and the prediction of proteotypic peptides (available at https://sourceforge.net/projects/ppline). In this work, we performed deep transcriptome sequencing of HepG2 cells and liver tissues using two platforms: Illumina HiSeq and Applied Biosystems SOLiD. Using PPLine, we revealed 7756 SAP and indels for HepG2 cells and liver (including 659 variants nonannotated in dbSNP). We found 17 indels in transcripts associated with the translation of alternate reading frames (ARF) longer than 300 bp. The ARF products of two genes, SLMO1 and TMEM8A, demonstrate signatures of caspase-binding domain and Gcn5-related N-acetyltransferase. Alternative splicing analysis predicted novel proteoforms encoded by 203 (liver) and 475 (HepG2) genes according to both Illumina and SOLiD data. The results of the present work represent a basis for subsequent proteomic studies by the C-HPP consortium.

KEYWORDS:

C-HPP; RNA-seq; SAP; SNP; alternative reading frames; alternative splicing; indel; proteotypic peptides

PMID:
26147802
DOI:
10.1021/acs.jproteome.5b00490
[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for American Chemical Society
Loading ...
Support Center