Format

Send to

Choose Destination
Genome Biol. 2015 Sep 2;16:184. doi: 10.1186/s13059-015-0729-7.

Exploiting single-molecule transcript sequencing for eukaryotic gene prediction.

Author information

1
Max Planck Institute for Molecular Genetics, Berlin, Germany.
2
Centre for Genomic Regulation (CRG), Barcelona, Spain.
3
Universitat Pompeu Fabra (UPF), Barcelona, Spain.
4
University of Natural Resources and Life Sciences (BOKU), Muthgasse 18, 1190, Vienna, Austria.
5
Department of Biology/Center for Biotechnology, Bielefeld University, 33615, Bielefeld, Germany.
6
Department of Biology/Center for Biotechnology, Bielefeld University, 33615, Bielefeld, Germany. bernd.weisshaar@uni-bielefeld.de.
7
Max Planck Institute for Molecular Genetics, Berlin, Germany. heinz.himmelbauer@boku.ac.at.
8
Centre for Genomic Regulation (CRG), Barcelona, Spain. heinz.himmelbauer@boku.ac.at.
9
Universitat Pompeu Fabra (UPF), Barcelona, Spain. heinz.himmelbauer@boku.ac.at.
10
University of Natural Resources and Life Sciences (BOKU), Muthgasse 18, 1190, Vienna, Austria. heinz.himmelbauer@boku.ac.at.

Abstract

We develop a method to predict and validate gene models using PacBio single-molecule, real-time (SMRT) cDNA reads. Ninety-eight percent of full-insert SMRT reads span complete open reading frames. Gene model validation using SMRT reads is developed as automated process. Optimized training and prediction settings and mRNA-seq noise reduction of assisting Illumina reads results in increased gene prediction sensitivity and precision. Additionally, we present an improved gene set for sugar beet (Beta vulgaris) and the first genome-wide gene set for spinach (Spinacia oleracea). The workflow and guidelines are a valuable resource to obtain comprehensive gene sets for newly sequenced genomes of non-model eukaryotes.

PMID:
26328666
PMCID:
PMC4556409
DOI:
10.1186/s13059-015-0729-7
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center