Format

Send to

Choose Destination
J Biotechnol. 2016 Oct 10;235:121-31. doi: 10.1016/j.jbiotec.2016.04.023. Epub 2016 Apr 12.

Refined Pichia pastoris reference genome sequence.

Author information

1
Austrian Center of Industrial Biotechnology (ACIB), Petersgasse 14, 8010 Graz, Austria.
2
BioGrammatics Inc., 2120 Las Palmas Drive, Carlsbad, CA 92011, United States.
3
Institute of Molecular Biotechnology, Graz University of Technology, Petersgasse 14, 8010 Graz, Austria.
4
Department of Molecular Genetics and Cell Biology, University of Chicago, 920 East 58th St., Chicago, IL 60637, United States.
5
Synthetic Genomics, Inc., 11149 North Torrey Pines Rd., La Jolla, CA 92037, United States.
6
Austrian Center of Industrial Biotechnology (ACIB), Petersgasse 14, 8010 Graz, Austria; Institute of Pathology, Research Unit Functional Proteomics and Metabolic Pathways, Medical University of Graz, Stiftingtalstrasse 24, 8010 Graz, Austria; Omics Center Graz, BioTechMed-Graz, Stiftingtalstrasse 24, 8010 Graz, Austria.
7
BioGrammatics Inc., 2120 Las Palmas Drive, Carlsbad, CA 92011, United States; Keck Graduate Institute, 535 Watson Drive, Claremont, CA 91711, United States.
8
Austrian Center of Industrial Biotechnology (ACIB), Petersgasse 14, 8010 Graz, Austria; Institute of Molecular Biotechnology, Graz University of Technology, Petersgasse 14, 8010 Graz, Austria; bisy e.U., Wetzawinkel 20, 8200 Hofstaetten/Raab, Austria. Electronic address: a.glieder@tugraz.at.

Abstract

Strains of the species Komagataella phaffii are the most frequently used "Pichia pastoris" strains employed for recombinant protein production as well as studies on peroxisome biogenesis, autophagy and secretory pathway analyses. Genome sequencing of several different P. pastoris strains has provided the foundation for understanding these cellular functions in recent genomics, transcriptomics and proteomics experiments. This experimentation has identified mistakes, gaps and incorrectly annotated open reading frames in the previously published draft genome sequences. Here, a refined reference genome is presented, generated with genome and transcriptome sequencing data from multiple P. pastoris strains. Twelve major sequence gaps from 20 to 6000 base pairs were closed and 5111 out of 5256 putative open reading frames were manually curated and confirmed by RNA-seq and published LC-MS/MS data, including the addition of new open reading frames (ORFs) and a reduction in the number of spliced genes from 797 to 571. One chromosomal fragment of 76kbp between two previous gaps on chromosome 1 and another 134kbp fragment at the end of chromosome 4, as well as several shorter fragments needed re-orientation. In total more than 500 positions in the genome have been corrected. This reference genome is presented with new chromosomal numbering, positioning ribosomal repeats at the distal ends of the four chromosomes, and includes predicted chromosomal centromeres as well as the sequence of two linear cytoplasmic plasmids of 13.1 and 9.5kbp found in some strains of P. pastoris.

KEYWORDS:

Centromere; Genome; Killer plasmid; P. pastoris; RNA-seq; Splicing

PMID:
27084056
PMCID:
PMC5089815
DOI:
10.1016/j.jbiotec.2016.04.023
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Elsevier Science Icon for PubMed Central
Loading ...
Support Center