Format

Send to

Choose Destination
See comment in PubMed Commons below
Mol Cell Proteomics. 2014 Nov;13(11):3184-98. doi: 10.1074/mcp.M114.038299. Epub 2014 Jul 24.

Annotation of the zebrafish genome through an integrated transcriptomic and proteomic analysis.

Author information

1
From the *Institute of Bioinformatics, International Technology Park, Bangalore 560 066, India; ‡Amrita School of Biotechnology, Amrita University, Kollam 690 525, India;
2
§Department of Surgery, Johns Hopkins University, Baltimore, Maryland 21205;
3
¶McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, Maryland 21205;
4
From the *Institute of Bioinformatics, International Technology Park, Bangalore 560 066, India; ‖Centre of Excellence in Bioinformatics, School of Life Sciences, Pondicherry University, Puducherry 605014, India;
5
From the *Institute of Bioinformatics, International Technology Park, Bangalore 560 066, India; ‖Centre of Excellence in Bioinformatics, School of Life Sciences, Pondicherry University, Puducherry 605014, India; **Departments of Biological Chemistry, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205;
6
From the *Institute of Bioinformatics, International Technology Park, Bangalore 560 066, India; ‡‡School of Biotechnology, KIIT University, Bhubaneswar, Odisha 751024, India;
7
§§Department of Computer Science, University of California, San Diego, California 92093;
8
¶McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, Maryland 21205; **Departments of Biological Chemistry, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205;
9
From the *Institute of Bioinformatics, International Technology Park, Bangalore 560 066, India; ¶McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, Maryland 21205; ¶¶Manipal University, Madhav Nagar, Manipal, Karnataka 576104, India;
10
From the *Institute of Bioinformatics, International Technology Park, Bangalore 560 066, India;
11
From the *Institute of Bioinformatics, International Technology Park, Bangalore 560 066, India; ¶¶Manipal University, Madhav Nagar, Manipal, Karnataka 576104, India;
12
From the *Institute of Bioinformatics, International Technology Park, Bangalore 560 066, India; ‖‖Department of Biotechnology, Kuvempu University, Shimoga 577 451, India;
13
From the *Institute of Bioinformatics, International Technology Park, Bangalore 560 066, India; Department of Biochemistry and Molecular Biology, School of Life Sciences, Pondicherry University, Puducherry 605 014, India;
14
From the *Institute of Bioinformatics, International Technology Park, Bangalore 560 066, India; ‡Amrita School of Biotechnology, Amrita University, Kollam 690 525, India; ‖Centre of Excellence in Bioinformatics, School of Life Sciences, Pondicherry University, Puducherry 605014, India; ¶¶Manipal University, Madhav Nagar, Manipal, Karnataka 576104, India;
15
The Center for Genomics and Division of Microbiology & Molecular Genetics, School of Medicine, Loma Linda University, Loma Linda, California 92350; pandey@jhmi.edu sleach1@jhmi.edu chwang@llu.edu.
16
§Department of Surgery, Johns Hopkins University, Baltimore, Maryland 21205; ¶McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, Maryland 21205; pandey@jhmi.edu sleach1@jhmi.edu chwang@llu.edu.
17
From the *Institute of Bioinformatics, International Technology Park, Bangalore 560 066, India; ¶McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, Maryland 21205; **Departments of Biological Chemistry, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205; Sol Goldman Pancreatic Cancer Research Center, Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205; Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, Maryland 21205 pandey@jhmi.edu sleach1@jhmi.edu chwang@llu.edu.

Abstract

Accurate annotation of protein-coding genes is one of the primary tasks upon the completion of whole genome sequencing of any organism. In this study, we used an integrated transcriptomic and proteomic strategy to validate and improve the existing zebrafish genome annotation. We undertook high-resolution mass-spectrometry-based proteomic profiling of 10 adult organs, whole adult fish body, and two developmental stages of zebrafish (SAT line), in addition to transcriptomic profiling of six organs. More than 7,000 proteins were identified from proteomic analyses, and ∼ 69,000 high-confidence transcripts were assembled from the RNA sequencing data. Approximately 15% of the transcripts mapped to intergenic regions, the majority of which are likely long non-coding RNAs. These high-quality transcriptomic and proteomic data were used to manually reannotate the zebrafish genome. We report the identification of 157 novel protein-coding genes. In addition, our data led to modification of existing gene structures including novel exons, changes in exon coordinates, changes in frame of translation, translation in annotated UTRs, and joining of genes. Finally, we discovered four instances of genome assembly errors that were supported by both proteomic and transcriptomic data. Our study shows how an integrative analysis of the transcriptome and the proteome can extend our understanding of even well-annotated genomes.

PMID:
25060758
PMCID:
PMC4223501
DOI:
10.1074/mcp.M114.038299
[Indexed for MEDLINE]
Free PMC Article
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for HighWire Icon for PubMed Central
    Loading ...
    Support Center