The importance of identifying alternative splicing in vertebrate genome annotation

Database (Oxford). 2012 Mar 20:2012:bas014. doi: 10.1093/database/bas014. Print 2012.

Abstract

While alternative splicing (AS) can potentially expand the functional repertoire of vertebrate genomes, relatively few AS transcripts have been experimentally characterized. We describe our detailed manual annotation of vertebrate genomes, which is generating a publicly available geneset rich in AS. In order to achieve this we have adopted a highly sensitive approach to annotating gene models supported by correctly mapped, canonically spliced transcriptional evidence combined with a highly cautious approach to adding unsupported extensions to models and making decisions on their functional potential. We use information about the predicted functional potential and structural properties of every AS transcript annotated at a protein-coding or non-coding locus to place them into one of eleven subclasses. We describe the incorporation of new sequencing and proteomics technologies into our annotation pipelines, which are used to identify and validate AS. Combining all data sources has led to the production of a rich geneset containing an average of 6.3 AS transcripts for every human multi-exon protein-coding gene. The datasets produced have proved very useful in providing context to studies investigating the functional potential of genes and the effect of variation may have on gene structure and function. DATABASE URL: http://www.ensembl.org/index.html, http://vega.sanger.ac.uk/index.html.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alternative Splicing
  • Animals
  • Databases, Genetic*
  • Genome*
  • Humans
  • Mice
  • Models, Genetic
  • Molecular Sequence Annotation / methods*
  • Molecular Sequence Annotation / standards
  • Vertebrates / genetics*