Current status and new features of the Consensus Coding Sequence database

Catherine M Farrell; Nuala A O'Leary; Rachel A Harte; Jane E Loveland; Laurens G Wilming; Craig Wallin; Mark Diekhans; Daniel Barrell; Stephen M J Searle; Bronwen Aken; Susan M Hiatt; Adam Frankish; Marie-Marthe Suner; Bhanu Rajput; Charles A Steward; Garth R Brown; Ruth Bennett; Michael Murphy; Wendy Wu; Mike P Kay; Jennifer Hart; Jeena Rajan; Janet Weber; Catherine Snow; Lillian D Riddick; Toby Hunt; David Webb; Mark Thomas; Pamela Tamez; Sanjida H Rangwala; Kelly M McGarvey; Shashikant Pujar; Andrei Shkeda; Jonathan M Mudge; Jose M Gonzalez; James G R Gilbert; Stephen J Trevanion; Robert Baertsch; Jennifer L Harrow; Tim Hubbard; James M Ostell; David Haussler; Kim D Pruitt

doi:10.1093/nar/gkt1059

Current status and new features of the Consensus Coding Sequence database

Nucleic Acids Res. 2014 Jan;42(Database issue):D865-72. doi: 10.1093/nar/gkt1059. Epub 2013 Nov 11.

Affiliation

¹ National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA, Center for Biomolecular Science and Engineering, University of California Santa Cruz (UCSC), Santa Cruz, CA 95064, USA, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK and Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.

Abstract

The Consensus Coding Sequence (CCDS) project (http://www.ncbi.nlm.nih.gov/CCDS/) is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by the National Center for Biotechnology Information (NCBI) and Ensembl genome annotation pipelines. Identical annotations that pass quality assurance tests are tracked with a stable identifier (CCDS ID). Members of the collaboration, who are from NCBI, the Wellcome Trust Sanger Institute and the University of California Santa Cruz, provide coordinated and continuous review of the dataset to ensure high-quality CCDS representations. We describe here the current status and recent growth in the CCDS dataset, as well as recent changes to the CCDS web and FTP sites. These changes include more explicit reporting about the NCBI and Ensembl annotation releases being compared, new search and display options, the addition of biologically descriptive information and our approach to representing genes for which support evidence is incomplete. We also present a summary of recent and future curation targets.

Publication types

Research Support, N.I.H., Extramural
Research Support, N.I.H., Intramural
Research Support, Non-U.S. Gov't

MeSH terms

Animals
Databases, Genetic*
Exons
Genomics
Humans
Internet
Mice
Molecular Sequence Annotation
Proteins / genetics*
Sequence Analysis

Substances

Proteins

Current status and new features of the Consensus Coding Sequence database

Authors

Affiliation

Abstract

Publication types

MeSH terms

Substances

Grants and funding