![]() |
| GENSAT Project Data Now in Entrez My NCBI Influenza Virus Resource NCBI ToolKit Utility Programs New Microbial Genomes in GenBank Iceman Preserved in GenBank RefSeq Updates RefSeq Release 11 New Organisms in UniGene GenBank Release 147 New Genome Build NCBI Courses PubMed Corrects Spelling BLAST Lab LocusLink Retired Masthead |
The CCDS set is built by consensus among the collaborating members including the European Bioinformatics Institute (EBI), National Center for Biotechnology Information (NCBI), the Wellcome Trust Sanger Institute (WTSI), and the University of California, Santa Cruz (UCSC). Annotated genes that are included in the CCDS set are given a unique identifier and version number (e.g., CCDS1.1, CCDS234.1) akin to the GenBank "accession.version" system. If the CDS structure changes or if the underlying genome sequence changes, then the version number will be incremented. With annotation and sequence based genome browser update cycles, the CCDS set will be mapped forward, maintaining identifiers. All changes to existing CCDS genes are made by collaboration agreement. The CCDS set is calculated on the basis of coordinated whole genome annotation updates carried out by the NCBI and Ensembl. To be included in the CCDS set, coding regions must be annotated as full-length, with an initiating ATG and valid stop codon; must be translated from the genome without frameshifts, and must use consensus splice-sites. Annotations are made via a mixture of manual curation and automated computational processing. Genome annotations resulting from the NCBI and Ensembl pipelines are first compared to identify annotated coding regions that have identical locations on the genome. Then, lower quality CDSs from this core set are removed pending additional review among the collaboration groups. Quality tests include analysis to identify putative pseudogenes, retrotransposed genes, consensus splice sites, supporting transcripts, and protein homology. As of March 2005, the initial CCDS dataset contains 14,795 coding sequences and 13,142 genes, representing more than half of the human genes, according to the current gene number. Visit the CCDS Project Web site at: |
|||
|