CCDS
Home
FTP
Process
Statistics
Collaborators
EBI
NCBI
UCSC
WTSI
Contact Us
GenComp eMail
Genome Displays Related Resources
Entrez Gene
HomoloGene
RefSeq
UniGene
|
 |
The Consensus CDS (CCDS) project is a collaborative effort to identify a core set of human and mouse protein coding regions that are consistently annotated and of high quality. The long term goal is to support convergence towards a standard set of gene annotations. Available information includes: | | Announcements |  | History Report Updates September 2, 2008 The History Report has been updated to show the disposition of each CCDS identifier in each NCBI build. This information can be used to track identifiers and their status over time. The disposition categories are: | Made public | Indicates the NCBI build when the CCDS identifier was first publicly available | | | | Withdrawn | Indicates when an identifier was withdrawn from CCDS | | | | Retained | Indicates that a CCDS identifier and version were carried over from the prior NCBI build to the next build without a version increment, indicating the annotation remained stable | | | | Updated | Indicates that a CCDS identifier was carried over from the prior NCBI build, and the CCDS version was incremented, indicating an annotation change | | | | Partial match | Indicates that there was only a partial match between the NCBI and Ensembl annotation in this build | | | | Withdrawn, inconsistent annotation: no match | Indicates that no match was found for this CDS between the NCBI and Ensembl annotation | | | |
| | | Under review, withdrawal: no match | Indicates that no match was found for this CDS between the NCBI and Ensembl annotation. However, the CCDS was already under discussion to withdraw, indicating the loss reflects asynchronous annotation updates. | | | | Reinstated | Indicates when a CCDS identifier that was previously a partial match or no match has been reinstated as a full match. The reinstatement may also include a CCDS version change. |
Using CCDS 38 as an example, the history shows that CCDS 38.1 was made public in NCBI build 35.1, was retained in build 36.2, and was updated to CCDS 38.2 in build 36.3. For CCDS 714, the history shows that CCDS 714.1 was made public in NCBI build 35.1, was a partial match in build 36.2, and was reinstated as a full match in build 36.3. CCDS Public Notes Available: July 9, 2008 CCDS Public Notes are now available for CCDS IDs that were recently updated or withdrawn, following consensus agreement from the NCBI, UCSC and WTSI/EBI groups. Public Notes will be added prospectively but will not be added comprehensively for previous changes. The purpose of these notes is to provide an explanation to users as to why a particular CCDS was either updated or withdrawn, and/or to explain representation choices for more complex cases. The Public Notes can be found in the Report, above the Sequence ID table, for a given CCDS ID that recently underwent a change in representation. For example, see CCDS ID 7726.2. CCDS update released for human: May 1, 2008 The NCBI, Ensembl, and Sanger (Havana) annotation of the human reference genome (NCBI build 36.3) was analyzed to identify additional coding sequences (CDS) that are consistently annotated. Existing CCDS IDs were tracked based on re-identification of matching annotation (both identical and partial matches of CCDS collaboration updates that were asynchronous). CCDS data is available in the CCDS web site and FTP site and will become available in the collaborators' genome and/or gene browser web sites according to each browser's update cycle. This update includes the addition of 2,151 new CCDS IDs and adds 1,249 Genes into the human CCDS set. Human build 36.3 includes a total of 20,159 CCDS IDs that correspond to 17,052 GeneIDs. See the statistics report for details. | | Overview |  | Annotation of genes is provided by multiple public resources, using different methods, and resulting in information that is similar but not always identical. The human and mouse genome sequence is now sufficiently stable to start identifying those gene placements that are identical, and to make those data public and supported as a core set by the three major public genome browsers. The long term goal is to supportconvergence towards a standard set of gene annotations. Toward this end, the Consensus CDS (CCDS) project was established. The CCDS project is a collaborative effort to identify a core set of protein coding regions that are consistently annotated and of high quality. | | Access and Availability |  | Initial results from the Consensus CDS project are now available from the participants' genome browser Web sites. In addition, CCDS identifiers are indicated on the relevant NCBI RefSeq and Entrez Gene records and in Map Viewer displays of RNA (RefSeq) and Gene annotations on the reference assembly. CCDS reports can be accessed by following provided links, or by directly querying the underlying database using the query interface provided at the top of this page. The CCDS dataset is also available for anonymous FTP. | | Collaborators |  | The CCDS set is built by consensus among the collaborating members which include: We envision the CCDS set will become more complete as the independent curation groups agree on cases where they initially differ, as additional experimental validation of weakly supported genes occurs, and as automatic annotation methods continue to improve. Communication among the CCDS collaborating groups is an ongoing activity that will resolve differences and identify refinements between CCDS update cycles. | | CCDS Identifiers and Tracking |  | Annotated genes that are included in the CCDS set are associated with a unique identifier number and version number (e.g., CCDS1.1, CCDS234.1). The version number will update if the CDS structure changes, or if the underlying genome sequence changes at that location. With annotation and sequence based genome browser update cycles, the CCDS set will be mapped forward, maintaining identifiers. All changes to existing CCDS genes are done by collaboration agreement; no single group will change the set unilaterally. | | Process Flow and Quality Testing |  | The CCDS set is calculated following coordinated whole genome annotation updates carried out by the NCBI, WTSI, and Ensembl. Annotation updates represent genes that are defined by a mixture of manual curation and automated computational processing. The main curation groups are the Havana team at the WTSI and the RefSeq annotation group at NCBI. In addition, the manually curated information on chr14 (Genoscope) and Chr7 (Wustl) has been brought in via the Vega resource. The automatic methods are via the Ensembl group and the NCBI genome annotation computational pipeline. Curated information is favored over automated information and the information has to be both consistent in the Hinxton (Vega/Ensembl) and NCBI groups and also pass stringent QC controls. The general process flow for defining the CCDS gene set includes: - compare genome annotation results
- identify annotated coding regions that have identical location coordinates on the genome
- quality evaluation
- remove lower quality CDSs from the core set pending additional review among the collaboration groups.
The CCDS set includes coding regions that are annotated as full-length (with an initiating ATG and valid stop-codon), can be translated from the genome without frameshifts, and use consensus splice-sites. The number and type of quality tests performed may be expanded in the future but includes consistency in cross-species comparative analysis, analysis to identify putative pseudogenes, retrotransposed genes, consensus splice sites, supporting transcripts, and protein homology.
|