RefSeq Collaborators and data sources

The RefSeq project is ambitious in scope and we actively welcome opportunities to work with other groups to provide this collection. We value our collaborators contributed information ranging from completely annotated genomes, advice to improve the sequence or annotation of individual RefSeq records, information about official nomenclature, and information about function.

In addition to the significant information collected by collaboration, numerous NCBI staff are involved in database support, programmatic support, and curation.

We collaborate with many groups including:

Consensus CDS (CCDS) Project
consistent annotation of the human and mouse genomes is supported by a collaboration between NCBI, the Wellcome Trust Sanger Institute (WTSI) and the University of California, Santa Cruz (UCSC).
Cytochrome P450
Dr. Nelson curates gene content and representative sequences for this gene family.
FlyBase
FlyBase provides the Drosophila melanogaster RefSeq collection.
Human Gene Mutation Database
contributed to the initial set of human RefSeq records.
HUGO Gene Nomenclature Committee
provide official nomenclature for human genes and curate gene content and representative sequences.
IMGT
International Immunogenetics Information System
Microbial Genomes
Microbial genomes are submitted to GenBank by several groups; we would like to acknowledge that their efforts add significant value to the RefSeq collection as we mine for experimentally supported data. NCBI collaborates with some groups to improve our Prokaryotic genome annotation pipeline, or to provide additional information for the genome, genes, or protein products.
mirRBase - the microRNA database
this is the primary data source for vertebrate RefSeq and Gene records of this type of small RNA molecule.
Mouse Genome Informatics
MGI provide official nomenclature for mouse genes and curate gene content and representative sequences.
Pseudogene.org
one source of pseudogene content represented in RefSeq and Gene.
Rat Genome Database
RGD provides official nomenclature for rat genes and identities genes and representative sequences.
SGD
Saccharomyces Genome Database provides the annotated RefSeq records.
SwissProt/UniProt
NCBI and UniProt collaborate to provide cross-linking between protein datasets.
The Arabidopsis Information Network
TAIR provides the Arabidopsis thaliana RefSeq collection.
VectorBase
the source of genome annotation data represented in RefSeq and Gene for some of the invertebrate organisms that are vectors of human disease.
Viral Genome Advisors
the viral RefSeq collection is curated via an international collaboration and panel of viral advisors
WormBase
WormBase provides the Caenorhabditis elegans (nematode) RefSeq collection.
Zebrafish Model Organism Database (ZFIN)
provide official nomenclature for zebrafish genes and curate gene content and representative sequences.

In addition, numerous individuals have made valuable contributions by helping to curate data for specific genes, gene families, or organisms. While it is impossible to list them all here, their assistance is very much appreciated.

Last updated: 2018-03-21T21:17:41Z