- Genome Home

- Genome Resource Guides

- BLAST Database Descriptions
| Database name | Description |
|---|---|
| genome (all assemblies)* | This database represents the current public build of the genome. The sequences in this database will have RefSeq accession numbers or type NT_?????? or NW_?????? and these represent either contigs (from a clone based assembly) or supercontigs (from a whole genome shotgun or composite assembly). The contigs in this database are from both the reference assembly and any alternate assemblies available for the genome. This database is generated at the time of a genome release. |
| genome (reference only) | This database represents the current public build of the genome. The sequences in this database will have RefSeq accession numbers or type NT_?????? or NW_?????? and these represent either contigs (from a clone based assembly) or supercontigs (from a whole genome shotgun or composite assembly). The contigs in this database are from only the reference assembly. This database is generated at the time of a genome release. |
| genome (reference, previous build #) | This database represents the previous public build of the genome. The sequences in this database will have RefSeq accession numbers or type NT_?????? or NW_?????? and these represent either contigs (from a clone based assembly) or supercontigs (from a whole genome shotgun or composite assembly). The contigs in this database are from both the reference assembly and any alternate assemblies available for the genome. This database is generated at the time of a genome release. |
| genome (all assemblies, previous build #) | This database represents the previous public build of the genome. The sequences in this database will have RefSeq accession numbers or type NT_?????? or NW_?????? and these represent either contigs (from a clone based assembly) or supercontigs (from a whole genome shotgun or composite assembly). The contigs in this database are from both the reference assembly and any alternate assemblies available for the genome. This database is generated at the time of a genome release. |
| HTGS | This database is a collection of all sequences in GenBank that have an HTGS (High Throughput Genome Sequence) keyword. This allows users to search htgs_phase3 sequences (normally found in NR) and htgs_phase0, 1 and 2 sequences (normally found in HTGS) at the same time. This database is updated daily using the following Entrez queries against the nucleotide database:
|
| RefSeq RNA | Collection of reference mRNAs generated by the NCBI RefSeq project. This database is generated daily using the following Entrez query against the nucleotide database:
|
| RefSeq protein | Collection of reference proteins generated by the NCBI RefSeq project. This database is generated daily using the following Entrez query against the protein database:
|
| Non-RefSeq RNA | This database is a collection of RNA sequences found in either the non-redundant database or in the third party annotation (TPA) database. This database is generated daily using the following Entrez query against the nucleotide database:
|
| Non-RefSeq protein | This database is a collection of protein sequences found in either the non-redundant database or in the third party annotation (TPA) database. This database is generated daily using the following Entrez query against the protein database:
|
| Build RNA | Collection of reference mRNAs generated by NCBI as part of the genome annotation pipeline. This database is generated at the time of a genome release. |
| Build RNA (previous build #) | Collection of reference mRNAs for the previous build generated by NCBI as part of the genome annotation pipeline. This database is generated at the time of a genome release. |
| Build protein | Collection of reference proteins generated by NCBI as part of the genome annotation pipeline. This database is generated at the time of a genome release. |
| Build protein (previous build #) | Collection of reference proteins generated for the previous build by NCBI as part of the genome annotation pipeline. This database is generated at the time of a genome release. |
| Ab initio RNA | Collection of ab initio RNA predictions generated by NCBI as part of the genome annotation pipeline using Gnomon. This database is generated at the time of a genome release. |
| Ab initio RNA (previous build #) | Collection of ab initio RNA predictions generated for the previous by NCBI as part of the genome annotation pipeline using Gnomon. This database is generated at the time of a genome release. |
| Ab Initio protein | Collection of ab initio protein predictions generated by NCBI as part of the genome annotation pipeline using Gnomon. This database is generated at the time of a genome release. |
| Ab initio protein (previous build #) | Collection of ab initio protein predictions generated by NCBI for the previous build as part of the genome annotation pipeline using Gnomon. This database is generated at the time of a genome release. |
| ESTs | Single pass sequence reads from cDNA libraries. This database is updated daily using the following Entrez query against the nucleotide database:
|
| Clone end sequences | The end sequences of large insert clones, primarily BACs, PACs and fosmids. This database is generated daily using the following Entrez query against the nucleotide database:
|
| Traces-WGS | All of the raw organism WGS traces. This database is updated as needed. |
| Traces-ESTs | All of the raw organism EST traces. This database is updated as needed. |
| Traces-other | All of the raw organism non-WGS and non-EST traces. This database is updated as needed. |
| WGS contigs | If an organism was assembled using a whole genome shotgun (WGS) strategy, this database is available (if the WGS assembly is in GenBank). This database is updated as needed using the following Entrez query against the nucleotide database:
If more than one WGS assembly is available for a given organism, the databases will be separated by project and given unique names. These are documented below. |
| Gene Trap Clones (Mouse Only) | A collection of sequences generated by performing Gene Trap insertions. This database is updated weekly using the following Entrez query against nucleotide:
|
| SNPs | A collection of sequences that define variation in a given organism- primarily these are single nucleotide polymorphisms (SNPs), but can be other types of variation. These databases are generated by dbSNP. |
| Reference Dog Assembly (boxer) | The supercontigs from the Whole Genome Shotgun (WGS) assembly from a 7.6X coverage whole genome library. This assembly was performed at the Broad Institute using the Arachne assembler. |
| TIGR Dog Assembly (Poodle) | This database is a collection of the Whole Genome Shotgun (WGS) contigs assembled from a 1.5X coverage whole genome library. A description of this assembly can be found in Kirkness et al. (2003). |
| TIGR Dog Extra (Poodle) | This database is a collection of Whole Genome Shotgun (WGS) reads that were not assembled into contigs (the Celera Dog Assembly). A description of the assembly can be found in Kirkness et al. (2003). |
| Arachne Chimp WGS Contigs | This database is a collection of Whole Genome Shotgun (WGS) contig assembled using the program Arachne. These contigs were assembled from a 4.5X coverage set of WGS reads. A publication describing this assembly should be forthcoming. |
| PCAP Chimp WGS Contigs | This database is a collection of Whole Genome Shotgun (WGS) contig assembled using the program PCAP. These contigs were assembled from a 4.5X coverage set of WGS reads. A publication describing this assembly should be forthcoming. |
| Celera CSA | Celera January 2001 compartmental shotgun assembly (CSA) of the human genome. It was generated from the 27 million reads of Celera's 5.3X whole genome shotgun data and 16 million 'reads' of shredded GenBank data from other human genome projects (Science 2001. 16;291(5507):1304-51). It was generated by the Celera Assembler applied to 3800 separate compartments of Celera and GenBank data associated by inferred sequence overlaps and Celera read pairs. It relied on Celera's paired reads and the BAC end reads for long range order and orientation. See Istrail et al. (2004). |
| Celera cWGA | This is the November 2000 combined whole genome shotgun assembly (WGA) of the human genome. It was generated by the Celera Assembler applied to the 27 million reads of Celera's 5.3X whole genome shotgun data and 16 million 'reads' of shredded GenBank data from other human genome projects (Science 2001. 16;291(5507):1304-51). It relied on Celera's paired reads and BAC end reads from GenBank for long range order and orientation. See Istrail et al. (2004). |
| Celera WGA | This is the December 2001 whole genome shotgun assembly (WGSA) of the human genome. It was generated by the Celera Assembler applied to shotgun data only: the 27 million reads of Celera's 5.3X whole genome shotgun data and 104,000 BAC end sequence pairs from GenBank from other human genome projects (Nature 1996. 381:364-366; Genomics 2000. 63:321-332). It relied on Celera's paired reads and the BAC end reads for long range order and orientation. See Istrail et al (2004). |
| HSC_TCAG | The Hospital for Sick Children Center for Applied Genomics assembly of Human Chromosome 7. This is a combination of WGS sequence data generated at Celera and HTGS sequence generated by the Human Genome Sequencing Consortium. An analysis of this assembly was published by Scherer et al. (2003). |
| Venter Reads | A collection of reads generated as part of the Venter Genome. These reads are traditional Sanger based reads and can be found in the Trace Archive. See Levy S, et al. (2007) for more information |
| Watson Reads | A collection of sequence reads generated as part of the Watson Genome project. These reads were generated using 454 technology and can be found in the Short Read Archive. See Wheeler et al., (2008) for more information. |
*in some cases this is labeled 'genome'
May 20, 2008