High-Throughput Genomic Sequences

The HTG GenBank Division

The High Throughput Genomic (HTG) Sequences division was created to make unfinished genomic sequence data rapidly available to the scientific community. It was done in a coordinated effort among the International Nucleotide Sequence databases, DDBJ, EMBL, and GenBank . The HTG division contains unfinished DNA sequences generated by the high-throughput sequencing centers using traditional clone-based Sanger sequencing. Sequence data in this division are available for BLAST homology searches against either the "htgs" database or the "month" database, which includes all new submissions for the prior month. The HTG division of GenBank was described in a Genome Research article by Ouellette and Boguski .

Draft genomes sequenced using non-clone based whole genome shotgun sequencing are not appropriate for HTG, these should be submitted as a WGS submission as described at https://www.ncbi.nlm.nih.gov/Genbank/wgs/.

Location of HTG Records

Unfinished HTG sequences containing contigs greater than 2 kb are assigned an accession number and deposited in the HTG division. A typical HTG record might consist of all the first-pass sequence data generated from a single cosmid, BAC, YAC, or P1 clone, which together make up more than 2 kb and contain one or more gaps. A single accession number is assigned to this collection of sequences, and each record includes a clear indication of the status (phase 1 or 2) plus a prominent warning that the sequence data are "unfinished" and may contain errors. The accession number does not change as sequence records are updated; only the most recent version of a HTG record remains in GenBank. Records remain at phase 2 until the quality of the sequence reaches the finished level. Therefore, there are records that lack gaps but that are phase 2 because the quality scores are too low to promote the record to phase 3. Finished HTG sequences (phase 3) retain the same accession number but are moved into the relevant primary GenBank division . An example of a submission (one accession number) that has progressed through phase 1, phase 2, and phase 3 is available [ examples ]. A newer status for sequences is phase 0. Phase 0 sequences are usually one-to-few pass reads of a single clone and, therefore, are not usually contigs. A phase 0 sequence will be used to check whether another center is already sequencing this clone. If so, the clone may never be finished. If not, it will be sequenced through phase 1, phase 2, and phase 3.

Status Location Definition
Phase 0 HTG division one-to-few pass reads of a single clone (not contigs).
Phase 1 HTG division Unfinished, may be unordered, unoriented contigs, with gaps.
Phase 2 HTG division Unfinished, ordered, oriented contigs, with or without gaps.
Phase 3 Primary division Finished, no gaps (with or without annotations).
There is some flexibility built into the phase definitions. For example, although the majority of submissions represent a collection of unordered or ordered sequences derived from a single cosmid, BAC, or PAC clone, there have been cases where each individual sequence was submitted as phase 1, then updated to phase 2, then upon assembly updated to phase 3. It is more convenient for NCBI to receive 1 submission containing a group of unordered or ordered sequences than to receive 200 submissions representing each unordered or ordered piece of the larger sequence.

Submitting HTG Records

The FTP-based HTG submission system has now been decommissioned.

The HTG submission system was created more than 20 years ago for fast processing of the BAC clones that were being sequenced for the Human Genome Project. As sequencing technologies have advanced over the years, the need for a specialized pipeline for depositing and updating BAC-based sequences has declined significantly. Furthermore, the center tracking that was done for HTG submissions is also no longer needed, since the likelihood of two groups sequencing the same clone has dropped substantially since the completion of clone-based genome sequencing efforts. In addition, NCBI has alternative submission systems for easy deposition of all types of genomic sequence data. Unfortunately, we are no longer able to maintain the older HTG submission system and it has been decommissioned.

To submit BAC-based HTG-like submissions to GenBank, you should use the standard GenBank submission pathway BankIt for fasta submissions (at https://www.ncbi.nlm.nih.gov/WebSub/) or by emailing an ASN file to gb-sub@ncbi.nlm.nih.gov. Similarly, updates to records that were originally submitted to the old HTG submission system will be able to be updated through the standard GenBank update route as described at https://www.ncbi.nlm.nih.gov/genbank/update/.

