High-Throughput Genomic Sequences

The HTG GenBank Division

The High Throughput Genomic (HTG) Sequences division was created to accommodate a growing need to make unfinished genomic sequence data rapidly available to the scientific community. It was done in a coordinated effort among the International Nucleotide Sequence databases, DDBJ, EMBL, and GenBank . The HTG division contains unfinished DNA sequences generated by the high-throughput sequencing centers using traditional clone-based Sanger sequencing . Sequence data in this division are available for BLAST homology searches against either the "htgs" database or the "month" database, which includes all new submissions for the prior month. The HTG division of GenBank was described in a Genome Research article by Ouellette and Boguski .

Draft genomes sequenced using non-clone based whole genome shotgun sequencing are not appropriate for HTG, these should be submitted as a WGS submission as described at . NextGen sequences and should not be submitted to HTG instead these should be submitted to the Sequence Read Archive .

Location of HTG Records

Unfinished HTG sequences containing contigs greater than 2 kb are assigned an accession number and deposited in the HTG division. A typical HTG record might consist of all the first-pass sequence data generated from a single cosmid, BAC, YAC, or P1 clone, which together make up more than 2 kb and contain one or more gaps. A single accession number is assigned to this collection of sequences, and each record includes a clear indication of the status (phase 1 or 2) plus a prominent warning that the sequence data are "unfinished" and may contain errors. The accession number does not change as sequence records are updated; only the most recent version of a HTG record remains in GenBank. Records remain at phase 2 until the quality of the sequence reaches the finished level. Therefore, there are records that lack gaps but that are phase 2 because the quality scores are too low to promote the record to phase 3. Finished HTG sequences (phase 3) retain the same accession number but are moved into the relevant primary GenBank division . An example of a submission (one accession number) that has progressed through phase 1, phase 2, and phase 3 is available [ examples ]. A newer status for sequences is phase 0. Phase 0 sequences are usually one-to-few pass reads of a single clone and, therefore, are not usually contigs. A phase 0 sequence will be used to check whether another center is already sequencing this clone. If so, the clone may never be finished. If not, it will be sequenced through phase 1, phase 2, and phase 3.

Status Location Definition
Phase 0 HTG division one-to-few pass reads of a single clone (not contigs).
Phase 1 HTG division Unfinished, may be unordered, unoriented contigs, with gaps.
Phase 2 HTG division Unfinished, ordered, oriented contigs, with or without gaps.
Phase 3 Primary division Finished, no gaps (with or without annotations).
There is some flexibility built into the phase definitions. For example, although the majority of submissions represent a collection of unordered or ordered sequences derived from a single cosmid, BAC, or PAC clone, there have been cases where each individual sequence was submitted as phase 1, then updated to phase 2, then upon assembly updated to phase 3. It is more convenient for NCBI to receive 1 submission containing a group of unordered or ordered sequences than to receive 200 submissions representing each unordered or ordered piece of the larger sequence.

Note that BAC ends should be submitted as GSS records.

Submitting HTG Records

Sequences are prepared for submission by using NCBI's software tools Sequin or tbl2asn. Each center has an FTP directory into which new or updated sequence files are placed. Files appearing in the FTP directories are moved to a processing directory on a daily basis, and processing is begun immediately (see Processing Submissions for more information). Sequencing centers interested in submitting HTG sequences should contact NCBI ( htg-admin@ncbi.nlm.nih.gov ) to set up an FTP account . Please see Submitting HTG Sequences for instructions on preparing HTG submissions.


An additional source of information about the HTG submission process, Frequently Asked Questions (FAQs), may help speed your submission by outlining some specific examples. If you need more information, please contact info@ncbi.nlm.nih.gov for additional help.

Support Center

Last updated: 2017-11-09T23:39:24Z