GenBank Adds CON Division for Assembly Data

GenBank, EMBL, and DDBJ have established a special-purpose division, Contig (CON), for exchanging assembly instructions for data in the international DNA sequence databases. The CON division contains no sequence data, but rather instructions for the assembly of sequence data from multiple GenBank records.

DNA sequence records in GenBank (as well as EMBL and DDBJ) are currently limited to 350 kb for flexibility in data exchange, analysis, searching, and display. Sequences that exceed 350 kb are split into multiple smaller records with separate accession numbers. The CON division contains information on how to reassemble the full-length contig. It also includes instructions for constructing assemblies of non-contiguous sequence shown in Entrez as “segmented sets.”

General users of GenBank and Entrez should notice no changes in the services. This data has been present in the ASN.1 format and implemented in Entrez for some time. The CON division simply formalizes procedures in development over the past few years, and makes the assembly information available in a more familiar format to users who download the GenBank database.

The CON division will be available with the December 15, 1999, release of GenBank. It applies only to sequence data as deposited in GenBank, EMBL, or DDBJ. It is not being used for other genome assembly projects.


