Clone DB Overview

  1. Introduction
  2. Genomic Libraries
  3. Gene Trap and Targeting Libraries
  4. Clone DB Management of Libraries and Clones
  5. More Info

I. Introduction

The NCBI Clone DB replaces and expands upon the former NCBI Clone Registry, which was established during the era of the human and mouse genome projects to represent a broader range of eukaryotic organisms and clone types. Clone DB integrates information about clones and libraries, including sequence data, map positions and distributor information. This consolidation provides for a more comprehensive view of clone-associated data that will help users take greater advantage of clone resources.

Clones and clone libraries take many forms and play diverse roles in biological research. Broadly defined, a clone can be considered a self-replicating system containing a DNA fragment of interest. A clone library is a collection of clones that share user-defined properties, such as DNA source of origin or vector type. At this time, Clone DB contains records for genomic clones and libraries, the collection of MICER mouse gene targeting clones and cell-based gene trap and gene targeting libraries from the International Knockout Mouse Consortium, Lexicon and the International Gene Trap Consortium. A planned expansion for Clone DB will add records for additional gene targeting and gene trap clones, as well as cDNA clones.

II. Genomic Libraries

A genomic library comprises a set of bacteria (or yeast), each carrying a different small fragment of an organism's DNA. The amount of DNA carried by the organism is determined by the type of cloning vector into which it is inserted (Table 1). Although yeast artificial chromosomes (YACs) and cosmids were extensively used during the early days of the human genome project (HGP), they are no longer commonly employed because of their tendency toward genomic rearrangements that result in clones containing chimeric or deleted sequence. Today, bacterial and P1-derived artificial chromosomes (BACs and PACs) are the basis for most large insert libraries, while fosmids are frequently the choice when libraries with smaller insert sizes are needed.

Table 1: Insert sizes of genomic cloning vectors
Vector Type Maximum Insert Size
YAC 0.2-2 Mb
PAC 100-300 kb
BAC 100-300 kb
fosmid 35-45 kb
cosmid 35-45 kb
phagemid 5-25 kb
plasmid 5-20 kb

Genomic clones played critical roles in the production of the human and mouse reference assemblies. In a clone-based assembly approach, a minimal clone tiling path describing an organism's genome is created using mapping and fingerprinting techniques. Shotgun libraries are created for selected clones in the path, which are then sequenced. Individual clone sequences are locally reassembled, and the tiling path is used to guide sequence assembly at the higher level (Figure 1). Such clone-based assemblies are generally recognized to have the highest possible assembly quality. Although the cost and time associated with this methodology, coupled with the advent of next-generation sequencing technologies that don't require cloning steps, have reduced the numbers of genomes assembled entirely by this approach, genomic clones and libraries continue to play vital roles in research.

Figure 1. Clone-based assembly steps

Clone-based assembly steps

Notably, large insert genomic libraries and a clone-based tiling path remain the tools of choice for the assembly of complex and repetitive genomes, such as those found in many plant species, where the short reads generated with current non-Sanger based sequencing technologies cannot be effectively assembled into longer contigs. Additionally, the mapping of sequenced clone ends to genomes assembled by other means is often used as a means to check and improve assembly quality. More recently, paired-end mapping of genomic clones has emerged as a powerful technique for the identification of structural variation, including insertions, deletions, inversion and translocations. Genomic clones also continue to provide the basis for many of the gene targeting vectors that are used in the production of genetically engineered organisms. Thus, genomic clone resources remain a vital tool in research.

III. Gene Trap and Targeting Libraries

Gene targeting is a technique in which homologous recombination between a vector construct and the genome is used to alter the endogenous gene copy. This technique has proven effective in altering gene activities in a several plant and animal species. A gene-targeting library is a collection of clones that can be used for gene targeting experiments. Gene targeting libraries may be maintained in either non-integrated or integrated (cell-based) formats. In the non-integrated format, insert sequences are contained within specialized gene targeting vectors maintained within a bacterial host. These vectors contain features essential for further engineering of the insert DNA, gene targeting in a host, and selection of appropriately integrated clones. Integrated, or cell-based, libraries are collections of gene targeting clones that have already been recombined into the host genome of interest. In mice, such libraries are frequently maintained as collections of embryonic stem cells.

Gene trapping is a technique used to introduce insertional mutations across a genome. Unlike gene targeting, which relies upon homologous recombination, gene trapping occurs via the insertion of specially designed vectors into random chromosomal sites. Gene trap vectors, in their basic form, are comprised of a promoterless reporter and/or a selectable marker gene flanked by an upstream splice acceptor and a downstream polyadenylation sequence. When such a vector inserts into the intron of an expressed gene, the endogenous transcript is generally inactivated, while the selectable marker reports the disrupted transcript's normal expression pattern. The gene trap also provides a molecular tag for identification of the disrupted gene. In mice, gene trap libraries are also frequently maintained as collections of embryonic stem cells.

The Mutagenic Insertion and Chromosome Engineering Resource (MICER) is a collection of non-integrated mouse gene targeting clones. There are two MICER libraries with records in Clone DB, one for each of the vectors used in this resource. For additional information on this resource, please view the MICER library records in the Clone DB genomic library browser or see the MICER home page, maintained at the Wellcome Trust Sanger Institute. The collection of Clone DB records for cell-based (integrated) gene targeting and gene trap libraries can be accessed via the cell-based library browser.

IV. Clone DB management of libraries and clones

All libraries in Clone DB have been externally defined, either by the library creator or distributor. Each genomic library in Clone DB is assigned a standard name and abbreviation that is unique for a given organism. Wherever possible, Clone DB adopts library names and abbreviations that have been provided to us by library creators or distributors as the standard names. In cases where a library is commonly known to a research community by more than 1 name, Clone DB assigns one of these names as the standard and stores all other known names and abbreviations as alternates. Alternate library names and abbreviations for a given library can be viewed on the corresponding library-specific record page.

Genomic libraries are comprised of one or more library segments. Segments, like the libraries themselves, are externally defined, and are used to distinguish subsets of clones that share common attributes within the larger context of the library. Segments may reflect distinct ligation or transformation events during library construction, library vectors, or even DNA sources. The particular feature that distinguishes any one segment from another in a given library can be viewed in the detailed record for that library.

A genomic clone's standardized name in Clone DB is comprised of its microtiter plate address (plate number, row and column), prefixed by the standard library abbreviation, to provide a unique clone name (see below). If a non-standardized name has been provided in a clone's insert or end sequence record in an INSDC database, the non-standardized name is stored in Clone DB as an alias of the standardized name. To facilitate integration of clone-based information from various databases, Clone DB strongly recommends that insert and end sequence submissions for genomic clones to INSDC databases utilize the Clone DB standard naming scheme.


Libraries with records in Clone DB may be accessed via the genomic or cell-based library browsers. Filters on these pages can be used to restrict the display to libraries meeting specific criteria. Detailed records for each of the libraries are available by clicking on the library name or abbreviation in this browser. The library-specific pages provide any details regarding library construction and statistics that are stored in Clone DB (Table 2), as well as links to related NCBI resources. Records for individual clones, which include mapped locations, associated sequences and other details, can be obtained via Entrez searches of Clone DB. For detailed descriptions of the browser, library and clone record displays in Clone DB, please see the Clone DB Help page. A per-organism report listing genomic clone insert sequences is available at the Clone DB FTP site.

Table 2: Library details for which information may be available in Clone DB.
Library Detail Description
% Chimeric % of clones in library with inserts containing non-contiguous DNA
% Mito % of clones in library contaminated by mitochondrial DNA
% Non-Recombinant % of clones in library lacking insert
% Organelle % of clones in library contaminated by unspecified organelle DNA
% Plastid % of clones in library contaminated by plastid DNA
Alternate library abbreviation(s) Other abbreviations by which the library is known
Alternate library name(s) Other names by which the library is known
Avg Insert (kb) Average size of insert, in kilobases
Cell line Cell line from which source DNA obtained
Cell type Cell type from which source DNA obtained
Chromosomes Chromosomes represented in chromosome-specific libraries
Coverage Genomic coverage
Developmental Stage Developmental stage of organism from which source DNA obtained
DNA Source Population, strain, breed, cultivar or accession of source DNA
DNA Source ID A library-creator supplied ID for the source DNA
Genome Project ID The NCBI Genome Project ID associated with a library
Health State Health state of organism from which source DNA obtained
Insert Cloning Site(s) Restriction sites on insert DNA used for cloning
Insert Prep Method Method used to prepare genomic DNA in library production
Lab Host Host in which vector is maintained
Linker(s) Linker sequences used for cloning of DNA
Lower size limit (kb) Smallest insert size in library, in kilobases
Organ Organ from which source DNA obtained
Plate ranges Starting and ending plate numbers for the library
Publications Publications describing library construction
Recombination Suitability of vector for homologous recombination
Seq Primers Sequencing primers reported for library
Sex Sex of organism from which source DNA is derived
Source Distributor Distributor of the source DNA
Std dev (kb) Standard deviation of insert, in kilobases
Tissue Tissue from which source DNA obtained
Upper size limit (kb) Largest insert size in library, in kilobases
Vector Cloning Site(s) Restriction enzyme sites in vector into which DNA is cloned
Vector Delivery Method Method used to put vector in laboratory host
Vector Name Name of cloning vector
Vector Selection Methods for selection of vector in host cells
Vector Subtype Specialized subtype for vector

Data for all mouse cell-based (gene trap and gene targeting) libraries with records in Clone DB has been provided courtesy of MGI. Each of these libraries represents a unique combination of: creator, vector, parental cell line, parental cell line strain, and allele type. These details are available on the individual library record pages.

Clone DB is not a library or clone distributor. Distributor information for specific libraries is provided in the library record and library browser pages. Contact information for the complete list of suppliers of libraries and clones in Clone DB is available on our distributors page. Please note: NCBI endeavors to keep distributor information up-to-date. Please contact Clone DB if you find any incorrect information.

V. More Info

For more information about Clone DB, please see the associated Clone DB Help and/or FAQ pages.

Support Center

Last updated: 2012-04-18T10:30:28-04:00