FAQ Clone DB Genomic Clone Placements

  1. Where can I find clone placements in Clone DB?"
  2. How does Clone DB generate genomic clone placements?
  3. How is the average insert length and standard deviation for a specific genomic library defined?
  4. How is concordance defined?
  5. What is meant by the "confidence" assigned to the placement?
  6. How does Clone DB display clone placements?

  1. Where can I find clone placements in Clone DB?

    For all genomic clones with sequence-based placements (i.e. those in which placement coordinates refer to the sequence of a particular genome assembly), graphical views of placements are presented in the "Genome View" tab of the individual clone record, using NCBI's Sequence Viewer. If a clone has more than one placement, users may select a specific placement to view from the drop-down menu. In the Sequence Viewer, the selected clone placement is displayed in the context of assembly components, annotated genes and other placed clones from the library to which it belongs. The clone of record is highlighted and shown at the top of the placement track. Clicking on any placement will open a pop-up tool tip that contains placement details. Clone placement details can also be viewed in tabular format in the "Sequence-Based Placements" table of "Clone Placements" tab.

    Information for non-sequence based placements (i.e. those in which placement coordinates are independent of assembly sequence) can found in the "Non-Sequence Based Placements" table of the "Clone Placements" tab. Currently, such data is only available for the subset of human genomic clones mapped by The BAC Resource Consortium and CCAP. The cytogenetic coordinates associated with these placements have been mapped to sequence coordinates on the current human reference assembly to allow users to compare their location with any additional sequence-based placements that have been generated for the clone.

  2. How does Clone DB generate genomic clone placements?

    All genomic clone placements currently produced by NCBI are based solely on end sequence alignments. Clone end sequences retrieved from dbGSS and the Trace Archives are screened by Clone DB to remove low quality bases and vector contamination. Only end sequences with ≥ 50 cleaned bases are considered suitable for further analysis. The cleaned sequences are aligned against a window-masked copy of the assembly of interest using NG Aligner, an NCBI-developed, BLAST-derived alignment algorithm. End alignments are performed at the assembly scaffold level and then mapped to the top level molecule to which the scaffold belongs (i.e. chromosome, in the case of placed scaffolds). End placements are ranked by weighted identity, a combined measure of the end alignment’s percent identity and coverage, on a per-assembly basis. Ties are allowed when an end has more than one placement with the same weighted identity in an assembly.

    Clone placements are generated by the pairing of end placements representing the forward (F) and reverse (R) ends of a given clone. Any pair of ranked end placements located on the same top-level molecule may be considered for use in the generation of a clone placement. However, to avoid the creation of duplicate clone placements in instances where there are multiple sequences representing the same clone end, NCBI clusters overlapping ranked end placements and selects a single prototype for usage in clone placements. An end cluster is defined as a set of similarly oriented, ranked end placements from one or more end sequences representing the same end (F/R) of a single clone that overlap in part or in whole on a given assembly scaffold. The cluster prototype is the end placement that holds the 5’-outermost position on the scaffold to which is aligns, and is the only placement from a cluster used for clone placements.

    Figure 1. Example clone placement and end cluster: A clone placement is comprised of single F and R end placements (red and green arrows connected by line). Within an end cluster (3 green “R” arrows), the prototype end placement (outlined in black), holds the outermost position and will contribute to the longest possible clone placement.

    Clone Placement Example

  3. How is the average insert length and standard deviation for a specific genomic library defined?

    The average insert length and standard deviation for each genomic library are calculated from a subset of the clone placements generated on the reference assembly. Only the following clone placements are used for this calculation:

    • (1) both ends have a unique placement on the primary assembly unit
    • (2) both ends are placed on the same scaffold
    • (3) both ends are correctly oriented (i.e. face each other)
    • (4) for BACs/PACs, clone size is >50 kb and <500 kb
    • (5) for fosmids, clone size is >10 kb and <100 kb.

    Note: Because the library average insert and standard deviation are defined by clone placements, these values reflect the assembly to which the library was aligned, and may change with assembly updates.

  4. How is concordance defined?

    All genomic clone placements created by NCBI are defined as either concordant or discordant. Concordant clone placements meet the following criteria:

    • End placements are found in opposite orientation and facing one another.
    • End placements are located within 3 s.d. of the library insert average from one another.

    The concordancy definition for any genomic clone placement in Clone DB can found in the "Sequence-Based Placements" table, found under the "Clone Placements" tab of any individual genomic clone record. Note: The concordancy of placements generated by external sources may be undefined.

  5. What is meant by the “confidence” assigned to the placement?

    At present, all genomic clone placements produced by NCBI are based on pairs of end sequence alignments. Although only two end placements comprise each clone placement, there are sometimes additional end sequences and/or end placements associated with a clone. This additional evidence, which may or may not support the clone placement, is used by NCBI define a score that reflects the confidence in the clone placement. Clone DB expresses confidence in placements using the five non-hierarchical categories below. These categories are intended to help users better assess clone placements as they relate to their particular experimental needs. The confidence score for all genomic clone placements is displayed in the "Sequence-Based Placements" table, found in the "Clone Placements" tab of any individual clone record.

    • Unique: There is a unique placement for the clone within an assembly unit. All end placements associated with the clone support this placement.
    • Unique-dissent: There is a unique placement for the clone within an assembly unit, but there are end placements that do not support the clone placement.
    • Multiple-conflict: There are multiple placements for the clone in an assembly unit; every clone placement is comprised of two uniquely placed ends and the multiple placements are due to non-overlapping end clusters.
    • Multiple-best: There are multiple placements for the clone in an assembly unit; this clone placement is comprised of two top-ranked end placements and the multiple placements are due to the existence of end sequences with multiple end placements.
    • Multiple-other: There are multiple placements for the clone in an assembly unit; this clone placement is comprised of lower-ranked end placements and the multiple placements are due to the existence of end sequences with multiple end placements.
  6. The presence of non-supporting end placements does not always indicate that a clone placement is incorrect. Such evidence may be due to the presence of some type of repetitive sequence in a clone end (resulting in multiple non-overlapping placements) or reflect the placements of unrelated end sequences that were incorrectly assigned a particular clone name during the sequencing process. It is recommended that users review all associated clone placement evidence when selecting a particular clone for their research.

  7. How does Clone DB display clone placements?

    Individual clone record pages contain graphical displays of clone placements. Table 1 describes the color and shading scheme used to depict various placement attributes.

    • Real vs. "virtual" ends: End-sequence derived clone placements are depicted by a line connecting two triangles whose orientations represent the orientations of the individual end placements. All concordant placements will have ends that face one another, but ends comprising discordant placements may be found in various orientation combinations. Most end-sequence derived clone placements represent a pair of real end placements. However, a clone placement may occasionally be derived from a single end placement (e.g. a clone that hangs into an assembly gap). In such cases, the end without a placement is referred to as a "virtual" end.
    • Representing other placement methods: Clone placements not derived from end placements (such as those derived from insert sequence placements) appear as rectangles with an arrow at one end to indicate orientation, if known.
    Table 1: Clone Placement Display Scheme
    Placement Count Concordant End Type Depiction
    Unique Yes Real uniq conc
    Multiple Yes Real mult conc
    Not Defined Yes Real undef conc
    Unique No Real uniq disc
    Multiple No Real mult disc
    Not Defined No Real undef disc
    Unique Not Defined Real uniq undef
    Multiple Not Defined Real mult undef
    Not Defined Not Defined Real undef undef
    Unique Yes Virtual uniq conc virt
    Unique Yes None uniq conc noends

Last updated: Mon, 2012-01-30 11:52