| PubMed | Entrez | BLAST | OMIM | Taxonomy | Structure |
Contamination VecScreen UniVec Database |
The VecScreen output lists all segments of the query sequence that closely match any of the sequences in the UniVec database. The origin of such segments should be questioned because they are likely to have been derived from foreign DNA. This document provides guidance for evaluating the significance of matches reported by VecScreen and also for decontaminating the query by removing the foreign sequences.
Strong and Moderate Matches to VectorBy definition, strong matches very rarely occur by chance, and moderate matches rarely occur by chance. Consequently, strong and moderate matches usually indicate that the segment originated from foreign DNA (vector, adapter, linker, or primer) that was attached to the source DNA/RNA during the cloning process. The occasional moderate match that occurs by chance will lack any corroborating evidence of contamination. Sometimes, however, there is a valid reason why part of the query sequence should match a sequence in the UniVec database (see Exceptions). Weak Matches to VectorWeak matches identify sequence segments that are potentially of foreign origin. Although weak matches often occur by chance, they indicate foreign sequence whenever there is corroborating evidence of contamination. Internal MatchesAdapters, linkers, PCR primers, and vectors are all attached to the ends of the source DNA/RNA. Foreign sequences are therefore much more commonly found near the ends of a query. Occasionally, however, foreign DNA segments can be found in the middle of a query sequence. This can happen when a chimeric insert (an insert assembled from several separate pieces of DNA) is sequenced or when multiple contaminated sequences are assembled into a longer sequence. Internal matches should therefore be checked for the presence of cloning sites or other corroborating evidence that would support the conclusion that the segment is foreign. If no corroborating evidence is found, or if a large portion of the query sequence matches vector, or if the matching segment lies in an open reading frame or other critical region, see Exceptions for possible alternative explanations for the match. mRNA 3' End SequencesBecause mRNAs end with a polyA tail, any sequence following a stretch of polyA in an mRNA (cDNA) sequence almost certainly originates from foreign DNA added during the cloning process. Corroborating Evidence of ContaminationAdditional signs that a segment of DNA is foreign include:
When VecScreen May Underestimate the Extent or Significance of ContaminationIf the origin of the foreign sequence is a vector, adapter, linker, or PCR primer that is not represented in the UniVec database, VecScreen may still report matches to similar sequences that are represented. However, in such cases the reported matches will underestimate the full extent and significance of the contaminating sequences. Although the query sequence may be decontaminated without knowing the origin of the segments of foreign sequence, if the identity of the foreign DNA is known, then the boundaries of the contamination can be located more precisely. Although the alignments shown in the VecScreen output identify the UniVec database entry that matches the query, the full extent of the match to any individual vector will not be apparent because the sequence for most vectors in UniVec is not present as one contiguous piece (see UniVec description). These alignments, therefore, do not indicate which vector has the best overall match to the query sequence. The best way to identify the most likely source(s) of foreign sequence is to review the cloning history of the sequenced DNA/RNA. If you obtained the clone, library, cDNA, etc. from another source, the full history will include all previous cloning, subcloning, and modification of the material. Note which cloning vectors, linkers, adapters, and PCR primers were used to clone the source DNA/RNA and for any subsequent manipulation of the DNA prior to sequencing. The segments of foreign sequence identified by VecScreen can then usually be matched to one or more of the vectors and oligonucleotides used for cloning. Sometimes, however, the foreign sequence may come from an unexpected source, such as contamination with another DNA present in the laboratory. If the cloning history is not known, it may be possible to identify the vector that has the best match to the foreign sequence segment by performing a BLAST search using a database that contains a contiguous sequence for each vector, such as NCBI's vector database. The matches reported by VecScreen may not always locate the exact junction between the foreign sequence and the native sequence for the following reasons: (a) the full extent of the foreign sequence may not be recognized because it originates from a variant MCS, adapter, linker, or primer that is not represented in the UniVec database; (b) sequencing errors may cause the alignment to be truncated before the true junction; and (c) chance similarity to vector sequence may extend a match a few bases into the native sequence. The precise boundary between foreign and native sequence should be easy to locate if the foreign DNA can be identified and the full cloning history of the sequenced DNA is known. However, the expected sequence across the junction is not always observed because of trimming of the cloning sites by nuclease activity, insertion of multiple linkers, or other aberrant cloning events. If the cloning history is unknown, a restriction site analysis on the matching segment and the flanking sequence may locate cloning sites that are good candidates for the boundary of the foreign DNA. If the junction between the foreign sequence and the native sequence cannot be located accurately, the foreign sequence segment(s) identified by VecScreen and any intervening segments of suspect origin should be removed. Removing Terminal Segments of Foreign SequenceA segment of foreign sequence close to either end of the query sequence should be removed, along with any additional sequence between the foreign sequence and the end of the query. The one exception to this rule is that the polyA tail of a mRNA (cDNA) sequence should never be trimmed (even if it matched a UniVec sequence) because it provides a useful landmark. However, any sequence following the polyA tail sequence should always be removed. Removing Internal Segments of Foreign SequenceA segment of foreign sequence in the middle of the query sequence usually indicates that two discontinuous pieces of native sequence have been joined, either at the cloning stage or during sequence assembly. In most cases, the foreign segment should therefore be removed and the query sequence split into two separate sequences. Occasionally, an internal segment of foreign DNA originates from a transposon or insertion sequence that was inserted into the cloned source DNA while it was being propagated in the Escherichia coli or yeast host. If the sequence is intended to represent the content of a particular clone, e.g., a BAC clone from a genome sequencing project, the transposable element sequence should be preserved but should be clearly annotated to indicate the location and identity of the transposon or insertion sequence. The sequence from transposable elements should, however, be removed during the assembly of composite sequences that are intended to represent the genetic information of the biological source organism, such as complete chromosome or genome sequences. Final CheckRe-run VecScreen on the revised query sequence to check that all the foreign sequence has been removed. A positive VecScreen result may not always indicate vector contamination. Exceptions arise when there is a rational explanation for similarity between the query sequence and an element found in vectors. Such cases can usually be discerned by comparing the source and function of the query segment to the source and function of the vector element that it matches. The most common reason for a strong match to vector sequences, other than contamination, is that the query is related to the source of an element that has been incorporated into a vector. Strong matches to vector may be expected if the query contains sequences related to any of the following:
The definition line for a segment in the UniVec database is composed of three parts.
The VecScreen output may include an alignment of the query sequence against a segment of vector contained in the UniVec database. To determine the position of the output alignment within the parent vector, use the following relationship:
Disclaimer | Privacy statement
|
||||||||||||||||||||