VecScreen is a system for quickly identifying segments of a nucleic acid sequence that may be of vector origin. NCBI developed VecScreen to minimize the incidence and impact of vector contamination in public sequence databases. GenBank Annotation Staff use VecScreen to verify that sequences submitted for inclusion in the database are free from contaminating vector sequence. Any sequence can be screened for vector contamination using the VecScreen Web site.
VecScreen searches a query for segments that match any sequence in a specialized non-redundant vector database (UniVec). The search uses BLAST with parameters preset for optimal detection of vector contamination. Those segments of the query that match vector sequences are categorized according to the strength of the match, and their locations are displayed (see an example of a positive result).
VecScreen is designed to quickly check a nucleic acid sequence for the presence of vector contamination and to show which segments within the sequence may be of vector origin. Although a VecScreen search against UniVec will not identify the vector that is the most likely source of the contamination (see UniVec Limitations), this can usually be deduced from the cloning history of the sequenced DNA (see Identifying the Foreign Sequence for more details).
Guidance on how to interpret positive VecScreen results and also on how to remove the foreign segment(s) from a contaminated sequence is available in Interpretation of VecScreen Results.
The sequence of any vector contamination should theoretically be identical to the known sequence of the vector. In practice, occasional differences are expected to arise from sequencing errors, and less frequently, from engineered variants or spontaneous mutations. The search parameters used for VecScreen have, therefore, been chosen to find sequence segments that are identical to known vector sequences or which deviate only slightly from the known sequence.
The blastn parameters used for VecScreen are significantly more stringent than the default blastn parameters. The principal differences are:
The VecScreen parameters are pre-set using blastn options:
Vector contamination usually occurs at the beginning or end of a sequence; therefore, different criteria are applied for terminal and internal matches. VecScreen considers a match to be terminal if it starts within 25 bases of the beginning of the query sequence or stops within 25 bases of the end of the sequence. Matches are categorized according to the expected frequency of an alignment with the same score occurring between random sequences.