Enter sequence
Simply paste your sequence as is. You may opt to include a definition line starting with ">" at the top in conforming to FASTA format. You can also load your sequences contained in a local file (make sure it is a plain text file). If the sequence is already in GenBank, you can just enter its accession or gi #.
Multiple query sequences may be submitted (not available to nr database at this time). This option is limited to 200 sequences per search and requires that each sequence have a unique identifier. We suggest that you do not use white spaces in the identifier as any characters after the white spaces will be excluded.
Use your own germline V gene
Paste your own germline V gene sequence. This is useful if you know your query Ig sequence originates from a germline V gene sequence that is not in our germline V genes database. Your germline sequence will be displayed as top germline hit.
Show amino acid translation
This will translate your query as well as the top germline sequence and align the amino acid to the second base of a codon. The mismatched amino acids in the germline sequence will be colored red.
Focus on the V gene segment
This allows you to find, with a single search, the best matches for the V gene segment in your query sequence among nr, Ig V gene sequence, or pdb database. This option has no effect on search against Ig germline V gene database (see explanation below).
A typical rearranged Ig query sequence includes a leader, the V, D, J gene segments (sometimes the C region is also included). When the sequence is submitted for blast search, the similarity matches will be performed over the entire query sequence. Unlike the Ig germline V gene database which only contains the V gene segment sequences, other databases such as nr, Ig V gene sequences, or pdb contain many rearranged Ig sequences that also include a leader, the V, D, J gene segments. As a result, the best hit from these databases does not necessarily have the best match to the query V gene segment; Rather, it has the best match over the entire query sequence (For example, it may have very high similarity to the leader, D, or J segments in your query sequence but only a low match to the V gene segment). This is not a problem if that is what you are looking for (i.e., to find the best overall match to your query sequence). However, if you are interested in only finding best matches to the V gene portion of your query sequence, then you'd have to manually isolate the V gene segment from your query sequence and perform the search.
This issue can now be addressed by the current "Focus on the V gene segment" option. With this option on, the V gene segment from your query sequence will be automatically isolated (based on comparison to Ig germline V gene database) and then used for search against nr, Ig V gene sequence, or pdb database.
Ig domain system
The V gene domain can be classified using either IMGT numbering system (Lefranc et al 2003) or Kabat numbering system (Kabat et al, 1991, Sequences of Proteins of Immunological Interest, National Institutes of Health Publication No. 91-3242, 5th ed., United States Department of Health and Human Services, Bethesda, MD). Domain annotation of the query sequence is based on pre-annotated domain information for the best matched germline hit.
nr database
All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, or phase 0, 1 or 2 HTGS sequences). No longer "non-redundant".
Ig germline V genes
Sequences in our collections of Igh, Ig kappa and Ig lambda germline V genes.
Ig germline V genes (old)
This is our previous version of human Ig germline V genes database before addition of the human germline sequences from IMGT database. All sequences in this database are already contained in the current version although they may have different names.
Ig V gene sequences
Ig V gene sequence database is a subset of nr and patent database. It includes Ig V gene sequences that show significant similarity to any of the germline V genes from human or mouse. The similarity threshold is 50% identity over at least 1/3 of the germline V gene length (i.e., 96 for nucleotide sequence and 32 for protein sequence).
The following sequences are excluded from this database.
1. Nucleotide sequences longer than 4,000,000 bp.
2. Sequences obtained from automatic genome annotation (i.e., sequences having accession numbers starting with XM or XP).
This database is intended to include Ig V genes only. Non-Ig V gene or Ig V-like gene (i.e., T cell receptors, VpreB, etc) sequences are excluded even when they are >=50% identical to Ig V germline genes. The only exception to this rule is when they are located inside or near Ig V gene locus and are therefore part of the sequences (usually large genomic sequences) that contain Ig V genes.
The database update is synchronized with the nr database. Using this database instead of the nr database is recommended if you are only interested in human or mouse Ig V gene sequences because the search speed is much faster.
The Ig V gene sequence databases (igSeqNt and igSeqProt) are available on blast ftp site.
Germline V gene function category
Option to search only certain category for germline V genes.
Origin of the query sequence
Specify the organism which the query sequence comes from. This allows the program to choose the corresponding Ig germline gene database for annotating the domains and reporting the germline genes correctly.
Organism
Choose an organism to limit your search. Note that this option has no effect for Ig germline V gene database. When germline V gene database is chosen, the organism will be automatically set to the same one as specified in "Origin of the query sequence".
Maximal number of alignments to show
limits the database sequence hits to the number specified. Note that this option has no effect on the automatical germline gene reporting function (i.e., the top three V, D and/or J genes).
Expect
The statistical significance threshold for reporting matches against database sequences. Lower EXPECT thresholds are more stringent and report only high similarity matches. Choose higher EXPECT value (for example 1 or more) if you expect a low identity between your query sequence and the targets.
Penalty for a mismatch
A higher penalty (-3 highest) tends to find higher similarity matches. However, if less similar sequences are desired, a lower penalty (such as -1) can be chosen. For example, if your sequence is severely mutated (including insertions and deletions), choosing a lower penalty can give you longer matches.
Number of retrieved sequences to search for germline genes
In addition to finding germline genes for the query sequence, this program can also match the returned hits (for searching database other than the germline V genes database) to the closest germline V genes. Users can specify the number of sequences for which they want this function to be done. This makes it easier to identify related hits.
Last modified: Mon Sep 10 11:03:05 EDT 2007