• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of bioinformLink to Publisher's site
Bioinformation. 2008; 3(4): 147–149.
Published online Dec 6, 2008.
PMCID: PMC2637961

SNP-Flankplus: SNP ID-centric retrieval for SNP flanking sequences

Abstract

The flanking sequences provided by dbSNP of NCBI are usually short and fixed length without further extension, thus making the design of appropriate PCR primers difficult. Here, we introduce a tool named “SNP-Flankplus” to provide a web environment for retrieval of SNP flanking sequences from both the dbSNP and the nucleotide databases of NCBI. Two SNP ID types, rs# and ss#, are acceptable for querying SNP flanking sequences with adjustable lengths for at least sixteen organisms.

Availability

This software is freely available at http://bio.kuas.edu.tw/snp-flankplus/

Keywords: PCR, SNP, primer design, flanking sequences

Background

Single nucleotide polymorphisms (SNPs) are the most commonly encountered genetic variants. Many kinds of primer design software tools, such as Primer 3 [1], provide the suitable polymerase chain reaction (PCR) primers for the PCR-based SNP genotyping methods. A longer template sequence is more helpful for optimal primer design; however, the SNP flanking sequences provided in NCBI dbSNP [2] are not always long enough for regular primer design.

Recently, FESD [3] designed a “SNPflank” function to identify flanking sequences for SNP IDs and provided customizable length with rs# input alone for human SNPs but is inaccessible recently. To offer longer template sequences for desired SNP for genotyping experiments, such as TaqMan real‐time PCR [4], PCR-RFLP [5], and PCR‐CTTP [6], we introduce the SNP-Flankplus for on-line retrieval of flanking sequences of target SNPs for sixteen organism genomes.

Methodology

The system design, algorithm and database of the program are described below.

Algorithm

This program adopts the sequences of accession numbers of the corresponding SNPs and the SNP contig position to obtain desired flanking sequence with specific length. In order to save memory space during reading the sequence of accession numbers, this system employs “block location way”, which splits the sequence of the accession numbers into multiple blocks. A specific block is loaded into the memory to search the required sequence and is hit by the algorithm 1 (under supplementary material).

When the flanking length exceeds a block, some nearby blocks aer used, i.e. (block hit - d) or (block hit + d). d is the size of extending blocks and is calculated by the algorithm 2 (under supplementary material).

Database

The source databases are retrieved on-line and constantly updated from NCBI dbSNP and Nucleotide [4].

Result

Input

The four main input interfaces in SNP-Flankplus are followed: (1) Single Reference cluster ID (rs#) input; (2) Single NCBI Assay ID (ss#) input; (3) Multiple SNP ID rs# and ss# input by pasting; and (4) Multiple rs# and ss# input through uploading a file (Figure 1a). Users are allowed to enter the SNP ID or multiple SNP IDs (rs# or ss#) for sixteen organisms when querying SNP information. When using the ss# input, the system will first query the corresponding rs#, and then retrieve SNP information related to this rs#. The SNP information contains allele information, submitted SNPs and other data for this RefSNP Cluster. Users can set the desired flanking length for the design of feasible primer sets. Two flanking length options are available: the system can be either set to default lengths of 300 ~ 1000 bps, or alternatively, the length can be set to the maximum length of the corresponding contig accession (Figure 1b).

Figure 1
A web snapshot. (a) Four input interfaces. (b) SNP information and adjustable flanking length. (c) File or text output.

Output

The flanking sequence output is shown in fasta format with on-line representation and file and/or text. It contains SNP ID (rs#), allele name, chromosome position of SNP, contig position of SNP, organism source, contig accession and sequence corresponding position, SNP type, sequence type, and case sensitivity. This information is separated by the “|” symbol. Its limitation of maximum flanking length is dependent on the corresponding contig accession number. Three types of flanking sequences are able to adjustable in real-time, such as: (1) SNP types contain general nucleotides, alleles, and IUPAC formats, (2) sequence types contain original, reverse, complementary, antisense sequences, and (3) case sensitive types contain upper case and lower case (Figure 1c).

Conclusion

SNP-Flankplus provides a real-time update mechanism is employed, and two SNP ID types (rs# and ss#) for sixteen organisms can be entered to obtain the latest SNP information and sequence. A maximum flanking length can be retrieved based on the corresponding contig accession number.

Supplementary material

Data 1:

Acknowledgments

This work was partly supported by the National Science Council in Taiwan under grant NSC97-2622-E-151-008-CC2, NSC96-2221-E-214-050-MY3 and KMU-EM-97-1.1b.

References

1. Rozen S, et al. Methods Mol Biol. 2000;132:365. [PubMed]
2. Sherry ST, et al. Nucleic Acids Res. 2001;29:308. [PMC free article] [PubMed]
3. Kang HJ, et al. Nucleic Acids Res. 2005;33:D518. [PMC free article] [PubMed]
4. De la Vega FM, et al. Mutatation Res. 2005;573:111. [PubMed]
5. Chang HW, et al. BMC Genomics. 2006;7:30. [PMC free article] [PubMed]
6. Hamajima N, et al. J Mol Diagn. 2002;4:103. [PMC free article] [PubMed]

Articles from Bioinformation are provided here courtesy of Biomedical Informatics Publishing Group
PubReader format: click here to try

Formats:

Cited by other articles in PMC

See all...

Links

  • PubMed
    PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...