• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of narLink to Publisher's site
Nucleic Acids Res. Jan 2007; 35(Database issue): D707–D710.
Published online Nov 29, 2006. doi:  10.1093/nar/gkl969
PMCID: PMC1751554

Snap: an integrated SNP annotation platform

Abstract

Snap (Single Nucleotide Polymorphism Annotation Platform) is a server designed to comprehensively analyze single genes and relationships between genes basing on SNPs in the human genome. The aim of the platform is to facilitate the study of SNP finding and analysis within the framework of medical research. Using a user-friendly web interface, genes can be searched by name, description, position, SNP ID or clone name. Several public databases are integrated, including gene information from Ensembl, protein features from Uniprot/SWISS-PROT, Pfam and DAS-CBS. Gene relationships are fetched from BIND, MINT, KEGG and are integrated with ortholog data from TreeFam to extend the current interaction networks. Integrated tools for primer-design and mis-splicing analysis have been developed to facilitate experimental analysis of individual genes with focus on their variation. Snap is available at http://snap.humgen.au.dk/ and at http://snap.genomics.org.cn/.

INTRODUCTION

The large amount of ‘omics’ data coming from the complete map of the human genome and downstream work such as transcriptomics, proteomics and variation analyses opens new avenues for decoding sequence data. A long-term strategy of our data management system is to integrate large scale ‘omics’ data with bio-medical focus into a practical setting that supports genetic research in complex human disease. The SNP Annotation Platform (Snap) server is produced to this end and establishes the foundation of an analytic system for single genes and relationships between genes with focus on effects produced by SNPs. Two individuals are 99.9% identical at the DNA level; however, the remaining 0.1% has high medical importance. They define the traits that make us unique and underlie our susceptibility to disease and changes in drug response.

Information from the public domains [Ensembl v38 (1), Uniprot 8.0 (2), Pfam (3), CBS-DAS (4), MINT (5), BIND (6), KEGG 0.6.1 (7)] has been combined with our database with ongoing work to keep the content current and relevant. Moreover, we have integrated our animal model platforms CVDB (8) and PigGIS (9), own comparative genomics platform TreeFam (10) and a protein interaction analysis system currently under construction.

For each gene in Snap, a SeqView entry describes basic genome information and SNPs information. Mapping of protein features to the DNA level, primer design for resequencing and RT–PCR and comparative mis-splicing analysis of both known and user-requested SNPs are available in the SeqView. In addition, a RelationView can be selected for a visual organization of gene networks centered on the selected gene. By integrating evolutionary connections from TreeFam, current interaction networks can be dramatically extended.

The purpose of Snap is to organize and integrate data of medical importance in a user-friendly manner and add a number of convenient tools to aid further analysis of genes and variations within them.

DATA SOURCE AND METHODS

Mapping protein features

The complete human gene and SNP sets from Ensembl (v38), annotated protein features from Swiss-Prot (r132) (11) and predicted features through the CBS-DAS protein annotation viewer were downloaded to a local server. Protein features were mapped to the human genome and SNPs were added by aligning Ensembl proteins and UniProt proteins using the BLAST program (12) with the assistance from the cross-references provided by Ensembl.

Predicting protein interactions

We have featured gene–gene relationships using protein–protein interactions. Information from the experimentally verified database MINT and the computational assistance-based database BIND are combined and data from KEGG are integrated to rank relationships between genes. Furthermore, orthologs defined by TreeFam are employed to transfer protein relationships of other species to the human system. An interaction between two human genes is established when one of the following conditions can be satisfied: (1) Gene A interacts with gene B in MINT or BIND; (2) Gene A and B are both involved in the same metabolic pathway provided by KEGG; (3) one ortholog of gene A interacts with one ortholog of gene B in other species.

DATABASE CONTENT AND ORGANIZATION

SNP index

The current version (v3) of Snap contains 23 710 human genes representing 48 218 individual transcripts and 3 480 292 SNPs from Ensembl v38, 123 884 of which lie in coding regions and 68 072 of these are nonsynonymous.

SNP annotation at transcripts level—prediction of protein features

The protein features from Swiss-Prot and Pfam numbering 2 115 643 are mapped to the DNA sequence to assign further biological meaning to each gene. We classify protein features referring to the category index of Swiss-Prot, and make use of Pfam and DAS-CBS data for complementarity. Five protein feature types with 39 sub-types from Swiss-Prot and two types of features with 11 sub-types from DAS-CBS are imported into Snap, covering ‘protein sorting’, ‘post-translational modifications’ and ‘protein structure and function’. ‘Amino-acid modifications’, ‘change indicators’, ‘regions’, ‘secondary structures’ and ‘others’—including several subunits separately—are features organized and provided in SeqView. 25 562 nonsynonymous SNPs (nsSNPs) are positionally co-localized in the sequence with protein features, comprising 37% of the nsSNPs. See supplementary data Table 1 for a list of all features.

Gene network—gene–gene relationship

Figuratively speaking, genes are spots and relations between them are roads that connect them. Snap presents 197 467 predicted interactions between human genes, of which 67 270 are contained in BIND, 47 826 in MINT, 2120 in KEGG and 80 251 are transferred from orthologous relationships in other species. To generate and show connections between genes, we have produced RelationView to describe networks centering on selected genes. Three formats are provided in Snap to show gene connections: RelationMap, RelationTree and RelationList.

RelationMap is generated by GraphViz (13) (Graph Visualization Software) and the different genes and connections are graphically distinctive by different types of borders and lines. Four levels of genes are shown; the root-gene is level-zero, this level connects with level-one genes, level-one genes connect with level-two genes, and so forth. All extensions for level-zero are always shown. To simplify the picture, extensions from each level-one and level-two gene are shown only if three or less exist. Additionally, the interaction network can be re-centered around any gene by clicking it. The RelationTree supports the RelationMap and presents all relations in the graphical tree detailing their data sources and levels of relationships. Both formats give hierarchical descriptions of gene connections. We also adopt a simple heuristic method, which accounts the number and quality of the supporting sources. All relations were rated and assigned a score reflecting their reliability. In the RelationList, the score was calculated for every two genes in a given map using an internal method based on their source quality. A higher ‘score’ reflects a shorter ‘distance’ between the genes. See supplementary data Table 2 for detailed algorithms.

Web service

(1) Primer design

An online service for PCR primer-design is provided to design primers for any region in the SeqView. Primers can be designed for resequencing individual SNPs, individual introns with flanking regions, individual exons with flanking regions, specific regions of interest or the entire sequence with or without introns. The service runs Primer3 (14) and individual primer pairs are checked for uniqueness using the UCSC In Silico PCR tool (15).

(2) Mis-splicing prediction

In recent years it has become clear that pre-translational regulation is complex and has shown vulnerable to sequence variation not only within the splice site consensus regions but also in a number of intronic and exonic cis-elements important for correct splice-site identification (16). The service ‘Mis-splicing’ estimates the degree of splicing defects resulting from a given nucleotide variation; either a SNP from the database or any base selected by the user. Six different tools are integrated in Snap [NNSplice (17), SpliceView (18), NetGene2 (19), ESEfinder (20), Rescue-ESE (21) and FAS-ESS (22)] to calculate various splicing parameters, and the results from both reference and SNP containing sequences are listed (see supplementary data Table 3 for details).

INTERFACE AND ACCESS

Snap is developed and maintained in a non-profit academic setting and can be directly accessed publicly (see Figure 1).

Figure 1
Main contents of Snap. From one central gene, SeqView and RelationView cover disease relevant aspects within the gene and within the gene's interaction network.

A search is simply done by inserting a gene ID, synonym name, SNP identifier or accession code from Ensembl Uniprot/SWISS-PROT. In SeqView, the primary result window, basic information about the gene is shown including description, position, transcripts, related diseases and polymorphism statistics with direct links to the information sources. The main part of SeqView is a gene map highlighting SNPs in combination with protein features from SWISS-PROT, Pfam and DAS-CBS selected by the user. The list of protein features can be reorganized by ‘use’, ‘class’ or ‘source’. If a protein feature name is marked in red (e.g. Pfam), clicking the feature bar will highlight SNPs contained in regions positive for the feature. By clicking the ‘redraw’ button, the feature is shaded in yellow in the sequence map. The overall interface focuses at annotating features from the transcript level to the DNA level and, most importantly, to the SNP level.

There are three links in the interface of SeqView that can lead to the RelationView, which is another part of Snap created from the perspective of gene–gene interactions. In the RelationMap, a web interface developed to visualize the network in a user-friendly way, three border patterns represent the number of outside links from the gene: (1) no outside links, (2) appropriate number of outside links and (3) too many outside links (threshold: 250). Relations between human genes are joined by bold lines, whereas those based on ortholog data are joined by dashed lines. Genes can be graphically grouped according to their ortholog data to clarify dependence on information from other species or all ortholog data can be disabled and only human gene relations are shown. Five types of layouts, three formats of figures and three levels of relationships are available to honor users' requirements. Gene relations from BIND and orthologs are displayed by default. In the RelationTree, we employed a plus sign ‘+’ to stand for the existing of more relationships, while a backslash ‘\’ to represent no more extended connections. The detailed underlying data can be seen in the RelationList views also.

The primer-design and mis-splicing services are accessible by clicking on the individual SNPs in ‘Polymorphisms Statistics’ or in the gene map in the SeqView. Furthermore, any base can be selected from the sequence and submitted from the panel to the right. For each design, two pairs of primers are given satisfying the parameters of primer size, TM value and product size; all parameters can be changed freely. A list of primer pairs for resequencing can be calculated for an entire mRNA or genomic sequence overlapping one by one, and covering the whole region of interest. The overlap is 100 bp by default. For RT–PCR use, primers are designed on mRNA and span introns specified by the user. Each primer pair must be at least 700 bp apart on the mRNA level and 1500 bp on the genome level. In addition, primers surrounding specified introns including flanking regions can be requested also.

CONCLUDING REMARKS

Complex diseases involve complex interactions of many aspects such as genetically coded variations, epigenetic modification and environmental influences. We need to define heterogeneous patterns of gene variations and their genetic modifiers to fully describe the genetic background of a common disease. The biggest challenge in achieving this aim is to organize the individual genes into one immense network. The three types of RelationView presented in Snap is a, still ongoing, attempt in this field.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

Acknowledgments

The authors wish to thank Pia P. Madsen and Brage S. Andresen for discussions on splicing. This project was supported by Chinese Academy of Sciences (KSCX2-YW-N-023; GJHZ0518), Ministry of Science and Technology under high-tech program 863, Ministry of Education (XXBKYHT2006001), National Natural Science Foundation of China (90608010; 90208019; 90403130; 30221004; 90612019; 30392130), and China National Grid. The work was further supported by the Danish Platform for Integrative Biology, the Ole Rømer grant from the Danish Natural Science Research Council and the Danish Medical Research Council. Funding to pay the Open Access publication charges for this article was provided by the Danish National Research Foundation.

Conflict of interest statement. None declared.

REFERENCES

1. Birney E., Andrews D., Caccamo M., Chen Y., Clarke L., Coates G., Cox T., Cunningham F., Curwen V., Cutts T., et al. Ensembl 2006. Nucleic Acids Res. 2006;34:D556–D561. [PMC free article] [PubMed]
2. Wu C.H., Apweiler R., Bairoch A., Natale D.A., Barker W.C., Boeckmann B., Ferro S., Gasteiger E., Huang H., Lopez R., et al. The Universal Protein Resource (UniProt): an expanding universe of protein information. Nucleic Acids Res. 2006;34:D187–D191. [PMC free article] [PubMed]
3. Finn R.D., Mistry J., Schuster-Bockler B., Griffiths-Jones S., Hollich V., Lassmann T., Moxon S., Marshall M., Khanna A., Durbin R., et al. Pfam: clans, web tools and services. Nucleic Acids Res. 2006;34:D247–D251. [PMC free article] [PubMed]
4. Olason P.I. Integrating protein annotation resources through the Distributed Annotation System. Nucleic Acids Res. 2005;33:W468–W470. [PMC free article] [PubMed]
5. Zanzoni A., Montecchi-Palazzi L., Quondam M., Ausiello G., Helmer-Citterich M., Cesareni G. MINT: a Molecular INTeraction database. FEBS Lett. 2002;513:135–140. [PubMed]
6. Alfarano C., Andrade C.E., Anthony K., Bahroos N., Bajec M., Bantoft K., Betel D., Bobechko B., Boutilier K., Burgess E., et al. The Biomolecular Interaction Network Database and related tools 2005 update. Nucleic Acids Res. 2005;33:D418–D424. [PMC free article] [PubMed]
7. Kanehisa M., Goto S., Hattori M., Aoki-Kinoshita K.F., Itoh M., Kawashima S., Katayama T., Araki M., Hirakawa M. From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res. 2006;34:D354–D357. [PMC free article] [PubMed]
8. Wang J., He X., Ruan J., Dai M., Chen J., Zhang Y., Hu Y., Ye C., Li S., Cong L., et al. ChickVD: a sequence variation database for the chicken genome. Nucleic Acids Res. 2005;33:D438–D441. [PMC free article] [PubMed]
9. Ruan J., Guo Y., Li H., Hu Y., Wang J., Bolund L. PigGIS: Pig Genomic Informatics System. Nucleic Acids Res. in press. [PMC free article] [PubMed]
10. Li H., Coghlan A., Ruan J., Coin L.J., Heriche J.K., Osmotherly L., Li R., Liu T., Zhang Z., Bolund L., et al. TreeFam: a curated database of phylogenetic trees of animal gene families. Nucleic Acids Res. 2006;34:D572–D580. [PMC free article] [PubMed]
11. Boeckmann B., Bairoch A., Apweiler R., Blatter M.C., Estreicher A., Gasteiger E., Martin M.J., Michoud K., O'Donovan C., Phan I., et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 2003;31:365–370. [PMC free article] [PubMed]
12. Altschul S.F., Madden T.L., Schaffer A.A., Zhang J., Zhang Z., Miller W., Lipman D.J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. [PMC free article] [PubMed]
13. Gansner E.R., North S.C. An open graph visualation system and its applications to software engineering. Software Practice and Experience. 1999;30:1209–1233.
14. Rozen S., Skaletsky H. Primer3 on the WWW for general users and for biologist programmers. Methods Mol. Biol. 2000;132:365–386. [PubMed]
15. Hinrichs A.S., Karolchik D., Baertsch R., Barber G.P., Bejerano G., Clawson H., Diekhans M., Furey T.S., Harte R.A., Hsu F., et al. The UCSC Genome Browser Database: update 2006. Nucleic Acids Res. 2006;34:D590–D598. [PMC free article] [PubMed]
16. Cartegni L., Chew S.L., Krainer A.R. Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nature Rev. Genet. 2002;3:285–298. [PubMed]
17. Reese M.G., Eeckman F.H., Kulp D., Haussler D. Improved splice site detection in Genie. J. Comput. Biol. 1997;4:311–323. [PubMed]
18. Rogozin I.B., Milanesi L. Analysis of donor splice sites in different eukaryotic organisms. J. Mol. Evol. 1997;45:50–59. [PubMed]
19. Hebsgaard S.M., Korning P.G., Tolstrup N., Engelbrecht J., Rouze P., Brunak S. Splice site prediction in Arabidopsis thaliana pre-mRNA by combining local and global sequence information. Nucleic Acids Res. 1996;24:3439–3452. [PMC free article] [PubMed]
20. Cartegni L., Wang J., Zhu Z., Zhang M.Q., Krainer A.R. ESEfinder: A web resource to identify exonic splicing enhancers. Nucleic Acids Res. 2003;31:3568–3571. [PMC free article] [PubMed]
21. Fairbrother W.G., Yeh R.F., Sharp P.A., Burge C.B. Predictive identification of exonic splicing enhancers in human genes. Science. 2002;297:1007–1013. [PubMed]
22. Wang Z., Rolish M.E., Yeo G., Tung V., Mawson M., Burge C.B. Systematic identification and analysis of exonic splicing silencers. Cell. 2004;119:831–845. [PubMed]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Links

  • MedGen
    MedGen
    Related information in MedGen
  • PubMed
    PubMed
    PubMed citations for these articles
  • Substance
    Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...