• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of bioinfoLink to Publisher's site
Bioinformatics. Oct 1, 2010; 26(19): 2474–2476.
Published online Aug 10, 2010. doi:  10.1093/bioinformatics/btq452
PMCID: PMC2944204

Genevar: a database and Java application for the analysis and visualization of SNP-gene associations in eQTL studies

Abstract

Summary: Genevar (GENe Expression VARiation) is a database and Java tool designed to integrate multiple datasets, and provides analysis and visualization of associations between sequence variation and gene expression. Genevar allows researchers to investigate expression quantitative trait loci (eQTL) associations within a gene locus of interest in real time. The database and application can be installed on a standard computer in database mode and, in addition, on a server to share discoveries among affiliations or the broader community over the Internet via web services protocols.

Availability: http://www.sanger.ac.uk/resources/software/genevar

Contact: hc.eginu@sikaztimred.liuonamme

1 INTRODUCTION

Expression quantitative trait loci (eQTL) mapping, where gene expression profiling is treated as a phenotypic trait in genome-wide association studies (GWAS), has successfully been employed to uncover genetic variants that influence expression variation in recent studies (Dixon et al., 2007; Stranger et al., 2007a). Single-nucleotide polymorphism (SNP)–gene associations from eQTL analysis can be investigated in populations (Stranger et al., 2007b) or among tissue types (Dimas et al., 2009; Heinzen et al., 2008). In addition to genome-wide eQTL identification, combinations of eQTLs and lead SNPs identified by GWAS have been provided to interrogate the mechanisms underlying disease susceptibility at specific loci (Grundberg et al., 2009; Nica et al., 2010; Zeller et al., 2010). However, an analytical and visualization tool, together with a structured repository for multiple datasets, is still needed to facilitate the investigation of loci of interest and to share data publicly and among collaborators.

Here, we present Genevar, a database and Java tool designed to provide: (i) data warehousing; (ii) real-time computation of correlation significance; (iii) visualization of mapping results in a user-friendly interface; and (iv) an added web services platform that is implemented as a bridge between the server and multiple users. Genevar allows published data to be visually accessible in a secure fashion, without the need for users to download raw data. Through interactive analysis pipelines, researchers are able to rapidly investigate, for instance, cis-acting eQTLs at the locus of interest.

Complementing already available standalone tools (Chen et al., 2009; Ge et al., 2008), a database-centric architecture enables Genevar to perform complex queries on-the-fly and does not have a high memory requirement for prior reading in large-scale datasets. Furthermore, exploiting the convenience of web-based (Wang et al., 2003; Zou et al., 2007) and web-launch (Mueller et al., 2005) tools, a Java interface was developed that connects to both database and web services. The main advantage of this system design is that users can switch between public services and local data on the same interface. Default services at the Sanger Institute currently contain gene expression profiling and genotypic data from the following two datasets: lymphoblastoid cell lines from eight HapMap3 populations (824 individuals, unpublished data); and three cell types derived from umbilical cords of 75 Geneva GenCord individuals (Dimas et al., 2009).

2 FEATURES

Genevar has two main functionalities in cis-eQTL analysis: (i) identifying eQTLs in genes of interest, and (ii) observing SNP–gene associations surrounding SNPs of interest (Fig. 1). Additional features include SNP–probe association plots and external links to three major genome browsers. Either cis- or trans-eQTLs can be plotted in the SNP–probe association plot module. Mapping results are listed in tree nodes in a structural manner, and information can be saved as PNG diagrams or exported as tab-delimited lists for further use in presentations or publications.

Fig. 1.
Results of Genevar: a scatter plot represents observed eQTLs in a 2 Mb window centering the GBP3 locus in HapMap3 CHB (A), and a line chart illustrates observed SNP–gene associations in a 2 Mb region surrounding rs13277113 SNP in eight HapMap3 ...

Genevar is compatible with PLINK (Purcell et al., 2007) genotype data formats and any tab-delimited expression/genotyping file in our format. After uploading datasets onto the database, Genevar presents expression profiling data and individual genotypes in two cataloged management panels. Once a group of datasets is selected in the follow-up analysis pipelines, the software automatically prompts available expression–genotype pairs for the user to choose from.

Spearman's rank correlation coefficient is performed to estimate the strength of relationship between alleles and gene expression intensities, linear regression is also used to model the relationship between the two variables. To test the significance of the relationship, a t-statistic is employed with n − 2 degrees of freedom for both correlation and regression analysis (Stranger et al., 2007b). The software allows the user to adjust the window size centering on the gene/SNP of interest (e.g. 2 Mb) and user-defined P-value threshold (e.g. P < 0.001) for the featured cis-eQTL analysis. Alternatively, non-parametric permutation P-values are also provided in the subsequent association plot module to further evaluate the significance of nominal P-values. In order to construct a distribution of the test statistic, under the null hypothesis of no SNP–probe associations, expression intensities are randomly re-assigned to individuals' genotypes, then correlation coefficient and statistical significance are re-computed for the relabeled traits, and this procedure is repeated 10 000 times (Stranger et al., 2005).

We recommend users to launch Genevar via Java Web Start from our homepage for the most up-to-date version. After launching, Genevar is initially in web services mode connecting to the Sanger Institute. The user can then make another services connection to affiliated institutes, or switch to database mode connecting directly to user's local database. Genevar can be run completely offline in database mode as there is no communication between the Java interface and Sanger server.

Future work will include modified visualization for displaying next-generation sequence data, e.g. RNA-Seq (Montgomery et al., 2010); and implementation of methylation modules to interrogate epigenomic data.

3 IMPLEMENTATION

This approach to relational database design is an attempt to systematically decompose traditional flat files, which are one record per line and have no structural relationships between the records, into grouped dimension tables and to reduce data redundancy. A normalized and structured repository is suitable to warehouse all kinds of data format regardless of the file size and field numbers. Most importantly, the advantage of using database indexing on expression and genotype fact tables highly stabilize retrieval performance with the subsequent but reasonable cost of slower uploads and increased disk space. The only limitation when the datasets grew would be the storage space as this is a trade-off for query speed.

To maximize the potential of Genevar as a platform shared among affiliations, Genevar has been extended to interact with web services protocols to enhance data security; the database schema will be deployed behind and protected by the firewall, whereas only a secure frontend webpage acting as a middle layer will be accessible to the user over the Internet.

Genevar uses Hibernate library (http://www.hibernate.org) to map object-oriented models onto MySQL relational database tables (http://www.mysql.com) in the back-end, and acquires Apache CXF framework (http://cxf.apache.org) to wrap up database queries and business logics into middle-layer services. Finally, a Tomcat server (http://tomcat.apache.org) is used to provide services in the front-end. For a standalone database-mode Genevar, only a MySQL database is required to be installed on user's local machine. Association results are visualized in genomic views by JFreeChart library (http://www.jfree.org/jfreechart/). A gene-centered scatter plot represents observed SNP–gene associations around genes of interest, and a SNP-centered line chart illustrates observed eQTLs surrounding SNPs of interest (Fig. 1).

Tested on a 1.6 GHz Pentium Centrino laptop with 1 GB of RAM, Genevar was able to upload a 75 × 23k expression dataset onto the database and built up indexes in 1 min; another 23 min were required for the 75 × 400k genotype file. Once it is uploaded, Genevar can fetch per SNP–probe pairs from these 75 individuals in <0.0257 s from the database, and calculates Spearman's rhos and nominal P-values for 486 SNP–probe pairs in 3 s.

ACKNOWLEDGEMENTS

We thank Guillaume Smits and Johan Rung (EMBL-EBI) for their suggestions on improving the functionalities. We also thank Richard Jeffs, James Smith, Paul Bevan (Sanger Webteam) and Andrew Bryant (Database Team) for helpful support on this project.

Funding: Wellcome Trust and Louis-Jeantet Foundation.

Conflict of Interest: none declared.

REFERENCES

  • Chen W, et al. GWAS GUI: graphical browser for the results of whole-genome association studies with high-dimensional phenotypes. Bioinformatics. 2009;25:284–285. [PMC free article] [PubMed]
  • Dimas AS, et al. Common regulatory variation impacts gene expression in a cell type-dependent manner. Science. 2009;325:1246–1250. [PMC free article] [PubMed]
  • Dixon AL, et al. A genome-wide association study of global gene expression. Nat. Genet. 2007;39:1202–1207. [PubMed]
  • Ge D, et al. WGAViewer: software for genomic annotation of whole genome association studies. Genome Res. 2008;18:640–643. [PMC free article] [PubMed]
  • Grundberg E, et al. Population genomics in a disease targeted primary cell model. Genome Res. 2009;19:1942–1952. [PMC free article] [PubMed]
  • Heinzen EL, et al. Tissue-specific genetic control of splicing: implications for the study of complex traits. PLoS Biol. 2008;6:2869–2879. [PMC free article] [PubMed]
  • Montgomery SB, et al. Transcriptome genetics using second generation sequencing in a Caucasian population. Nature. 2010;464:773–777. [PMC free article] [PubMed]
  • Mueller M, et al. eQTL Explorer: integrated mining of combined genetic linkage and expression experiments. Bioinformatics. 2005;22:509–511. [PubMed]
  • Nica AC, et al. Candidate causal regulatory effects by integration of expression QTLs with complex trait genetic associations. PLoS Genet. 2010;6:e1000895. [PMC free article] [PubMed]
  • Purcell S, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007;81:559–575. [PMC free article] [PubMed]
  • Stranger BE, et al. Genome-wide associations of gene expression variation in humans. PLoS Genet. 2005;1:e78. [PMC free article] [PubMed]
  • Stranger BE, et al. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science. 2007a;315:848–853. [PMC free article] [PubMed]
  • Stranger BE, et al. Population genomics of human gene expression. Nat. Genet. 2007b;38:1217–1224. [PMC free article] [PubMed]
  • Wang J, et al. WebQTL: web-based complex trait analysis. Neuroinformatics. 2003;1:299–308. [PubMed]
  • Zeller T, et al. Genetics and beyond–the transcriptome of human monocytes and disease susceptibility. PLoS One. 2010;5:e10693. [PMC free article] [PubMed]
  • Zou W, et al. eQTL Viewer: visualizing how sequence variation affects genome-wide transcription. BMC Bioinformatics. 2007;8:7–11. [PMC free article] [PubMed]

Articles from Bioinformatics are provided here courtesy of Oxford University Press
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...