PLoS Genet. 2011 Jan 13;7(1):e1001273. doi: 10.1371/journal.pgen.1001273.
Proteins encoded in genomic regions associated with immune-mediated disease physically interact and suggest underlying biology.
Franke A, McGovern DP, Barrett JC, Wang K, Radford-Smith GL, Ahmad T, Lees CW, Balschun T, Lee J, Roberts R, Anderson CA, Bis JC, Bumpstead S, Ellinghaus D, Festen EM, Georges M, Haritunians T, Jostins L, Latiano A, Mathew CG, Montgomery GW, Prescott NJ, Rotter JI, Schumm P, Sharma Y, Simms LA, Taylor KD, Whiteman D, Wijmenga C, Baldassano RN, Barclay M, Bayless TM, Brand S, Buning C, Cohen A, Colombel JF, Cottone M, Stronati L, Denson T, De Vos M, D'Inca R, Dubinsky M, Edwards C, Florin T, Franchimont D, Gearry R, Glas J, Van Gossum A, Guthery SL, Halfvarson J, Hommes D, Hugot JP, Laukens D, Lawrance I, Lemann M, Levine A, Libioulle C, Louis E, Mowat C, Newman W, Panés J, Phillips A, Proctor DD, Regueiro M, Rutgeerts P, Sanderson J, Sans M, Seibold F, Steinhart AH, Stokkers PC, Torkvist L, Kullak-Ublick G, Walters T, Targan SR, Brant SR, Rioux JD, D'Amato M, Weersma R, Kugathasan S, Griffiths AM, Mansfield JC, Vermeire S, Duerr RH, Silverberg MS, Satsangi J, Schreiber S, Cho JH, Annese V, Hakonarson H, Daly MJ, Parkes M.
Source
Center for Human Genetics Research, Massachusetts General Hospital, Boston, Massachusetts, United States of America.
Abstract
Genome-wide association studies (GWAS) have defined over 150 genomic regions unequivocally containing variation predisposing to immune-mediated disease. Inferring disease biology from these observations, however, hinges on our ability to discover the molecular processes being perturbed by these risk variants. It has previously been observed that different genes harboring causal mutations for the same Mendelian disease often physically interact. We sought to evaluate the degree to which this is true of genes within strongly associated loci in complex disease. Using sets of loci defined in rheumatoid arthritis (RA) and Crohn's disease (CD) GWAS, we build protein-protein interaction (PPI) networks for genes within associated loci and find abundant physical interactions between protein products of associated genes. We apply multiple permutation approaches to show that these networks are more densely connected than chance expectation. To confirm biological relevance, we show that the components of the networks tend to be expressed in similar tissues relevant to the phenotypes in question, suggesting the network indicates common underlying processes perturbed by risk loci. Furthermore, we show that the RA and CD networks have predictive power by demonstrating that proteins in these networks, not encoded in the confirmed list of disease associated loci, are significantly enriched for association to the phenotypes in question in extended GWAS analysis. Finally, we test our method in 3 non-immune traits to assess its applicability to complex traits in general. We find that genes in loci associated to height and lipid levels assemble into significantly connected networks but did not detect excess connectivity among Type 2 Diabetes (T2D) loci beyond chance. Taken together, our results constitute evidence that, for many of the complex diseases studied here, common genetic associations implicate regions encoding proteins that physically interact in a preferential manner, in line with observations in Mendelian disease.
- PMID:
- 21249183
- [PubMed - indexed for MEDLINE]
- PMCID:
- PMC3020935
Free PMC ArticleFigure 1Pictorial outline of methodology.
A. Genes overlapping the wingspan of associated SNPs are defined, and these genes code for associated proteins. B. Associated proteins are used to recover direct and indirect networks. Direct networks (left) are built from direct interactions between associated proteins according to the InWeb database (colored proteins). Connections between proteins within the same locus are not considered. Indirect networks (right) are built by allowing connections between associated proteins through a protein elsewhere in the genome (grey). Various network parameters to quantify connectivity, defined in the text, are assigned. C. Random networks are built from a within-degree node-label permutation method described in Text S1. An empirical distribution is constructed for each network parameter and used to evaluate the significance of networks. D. Using the same permutation method to score individual proteins, a subset of proteins per locus is nominated as candidates for harboring causal variants (red circles). Scores used to nominate candidates, described in Text S1, are Bonferroni corrected for the number of possible candidates within each locus. E. Candidate genes from D (nominal p-values used) are tested for co-expression.
PLoS Genet. 2011 January;7(1):e1001273.
Figure 2RA and CD direct networks are significantly interconnected.
The direct network connectivity, the number of edges in the direct network, was enumerated for the disease networks and 50,000 random networks. A histogram was plotted to represent random expectation, and the disease network is shown by an arrow for (A) RA and (B) CD. See Figure S6 for remaining parameters and for parameters of height, lipids and T2D.
PLoS Genet. 2011 January;7(1):e1001273.
Figure 3Candidate RA and CD genes are preferentially expressed in immune tissues.
We obtained tissue expression data for 126 different cell types from a publicly available database, which was grouped into immune, gastrointestinal (GI), neuronal and ‘other’ [46]. For each tissue, we compared the expression of RA (A) and CD (B) candidate genes to the rest of the genes in the genome using a one-tailed rank-sum test, resulting in a p-value for each tissue (-log(p) is plotted on the y-axis). A significant difference for a given tissue indicated that the candidate genes were enriched for expression in that tissue compared to all genes in the genome. To test whether our network prioritization identified genes that were co-enriched in specific tissues beyond what was expected from all genes in associated regions, we calculated the same p-values for the rest of the genes in RA and CD associated loci (i.e., the genes that weren't prioritized via our network permutations). In this figure, we plot the tissue enrichment scores for each tissue for the candidate genes (purple) and the non-prioritized genes in the remaining regions of association (black). We indicate the category of tissue on the bottom: immune (red), GI (yellow), neuronal (green) and other (blue). We ordered the tissues by decreasing enrichment score of the candidate genes.
PLoS Genet. 2011 January;7(1):e1001273.
Figure 4Final disease networks.
Resultant networks built from candidate genes are depicted for RA and CD (A and B, respectively). Using only the candidate genes, we plotted the direct network as well as any other proteins connected to the direct network after filtering them on expression in any one of the tissues found to be specific to the core network. 610 such proteins connect to the RA network and 293 such proteins connect to the CD network. Large circles represent disease proteins, and small circles represent the connected proteins. Small red circles indicate proteins connected to the core network that were newly identified associated regions (10 proteins in CD and 1 protein in RA). The large circles are colored by locus.
PLoS Genet. 2011 January;7(1):e1001273.
Publication Types
MeSH Terms
Grant Support
Full Text Sources
Other Literature Sources
Medical