Nat Commun. 2019 May 20;10(1):2236. doi: 10.1038/s41467-019-09773-y.
A systems biology approach uncovers cell-specific gene regulatory effects of genetic associations in multiple sclerosis.
Madireddy L, Patsopoulos NA, Cotsapas C, Bos SD, Beecham A, McCauley J, Kim K, Jia X, Santaniello A, Caillier SJ, Andlauer TFM, Barcellos LF, Berge T, Bernardinelli L, Martinelli-Boneschi F, Booth DR, Briggs F, Celius EG, Comabella M, Comi G, Cree BAC, D'Alfonso S, Dedham K, Duquette P, Dardiotis E, Esposito F, Fontaine B, Gasperi C, Goris A, Dubois B, Gourraud PA, Hadjigeorgiou G, Haines J, Hawkins C, Hemmer B, Hintzen R, Horakova D, Isobe N, Kalra S, Kira JI, Khalil M, Kockum I, Lill CM, Lincoln MR, Luessi F, Martin R, Oturai A, Palotie A, Pericak-Vance MA, Henry R, Saarela J, Ivinson A, Olsson T, Taylor BV, Stewart GJ, Harbo HF, Compston A, Hauser SL, Hafler DA, Zipp F, De Jager P, Sawcer S, Oksenberg JR, Baranzini SE.
Abstract
Genome-wide association studies (GWAS) have identified more than 50,000 unique associations with common human traits. While this represents a substantial step forward, establishing the biology underlying these associations has proven extremely difficult. Even determining which cell types and which particular gene(s) are relevant continues to be a challenge. Here, we conduct a cell-specific pathway analysis of the latest GWAS in multiple sclerosis (MS), which had analyzed a total of 47,351 cases and 68,284 healthy controls and found more than 200 non-MHC genome-wide associations. Our analysis identifies pan immune cell as well as cell-specific susceptibility genes in T cells, B cells and monocytes. Finally, genotype-level data from 2,370 patients and 412 controls is used to compute intra-individual and cell-specific susceptibility pathways that offer a biological interpretation of the individual genetic risk to MS. This approach could be adopted in any other complex trait for which genome-wide data is available.
Fig. 1
Overall strategy and computation of the predicted regulatory effect (PRE) in MS-associated loci. a GWAS signals were integrated with cell-specific regulatory information to compute PRE at both population and individual level. In a second stage, genes with high PRE at each of the cell types analyzed were identified in a human protein interactome (PPI) and sub-networks of enriched genes (proteins) were extracted. b Each MS-associated SNP and those in LD were used as query in RegulomeDB. For each SNP, the all regulatory features were annotated and classified according to type and cell of origin. A graph connecting every queried SNP (crosses), the regulatory feature (diamonds), and the target gene (circles) was created and the number of experiments supporting a particular regulatory feature was used as weight (numbers next to SNP). Finally, a PRE score was computed for each gene by summing up weights from all incoming regulatory signals for each of the cell types analyzed. c Heatmap represents the PRE of all genes under GW MS-associated loci for cells of interest. Rows represent genes, and columns denote cell types. Colors indicate positive (red), neutral (white) and negative (blue) PRE values. Two representative regions are highlighted. Region 10 (associated SNP: rs6670198, green box) highlights immune-specific (B, T, and M) regulation of FAM213B and TNFRSF14. In contrast, region 21 (associated SNP rs6032662, blue box), shows high PRE only for CD40 in B cells. C: CNS; L: lung; T: T cells; M: monocytes; B: B cells. This analysis represents all SNPs with an r2 > 0.5 of the main GW effect
Nat Commun. 2019;10:2236.
Fig. 2
Network connectivity analysis. The PRE of genes were loaded as attributes in a protein interactome. In the central panel, genes with a PRE above the 95th percentile of their respective cell-specific distributions are visualized (M: monocyte, green; T: T cells, red; B: B cells, blue, C: CNS, yellow). For each cell type, the number of edges in the sub-network composed of interacting proteins with PRE above the threshold was analyzed. In this example, the CNS sub-network is composed of 109 nodes and 71 edges. Ten thousand random networks with the same number of nodes (i.e. 109) were generated and the distribution of edges was plotted along with the number of edges of the relevant sub-network (i.e. 71). A p-value was computed to evaluate the probability that this number of edges was seen by chance
Nat Commun. 2019;10:2236.
Fig. 3
Cell-specific gene sub-networks of GW associated regions (r2 > 0.5). Graphs correspond to the largest connected component in each cell/tissue bucket. Nodes represent proteins and edges represent interactions. For each cell type the PRE is proportional to the color intensity (dark: high; light: low). Genes/proteins are organized according to their cellular distribution. The histogram next to each sub-network shows the distribution of the number of edges of 10,000 randomly generated networks. The red arrows denote the number of edges observed in the corresponding sub-network and the p-value, the probability of observing a more extreme number of edges in a randomly generated network. a B cells; b T cells; c monocytes. An asterisk is placed next to genes/proteins exclusively observed in that cell type. d shows an aggregate (common) module present in all three cell types. A pie chart describes the GO: molecular functions assigned to these genes and a table describes the nine PANTHER pathways that were significantly enriched
Nat Commun. 2019;10:2236.
Fig. 4
Individualized PRE computations for three representative associated regions. Each row represents an individual (out of 2370 cases and 412 controls), and each column represents a gene within the associated region. Region 9 (a) contains the gene EOMES (green boxes), region 21 includes CD40 (pink boxes) (b) and region 53 (c) the FC receptor-like cluster (yellow boxes). The leftmost column denotes subject status (red: cases; green: controls)
Nat Commun. 2019;10:2236.
Fig. 5
Select case-control intra-individual MS-risk networks. a Number of edges in the largest connected component (LCC) of the network generated among proteins (genes) with high PRE (>25th percentile) in 2370 patients and 412 healthy controls (GW_r2 > 0.5). Each row represents a subject, each column represents a cell type (B: B cell; T: T cell; M: monocyte; c: CNS). The leftmost column indicates subject status (red: cases; green: controls). b Representative sub-networks from subjects at the extremes of the distribution for E-LCC for each cell type. For each network, the number of nodes (N), edges (E), and percentile relative to all subjects (P) is indicated. The intensity of node color is proportional to the PRE of each gene in the corresponding cell type
Nat Commun. 2019;10:2236.
Fig. 6
Heterogeneity in intraindividual MS-risk networks Intraindividual cell-specific networks of four representative MS subjects showing heterogeneity of risk across all cell types. a Cell specific risk networks for subject_id: 201327986. b Cell specific risk networks for subject_id: 201101471. c Cell specific risk networks for subject_id: 201102205. d Cell specific risk networks for subject_id: 201101897. For each subject, the most connected risk network (number of edges in the highest percentile across all subjects) is highlighted within a colored box. For each network, the number of nodes (N), edges (E), and percentile relative to all subjects (P) is indicated. The intensity of node color is proportional to the PRE of each gene in the corresponding cell type. M: monocyte, green; T: T cells, red; B: B cells, blue, C: CNS, yellow
Nat Commun. 2019;10:2236.
Publication types
MeSH terms
Grant support
Full Text Sources
Other Literature Sources
Medical