- Journal List
- HHS Author Manuscripts
- PMC5558435

Intersection of diverse neuronal genomes and neuropsychiatric disease: The Brain Somatic Mosaicism Network
Michael J. McConnell
1Department of Biochemistry and Molecular Genetics, Department of Neuroscience, Center for Brain Immunology and Glia, Children’s Health Research Center, and Center for Public Health Genomics, University of Virginia School of Medicine, 1340 Jefferson Park Avenue, Charlottesville, VA 22908, USA
John V. Moran
2Department of Human Genetics, University of Michigan Medical School, 1241 East Catherine Street, Ann Arbor, MI 48109, USA
3Department of Internal Medicine, University of Michigan, 1500 East Medical Center Drive, Ann Arbor, MI 48109, USA
Alexej Abyzov
4Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic, 200 1st Street S.W., Rochester, MN 55905, USA
Schahram Akbarian
5Department of Psychiatry, Friedman Brain Institute, Icahn School of Medicine at Mount Sinai Hess Center for Science and Medicine, 1470 Madison Avenue, New York, NY 10029, USA
Taejeong Bae
4Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic, 200 1st Street S.W., Rochester, MN 55905, USA
Isidro Cortes-Ciriano
6Department of Biomedical Informatics, Harvard Medical School, 10 Shattuck Street, Boston, MA 02115, USA
Jennifer A. Erwin
7The Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA 92037, USA
Liana Fasching
8Child Study Center, Yale School of Medicine, 333 Cedar Street, New Haven, CT 06520, USA
Diane A. Flasch
2Department of Human Genetics, University of Michigan Medical School, 1241 East Catherine Street, Ann Arbor, MI 48109, USA
Donald Freed
9Department of Neurology, Kennedy Krieger Institute, 707 North Broadway, Baltimore, MD 21205
10Program in Biochemistry, Cellular and Molecular Biology, Johns Hopkins School of Medicine, 725 North Wolfe Street, Baltimore, MD 21205, USA
Javier Ganz
11Division of Genetics and Genomics, Manton Center for Orphan Disease, and Howard Hughes Medical Institute, Boston Children’s Hospital, 3 Blackfan Circle, Boston, MA 02115, USA
12Departments of Neurology and Pediatrics, Harvard Medical School, 25 Shattuck Street, Boston, MA 02115, USA. Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
Andrew E. Jaffe
13Lieber Institute for Brain Development, 855 North Wolfe Street, Baltimore, MD 21205, USA
Kenneth Y. Kwan
2Department of Human Genetics, University of Michigan Medical School, 1241 East Catherine Street, Ann Arbor, MI 48109, USA
14Molecular and Behavioral Neuroscience Institute, University of Michigan Medical School, 109 Zina Pitcher Place, Ann Arbor, MI 48109, USA
Minseok Kwon
6Department of Biomedical Informatics, Harvard Medical School, 10 Shattuck Street, Boston, MA 02115, USA
Michael A. Lodato
11Division of Genetics and Genomics, Manton Center for Orphan Disease, and Howard Hughes Medical Institute, Boston Children’s Hospital, 3 Blackfan Circle, Boston, MA 02115, USA
12Departments of Neurology and Pediatrics, Harvard Medical School, 25 Shattuck Street, Boston, MA 02115, USA. Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
Ryan E. Mills
2Department of Human Genetics, University of Michigan Medical School, 1241 East Catherine Street, Ann Arbor, MI 48109, USA
15Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
Apua C. M. Paquola
7The Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA 92037, USA
Rachel E. Rodin
11Division of Genetics and Genomics, Manton Center for Orphan Disease, and Howard Hughes Medical Institute, Boston Children’s Hospital, 3 Blackfan Circle, Boston, MA 02115, USA
12Departments of Neurology and Pediatrics, Harvard Medical School, 25 Shattuck Street, Boston, MA 02115, USA. Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
Chaggai Rosenbluh
16Department of Cell, Developmental and Regenerative Biology, Department of Genetics and Genomic Sciences, Department of Neuroscience, Friedman Brain Institute, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA
Nenad Sestan
17Department of Neuroscience and Kavli Institute for Neuroscience, Yale School of Medicine, 333 Cedar Street, New Haven, CT 06510, USA
Maxwell A. Sherman
6Department of Biomedical Informatics, Harvard Medical School, 10 Shattuck Street, Boston, MA 02115, USA
Joo Heon Shin
13Lieber Institute for Brain Development, 855 North Wolfe Street, Baltimore, MD 21205, USA
Saera Song
18Howard Hughes Medical Institute, Laboratory of Pediatric Brain Disease, The Rockefeller University, 1230 York Avenue, New York, NY 10065
19Rady Institute of Genomic Medicine, University of California, 9500 Gilman Drive, San Diego, La Jolla, CA 92093, USA
Richard E. Straub
13Lieber Institute for Brain Development, 855 North Wolfe Street, Baltimore, MD 21205, USA
Jeremy Thorpe
9Department of Neurology, Kennedy Krieger Institute, 707 North Broadway, Baltimore, MD 21205
10Program in Biochemistry, Cellular and Molecular Biology, Johns Hopkins School of Medicine, 725 North Wolfe Street, Baltimore, MD 21205, USA
Daniel R. Weinberger
13Lieber Institute for Brain Development, 855 North Wolfe Street, Baltimore, MD 21205, USA
20Departments of Psychiatry and Behavioral Sciences and Neuroscience, 600 North Wolfe Street, Baltimore, MD 21287, USA
21McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, 733 North Broadway, Baltimore, MD 21230, USA
Alexander E. Urban
22Department of Psychiatry and Behavioral Sciences and Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Palo Alto, CA, 94304, USA
Bo Zhou
22Department of Psychiatry and Behavioral Sciences and Department of Genetics, Stanford University School of Medicine, 3165 Porter Drive, Palo Alto, CA, 94304, USA
Fred H. Gage
7The Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA 92037, USA
Thomas Lehner
23Office of Genomics Research Coordination, National Institute of Mental Health, National Institutes of Health, 6001 Executive Boulevard, Rockville, MD 20852, USA
Geetha Senthil
23Office of Genomics Research Coordination, National Institute of Mental Health, National Institutes of Health, 6001 Executive Boulevard, Rockville, MD 20852, USA
Christopher A. Walsh
11Division of Genetics and Genomics, Manton Center for Orphan Disease, and Howard Hughes Medical Institute, Boston Children’s Hospital, 3 Blackfan Circle, Boston, MA 02115, USA
12Departments of Neurology and Pediatrics, Harvard Medical School, 25 Shattuck Street, Boston, MA 02115, USA. Broad Institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
Andrew Chess
16Department of Cell, Developmental and Regenerative Biology, Department of Genetics and Genomic Sciences, Department of Neuroscience, Friedman Brain Institute, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA
Eric Courchesne
24Autism Center of Excellence, Department of Neuroscience, School of Medicine, University of California San Diego, 8110 La Jolla Shores Drive, La Jolla, CA 92037, USA
Joseph G. Gleeson
18Howard Hughes Medical Institute, Laboratory of Pediatric Brain Disease, The Rockefeller University, 1230 York Avenue, New York, NY 10065
19Rady Institute of Genomic Medicine, University of California, 9500 Gilman Drive, San Diego, La Jolla, CA 92093, USA
Jeffrey M. Kidd
2Department of Human Genetics, University of Michigan Medical School, 1241 East Catherine Street, Ann Arbor, MI 48109, USA
15Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, 100 Washtenaw Avenue, Ann Arbor, MI 48109, USA
Peter J. Park
6Department of Biomedical Informatics, Harvard Medical School, 10 Shattuck Street, Boston, MA 02115, USA
Jonathan Pevsner
9Department of Neurology, Kennedy Krieger Institute, 707 North Broadway, Baltimore, MD 21205
10Program in Biochemistry, Cellular and Molecular Biology, Johns Hopkins School of Medicine, 725 North Wolfe Street, Baltimore, MD 21205, USA
Flora M. Vaccarino
8Child Study Center, Yale School of Medicine, 333 Cedar Street, New Haven, CT 06520, USA
25Department of Neuroscience, Yale School of Medicine, 333 Cedar Street, New Haven, CT 06520, USA
Abstract
Neuropsychiatric disorders have a complex genetic architecture. Human genetic population-based studies have identified numerous heritable sequence and structural genomic variants associated with susceptibility to neuropsychiatric disease. However, these germline variants do not fully account for disease risk. During brain development, progenitor cells undergo billions of cell divisions to generate the ~80 billion neurons in the brain. The failure to accurately repair DNA damage arising during replication, transcription, and cellular metabolism amid this dramatic cellular expansion can lead to somatic mutations. Somatic mutations that alter subsets of neuronal transcriptomes and proteomes can, in turn, affect cell proliferation and survival and lead to neurodevelopmental disorders. The long life span of individual neurons and the direct relationship between neural circuits and behavior suggest that somatic mutations in small populations of neurons can significantly affect individual neurodevelopment. The Brain Somatic Mosaicism Network has been founded to study somatic mosaicism both in neurotypical human brains and in the context of complex neuropsychiatric disorders.
Graphical Abstract
Collectively, somatic SNVs, indels, structural variants (e.g., CNVs), and MEIs (e.g., L1 retrotransposition events) shape the genomic landscape of individual neurons. The Brain Somatic Mosaicism Network aims to systematically generate pioneering data on the types and frequencies of brain somatic mutations in both neurotypical individuals and those with neuropsychiatric disease. The resulting data will be shared as a large community resource.

The human body reaches a steady-state level of approximately 1014 cells in adulthood. Because DNA replication and DNA repair are imperfect processes (estimated at ~0.27 to 0.99 errors in ~109 nucleotides per cell division) (1), somatic cells within an individual must differ in the presence of single-nucleotide variants (SNVs) and/or small insertion/deletion (indel) mutations (2–4). In addition to SNVs and indels (5), subsets of neurons also harbor structural variants [which include large (>1 Mb) copy number variants (CNVs), inversions, translocations, and whole-chromosome gains or losses (6–10)] and smaller mobile genetic element insertions (MEIs) (11–16). Here, we define somatic mosaicism as the existence of different genomes within the cells of a monozygotic individual. Well-known examples of somatic mosaicism include ichthyosis with confetti and lines of Blaschko (4).
Healthy neuronal development requires that neural stem cells and progenitor cells (NPCs) undergo tens of billions of cell divisions, both before birth and during the first years of life, to generate the ~80 billion neurons in the fully developed human brain (17). Because neurons are among the longest-lived cells in the body, the accumulation of somatic mutations (i.e., SNVs, indels, structural variants, and MEIs) within NPCs, or perhaps postmitotic neurons (18), could influence neuronal development, complexity, and function (19, 20). Indeed, mounting evidence indicates that somatic mutations in small populations of neurons contribute to various neurodevelopmental disorders (Table 1).
Table 1
Disease abbreviations: CLOVES, Congenital lipomatous overgrowth, vascular malformations, and epidermal nevi; FCD, focal cortical dysplasia; GPCR, G protein–coupled receptor; HME, hemimegalencephaly; MCAP, megalencephaly-capillary malformation-polymicrogyria syndrome; MPPH2, megalencephaly-polymicrogyria-polydactyly-hydrocephalus syndrome-2; NF, neuro-fibromatosis; RALD, Ras-associated autoimmune leukoproliferative disorder; TSC, tuberous sclerosis complex. Mosaicism abbreviations: G, germline; S, somatic; OS, obligatory somatic; MS, milder somatic; SHS, second-hit somatic.
| Gene(s) | Signaling pathway(s) | Disease(s) | Cellular function(s) | Cancer(s) | Cancer role | Mosaicism |
|---|---|---|---|---|---|---|
| PIK3CA (100–104) | PI3K-AKT-mTOR | HME, mosaic overgrowth syndrome, type 2 segmental, CLOVES, MCAP | PI3K subunit, serine/threonine kinase | Cervical, various neoplasms, colorectal | Oncogene | OS |
| AKT1 (105) | PI3K-AKT-mTOR | Proteus syndrome | Serine/threonine kinase | Breast, ovarian, colorectal | Oncogene | OS |
| AKT2 (106) | PI3K-AKT-mTOR | Diabetes mellitus | Serine/threonine kinase | Ovarian, pancreatic, breast, colorectal, lung cancer | Oncogene | G/S |
| AKT3 (101, 103, 13, 107) | PI3K-AKT-mTOR | HME, MCAP, MPPH2 | Serine/threonine kinase | Melanoma, glioma, ovarian cancer | Oncogene | OS |
| MTOR (108) | PI3K-AKT-mTOR | FCD type II | Serine/threonine kinase | Carcinoma, glioblastoma, melanoma | Oncogene | OS |
| DEPDC5 (109, 110) | PI3K-AKT-mTOR | Epilepsy with FCD | mTORC1 repressor | Glioblastoma and ovarian tumors | Tumor suppressor | G/S |
| TSC1 (111, 112) | PI3K-AKT-mTOR | TSC | Negative regulator of mTORC1 | Renal angiomyolipomas | Tumor suppressor | SHS |
| TSC2 (111, 112) | PI3K-AKT-mTOR | TSC | Negative regulator of mTORC1 | Renal angiomyolipomas | Tumor suppressor | SHS |
| NRAS, BRAF, FGFR3, PIK3CA (113–118) | RAS, PI3K-AKT-mTOR | Congenital melanocytic, other nevi; seborrheic keratosis | Cell cycle regulation | (FGFR3) bladder, cervical, urothelial | Oncogene | G/S |
| NF2 (119) | RAS, PI3K-AKT-mTOR | NF type 2 | Negative regulator of Ras, mTOR pathways | Neurofibromas | Tumor suppressor | G/MS |
| NF1 (120–124) | RAS | NF type 1, Watson syndrome | Negative regulator of Ras pathway | Neurofibromas, leukemia | Tumor suppressor | SHS |
| BRAF, NRAS, KRAS (125) | RAS | Pyogenic granuloma | Cell cycle regulation | (KRAS) breast, colorectal, other; (NRAS) thyroid, melanoma, other; (BRAF) melanoma, colorectal | Oncogene | OS |
| HRAS, KRAS (126) | RAS | Schimmelpenning-Feuerstein-Mims syndrome | Cell cycle regulation | (KRAS) bladder, breast, colorectal, pancreatic, other; (HRAS) Colorectal, bladder, kidney, other | Oncogene | OS |
| KRAS (127, 128) | RAS | RALD | Cell cycle regulation | Breast, bladder, other | Oncogene | OS |
| GNAQ (129) | GPCR, MAPK | Sturge-Weber syndrome | G protein alpha subunit | Melanoma | Oncogene | OS |
| GNAQ, GNA11 (130) | GPCR, MAPK | Dermal melanocytosis and phakomatosis pigmentovascularis | G protein alpha subunit | Melanoma | Oncogene | OS |
| MAP3K3 (131) | MAPK | Verrucous venous malformation | Cell cycle regulation | Breast, colon, rectal cancers | Oncogene | OS |
| GNAS (132, 133) | GPCR | McCune-Albright syndrome | G protein alpha subunit | Adenomas, carcinomas, ovarian neoplasms | Oncogene | OS |
| JAK2 (134, 135) | JAK-STAT | Myelofibrosis, polycythemia vera, and essential thrombocythemia | Cell cycle regulation | Leukemia | Oncogene | SHS |
| SCN1A (136) | Sodium channel | Dravet syndrome | Neural excitation | – | – | G/MS |
| NLRP3 (137) | Caspase/inflammasome | CINCA syndrome | Inflammasome subunit | – | – | G/MS |
| PORCN (138) | Wnt | Focal dermal hypoplasia | O-acyltransferase | – | – | G/MS |
| PIGA (139) | Hematopoiesis | Paroxysmal nocturnal hemoglobinuria | ER protein processing | Leukemia | – | OS |
Genomic studies implicitly assume that every cell within an individual has the same genome. Family-based genetic studies, genome-wide association studies (GWAS), and exome sequencing analyses have identified numerous common, rare, and de novo germline SNVs and CNVs associated with an increased risk of autism spectrum disorder (ASD), schizophrenia, and bipolar disorder, but each variant only represents a minor component of population-level disease risk (21–24). In general, these approaches sequence the DNA from available clinical samples (e.g., peripheral blood) to interrogate an individual’s germline genome; they do not account for any additional disease risk brought about by somatic mutations that occur during brain development. To address this knowledge gap, the National Institute of Mental Health (NIMH) supported the formation of the Brain Somatic Mosaicism Network (BSMN). Notably, several outstanding reviews have recently discussed how somatic mutations within the brain may contribute to neurological disease [e.g., (2, 25, 26)]. Here, we build on these discussions and highlight how somatic mutations with in the brain may contribute to neuronal diversity. We also evaluate emerging genomic approaches to measure and validate somatic mosaicism and summarize BSMN efforts to generate a large publicly available resource to evaluate the contribution of somatic mosaicism to neuropsychiatric disease (Fig. 1).
The general approach of the BSMN is to identify mosaic variants in primary human brain tissue from large cohorts of neurotypical individuals and neuropsychiatric disease patients. The methods include bulk sequencing of tissues or sorted neurons (top), sequencing of single cells after whole-genome amplification (middle), or clonal expansion from single cells followed by bulk sequencing (bottom). Each method offers a trade-off between sensitivity and specificity.
Mechanisms of somatic mosaicism
DNA damage occurs constantly in every cell in our bodies, and many components of the DNA damage response are essential for neurodevelopment. Single-strand and double-strand DNA breaks, as well as base mutations, arise as a consequence of DNA replication, transcription, epigenetic modification, cellular respiration, and environmental stressors. If the resultant damage is not accurately repaired, DNA mutations can occur that can lead to somatic variation among neurons and other cell types.
The nonhomologous end-joining (NHEJ) pathway of DNA repair is required for neurodevelopment. Mice deficient in NHEJ proteins exhibit extensive NPC apoptosis and often die prenatally (27). Intriguingly, the embryonic lethality and NPC apoptosis phenotypes are rescued in a p53-null mouse background, suggesting that genotoxic stress contributes to lethality (28). Consistent with these data, compound heterozygous mutations in DNA damage response genes [e.g., ataxia telangiectasia mutated (ATM), ataxia telangiectasia-related (ATR), and ATR-interacting protein (ATRIP)] can lead to increased mutational loads, neurodevelopmental brain defects, and neuronal degeneration (29–31). More broadly, deficits in other DNA repair pathways, such as transcription-coupled repair, homologous recombination, and nucleotide excision repair, also can lead to human neurodevelopmental phenotypes (32, 33).
Defects in different DNA repair pathways are associated with distinct somatic mutation profiles. For example, SNVs and indels can arise from errors during base excision repair, nucleotide excision repair, and transcription-coupled repair (33). Moreover, the action of the apolipoprotein B mRNA editing enzyme, catalytic polypeptide-like-3 (APOBEC3) family of cytosine deaminase proteins can lead to cytidine-to-uridine transition mutations on single-strand DNA that, upon replication, lead to guanosine-to-adenosine mutations on the opposing DNA strand (34). Errors made during DNA mismatch repair also can lead to either interspersed SNVs or indels within microsatellite repeat sequences, whereas errors made during double-strand break repair by homologous recombination, NHEJ, or alternative-NHEJ can lead to CNVs (35, 36).
Errors incurred during DNA replication or transcription also can lead to the formation of CNVs. Large, actively transcribed genes that undergo replication during late S-phase correspond to chromosomal fragile sites and are hot spots for the generation of genomic variants and translocations (37, 38). Because neuronal genes are overrepresented among the longest genes in the human genome, transcription may predispose these genes to somatic CNVs (39). Indeed, intragenic deletions within large, neuronally expressed genes (e.g., AUTS2, IMMP2L, NXRN1, and CNTNAP2) are associated with ASD, intellectual disability, and other neurodevelopmental disorders (40, 41). Thus, if individuals harbor somatic CNVs at these loci in many neurons or in neurons within specific functional brain regions, they may be susceptible to neurological disease.
Long interspersed element-1s (LINE-1s or L1s) can mobilize (i.e., retrotranspose) within the brain, leading to another form of somatic variation (42). Active L1s encode two proteins, ORF1p and ORF2p, which are required for retrotransposition. ORF2p contains endonuclease and reverse transcriptase activities that are needed to “copy-and-paste” L1 sequences into a new genomic location by a mechanism termed target-site primed reverse transcription (TPRT) (42, 43). In addition to canonical TPRT, L1s occasionally can integrate into endogenous DNA lesions (44). Moreover, recombination events that arise either during (15, 45–47) or after L1 retrotransposition (48) can lead to the formation of structural variants.
Somatic mutations in human disease
Mosaicism and structural brain abnormalities
One of the most common causes of medically refractory pediatric epilepsy is focal dysplasia of the cerebral cortex. Until recently, the basis of this disorder remained a medical mystery. Genetic studies of the most severe form of focal dysplasia, hemimegalencephaly, in which one entire cerebral hemisphere is enlarged in size, led to the identification of gain-of-function somatic mutations in the phosphatidylinositol-3-kinase (PI3K)–protein kinase B (Akt) and mammalian target of rapamycin (mTOR) signaling pathways (Table 1, Fig. 2). We now know that mutations in mTOR are the single largest contributor to focal dysplasia in pediatric epilepsy (49–51). Similarly, germline mutations in one allele of the TSC1 or TSC2 gene confer susceptibility to tuberous sclerosis, a disease characterized by facial and skin lesions, seizures, intellectual disability, cardiac and renal tumors, and cortical tubers (52). Because the Tsc1 and Tsc2 proteins are negative regulators of the mTOR-signaling pathway, a second somatically acquired mutation is required for disease onset. Somatic mutations that mildly activate the mTOR-signaling pathway also cause symmetrical overgrowth syndromes such as megalencephaly-capillary malformation syndrome, megalencephaly, and certain forms of polymicrogyria (49–51). Common to all of these phenotypes is the presence of hypertrophic neural-like “balloon” cells, which carry the somatic mutation yet fail to transform to a malignant cell type (52).
(A) Axial brain magnetic resonance imaging (MRI) of focal overgrowth of one hemisphere (arrows) from a 2-month-old child with intractable epilepsy and intellectual disability. MRI showed poor differentiation between the gray and white matter with dysplasia of the cortical gyri and sulci (arrows). (B) Brain mapping using high-resolution MRI or functional imaging such as positron emission tomography (PET), together with electrocorticography to fine-map specific epileptic foci, is followed by surgical resection of diseased brain tissue. (C) Histological analysis with hematoxylin/eosin showing characteristic balloon cells (arrows) consisting of large nuclei, distinct nucleoli, and glassy eosinophilic cytoplasm. (D) Immunostained section for phospho-S6 (green), as evidence of increased mTOR pathway activation. Arrows highlight large dysplastic cell showing strongest immunosignal. Scale bar, 50 μm. Bulk tissue sequencing showed somatic activating mutation in the MTOR gene c.6644C>T leading to p.S2215F in 15% of brain cells from the diseased hemisphere. After surgery, the patient showed clinical improvement.
Somatic mutations that inappropriately activate Ras signaling or related signaling pathways can likewise confer proliferation and survival phenotypes to subsets of cells and cause neurological disease. For example, a gain-of-function somatic mutation in GNAQ, encoding G protein subunit alpha q, can lead to Sturge-Weber syndrome, a disease characterized by vascular anomaly in the brain, glaucoma, seizures, stroke, and intellectual disability (53). The same GNAQ mutation, occurring in a different somatic cell type later in development, can cause uveal melanoma (54). Because mutations in certain neurodevelopmental disorders (e.g., neurofibromatosis, tuberous sclerosis, Proteus syndrome, and other neurocutaneous disorders) either activate proto-oncogenes or inactivate tumor suppressor genes, it is not surprising that similar mutations in non-neuronal cell types manifest as cancers. Intriguingly, postmitotic neurons are rarely the source of brain tumors, suggesting that postmitotic neurons may have safeguards that ensure against dedifferentiation and further proliferation.
Relative to germline mutations, somatic mutations can lead to milder cases of heritable neurodevelopmental disorders. For example, somatic mutations in genes involved in neuronal migration are estimated to represent 5 to 10% of de novo mutations and are detected more frequently in patients with unexplained brain malformations when studied with sensitive high-throughput sequencing methods (55). Moreover, somatic mutations within the LIS1 or DCX genes can lead to gross disruptions of neuronal migration, whereas germline mutations in LIS1 or DCX result in lissencephaly (56, 57). Results from several experiments also suggest that somatic mutations that lead to a reduction of gene copy number in migrating neurons can lead to cell-autonomous defects in neuronal migration, with severe epilepsy and intellectual disability as a consequence (56, 57).
ASD and other common neuropsychiatric diseases
Genetic approaches have not yet fully explained the etiology of ASD, bipolar disorder, schizophrenia, or Tourette syndrome. Although gene-by-gene and gene-by-environment interactions could, in principle, account for additional disease risk, somatic mosaicism is another potential mechanism that warrants exploration as a contributor to neuropsychiatric diseases (58).
De novo SNVs and CNVs, particularly loss-of-function mutations, are significant contributors to ASD risk (21, 59–62). In addition to de novo germline mutations, a substantial number of de novo somatic mutations (i.e., ~5.4% of de novo events) are detected in the blood of ASD patients and are enriched in ASD probands (22). Somatic mosaic mutations also have been identified throughout postmortem ASD brains or, in some instances, in more localized areas in ASD brains (59). Evidence of continuous, widespread cortical mismigration, as seen in some mutant mice, has not been reported in the postmortem ASD brain (63, 64). However, NPCs from a subset of ASD patients with enlarged brain volumes are inherently more proliferative and display abnormal neurogenesis when compared to controls (65, 66). Other ASD patients have focal cortical abnormalities, including disorganized neurons and lamina, polymicrogyria, and other local surface malformations (67). Thus, in addition to specific mutations, additional cell cycles may further affect somatic mutational loads in patients.
Prenatal challenges to the immune system in animals (i.e., maternal immune activation) (68) can also lead to many features like those present in ASD brains. Maternal immune activation leads to increased cellular proliferation, brain size, and ASD-like behaviors in animal models (69–72). Intriguingly, an elevated prevalence of MEIs was observed in a primate model of maternal immune activation (73). Elevated MEI levels likewise are observed in schizophrenia (73) and Rett syndrome patients (74), suggesting that somatic MEI burden may play a role in the etiology of some neurodevelopmental and neuropsychiatric diseases.
Methods to detect somatic mutations
The difficulty in detecting a somatic mutation depends on its frequency within a cell population. Whereas mutations affecting a large fraction (e.g., 50%) of cells are readily detected in bulk tissue sequencing experiments and generally result in high-confidence calls, mutations affecting one or a few cells are unlikely to be detected with bulk tissue sequencing approaches. The identification and validation of rare somatic mutations requires sequencing DNA derived from small pools of cells, single cells, or clonally reprogrammed cells followed by robust computational data analyses (Fig. 1).
Bulk tissue approaches
Whole-genome sequencing (WGS) or whole-exome sequencing (WES) of DNA derived from bulk brain tissue allows a straightforward approach to discovering somatic mosaicism (26). WGS and WES minimize sequencing artifacts that can confound downstream analyses and, in the case of WGS, provide an opportunity for identifying a wide range of structural rearrangements, including inversions and translocations. However, WGS and WES using standard sequencing depths have reduced statistical power to detect mutations that occur at low frequencies (i.e., <10% of cells in a population at 30 to 100x coverage). Although increasing sequence coverage allows detection of somatic variants at lower frequencies, it quickly becomes cost prohibitive. Moreover, WGS and WES do not provide information on how somatic variants are distributed across individual cell lineages within a bulk tissue sample.
Sorted-pools approaches
Fluorescence-activated cell or nuclei sorting (FACS/FANS) can be used to isolate specific neural populations (e.g., NeuN+ neurons versus NeuN− cells or cortical inhibitory interneurons versus excitatory principal neurons). Analysis of sorted nuclei populations (e.g., 5000 or 500,000 cells) from specific brain regions increases the power to detect somatic mosaicism that arises in one lineage, because these genomes are no longer diluted by genomes derived from other lineages. Independent pools of sorted nuclei can then be subjected to RNA sequencing (RNA-seq) and quantitative reverse transcription polymerase chain reaction (qRT-PCR) to confirm cell type–specific gene expression profiles (75). In addition to increasing the power for detecting a somatic mutation, cell sorting before DNA extraction could yield information about the embryological origin and developmental trajectory of somatic variation across the brain. Large pools of sorted cells can yield enough DNA for the direct examination of somatic variants by WGS or WES. However, smaller pool sizes will only generate small amounts of DNA; thus, they are best suited for generating PCR amplicon libraries (e.g., as used in MEI detection and other targeted sequencing) or for subsequent whole-genome amplification (WGA).
Single-cell approaches
WGA can be used to analyze the genomes of single neurons (26). The spectrum of mutations identified from the genomes of single neurons can then be compared to germline variants in bulk tissue data derived from a non-neuronal control (e.g., brain dural fibroblasts or heart) to identify candidate somatic mutations (5). WGA approaches already are used in pre-implantation genetic screening of embryos (76, 77) and include (i) degenerate-oligonucleotide-primed PCR (DOP-PCR), (ii) multiple displacement amplification (MDA), and (iii) multiple annealing and looping-based amplification (MALBAC). Each method has its advantages and drawbacks. In general, DOP-PCR provides coverage evenly across the genome, which facilitates the detection of large CNVs and chromosomal aneuploidies. However, DOP-PCR has a higher read duplication rate, lower mapping rate, and lower recovery rate when compared with MDA and MALBAC (78) and is cost prohibitive for SNV, indel, and MEI detection. By comparison, MDA yields a high rate of artificial chimeric DNA molecules that can lead to false-positive calls in downstream analyses (79), whereas MALBAC exhibits reduced coverage of certain genomic regions (14, 16, 80), especially those rich in repetitive sequences (78). Considerable advances have recently been made in detecting SNVs (81, 82), CNVs (83), and MEIs (16) in WGA samples; however, best practices necessitate evaluating each WGA approach for the detection of specific types of somatic mosaicism.
Clonal expansion of single cells using human-induced pluripotent stem cell (hiPSC) technology or somatic cell nuclear transfer (SCNT) provides a biological alternative to WGA (80, 84). Any variant uniformly identified in the clonal line, but not in controls, represents a candidate somatic mutation that requires confirmation in the tissue of origin. In contrast, mutations introduced during cell culture will be present in a lower frequency of cells within a clonal cell line and can be discriminated from bona fide somatic mutations in downstream computational analysis. Although the clonal isolation and expansion of primary human neural stem and progenitor cells is possible, the analysis of human neuronal genomes using clonal reprogramming has several limitations. Foremost among these is the availability of live human neurons. Moreover, neither clonal reprogramming nor SCNT have been reported using human neurons; SCNT is further limited by the expense and availability of human oocytes. Finally, reprogramming approaches currently are only successful in ~10% of cells; thus, any neurons harboring highly aberrant genomes may be refractory to reprogramming. Despite these caveats, clonal reprogramming of human neurons is theoretically possible. In addition, it is noteworthy that mouse neurons reprogrammed by SCNT contain genomic rearrangements (e.g., kataegis and chromothripsis) that would be very challenging to validate using current WGA approaches (84).
Computational methods for mutation detection
WGS and WES have been used successfully to detect somatic SNVs in family-based studies of Mendelian disease and large-scale sequencing studies of human patient cohorts (2). To identify SNVs, most computational approaches compare call sets generated from an affected sample to those generated from a matched healthy/unaffected sample and/or a control population. These comparisons allow the identification and subsequent exclusion of germline polymorphisms from downstream analyses; however, care must be taken to ensure that any candidate somatic mutations are not germline variants that were missed in the matched control. In general, variant callers initially developed to detect mutations in cancer offer higher sensitivity for detecting mosaic SNVs when compared with standard approaches used to detect germline variants (85, 86).
Somatic CNVs can be detected by identifying deviations either from the expected depth of sequence or in the expected distances between paired-end sequencing reads. Similarly, inversions can be identified through differences in the orientations of paired-end sequencing reads. Numerous approaches have been developed to identify CNVs from WGS (7, 87–89), and most can be applied directly to identify somatic mutations. For example, recent studies using WGA in conjunction with WGS have identified megabase-scale de novo CNVs in human and mouse neurons based on differences in read-depth across genomic bins (6–9). CNVs are more difficult to identify using WES due to the biases encountered during the capture of target exons (90).
Somatic MEIs can be detected from bulk tissue, PCR amplicons generated from sorted-cell fractions, or single-cell WGA DNA using split-read and paired-end information (e.g., one paired-end read may map to the reference genome, whereas another may map to a MEI) (91, 92). Detecting low-frequency MEIs with fewer supporting reads requires careful bioinformatic analyses that can distinguish signal from noise, followed by experimental validation with orthogonal methods (14, 93). The analysis of single-cell data remains challenging due to the presence of chimeras generated during WGA (14, 16, 94); thus, care must be taken in calling MEIs.
Validation of somatic mutations
It is essential to validate all candidate somatic mutations. False-positive calls can arise from DNA sequencing errors, contamination with germline variants, chimeric molecules generated during single-cell WGA, PCR-induced nucleotide substitutions, and the failure to amplify certain genomic regions. False-negative calls are dependent on the allele frequency of the somatic mutation within the sample, the type of mutation, and the method of detection. Orthologous experimental methods are required to eliminate false-positives and to calibrate the confidence of detection for different types of somatic mutations. Validation experiments can then be performed on either the tissue of origin or amplified material used to discover the variant. The first approach represents a biological validation, which establishes the presence of a variant call in unamplified DNA from the source sample. The second approach represents a technical validation, which establishes the presence/absence of variant calls in the DNA source material used for discovery.
Biological/primary validation in the tissue of origin
Validation on unamplified DNA from the tissue of origin provides confirmation that a candidate call is a genuine somatic variant and rules out the possibility that it corresponds to a DNA amplification artifact or a mutation that occurred during clonal expansion. Biological validation requires a variant to be present in multiple cells in the tissue of origin at a frequency above experimental detection limits. As such, the failure to validate a variant in the tissue of origin does not necessarily represent a false call. For example, only ~50% of CNVs manifested in hiPSC clones could be directly confirmed in the primary fibroblast cells used to derive hiPSCs (80).
Somatic variants can be confirmed in unamplified cell source material by (i) targeted DNA capture followed by high-coverage (>100x) DNA resequencing, (ii) high-coverage sequencing of multiplexed PCR amplicons, and (iii) droplet digital PCR (ddPCR). These approaches vary in throughput and sensitivity. Targeted DNA capture and resequencing can require the creation of several thousand custom oligonucleotides designed to capture the genomic DNA either including or surrounding the putative variants. The captured DNA then is subjected to high-coverage paired-end DNA sequencing, yielding a typical sensitivity of variant detection in greater than 1% of cells. Amplicon sequencing involves PCR amplification of candidate loci followed by high-coverage paired-end DNA sequencing, yielding a typical sensitivity of variant detection in greater than 0.1% of cells. Finally, ddPCR involves partitioning a DNA sample into large numbers of individual droplets that generally contain one copy of template DNA. PCR takes place within these droplets, leading to the production of a fluorescent readout, either through the use of an intercalating dye or a fluorescent oligomer probe, to indicate the presence or absence of the PCR target of interest. Subsequent quantification of the fluorescent droplets allows a determination of the number of copies of the target locus present in the sample, yielding a typical sensitivity of variant detection in greater than 0.001% of cells (95). Although extremely sensitive, ddPCR requires the optimization of primers, probes, and amplification conditions, which is time-consuming and limits throughput.
The goal when employing biological validation procedures is to detect putative somatic variants and to assess, as precisely as possible, the frequency of each variant in that tissue of origin. Biological validation can (i) determine whether certain individuals in the population are more prone to somatic variation than others, (ii) investigate whether different areas of the brain and/or specific brain cell types have varying amounts and types of particular forms of somatic variation, (iii) assess whether developmental timing contributes to somatic variation, and (iv) reveal whether somatic variations increase as a function of the number of cell divisions and/or a function of age in postmitotic neurons.
Technical validation on source/amplified material
If a somatic variant is only present in a single cell, it will be impossible to validate in bulk tissue. Likewise, a variant present in very few cells may be difficult to validate in the tissue of origin. Thus, technical validation in the source DNA used to discover a putative variant can be used to determine whether a call is true or false. Technical validation typically employs PCR, qPCR, and Sanger sequencing of the locus in the DNA source material (e.g., WGA DNA or DNA from a clonal cell population). Multiple true/false verdicts form the basis for estimating false-discovery and false-negative rates in the resultant call sets.
Present understanding of the prevalence of somatic mutation in neurotypical individuals
Recent studies revealed that mosaic neuronal genomes are the rule, rather than the exception; every neuron probably has a different genome than the neurons with which it forms synapses. Not unexpectedly, SNVs are the most prevalent somatic mutations. A “triple calling” strategy was used to identify and validate clonal SNVs in MDA-amplified DNA from single neurons isolated from a neurotypical brain, leading to estimates of ~1000 to 1500 SNVs per neuronal genome (5). By comparison to human cortical neurons, a SCNT experiment in reprogrammed mouse olfactory neurons detected hundreds of SNVs per neuron and a lower proportion of C-to-T transition mutations (84). Although the divergent SNV rates between these two studies may arise from technical differences (as discussed above), both approaches establish that SNVs represent an important form of somatic mutation in both human and mouse neurons.
Brain somatic CNVs initially were identified by comparing the sequences of bulk DNA derived from multicellular samples of different brain regions to the sequences of DNA derived from somatic tissues (96, 97). The first single-cell study of neuronal CNVs analyzed 110 human frontal cortex neurons and found that 13 to 41% of the neurons contained at least one megabase-scale de novo CNV (6). Additional studies, which analyzed fewer neuronal genomes, confirmed that de novo CNVs occur in at least 10% of neurons (7, 8). CNVs can be shared by multiple neurons and inherited in a clonal manner (8). Furthermore, megabase-scale CNVs typically alter the copy number of 10 or more genes in individual neurons. In addition to expression-level differences that can accompany gene copy number changes, mosaic neuronal CNVs also are expected to reveal or abate pernicious alleles on a neuron-by-neuron basis in every individual.
L1 retrotransposon insertions alter the transcriptional regulation of genes in myriad ways (42). Initial studies used engineered L1s containing a retrotransposition indicator cassette to discover MEI activity in mouse brain (98) and in human NPCs in vitro (99). Studies of MDA-amplified NeuN-positive nuclei isolated from a neurotypical human brain, followed by L1-transposon profiling (13) or WGS (15, 16), have since suggested that 0.2 to 1 L1 insertion occur per neuronal genome. Another report, which employed MALBAC WGA in conjunction with L1 capture technology (RC-seq), reported an average of 13 L1 insertions in every neuronal genome (11), although a subsequent study suggested a high false-positive rate in these data (14). By comparison, SCNT experiments in mouse olfactory neurons reported ≤1.3 MEI per neuronal genome (84). An extrapolation of these data indicates that potentially billions of neurons in the neurotypical brain contain de novo MEIs. Additional studies are required to determine whether L1s retrotranspose at varying rates in different brain regions, in different individuals, or preferentially insert into expressed genes, and whether other mobile elements [e.g., Alu retrotransposons (42)] also contribute to intra-individual neuronal genetic diversity.
Generation of a community resource
The BSMN will generate comprehensive maps of somatic genomic variation in neurotypical and diseased human brains, including a prioritized call set of confirmed somatic variants (Box 1) that may contribute to neuropsychiatric disease and epilepsy. Functional validation experiments will be performed using CRISPR/Cas9-mediated genome engineering, hiPSC-based neurogenesis, and mosaic mouse models generated by in utero electroporation (Fig. 3). The BSMN is initially determining concordance among disparate sequencing and bioinformatic approaches by performing a “common experiment” in which pulverized tissue from one neurotypical individual in the Lieber brain repository has been distributed to all of the working groups for independent assessment of mosaicism.
Box 1
Criteria used to prioritize somatic variants for functional characterization
Absence from the germ line
We will focus on variants with a definitive somatic origin.
Recurrence and frequency of somatic variation at the locus of interest
We will prioritize loci at which somatic variations, across all types, recur in multiple disease samples but not in control samples.
Mutation severity
Highly deleterious variations will be prioritized for likely functional importance.
Intersection with known disease loci and biochemical pathways
Taking advantage of data on germline variations in brain disorders, we will prioritize loci that have been previously implicated in disease.
Intersection with brain expression and epigenomic data
Taking advantage of large, publicly funded consortia of human brain spatiatemporal expression data (e.g., BrainSpan) and epigenomic data (e.g., PsychENCODE and Roadmap Epigenomics), we will select genes that are expressed in brain regions associated with brain disorders and noncoding loci with potential regulatory function.
In utero electroporation (IUE) transfects a subpopulation of cortical neurons within a local area and will be combined with genome editing to generate mosaic mouse models for functional analysis. For example, a red fluorescent construct (CAG-TdTom) is shown labeling a transfected subset of neurons, shown in the context of a coronal brain section in which nuclei are stained blue with 4′,6-diamidino-2-phenylindole (DAPI). Scale bar, 500 μm.
The BSMN will generate an estimated 10,000 sequencing data sets that comprise >600 terabytes of data and facilitate data-sharing through the BSMN Knowledge Portal (www.synapse.org/bsmn) and the NIMH Data Archive (https://data-archive.nimh.nih.gov). Coordinated analyses with data derived from some of the same brain samples by the CommonMind (www.synapse.org/cmc) and PsychENCODE (www.synapse.org/pec) initiatives may elucidate the effect of somatic mosaicism on tissue-wide gene expression. Data generated though the BSMN initiative will be released to the broader research community on an ongoing basis through a controlled-access mechanism that follows NIH policies and regulatory requirements.
Acknowledgments
We thank T. Insel for initiating this project, L. Bingaman for ongoing administrative assistance, and N. Leff and M. L. Gage for copyediting assistance. J.M.K acknowledges support provided by the Pew Biomedical Scholars Award. J.V.M. is an inventor on patent application 6150160, held by the John Hopkins University and the Trustees of the University of Pennsylvania, which covers the compositions and methods of use of mammalian retrotransposons. We also acknowledge the support of NIH R01 MH100914, Genomic mosaicism in developing human brain (F.M.V.). Some figures use images from the Servier Medical Art PowerPoint Image Bank. All the work was supported by U01MH106883, U01MH106874, U01MH106893, U01MH106892, U01MH106882, U01MH106876, U01MH1068898, U01MH106891, and U01MH106884. We regret that space constraints limit the number of references and apologize to many colleagues whose very valuable contributions to this field are not cited.
Footnotes
www.sciencemag.org/content/356/6336/eaal1641/suppl/DC1
Brain Somatic Mosaicism Network Listing


