Epistasis between mutator alleles contributes to germline mutation spectra variability in laboratory mice

Maintaining germline genome integrity is essential and enormously complex. Although many proteins are involved in DNA replication, proofreading, and repair [1], mutator alleles have largely eluded detection in mammals. DNA replication and repair proteins often recognize sequence motifs or excise lesions at specific nucleotides. Thus, we might expect that the spectrum of de novo mutations — the frequencies of C>T, A>G, etc. — will differ between genomes that harbor either a mutator or wild-type allele. Previously, we used quantitative trait locus mapping to discover candidate mutator alleles in the DNA repair gene Mutyh that increased the C>A germline mutation rate in a family of inbred mice known as the BXDs [2,3]. In this study we developed a new method to detect alleles associated with mutation spectrum variation and applied it to mutation data from the BXDs. We discovered an additional C>A mutator locus on chromosome 6 that overlaps Ogg1, a DNA glycosylase involved in the same base-excision repair network as Mutyh [4]. Its effect depended on the presence of a mutator allele near Mutyh, and BXDs with mutator alleles at both loci had greater numbers of C>A mutations than those with mutator alleles at either locus alone. Our new methods for analyzing mutation spectra reveal evidence of epistasis between germline mutator alleles and may be applicable to mutation data from humans and other model organisms.

Taken together, these results suggest that both the SET and transposase domains of primate SETMAR are important for SETMAR-mediated DNA repair.The p.Leu103Phe missense mutation that differentiates C57BL/6J and DBA/2J (Table 1) resides within the Setmar pre-SET domain and occurs at an amino acid residue that is predicted to be deleterious by SIFT [47].However, since the mouse Setmar ortholog lacks the Mariner-derived domain, we believe that the the p.Leu103Phe or p.Ser273Arg missense mutations are unlikely to affect C>A mutation rates in the BXDs.Moreover, we believe that the documented mutator phenotypes associated with Ogg1, as well as that gene's known role in base-excision repair, make it more likely candidate to underlie the epistatic interaction with Mutyh we observed in this study.

Supplementary Figures
Figure 1-figure supplement 1: Simulations to assess the power of the aggregate mutation spectrum distance method.In each of 50 trials, we simulated genotypes at 1,000 biallelic loci on a toy population of either 50 or 100 haplotypes as follows.At every locus on every haplotype, we drew a single floating point value from a uniform distribution [0,1).If that value was less than or equal to 0.5, we set the allele to be "A"; otherwise, we set the allele to be "B".In each trial, we also simulated de novo germline mutations on the population of haplotypes, such that at a single locus   , we augmented the mutation rate of a particular -mer by the specified effect size (an effect size of 1.5 indicates a 50% increase in the mutation rate) on haplotypes carrying "A" alleles.We then applied the aggregate mutation spectrum distance method to these simulated data and asked if the adjusted cosine distance at locus   was greater than expected by chance.Given a specific combination of parameters, the y-axis denotes the fraction of 50 trials in which the simulated mutator allele could be detected at a significance threshold of p = 0.05.Shaded areas indicate the 95% bootstrap confidence interval surrounding that estimate.
Figure 1-figure supplement 2: Comparing power between the aggregate mutation spectrum distance method and QTL mapping.In each of 50 trials, we simulated genotypes at 1,000 biallelic loci on a toy population of 100 haplotypes as follows.At every locus on every haplotype, we drew a single floating point value from a uniform distribution [0,1).If that value was less than or equal to 0.5, we set the allele to be "A"; otherwise, we set the allele to be "B".In each trial, we also simulated de novo germline mutations on the population of haplotypes, such that at a single locus   , we augmented the rate of the specified mutation type by the specified effect size (an effect size of 1.5 indicates a 50% increase in the mutation rate) on haplotypes carrying "A" alleles.We then applied the aggregate mutation spectrum distance method to these simulated data and asked if the adjusted cosine distance at locus   was greater than expected by chance.Similarly, in each trial, we used R/qtl2 to perform a genome scan for QTL and asked if the log-odds score at   was greater than expected by chance.Given a specific combination of parameters, the y-axis denotes the fraction of 50 trials in which the simulated mutator allele could be detected at a significance threshold of p = 0.05 (for AMSD) or at an alpha of Figure 1-figure supplement 3: Comparing power between the aggregate mutation spectrum distance method and QTL mapping with variable counts of simulated mutations.In each of 50 trials, we simulated genotypes at 1,000 biallelic loci on a toy population of 50 or 100 haplotypes as follows.At every locus on every haplotype, we drew a single floating point value from a uniform distribution [0,1).If that value was less than or equal to 0.5, we set the allele to be "A"; otherwise, we set the allele to be "B".In each trial, we also simulated de novo germline mutations on the population of haplotypes, such that at a single locus   , we augmented the rate of the specified mutation type by the specified effect size (an effect size of 1.5 indicates a 50% increase in the mutation rate) on haplotypes carrying "A" alleles.To more closely approximate the BXD RILs, the mean number of simulated mutations on each haplotype was allowed to vary by a factor of 20 (see Materials and Methods for more details).We then applied the aggregate mutation spectrum distance method to these simulated data and asked if the adjusted cosine distance at locus   was greater than expected by chance.Similarly, in each trial, we used R/qtl2 to perform a genome scan for QTL and asked if the log-odds score at   was greater than expected by chance.Given a specific combination of parameters, the y-axis denotes the fraction of 50 trials in which the simulated mutator allele could be detected at a significance threshold of p = 0.05 (for AMSD) or at an alpha of  In each of 50 trials, we simulated genotypes at 1,000 biallelic loci on a toy population of 100 haplotypes as follows.At every locus on every haplotype (other than the simulated mutator locus, we drew a single floating point value from a uniform distribution [0,1).If that value was less than or equal to 0.5, we set the allele to be "A"; otherwise, we set the allele to be "B".To model the effects of mutator allele frequencies on AMSD and QTL power, we allowed the expected frequency of "A" alleles at the mutator allele marker to be either 0.1, 0.25, or 0.5 in these simulations.In each trial, we also simulated de novo germline mutations on the population of haplotypes, such that at a single locus   , we augmented the rate of the specified mutation type by the specified effect size (an effect size of 1.5 indicates a 50% increase in the mutation rate) on haplotypes carrying "A" alleles.We then applied the aggregate mutation spectrum distance method to these simulated data and asked if the adjusted cosine distance at locus   was greater than expected by chance.Similarly, in each trial, we used R/qtl2 to perform a genome scan for QTL and asked if the log-odds score at   was greater than expected by chance.Given a specific combination of parameters, the y-axis denotes the fraction of 50 trials in which the simulated mutator allele could be detected at a significance threshold of p = 0.05 (for AMSD) or at an alpha of mapping).Shaded areas indicate the 95% bootstrap confidence interval surrounding that estimate.
mapping).Shaded areas indicate the 95% bootstrap confidence interval surrounding that estimate.

Figure 1 -
Figure 1-figure supplement 4: Comparing power between the aggregate mutation spectrum distance method and QTL mapping with variable mutator allele frequencies.In each of 50 trials, we simulated genotypes at 1,000 biallelic loci on a toy population of 100 haplotypes as follows.At every locus on every haplotype (other than the simulated mutator locus, we drew a single floating point value from a uniform distribution [0,1).If that value was less than or equal to 0.5, we set the allele to be "A"; otherwise, we set the allele to be "B".To model the effects of mutator allele frequencies on AMSD and QTL power, we allowed the expected frequency of "A" mapping).Shaded areas indicate the 95% bootstrap confidence interval surrounding that estimate.

Figure 2 -
Figure 2-figure supplement 1: Quantitative trait locus scans for mutation spectrum phenotypes.Using the BXDs with D genotypes at rs27509845 (the marker with the highest cosine distance on chromosome 4; n = 66 BXDs, 42,171 total mutations), we used R/qtl2 to perform QTL scans for the fractions of each 1-mer mutation type.QTL scans also included a kinship matrix (that contained the pairwise genetic similarity between each pair of BXDs, calculated using the leave-one-chromosome-out method) as a random effect term using the kinship keyword argument in the scan1 function.Plots show the log-odds (LOD) score at every genotyped marker in blue; the dotted black line represents the genome-wide LOD significance threshold (established using 1,000 permutations at an alpha of 0.05 7 to account for

Figure 3 -
Figure 3-figure supplement 2: Mutation spectra comparison in Sanger Mouse Genomes Project strains.Fractions of de novo germline mutations in Sanger MGP strains with either D or B haplotypes at the chromosome 4 and chromosome 6 mutator loci, stratified by mutation type.

Figure 3 -
Figure 3-figure supplement 3: Frequency of nonsynonymous DNA repair mutations in wild mice.Alternate allele frequencies of each nonsynonymous DNA repair mutation overlapping the chromosome 6 mutator locus were calculated in populations of wild-derived mice from Harr et al. [32].Numbers of mice in each subpopulation are shown in parentheses.Mmc (Mus musculus castaneus), Mmd (Mus musculus domesticus), Mmm (Mus musculus musculus), and Ms (Mus spretus).The Mbd4 p.Asp129Asn mutation was not observed in any wild populations.