DNA Replication-Transcription Conflicts Do Not Significantly Contribute to Spontaneous Mutations Due to Replication Errors in Escherichia coli

ABSTRACT Encounters between DNA replication and transcription can cause genomic disruption, particularly when the two meet head-on. Whether these conflicts produce point mutations is debated. This paper presents detailed analyses of a large collection of mutations generated during mutation accumulation experiments with mismatch repair (MMR)-defective Escherichia coli. With MMR absent, mutations are primarily due to DNA replication errors. Overall, there were no differences in the frequencies of base pair substitutions or small indels (i.e., insertion and deletions of ≤4 bp) in the coding sequences or promoters of genes oriented codirectionally versus head-on to replication. Among a subset of highly expressed genes, there was a 2- to 3-fold bias for indels in genes oriented head-on to replication, but this difference was almost entirely due to the asymmetrical genomic locations of tRNA genes containing mononucleotide runs, which are hot spots for indels. No additional orientation bias in mutation frequencies occurred when MMR− strains were also defective for transcription-coupled repair (TCR). However, in contrast to other reports, loss of TCR slightly increased the overall mutation rate, meaning that TCR is antimutagenic. There was no orientation bias in mutation frequencies among the stress response genes that are regulated by RpoS or induced by DNA damage. Thus, biases in the locations of mutational targets can account for most, if not all, apparent biases in mutation frequencies between genes oriented head-on versus codirectional to replication. In addition, the data revealed a strong correlation of the frequency of base pair substitutions with gene length but no correlation with gene expression levels.

T he fact that DNA replication and transcription occur simultaneously and on the same template sets up inevitable molecular conflicts. In Escherichia coli replication proceeds at an average of 650 bp/s (1), whereas transcription, at its fastest, proceeds at about a tenth of this speed (2). Also, since there are thousands of RNA polymerase molecules active at one time (3), collisions between the replication fork and the transcription complex must be frequent. The effects of these collisions on genomic integrity have been a subject of active research for years. In particular, the mutagenic and evolutionary consequences of collisions that occur when replication and transcription meet head-on (HO) versus codirectionally (CD) is the subject of recent debate in the literature (4,5).
HO collisions occur when the transcribed DNA strand is the template for laggingstrand replication, whereas CD collisions occur when the transcribed DNA strand is the template for leading-strand replication. HO, but not CD, collisions slow replication in bacteria (6)(7)(8) and yeast (9). In addition, experiments using experimentally inverted highly expressed genes show that HO collisions can arrest replication forks and produce DNA double-strand breaks, requiring recombination functions to restore replication (10). The potential of HO replication-transcription conflicts to produce genomic disruption is a powerful selective force, one indication of which is that ribosomal operons, in which genes are highly expressed, are in the CD orientation in almost all bacteria (11). Many bacteria also have a large majority of all their genes in the CD orientation, although E. coli is an exception to this tendency, with only 55% of its genes so oriented (12).
While clearly a threat to overall genomic maintenance and stability, it is not as clear that replication-transcription conflicts normally contribute to spontaneous point mutation rates. Two types of evidence suggest that they do. First, comparative genomics of diverged strains of Bacillus subtilis showed that HO-oriented genes have a 42% higher rate of nonsynonymous (amino acid-changing) substitutions than do CD-oriented genes, although the rate of synonymous substitutions is nearly the same (13). Second, when a gene is experimentally inverted, its mutation frequency can be greater in the HO orientation than in the CD orientation (8,(13)(14)(15). However, these observations have alternative explanations (5), which are considered further in Discussion.
Several reports have emphasized that the mutations induced by replication-transcription conflicts have signatures (reviewed in reference 5). In both E. coli (16) and B. subtilis (15), inverting a gene revealed orientation-specific hot spots for both base pair substitutions (BPSs) and large duplications and deletions. Interestingly, in both studies the HO conflict promoted a specific BPS in the 210 element of the gene's promoter (15,16).
In a previous publication (17), we reported that there was no difference in overall BPS rates of HO-oriented versus CD-oriented genes in E. coli. This conclusion was based on mutation accumulation (MA) experiments followed by whole-genome sequencing (WGS) that yielded 233 BPS mutations in a wild-type strain and 1,625 BPS mutations in a mismatch repair (MMR)-defective strain (18). Since our original publication, we have conducted nine additional MA-WGS experiments with MMR-defective strains, yielding a total of 30,061 BPSs and 5,324 small insertions or deletions (indels) (19). MMR corrects mismatches in newly replicated DNA (reviewed in reference 20); thus, these mutations can be considered to be primarily errors made during normal replication. For this study, we used this extensive database to investigate in greater detail whether replicationtranscription conflicts induce the types of mutations detected in other experiments or, possibly, other characteristic mutations. Our conclusion is that in E. coli growing and replicating in rich medium and free of exogenous stress, replication-transcription conflicts are not a major source of spontaneous point mutations.

RESULTS
Distributions of mutation frequencies per gene and correlations with other parameters. Combining the results of 10 MA-WGS experiments with MMR 2 strains gave a database of 30,061 BPSs and 5,324 indels (#4 bp) across the E. coli genome (19). Of these, 27,164 BPSs and 3,841 indels were in gene coding sequences (CDSs) (see Table S1 and Table S2 for the primary data discussed below). Before applying standard statistical methods to these data, we analyzed the distributions of the mutation frequencies. If the numbers of mutations per CDS were random, the distribution would be Poisson, but the BPSs/CDS numbers fell into a normal (Gaussian) distribution (Fig. 1A). As was found for B. subtilis (21), there is a strong linear correlation between the number of BPSs in a CDS and the length of the CDS in nucleotides (nts) (Fig. 1B  and 1C and Table 1). Figure 1D shows that CDS length is normally distributed except for the rRNA and tRNA genes (which are further discussed below). These two factors explain the normal distribution of BPSs per CDS.
In contrast to the BPSs, the numbers of indels/CDS fell into neither a Poisson nor a normal distribution. Because indel mutation rate is dependent on the length of homopolymeric runs (18,22), and because the occurrence of such runs in genes appears to be idiosyncratic, the distribution of indels/CDS is also idiosyncratic.
We tested for correlations between mutations per CDS and mutations per CDS per nucleotide (nt) and other parameters. There were no significant differences between the frequencies of mutations in CDSs located in the right or left replichores, or between CDSs in the forward or reverse orientation relative to the reference strand. The sole exception was a small (;20%) increase in the value of BPSs/CDS/nt among highly expressed genes (minus rRNA and tRNA genes) in the left replichore versus  ), which gives a correlation coefficient, R, of 0.81 (P , 0.0003). (C) As for panel B, but showing that the correlation does not depend on outliers, CDSs longer that 3,000 nt and the ribosomal and tRNA genes have been eliminated (see the text for the explanation for eliminating these genes). The equation shows the least-squares fit (red line), which gives an R of 0.78 (P , 0.0003). (D) Distribution of CDS lengths. The number of CDSs (y axis) of the indicated length (x axis) is shown (blue circles). For this analysis, the CDS lengths were binned into 10-nt bins. A Gaussian curve (red line) was fitted to the data disregarding ribosomal and tRNA genes (see the text for the explanation for eliminating these genes). The Gaussian equation shown gives a fit with R 2 equal to 0.89. those in the right replichore. There was also no significant correlation between mutation frequencies and distance of a CDS from the origin, but the symmetrical wave pattern of mutation rates across the genome, previously reported (23,24), would not have been detected as a monotonic correlation. Thus, the correlation between BPSs per CDS and length of the CDS remained the only significant correlation overall. (Data are not shown but are available in the IUScholarWorks Repository [http://hdl.handle.net/2022/ 26797].) It is generally agreed that transcription is mutagenic, and consequently, highly expressed genes have higher mutation rates than poorly expressed genes (reviewed in reference 25). However, the opposite has also been reported (26)(27)(28). We used our previously published transcriptome sequencing (RNA-seq) values from lag-, log-, and stationary-phase cells (24) to determine if mutation frequencies were correlated with gene expression levels. Surprisingly, we found no overall correlation (Table 2). An exception was the subset of genes encoding the rRNAs and ribosomal proteins, which showed weak correlations between BPSs/CDS and gene expression levels. Overall, neither BPS nor indel frequencies are biased by gene orientation relative to DNA replication. Relative to the reference DNA strand, genes oriented in the forward direction on the right replichore and the reverse direction on the left replichore are transcribed codirectionally (CD) with replication, whereas genes oppositely oriented are transcribed head-on (HO) to replication. For the purposes of analysis, we set the division between the replichores halfway around the genome from the origin close to the terC site, the most frequent site of replication fork termination (29). This resulted in 2,467 CDoriented genes and 2,044 HO-oriented genes. Table 3 shows there is no statistically significant difference in frequencies of BPSs or indels between CDS in the CD versus the HO orientation whether or not the frequencies are corrected for CDS length.
We note that large insertions, deletions, and rearrangements have been found to be induced at sites of strong replication-transcription conflicts (15,16), but our sequencing techniques could not detect indels greater than 4 bp. E. coli has 76 genes encoding rRNAs and ribosomal proteins; all but 4 of these are in the CD orientation. Of the 86 genes encoding tRNAs, 52 are in the CD orientation , and stationary phases of growth. r is Pearson's correlation coefficient, except that Spearman's rank correlation coefficient is given for the ribosomal genes because it gave slightly stronger correlations. P is the probability that the correlation occurred by chance calculated from Student's twotailed t distribution (57) and adjusted for multiple comparisons by the Benjamini-Hochberg procedure (58). Note that weak correlations can be statistically significant. CDS, coding sequence; TMP, transcripts per kilobase million; BPSs, base pair substitutions; indels, insertions and deletions of #4 bp; CD, codirectional with replication; HO, headon to replication; NA, not applicable (no mutations). b All but four of the 76 ribosomal genes are in the CD orientation. Included are all genes for rRNAs and ribosomal proteins; excluded are all tRNA genes whether or not they are in rrn operons. While Pearson's coefficient values were significant, Spearman's rank coefficient values, although still low, were 10% to 20% larger for this group of genes.
( Table S1). Most of these genes are highly expressed; in addition, the tRNA genes are small. Thus, these genes are outliers in expression, size, and orientation, and their inclusion in the analysis could bias the results. Therefore, we repeated our analyses with these genes excluded. As shown in Table 3, there were still no significant differences in the frequencies of BPSs or indels between CD oriented and HO-oriented CDSs. (See below for analyses of the excluded genes.) Analyzing the results of mutation accumulation experiments with B. subtilis, Schroeder et al. (21) found that the slope of the regression line of BPSs/CDS versus a DHO, the percent increase of the mean of the HO genes over the mean of the CD genes. P, the probability that the indicated values for CD oriented and HO-oriented genes are equal calculated from Student's two-tailed t distribution (57) and adjusted for multiple comparisons using the Benjamini-Hochberg procedure (58). BPSs, base pair substitutions; indels, insertions and deletions of #4 bp; CDS, coding sequence; SD, standard deviation; CD, codirectional with replication; HO, head-on to replication; NA, not applicable (no mutations). b Only four of the 76 ribosomal genes are in the HO orientation; their CDSs accumulated 4 BPSs and no indels.
CDS length was ;30% greater for HO-oriented genes than for CD oriented genes. However, this difference disappeared when the few extreme outliers for length were eliminated. Our results with E. coli did not show a difference in the slopes of the regression lines overall (Table S3). However, within the subset of highly expressed genes minus the tRNA genes (discussed below), the slope of BPSs/CDS versus CDS length was about 10% greater for HO-oriented genes than for CD oriented genes (the unadjusted P value for the difference was 0.04 but increased to 0.08 when adjusted for multiple comparisons.) Within a subset of highly expressed genes, the frequencies of indels, but not of BPSs, are biased toward genes oriented HO to replication. Using our RNA-seq data, we identified a set of highly expressed genes by using the criterion that the log 10 (TPM) (transcripts per kilobase per million) for a gene during at least one of the three growth phases (lag, log, and stationary) be equal to or greater than one standard deviation above the mean log 10 (TPM) for that phase. The result was 770 genes, 483 in the CD orientation and 287 in the HO orientation. As shown in Table 2, among these genes there was no correlation between mutation frequency and expression level. There was also no statistically significant difference in BPSs/CDS between the CD and HO orientations (Table 3). However, among the highly expressed genes, the frequency of indels per CDS of HO-oriented genes was nearly twice that of CD oriented genes, a difference that was significant at the 5% level when unadjusted but fell to 10% when adjusted for multiple comparisons (Table 3). Dividing the indels/CDS by the length of the CDS increased the bias to nearly 3fold but decreased the significance level to 7% (13% adjusted).
Eliminating the ribosomal and tRNA genes from the subset of highly expressed genes did not change the BPSs/CDS results but reduced 5-to 10-fold the difference in indels/ CDS between CD oriented and HO-oriented genes (Table 3). Thus, the bias for indels to be in HO-oriented genes among highly expressed genes is mostly due to the genes that were removed, especially the tRNA genes. We discuss the analysis of tRNA genes at more length below.
We also identified and analyzed 352 genes that were highly expressed in all three conditions and obtained the same results. (Data are not shown but are available in the IUScholarWorks Repository http://hdl.handle.net/2022/26797.) Mutation frequencies in essential genes are not significantly biased by orientation relative to replication. The EcoCyc database (30) lists 358 essential genes for MG1655 E. coli strains growing in LB medium. Of these, 252 are oriented CD with replication and 106 are oriented HO to replication. The frequency of BPSs in the CDSs of CD oriented essential genes was slightly higher than in the CDSs of HO-oriented essential genes, but not significantly so (Table 3). Not surprisingly, since indels tend to be knockout mutations, there were only 31 indels recovered in the CDSs of essential genes, and these were distributed 15 in the CD genes and 16 in the HO genes, giving frequencies that appear to be different but that are not statistically significantly different (Table 3).
Mutations in gene promoters are not biased by orientation relative to replication. The Regulon DB (31) database lists 8,568 promoters for MG1655 E. coli strains, 4,956 of which are oriented CD with replication and 3,612 of which are oriented HO to replication. We identified the mutations recovered in our experiments in a region from 60 bp upstream to the transcriptional start sites (TSSs) of these promoters. Neither the frequencies of BPSs nor indels in these promoters differed between the two orientations (Table S4).
To test if a gene's expression levels affected the frequencies of mutations in its promoter, we analyzed the promoters of 3,816 genes to which we could assign our RNAseq values; 2,022 of these are oriented CD and 1,794 are oriented HO. Among these, there was still no difference between the frequencies of BPSs or indels in promoters in the CD versus HO orientation (Table S4). The promoters of the highly expressed genes described above also showed no significant difference between the two orientations for either BPSs or indels (Table S4).
The mutation frequencies in the promoters of essential genes were predictably low, particularly for indels (Table S4). However, there was a surprising anomality. No indels were recovered in the promoters of HO-oriented essential genes, but 15 were recovered in the promoters of CD oriented essential genes. However, 7 of these indels were at a single site, a run of 9 AÁT base pairs within the first promoter of the glnS gene (positions 235 to 227); an additional 3 indels occurred at a run of 5 GÁC base pairs within the second promoter of the hemA gene (positions 247 to 243). Thus, this apparent bias in mutation frequency is largely due, instead, to a bias in mutational targets, which is further discussed below.
As mentioned above, two studies found that a specific BPS, a transition at the 39 T of the 210 element (T -7 ), was induced in promoters in the HO orientation (15,16). In our collection of 946 AÁT transitions in promoters, only 22 were within 20 nt upstream of a TSS (Fig. S1). Of these, only 7 were a transition at T -7 of a possible 210 element; two of these were in the CD orientation, and 5 were in the HO orientation. While these small numbers would not support the hypothesis that replication-transcription conflict specifically induces a transition at T -7 , it is possible that our experiments would not detect this mutation. This particular BPS is not elevated in B. subtilis MMR 2 strains, most likely because it is not produced by simple replication errors (15). Thus, it would be swamped out by the uncorrected replication errors in our MMR 2 strains. In addition, T -7 is an important base for promoter recognition and the initiation of transcription (32), and, in B. subtilis, this mutation prevents the transcription of many housekeeping genes (J. D. Wang, personal communication). If this is true in E. coli as well, transitions of T -7 would be selected against in our MA experiments.
Mutation rates verify the results obtained with mutation frequencies. For the analyses described above, we used all the mutations recovered in 10 mutation accumulation experiments involving 334 independent mutation accumulation lines. In these experiments, the lines were propagated for a minimum of 400 generations and a maximum of 3,000 generations, giving a total of nearly 250,000 generations (19). These data allow very good estimates of the rate at which mutations accumulated per generation per nucleotide across the whole genome. Figure 2 shows that these mutation rates are congruent with the results obtained using mutation frequencies presented above. Mutation biases in tRNA genes. For their length, the mutation frequencies among tRNA genes were unusually high, particularly for indels; per nucleotide, tRNAs accumulated twice as many BPSs and nearly 20 times as many indels as did other genes (Table 4). However, examination of the distribution of these mutations reveals that a few of the tRNA genes were heavily mutated, while the majority accumulated no or few mutations. Of the 117 indels in tRNA genes, 100 were in just 4 genes, the homologues leuP, leuQ, leuV, and leuT. Ninety-five of these occurred in the 8-GÁC-bp run and 3 occurred in the 5-GÁC-bp run that are in each gene (in the tRNAs, 5 of the 8 Cs present in the 8-bp run base pair with the 5 Gs encoded by the other run to form the stem of the TWC loop [33]).
Of the 17 indels that occurred in the other tRNA genes, all but 3 occurred in mononucleotide runs of $5 nt. Because of the strong mutagenic consequence of such runs, we checked to see if tRNA genes have an overabundance of them. As shown in Fig. S2, the number and extent of runs in most tRNA genes do not differ from those in the genome as a whole. However, the 8-GÁC-bp runs in leuP, leuQ, leuV, and leuT are outliers. Possibly also outliers are 7-GÁC-bp runs in serW and serX (also forming the stem of the TWC loop), which accounted for 5 and 2 indels, respectively. Of these 6 genes, only leuT is oriented CD with replication.
The four homologous tRNA-Leu genes provide an interesting test of whether HOoriented genes have higher mutation rates than CD-oriented genes simply because of their orientation relative to replication. leuP, leuQ, and leuV are in an operon oriented HO to replication, and leuT is in a different operon, oriented CD with replication. All four genes have similar high expression levels in all three growth stages (Table S2). The numbers of indels recovered in these genes were 23, 29, 25, and 23, respectively, which obviously do not differ statistically. Thus, in this natural experiment in which an identical sequence appears in both CD and HO orientation, the accumulated number of indels/gene was also identical. However, because three of the genes are oriented HO and only one is oriented CD, it mistakenly appears that the HO genes are more highly mutated because of their orientation relative to replication. The four homologous tRNA-Leu genes also accounted for 18 of the 74 BPSs in tRNA genes. In contrast to the indels, the frequencies of these were biased: leuP, leuQ, and leuV accumulated 6, 4, and 7 BPSs, but leuT accumulated only 1 BPS. In Text S1 and Fig. S3, we explain how this bias may also be due to mutation target placement.
BPSs are not biased in stress response genes. Merrikh (34) suggested that the placement of stress response genes in the HO orientation is evolutionarily advantageous. She hypothesized that the accumulation of BPSs in these genes would increase the chance of adaptive mutations, allowing the cell to successfully meet new environmental challenges. Although during our mutation accumulation experiments the cells are not exposed to exogenous stresses, they are exposed to endogenous stresses. For example, in the absence of repair pathways, reactive oxygen species are particularly mutagenic during mutation accumulation experiments (35). To test whether the BPSs that occurred during our mutation accumulation experiments were biased toward HOoriented stress response genes, we compared the frequencies of BPSs in the CDSs of CD-oriented and HO-oriented genes that are part of two global stress response networks, the RpoS-mediated general stress response and the SOS response to DNA damage.
RpoS is a sigma factor that is induced as cells enter stationary phase and by a variety of other stresses (reviewed in reference 36). Directly or indirectly, it regulates over 600 genes, many of which encode or regulate DNA repair activities. We analyzed the accumulated mutations in the CDSs of 605 genes that are upregulated by RpoS and in a subset of 98 of these that are particularly responsive to RpoS (37). There were no significant differences in BPSs/CDS in genes oriented CD versus HO to replication (Table 5).
DNA damage induces the expression of approximately 80 genes, 57 of which are known to be repressed by the SOS repressor LexA; an additional 44 genes have predicted LexA-binding sequences but have not been experimentally shown to be part of the SOS response (reviewed in reference 38). We analyzed the BPSs in the CDSs of all three of these subsets of genes and found no significant differences in mutation frequencies between CDSs oriented CD versus HO to replication ( Table 5).
As reported for B. subtilis (5), the placement of these stress response genes in E. coli is biased toward the CD orientation (Table 5). Thus, these results provide no support for the hypothesis that stress response genes are oriented HO to replication so that adaptive mutations will accumulate.
Transcription-coupled repair does not cause or prevent HO versus CD mutational bias. When RNA polymerase encounters a DNA lesion and stalls, it triggers a rescueand-repair pathway called transcription-coupled repair (TCR) (39). In E. coli and other bacteria, the Mfd (mutation frequency decline) protein initiates this repair by dislodging the transcription complex and recruiting the nucleotide excision repair pathway to remove the DNA lesion (40). Although the activity of Mfd might be expected to resolve replication-transcription conflicts, several papers have reported that in doing so Mfd creates mutations, at least at certain DNA sites (14,41,42).
We previously reported that, as determined by MA-WGS experiments, loss of Mfd had little effect on the mutation rates of MMR-defective strains (19). Deletion of the mfd gene in MMR 2 strains increased the genomic rates of BPSs by about 12% (2.75 Â 10 28 versus 2.45 Â 10 28 BPSs/generation/nt) and of indels by 28% (5.54 Â 10 29 versus 4.33 Â 10 29 indels/generation/nt) (Fig. 3). Thus, in contrast to the reports cited above, we found that TCR is somewhat antimutagenic. This finding is reminiscent of the original phenotype of Mfd 1 , which is the decline in mutations when E. coli strains are briefly incubated without protein synthesis after UV exposure (43).  (Table S1). As shown in Table 6, overall, there were no significant differences in mutation frequencies between genes in the two orientations. There were large apparent increases in the frequencies of both BPSs and indels in HO-oriented versus CD-oriented tRNA genes, but these results are based on only 18 BPSs in 14 genes and 29 indels in 7 genes (Table S1). Also, as with the Mfd 1 MMR 2 strains, in the Mfd 2 MMR 2 strains the HO orientation of leuP, leuQ, leuV, serW, and serX genes accounted for most of the bias: 23 of the 24 indels and 4 of the 11 BPSs in the HO-oriented genes were in these genes, and 19 indels and 3 BPSs were in just leuP, leuQ, and leuV.
Gene orientation does not influence the types of BPSs that occur. Million-Weaver et al. (14) reported that induction of a high level of transcription of a Bacillus subtilis mutant hisC allele increased its reversion rate about 7-fold when the gene was in the HO orientation but only about 4-fold when the gene was in the CD orientation. About 60% of the induced mutations in the HO-oriented gene were due to DNA PolY1, a member of the Y family of error-prone polymerases, and about 50% were T-to-C transitions. T-to-C mutations are frequent when the fidelities of Y-family polymerase are assayed using single-stranded DNA templates (44,45). However, the hisC target was a CDSs, coding sequence; BPSs, base pair substitutions; SD, standard deviation; CD, codirectional with replication; HO, head-on to replication. b DHO, the percent increase of the mean of the HO genes over the mean of the CD genes. P, the probability that the indicated values for CD oriented and HO-oriented genes are equal calculated from Student's two-tailed t distribution (57) and adjusted for multiple comparisons by the Benjamini-Hochberg procedure (58). c The numbers of genes differ from those reported by Simmons et al. (38) because our strains do not have three damage-induced genes, two of which are on the prophage e14.
double-stranded DNA, so the reported T-to-C transition could have been due to either a G inserted opposite the T or a C inserted opposite the A on the other strand. Since Million-Weaver et al. (14) were assaying on-site reversion of a TAG stop codon, we deduce that the T was on the nontranscribed strand and the A was on the transcribed stand.
To test if a similar mutational signature occurred in our experiments, we compared the frequencies of all six types of BPSs in CDSs in the CD and HO orientations (Table 7). We included all six BPSs because (i) GÁC-to-AÁT transitions might be induced by replication-transcription conflicts but could not have been scored in the assay used by Million-Weaver et al. (14) because a GÁC-to-AÁT change would create a TAA stop codon and (ii) E. coli's Y-family polymerases, Pol IV and Pol V, frequently create transversions (44)(45)(46)(47)(48)(49). However, the frequencies of transversions are low, so we grouped them into AÁT and GÁC transversions for each gene.
As shown in Table 7, the frequencies of none of the types of BPSs were enhanced in the HO-oriented versus the CD-oriented CDSs. Indeed, the only statistically significant difference was a higher mutation frequency among AÁT transversions in the CD orientation. There also were no statistically significant orientation biases among the BPSs in the subset of highly expressed genes. Table 8 shows that the frequencies per CDS of AÁT and GÁC transitions were not correlated with gene expression levels. (There were too few transversion mutations to make a similar analysis meaningful. ) We also checked to see if it mattered which base was on the transcribed strand. As shown in Fig. 4, AÁT transitions were almost twice as frequent in HO-oriented versus CD-oriented CDSs when A was on the transcribed strand, and GÁC transitions were about 60% more frequent in HO-oriented versus CD-oriented CDSs when C was on the transcribed strand. However, this orientation bias may have nothing to do with transcription but, instead, may reflect which DNA strand the base is located on. The transcribed strand in HO-oriented CDSs is the lagging-strand template (LGST) and in CD-oriented CDSs is the leading-strand template (LDST). As we previously reported (18,50), across the chromosome, AÁT and GÁC transitions occur about twice as frequently when the A and the C are on the LGST than when these bases are on the LDST. As shown in Fig. 4, this strand bias is sufficient to explain the apparent differences between CD oriented and HO-oriented CDSs.
Comparison of results with Bacillus subtilis to those with E. coli. While only about half of the genes in E. coli are oriented CD with replication, three quarters of the Bacillus subtilis genes are so oriented (12). In MS-WGS experiments with MMR-defective strains of B. subtilis, Schroeder et al. (21) found that HO-oriented genes did have higher rates of BPSs than CD-oriented genes, but this bias was entirely due to bias in the number of the triplet nucleotide sequences that are transition hot spots. Schroeder et al.  a BPSs, base pair substitutions; indels, insertions and deletions of #4 bp; CDS, coding sequence; SD, standard deviation; CD, codirectional with replication; HO, head-on to replication; NA, not applicable (no mutations). b DHO, the percent increase of the mean of the HO genes over the mean of the CD genes. P, the probability that the indicated values for CD-oriented and HO-oriented genes are equal calculated from Student's two-tailed t distribution (57) and adjusted for multiple comparisons using the Benjamini-Hochberg procedure (58). By the nonparametric Mann-Whitney test, the difference between indels in CDSs oriented HO versus CD is statistically significant at (adjusted) P values of 0.04 (indels/CDS) and 0.05 (indels/CDS/nt) for highly expressed genes and (adjusted) P values of 0.01 for both indels/tRNA gene and indels/tRNA gene/nt. See the text for further discussion. c Includes ssrA, which encodes tmRNA.  DHO, the percent increase of the mean of the HO genes over the mean of the CD genes. P, the probability that the indicated values for CD-oriented and HO-oriented genes are equal calculated from Student's two-tailed t distribution (57) and adjusted for multiple comparisons using the Benjamini-Hochberg procedure (58).
Foster et al.
their competence phenotypes but not in their mutagenic phenotypes (24). Combining the results yielded 9,587 BPSs and 2,100 indels, which we reanalyzed and compared to the results reported in Schroeder et al. (21) (see Table S5 for the data discussed below).
As with E. coli, the numbers of BPSs/CDS in our B. subtilis strains were significantly correlated with CDS length. While BPSs/CDS did not differ significantly between genes in the two orientations, the frequency of BPSs/CDS/nt was 6% higher in genes oriented HO to replication than in those oriented CD to replication. We confirmed the finding of Schroeder et al. (21) that the slope of the regression line of BPSs/CDS versus CDS length of HO-oriented genes was greater than that of CD-oriented genes, but in our case the slopes differed by only 10% (Table S5B), whereas the difference found by Schroeder et al. (21) appears to be about 30% (but was 0 when outliers for length were eliminated). Comparing mutation rates, i.e., BPSs/generation/nt across the whole genome (Fig. 5A), the rate in HO-oriented genes was 9% greater than in CDoriented genes (P = 0.04), a difference that was due to higher rates of GÁC-to-AÁT transitions in HO-oriented genes, as previously reported (21).
In our strains, the frequency of indels/CDS was 18% higher in HO-oriented genes than in CD oriented genes; this difference increased to 61% when normalized for CDS length. Comparing indels/generation/nt across the whole genome (Fig. 5B), genes oriented HO to replication had a 28% higher rate than did genes oriented CD with replication (P , 0.0003). This difference was mostly due to an increase in insertions of AÁT base pairs in mononucleotide runs of AÁT base pairs. In contrast to the results for E. coli described above, all but two tRNA genes in B. subtilis are oriented CD and so yield little information about mutational bias.

DISCUSSION
The results presented here provide little evidence that conflicts between DNA replication and transcription are mutagenic in E. coli growing under stress-free conditions. Whether all genes or various subsets of them were considered, the frequencies of BPSs and indels were not significantly greater in the CDSs or promoters of genes transcribed HO to replication than in genes transcribed CD with replication. The single exception was a greater frequency of indels in HO-oriented genes in a subset of highly expressed genes dominated by tRNA genes. However, the apparent greater bias in tRNA genes was almost entirely due to four homologous genes carrying strong hot spots for indels, three of which are oriented HO and one of which is oriented CD. Overall, this and other studies with a variety of bacterial species support the hypothesis that, in the absence of exogenous stress, the major determinant of spontaneous mutation at a given site is not a conflict between replication and transcription but rather the local sequence context (18,19,21,35,51,52). The hypothesis that conflicts between replication and transcription are mutagenic derives largely from comparisons of mutation rates in reporter genes that were inserted into the B. subtilis chromosome in forward and reverse orientations. The mutations scored were reversions of point mutations in genes for amino acid synthesis (13,14) and inactivation of a gene that makes the bacteria sensitive to trimethoprim (15). These experiments showed that when the gene was oriented HO to replication, it had about twice the mutation rate as when it was oriented CD with replication, but only when transcription was artificially induced. A similar result was seen when one-fourth of the B. subtilis chromosome was inverted, which flipped the normally highly Because of the large data variances, the results are presented in both parametric (means) and nonparametric (medians and distributions) forms. The X's in the boxes indicate the means, the horizontal lines in the boxes indicate the medians, the bottoms and tops of the boxes indicate the first (Q1) and third (Q3) quartiles, and the whiskers indicate the minimum and maximum values that are not outliers, with an outlier defined as a value equal to or greater than 1.5 times the interquartile range (Q3 minus Q1) below or above Q1 and Q3 (59). The points above the whiskers are the outliers; there are no outliers below the whiskers. For the GÁC-to-AÁT results, the medians of the blue and green boxes are zero. The horizontal lines with asterisks indicate that the differences between the indicated means (by Student's t tests) or the distributions about the indicated medians (by Mann-Whitney tests) are highly significant (,,0.0003). No other differences are significant. BPSs, base pair substitutions; Nt, nucleotide; CD, codirectional with replication; HO, head-on to replication; LDST, leading strand template; LGST, lagging strand template. Foster et al. expressed rpoB gene to the HO orientation: as a result, the rate of BPSs in rpoB conferring resistance to rifampin increased about 3-fold relative to the normal strain (8).
These artificial constructs maximize replication-transcription conflicts by positioning highly transcribed genes HO to replication. The conflicts then lead to mutations by one mechanism or another. However, the mutations that result, such as BPSs and indels that inactivated genes (15), are detrimental and would not be adaptive when not under selection. Unperturbed genomes have evolved to minimize conflict that would be deleterious (because most mutations are deleterious), and so replication-transcription conflicts normally do not contribute to spontaneous mutation rates (J. D. Wang, personal communication).
In addition, there are other mechanisms that can account for an apparent orientation bias in mutation rates. Flipping the orientation of a gene also changes whether its CDS is replicated as the leading or the lagging DNA strand, and the two processes appear to have different error rates. Using reversion of four alleles carrying point mutations in the lacZ gene in E. coli, Fijalkowska et al. (53) found 2-to 4-fold differences in mutation frequencies in one orientation versus the other. Importantly, the higher mutation frequencies were not coordinated with the direction of transcription. When mutations across the entire lacI gene were scored in the two orientations, the mutation frequencies were the same and thus not influenced by transcription, but the spectra of mutations were different (54). As we previously reported with a smaller data set (18) and shown here in Fig. 4, in the spectrum of BPSs across the E. coli chromosome is strand specific: AÁT-to-GÁC transitions are twice as likely to occur with A templating the lagging strand and T templating the leading strand than in the opposite orientation, whereas GÁC-to-AÁT transitions are twice as likely to occur with C templating the lagging strand and G templating the leading strand than in the opposite orientation. This is the same bias found in the experiments with lacI (54). Thus, as previously noted (5), when a gene is inverted, the DNA context met by the replication machinery for each strand is different, and this fact could result in the mutational biases reported. These biases would be particularly strong when only one or a few specific mutations are scored, as in the reversion assays referenced above.
Given that replication-transcription conflicts are detrimental to genomic stability (6)(7)(8)10), it has been argued that the persistence of HO-oriented genes in genomes means that mutation of these genes must convey an evolutionary advantage (34). Merrikh and colleagues (13,34,55) have argued that stress response genes are oriented HO to replication to increase the probability that these genes will acquire adaptive mutations. However, the data presented here for E. coli (Table 5) and previously for B. subtilis (5) show no evidence that stress-induced genes are preferentially in the HO orientation. In addition, we found that HO-oriented stress response genes also are not preferentially mutated (Table 5). Further, Schroeder et al. (5) have convincingly argued that the occurrence of HO-oriented genes in genomes is the result of a balance between inversions that create them and purifying selection that removes them. They further note that, at least in B. subtilis, genes oriented HO tend to be nonessential and thus under relaxed selection, which would further account for their persistence as well as for their higher rate of nonsynonymous BPSs. Supporting this hypothesis, more than twice as many essential genes in E. coli are oriented CD rather than HO to replication (Table S1), and there is no difference in the mutation frequencies between the genes in the two orientations ( Table 3).
As mentioned above, the most important determinant of mutation rate at a given site is the local sequence context. For BPSs, the identities of the adjacent bases can influence the mutation rate of a base as much as 400-fold (21). The mutation rates of indels are an exponential function of the length of the homopolymeric run in which they appear (18). Thus, as previously noted (21), apparent differences in mutation rates of genes oriented CD versus HO to replication can be explained to a great extent by differences in the sequence contexts in the two orientations. In addition, our finding that the orientation bias of mutation frequencies in tRNA genes was simply due to gene placement demonstrates that the distribution of specific mutational targets across the genome strongly influences mutation rates. These factors should be evaluated before concluding that a given bias in mutation rates is due to replication-transcription conflicts.

MATERIALS AND METHODS
Strains and procedures. All the E. coli strains reported in this paper are derivatives of PFM2, a prototrophic rpoS 1 derivative of MG1655 (35). The B. subtilis strains are derivatives of the nondomesticated strain NCIB 3610 (56) and were kindly supplied by D. Kearns and M. Konkol. The methods of strain construction, the conditions of growth, and the MA procedures have been published (19,24).
RNA sequencing. The RNA-seq procedures are published (24). However, there was an inadvertent error in the description of the samples. Due to a problem in sequencing, one replicate of the samples taken during the lag and stationary phases of growth was lost, so the means of only two replicates for those samples were used. These values are given in Table S2.
Mutation annotation. For our original mutation calls we used the E. coli reference sequence NC_000913.2 (version 2) (18). The current reference sequence, NC_000913.3 (version 3), differs in both the length of the genome (due to insertion sequences) and gene annotation (30). Twenty places where we called mutations in noncoding regions have been updated to genes in Table S2. Our original mutation calls were in protein-coding genes only, but mutations in known RNA genes have also been updated in Table S2. Version 3 was used for our RNA-seq analysis, but we obtained no expression data for 16 genes. Also 146 pseudogenes have been identified in version 3 for which we do not have mutational data. These minor discrepancies do not change any of the conclusions of this paper.
Statistical analyses. Data were fitted to Gaussian curves using the MathWorks MatLab Curve Fitting app. Statistical tests were computed using either Microsoft Excel or MatLab; the underlying methods are found in reference 57. While in almost all cases the probabilities from parametric tests (Pearson's linear correlation, Student's t tests, and F tests) are presented, nonparametric tests were always also used (Spearman's rank correlation and Mann-Whitney tests). If these probabilities differed, the difference is noted in the text, the tables, or the figure legends.
There were over 400 comparisons statistically evaluated for these studies. To compensate for multiple comparisons, we applied the Benjamini-Hochberg correction (58) to the original calculated values. To allow the uncorrected probabilities to be extracted, the relationship between the two probability values is shown in Fig. S4.
For the mutational analysis presented in this paper, the results of 10 MA experiments with MMR 2 strains, comprising 334 independent MA lines, were treated as one data set. The reasoning behind this treatment is given in reference 19. Briefly, there were no significant differences in the mutation rates or spectra among the experiments, as is clear from the small error bars in Fig. 2A and B. The data are dominated by one experiment with a DmutH strain that ran for nearly 100,000 generations (due to a miscommunication), whereas the rest of the experiments averaged about 20,000 generations. However, the results from this experiment are the same as those from the other experiments. Indeed, when the numbers of mutations from each experiment are simply normalized by the generations for that experiment, the means 6 standard deviations (SDs) are 0.115 6 0.017 for BPSs and 0.022 6 0.0014 for indels. Data availability. The sequences and SNPs reported in this paper have been deposited with the National Center for Biotechnology Information Sequence Read Archive (https://trace.ncbi.nlm.nih.gov/ Traces/sra/) (accession no. SRP013707) and in the IUScholarWorks Repository (http://hdl.handle.net/ 2022/25294).

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. TEXT S1, DOCX file, 0.01 MB.