The Nature of Random 7-Mer Hits to Bigfoot Gene αCNSs.

**(A)** Plot is the number of each 7-mer motif in CNSs (*y* axis) versus number of hits in control noncoding DNA (*x* axis) where the expected hit ratio is 1:1 (primary data; see Supplemental Table 3 online). Each point is a particular 7-mer. The slope of the correlation line, 0.46 and not 1.0, and the volume of the points that define it (large oval) imply that many overrepresented 7-mers are removed (purified) from αCNSs (rightmost oval is the most purified). Some 7-mers are most enriched in αCNSs (circle at left). The 14 most significantly enriched 7-mers, in descending order of their significance, are as follows (all P < E-5): 5′-ACACGT, CACGTGT, CACGTGA, TCACGTG, CACGTGC, GCACGTG, CACGTGG, CCACGTG, ACGTGGC, GCCACGT, **CATG**TGA, TCA**CATG** (the MYCATERD1 box at PLACE: dehydration stress []), GGACCAC, and GTGGTCC (not in PLACE).

**(B)** and **(C)** These graphs illustrate the nucleotide content of purified 7-mers. The *x* axis is all purified 7-mers ranked from most significantly purified (lowest P value) on left to least significant on right, with arrows denoting the boundaries of the three nominal P value groups that are below P = 0.05 (95% confidence is all to the left of the rightmost arrow).

**(B)** Plots percentage of GC of 7-mer (*y* axis) versus significance of purification.

**(C)** Plots “yes or no” to the question, “Is there a run of four nucleotides in this 7-mer?” versus significance of purification, where a vertical line denotes “yes.” The three arrows denote, from left to right, nominal P values for significance of purification: P = 10^{−5}, 10^{−3}, and 0.05. Note that 7-mer purification is elevated with high percentage of AT and runs of the same nucleotide.

## PubMed Commons