(*A*) Calculation of the probability that an n-mer sequence appears within a protein-coding region in the real genetic code. The 5-mer sequence S = UGACA can appear in one of the three reading frames. For each reading frame, the probabilities of all three codon combinations that contain S are summed up. Codon combinations with an in-frame stop (such as UGA) do not contribute to the n-mer probability since they cannot appear in a coding region. Vertical lines separate consecutive codons, stop codons are in red, P_{0}, P_{−1}, P_{+1} denote the probabilities of encountering S in the 0/−1/+1 frame. (*B,C,D*) Three examples of “difficult” n-mers in the real code and in alternative codes. (*B*) The 5-mer UGACA, which includes the stop codon UGA, can appear in a protein-coding sequence with the real genetic code in only two of the three possible reading frames (+1 and −1 frames). (*C*) In the alternative code shown in , whose stop codon AAA overlaps with itself, the 5-mer AAAAA cannot appear in a protein-coding sequence in any of the three reading frames. (*D*) In an alternative code with the overlapping stop codons CCG and CGG, the 5-mer CCGGU can only appear in one reading frame. The 5-mers are in bold text, stop codons are in red, N denotes any DNA letter, green v denotes a frame in which the n-mer can appear, red x denotes a frame in which the n-mer cannot appear. (*E*) Distribution of the probabilities of all 6-mers in the real code (bold black line) and in the alternative codes (light blue lines). The *x-*axis is the probability of obtaining 6-mers within protein-coding sequences; the *y*-axis is the number of 6-mers with this probability. In the real code there are significantly less “difficult” 6-mers (with low probabilities), relative to the alternative codes. (*F*) The fraction of n-mers that have a higher probability in the real code than in alternative codes increases with n-mer size. The *y*-axis shows the fraction of n-mers for which the average probability of appearing in the real genetic code is significantly higher than in the alternative codes.

## PubMed Commons