Changes made to the
Araneus spider silk DNA sequences. (
A) The codon frequency is a measure of the abundance of codon sequences in the
E. Coli genome relative to each amino acid. The average frequency is the mean of the codon frequencies across the entire sequence of the silk gene. The average codon frequency is shown for the spider (gray) and synthetic (black) genes. Very rare codons (<10 per gene, defined as frequencies <0.13) were entirely eliminated from the sequences. (
B) The DNA sequence entropy of the repetitive units is shown for the wild-type spider (gray) and synthetic (black) genes. The repeat units for each silk monomer were manually aligned (Supplementary information) and the sequence entropy is calculated:

, where
N is the length of the repeat unit and
pi(
j) is the probability that base
j (A, T, G, C) occurs at position
i. The maximum of this function (when all four based are equally represented at each position) is ∼0.6. A lower sequence entropy indicates a higher degree of sequence identity between the repeat units. ADF-3 has extremely repetitive DNA sequences and this repetitiveness is effectively eliminated upon optimization. (
C) The amino-acid sequences of the synthetic spider silk genes are shown. Each silk sequence is labelled with a name and the gland in which it is produced. The repetitive regions used in the sequence entropy calculations are in red and green.