The Darwinian Basis of Life
The previous presentations in this session described a mix of top-down and bottom-up approaches to synthetic biology. This presentation is intended to take us all the way down to the very bottom as we discuss synthetic biology “from scratch,” that is, life from scratch. Life is something we all know when we see it, right? Consider the giant bacterium Titanospirillum velox, which we all would agree is alive. Yet in a recent publication by Richard Hoover that appeared in the online Journal of Cosmology (Hoover, 2011), he describes what he believes is an extraterrestrial analog of Titanospirillum, found within a certain class of carbonaceous meteorites. Much of the popular media fell for it, abetted by a Fox News report showing a micrograph of Titanospirillum, rather than the mineralic artifacts within the meteorite that most experts agree have nothing to do with life.
But surely the experts must know life when they see it, right? And therefore the experts must know how to define life. Indeed some scientists have advanced a definition, or at least a working definition, of life. Some, including me, have pointed to the so-called NASA working definition of life, which is a self-sustained chemical system that is capable of undergoing Darwinian evolution (Joyce, 1994). In several of the presentations at this workshop, evolution has been referred to as the distinguishing, if not defining, feature of life. The Darwinian paradigm is the only one we know that explains how biological complexity can sustain itself against the vagaries of a changing environment. Darwinian evolution has a rigorous scientific foundation, but the word “life” is not a scientific term. As Andrew Ellington would say, it is a term for poets and philosophers, and scientists, let alone a government agency such as NASA, should not be in the business of trying to define it.
Without becoming bogged down in definitions, we can agree that life is all about Darwinian evolution, and that scientists understand what Darwinian evolution is all about. The key principles of Darwinian evolution are, first, heritable variation of form and function among a population of individuals; second, competition for finite resources by those individuals; and third, preferential reproduction of variants that operate most effectively in the competitive environment. In The Origin of Species, Charles Darwin made the observation: “Owing to this struggle for life, any variation … if it be in any degree profitable to an individual of any species, in its infinitely complex relations to other organic beings and to external nature … will generally be inherited by its offspring” (Darwin, 1859; italics added). If life has a slogan, if not a formal definition, it would be inherit profitable variation. I, for one, would be comfortable abiding by this slogan.
In considering synthetic biology from scratch, the focus is on the evolution of functional molecules, rather than organisms. The principles of directed molecular evolution are the same as those for the Darwinian evolution of organisms, again captured by the slogan inherit profitable variation. In chemical terms, Darwinian evolution involves three processes: (1) reproduction of information-carrying molecules (inherit), (2) selection of molecules that meet some fitness criteria (profitable), and (3) maintenance of chemical diversity among the population of molecules (variation).
Over the past two decades, the technology of direct molecular evolution has become very powerful, but also routine. There are many methods for introducing molecular variation, both for generating initial combinatorial libraries and for maintaining variation in a population. Individuals that meet some fitness test can be separated based on a high-throughput screen or a selection procedure. Through screening, one or more highly advantageous variants can be identified and subsequently mutagenized to provide a second-generation combinatorial library. This process of iterative high-throughput screening can be a powerful discovery tool, although it does not fully capture the power of Darwinian evolution, which requires that the population as a whole be subject to repeated rounds of selection and randomization. Maintaining a diverse population is key to exploring the fitness landscape because it allows both dominant and subdominant individuals to give rise to novel variants. The subdominant individuals, following the acquisition of a few beneficial mutations, may yield descendants that are more advantageous compared to the previously dominant individuals. Ideally, the loss of variation due to selection should be compensated by the introduction of novel variation throughout the course of evolution.
There are many methods for selecting profitable molecules. For example, molecules can be selected based on their specific chromatographic mobility, their ability to withstand exposure to some physical condition, their ability to bind to a target ligand (aptamers), or their ability to catalyze a particular chemical transformation (enzymes). More complex selection criteria can be imposed, such as a combination of both positive and negative selection, conditional selection, and the selection for multiple attributes.
Finally, there are various methods for reproducing the profitable molecules in order to bring about the inheritance of selectively advantageous traits. If the selected molecules are DNA or RNA, then it is straightforward to achieve their amplification by using the appropriate polymerase enzyme(s), resulting in large numbers of progeny. If the selected molecules are proteins, which cannot be amplified directly, then one must amplify nucleic acid molecules that encode and are physically linked to the corresponding proteins. Techniques such as phage display, ribosome display, and compartmentalized self-replication make it possible to amplify an ensemble of genes that encode a corresponding set of proteins. The same principle of genetic encoding can be used to amplify other informational macromolecules, such as peptide analogs, polysaccharides, and even multicomponent organic molecules.
All of the amplification methods discussed above are not part of a self-sustained evolving system because they rely on informational macromolecules that are not themselves subject to evolution within the system. Protein polymerases, phage particles, and ribosomes all are the products of Darwinian evolution in biology. They are employed as operators in laboratory evolution systems, but their information content is not subject to evolution within those systems.
The experimental pursuit of life from scratch began in 1953 with the work of Stanley Miller, then a graduate student at the University of Chicago, who sought to cook up a prebiotic soup from entirely abiotic ingredients (Miller, 1953). None of those ingredients had information content derived from Darwinian processes. It has been almost 60 years since Miller’s classic experiments, and life has yet to be produced starting from a prebiotic soup. However, considerable progress has been made toward synthesizing life using other methods, and it now appears that the production of life from scratch will be achieved in the near future. Some research efforts along these lines have attempted, as Miller did, to recapitulate the historical origins of life on Earth. Other efforts take inspiration from the origins of life on Earth, but aim for a second origin that would occur under decidedly artificial laboratory conditions. Considerable attention has been directed toward the critical role that RNA is thought to have played in the early history of life on Earth, during an era referred to as the “RNA world” (Atkins et al., 2011). RNA continues to play a central role in contemporary biology, which further motivates the goal of constructing RNA-based life from scratch.
Self-Sustained Darwinian Evolution
An explicit aim of my own research program is to construct a system of RNA molecules that undergo self-sustained Darwinian evolution. In fact this goal was recently achieved, although the system still lacks the complexity and inventiveness of what one might regard as life. The self-sustained evolving system employs populations of RNA enzymes that catalyze the RNA-templated joining of RNA substrates. The enzymes contain ~55 essential nucleotides and can be made to join pairs of RNA substrates of almost any sequence (Rogers and Joyce, 2001). If the substrates, once joined, form additional copies of the enzymes, then self-replication can be achieved. The newly formed enzymes behave similarly, resulting in exponential growth (Paul and Joyce, 2002). However, the process cannot be sustained indefinitely and is informationally restricted by the requirement that the original and newly formed enzymes must have the same sequence.
An improved version of the replication system employs two different RNA enzymes that catalyze each other’s synthesis, enabling their cross-replication and sustained exponential growth (Kim and Joyce, 2004; Lincoln and Joyce, 2009). Each enzyme of the cross-replicating pair contains two substrate-binding domains that recognize corresponding oligonucleotide substrates through Watson-Crick pairing. During cross-replication, the “Watson” enzyme joins two pieces of RNA to form the “Crick” enzyme, while the “Crick” enzyme joins two pieces of RNA to form the “Watson” enzyme. Information is passed back and forth between these two enzymes in the form of particular sequences within the two substrate-binding domains.
Following optimization of the cross-replication system, it now is possible to achieve 100-fold amplification in just a few hours at a constant temperature and in the absence of any biological materials (Lincoln and Joyce, 2009). The only informational macromolecules in the system are the enzymes and their components, which themselves are subject to Darwinian evolution within the system. The only other components are MgCl2, a buffer to maintain pH, and H2O. Evolution can occur because there are many potential variants of the cross-replicating enzymes that must compete for a finite supply of substrates and can undergo mutation through recombination of the two substrate-binding domains.
Beginning with a small seed of the cross-replicating enzymes, amplification occurs with exponential growth, limited only by the amount of substrates that are available. The amplification profile follows the logistic growth equation [enzyme]t = a / (1 + be−ct), where a is the maximum extent of amplification, b is the degree of sigmoidicity, and c is the exponential growth rate. This equation also describes population growth for biological organisms constrained by the carrying capacity of their local environment.
Cross-replication of the RNA enzymes can be sustained indefinitely by continuing to supply the necessary substrates. This is most conveniently achieved through a serial transfer procedure, whereby a small aliquot is taken from a spent reaction mixture and transferred to a new reaction vessel that contains a fresh supply of substrates. The new reaction mixture contains only those enzymes that were carried over in the aliquot, and these enzymes immediately resume exponential amplification in the new mixture. Within a period of 24 hours, an overall amplification factor of >109 can be achieved (Lincoln and Joyce, 2009).
Self-sustained exponential amplification provides the growth engine for Darwinian evolution, but it is the diversity of enzymes in the population, their differential reproductive fitness, and their capacity for mutation that provides the opportunity to evolve macromolecular information. Profitable variation in the system emerges through particular combinations of substrates that form cross-replicators with high reproductive fitness. The genetic basis for this variation is represented by the two “loci”—the two substrate-binding domains—that can exist as any of a large number of possible “alleles.” Each allele can be made to encode a different corresponding phenotypic trait, embodied by the functional domain that is physically linked to that allele. If there are n potential variants of the first allele and m potential variants of the second allele, then the combinatorial complexity of the system is n × m.
As an example, a population of cross-replicating enzymes was constructed with 12 different alleles at each of the two loci, providing a combinatorial complexity of 12 × 12 = 144. Each variant allele was linked to a different form of the catalytic center of the enzyme, which resulted in differential reproductive fitness for various combinations of alleles at the two loci (Lincoln and Joyce, 2009). The evolution process was seeded with 12 different cross-replicating pairs that, due to mutation, could give rise to any of the other 132 combinations. Evolution was allowed to proceed in a self-sustained manner for 100 hours, with an overall amplification factor of 1025. During this time the starting 12 replicators decreased in abundance as novel variants arose and came to dominate the population. Three of these novel variants together accounted for about half of the population members after 100 hours. The basis for their selective advantage was shown to be their relatively fast amplification rate and their propensity to cross-mutate to form additional copies of each other.
Toward Inventive Evolution
The capacity for the invention of novel function within the context of Darwinian evolution depends on both the genetic complexity of the system and the functional richness of the corresponding phenotypes. A population of 144 replicators is represented by only ~7 bits of genetic information. This is much less than the genetic complexity of even the simplest biological systems, which have an information content of 2 bits per base pair of genetic material. In principle, the synthetic RNA-based evolving system could have an information content of 30 bits, considering the 15 total base pairs within the two genetic loci. This would provide a molecular diversity of 415 = 109. However, it would not be possible to manage such high diversity because of the vast number of substrate molecules that would need to be present in the reaction mixture. These would slow replication as each enzyme must find its cognate substrates from among the complex mixture.
Current research efforts in our laboratory aim to maximize the genetic complexity of the self-sustained evolving system within the practical limits of both generating and harvesting molecular diversity. We constructed a population of cross-replicating enzymes with 64 different alleles for each of the two genetic loci, providing a combinatorial complexity of 64 × 64 = 4,096. In this test population, each variant allele was linked to the same functional sequence. This was done to assess the extent to which differences in genotype alone would result in differential fitness, something that should be minimized to allow the broadest exploration for novel phenotypes. Starting with this library, 106-fold selective amplification was carried out; then individuals were cloned from the population and sequenced. Indeed two sources of genotype-related bias were identified, and these biases were eliminated from subsequent populations that were constructed.
Although the replicative function is the most important aspect of phenotype, replication can be made contingent on other functions so that fitness reflects the ability to execute those other functions. There is a stem-loop region of the RNA enzyme that supports the structure of the catalytic center and is generic in sequence, so long as it forms a stable secondary structure. This stem loop can be replaced by a ligand-binding (aptamer) domain, configured so that in the absence of the ligand the domain is unstructured, whereas in the presence of the ligand the domain adopts a folded state that supports the active structure of the enzyme. In this way replication can be made contingent upon recognition of the target ligand (Lam and Joyce, 2009).
As two examples, the supporting stem loop of the enzyme was replaced by an aptamer that recognizes either theophylline or FMN (Lam and Joyce, 2009). In the absence of the ligand there is no replication, but in the presence of the ligand there is sustained exponential growth. Furthermore, the exponential growth rate depends on the concentration of the ligand relative to the Kd (equilibrium binding constant) of the ligand-binding domain. This method for quantitative, ligand-dependent exponential amplification may have applications in biosensing and molecular diagnostics. It is analogous to quantitative PCR for the measurement of nucleic acid targets, but it operates at a constant temperature and can be generalized to non–nucleic acid targets, including proteins, drugs, and metabolites that can be recognized by an aptamer (Lam and Joyce, 2011).
The self-sustained evolution system could, in principle, be used to discover novel aptamers. A genetically encoded random-sequence domain could be placed adjacent to the catalytic center such that ligand recognition would result in selective amplification of the functional molecules. This would require sufficient genetic complexity to encode a population containing enough variants to include molecules with the desired function. Our most recently constructed populations of cross-replicating enzymes have 256 different alleles for each of the two loci, providing a combinatorial complexity of 256 × 256 = 65,536. This still is likely to be insufficient to derive novel aptamers within the context of self-sustained evolution, unless the target ligands are compounds that have a strong propensity to bind to RNA.
The populations with a combinatorial complexity of 256 × 256 were constructed by two different methods. The first involved serial production of all 256 + 256 = 512 allelic variants, synthesizing individual DNA templates to link each genotype-phenotype combination, then transcribing the templates to generate the corresponding RNAs. The advantage of this approach is that each variant is stored in a separate location, allowing one to prepare custom sublibraries. The disadvantage is that, at cost of synthesis of ~$10 per variant, the method cannot be extended to much more complex populations. The second approach for constructing the populations involved a novel split-and-pool method that makes it possible to synthesize the entire population in parallel. The parallel method enables the construction of populations with as many as 109 different members, although, as discussed above, it would be difficult to manage such high-diversity populations throughout the course of a self-sustained evolution experiment.
With increasing population complexity it becomes important to consider the nature of the code that relates particular genetic sequences to their corresponding functional sequences. This code can be chosen arbitrarily, but evolutionary optimization likely will benefit if more closely related genetic sequences correspond to more closely related functional sequences. The code need not have a collinear 3:1 relationship, as is the case for the genetic code in biology which relates trinucleotide codons within mRNA to individual amino acids within proteins. Furthermore, the same code need not apply for all genotype positions, although this simplification clearly has great selective advantage for natural biology. In synthetic biology, the experimenter must decide what genetic code to employ based on theoretical and practical considerations.
In preparing the populations of 256 × 256 combinatorial complexity, two different “sparse” codes were implemented, whereby each nucleotide within the genotype region encodes one or more nucleotides within the phenotype region. For the serially constructed population, each of four genetic nucleotides encodes three noncontiguous nucleotides in the functional region, with a different codon relationship for each genetic position. For the parallel library, each of four genetic nucleotides encodes either one or two contiguous nucleotides in the functional region, again with a different codon relationship for each genetic position. Experimental studies are under way to assess the operational characteristics of these different codes.
When practicing synthetic biology from scratch the experimenter makes the rules and allows Darwinian evolution to play the game. The capacity of the system to invent novel function is the most important measure of its robustness. Inventiveness should be as broad as possible so that the system can adapt to unanticipated changes in its environment. Life on Earth, although vulnerable to extreme changes of environmental conditions, has demonstrated extraordinary resiliency and inventiveness in adapting to highly disparate niches. Perhaps the most significant invention of life is a genetic system that has an extensible capacity for inventiveness, something that likely will not be achieved soon for synthetic biological systems. However, once informational macromolecules are given the opportunity to inherit profitable variation through self-sustained Darwinian evolution, they just may take on a life of their own.
Jeff Rogers developed the RNA enzyme that provides the basis for replication, Natasha Paul provided the first demonstration of a self-replicating RNA enzyme, Dong-Eun Kim converted the system to a cross-replication format, Tracey Lincoln first achieved self-sustained Darwinian evolution of RNA enzymes, Bianca Lam made replication contingent upon recognition of a target ligand, and Michael Robertson and Jonathan Sczepanski constructed increasingly complex populations of the cross-replicating RNA enzymes. This work was supported by grants from the National Aeronautics and Space Administration (NNX10AQ91G), the National Institutes of Health (GM065130), the National Science Foundation (MCB-0948161), and the Defense Advanced Research Projects Agency (DSO BAA-09-63).
- Atkins JF, Gesteland RF, Cech TR, editors. RNA Worlds: From Life’s Origins to Diversity in Gene Regulation. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press; 2011.
- Darwin CR. On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life. London: John Murray; 1859. p. 61.
- Hoover RB. Fossils of cyanobacteria in CI1 carbonaceous meteorites. Journal of Cosmology. 2011;13 published online March 2011.
- Joyce GF. Foreword. Deamer DW, Fleischacker GR, editors. Boston: Jones and Bartlett; Origins of Life: The Central Concepts. 1994:xi–xii.
- Kim DE, Joyce GF. Cross-catalytic replication of an RNA ligase ribozyme. Chemistry and Biology. 2004;11:1505–1512. [PubMed: 15556001]
- Miller SL. A production of amino acids under possible primitive Earth conditions. Science. 1953;117:528–529. [PubMed: 13056598]
The Scripps Research Institute.
Gerald F. Joyce54.
National Academies Press (US), Washington (DC)
Joyce GF. SYNTHETIC BIOLOGY “FROM SCRATCH”. In: Institute of Medicine (US) Forum on Microbial Threats. The Science and Applications of Synthetic and Systems Biology: Workshop Summary. Washington (DC): National Academies Press (US); 2011. A9.