Bioinformatics and Computational Tools – Etienne de Villiers, International Livestock Research Institute, Kenya
Dr. Etienne de Villiers opened the session with a discussion on the potential of bioinformatics and computational tools. He began by noting that he had never considered the use of his research for offensive weapons development, and rather considered bioinformatics a useful tool for assisting with areas such as vaccine development. Indeed, bioinformatics is useful in enhancing understanding of genome structures and in enabling the identification and manipulation of genes to make clear their functions. To accomplish this, bioinformaticists use an interdisciplinary approach that combines biology with statistics, mathematics, algorithms, databases and text mining.
Dr. de Villiers pointed out that there had been a genomics revolution over the last couple of years resulting in next generation sequencing which exploited the latest technology to achieve millions of sequences an hour. This technology is advancing rapidly—whereas the Human Genome Project cost hundreds of millions of dollars or more and took a great deal of time, it is now possible to sequence a genome in a week for thousands of dollars—resulting in an explosion of genome data.6 By means of examples, he pointed to the international 1000 genomes project, which involves looking at 2500 human genomes for genetic variation, and the 1000 plant and animal genomes reference project conducted by BGI China.7
Dr. de Villiers went on to outline how an explosion in computing power, in combination with lowered costs for computers and greater availability of this technology across the globe, is enhancing bioinformatics. These changes have resulted in the emergence of the concept of cloud computing, whereby individuals can gain legitimate access to high performance computers through the internet to conduct research. Under this model, computational power is a service that can be rented by users to the extent needed to achieve research goals. An alternative approach, termed distributed computing, draws from a network of smaller computers to create a supercomputer-like environment that enables analysis of complex problems. One example is the “Folding@Home” project which has a goal “to understand protein folding, misfolding, and related diseases.”8 Individual participants can donate a portion of their idle computer processing power to the analysis of protein structures. Dr. de Villiers noted that in 2009 40,000 central processing units (CPUs) supported this project, making it in essence the largest computer in the world.9
Advances in computational power and performance have resulted in ‘metagenomics’, an example of which is the Sargasso Sea community survey (Venter et al., 2004). Dr. de Villiers defined metagenomics as “the sequencing and analysis of DNA of organisms recovered from an environment, without the need for culturing them, using next generation sequencing technologies” and suggested that metagenomics could play an important role in global disease tracking as it enables researchers to identify and track what exists and where pathogen reservoirs are located. Over time this could facilitate the development of diagnostic sequencing capabilities and understanding of disease trends, or even the production of vaccines and drugs. To begin this process Dr. de Villiers and colleagues at the International Livestock Research Institute had begun the process of building a biobank to manage samples, which he suggested could be a unique resource for other researchers to use.
Systems Biology – Andrew Pitt, University of Glasgow, UK
The second speaker in the session was Dr. Andrew Pitt of the University of Glasgow, who discussed the issue of systems biology. Dr. Pitt began by pointing out that systems biology had become a buzzword for a process of using biological knowledge to produce a mathematical model that describes a system at different levels from the molecular to the ecosystem. Such a model could not only describe the system, but could be used to enable the researcher to make system predictions. In this regard, systems biology lies in the middle ground between informatics and synthetic biology and provides the resources to enable the conversion of biological information into something meaningful from which to generate new systems and biology.
Dr. Pitt noted that this is achieved by taking a rational engineering approach which draws from systems engineering and mathematics, along with increasingly diverse approaches to understanding systems including mobile phone networks and traffic management. Using these approaches, researchers have sought to take biological information and convert it into a suitable format from which to build a discrete model that can be described mathematically. Such a model can be used to both describe but also to predict. The latter is particularly important and requires the development of specific models to test predictions, something which becomes more complex as the focus shifts from cells, to tissues, to organisms, and to ecosystems. Indeed, the development of models which account for the relationship between these different levels and scales of complexity is particularly challenging and as a unified science, systems biology has a long way to go.
Nonetheless, current research generates the potential to achieve systems medicine in the future and there are a number of enabling technologies, such as genomics and proteomics, which have facilitated progress. Coupled with these developments, advances in mass spectrometry and high throughput screening provide the depth of data required to populate models while the increased understanding of bioinformaticists enables such data to more effectively be captured and converted. Though some of the underlying networks are relatively straightforward and there are key nodes where we can intervene, the complexity of connections nonetheless renders systems biology a challenge, evident in an example from Japan articulating the epidermal growth factor receptor (EGFR) signaling cascade, which illustrates the relationship between proteins and other elements within a systems pathway (Oda et al., 2005).
Dr. Pitt went on to elaborate on why a key challenge for systems biology is solving the mathematics, which in the case of the EGFR pathway model requires addressing some 211 reactions in 322 components, forming 7 RNAs based on 202 proteins, and the calculations here only cover a small portion of the mapping process. To understand how the complete system works, a massive number of interactions must be examined: at the genetic level it is likely to require some 4,000 calculations with a further 5,000 calculations at RNA level and 50,000 interactions in one mathematical equation at the level of proteins. In this regard, Dr. Pitt suggested that the number of numerical parameters is one of the main limits to advances in systems biology, particularly given that biological data reduction is slow and expensive and computational power is limited because of scale.
A further challenge identified by Dr. Pitt was the difficulty in overcoming the stochastic nature of some of the interactions, which is difficult to model using mathematics. He then elaborated on the potential for an approach which focused on building platforms to generate data more quickly. He noted, however, that even this approach still requires advances in computational power, and that while this is improving, there is still some way to go to achieve the improvements required to build substantial models and intervene at the cellular level.
Emerging Trends in Synthetic Biology – Pawan K. Dhar, University of Kerala, India
Dr. Pawan Dhar of the University of Kerala discussed emerging trends in synthetic biology in the third presentation of the session. Dr. Dhar began by pointing out that there was enormous possibility of creating useful applications using a rational design approach in biology. Accordingly, he suggested there was a need for building standards and rules of composition to engineer novel biological applications. Dr. Dhar said that in contrast to the top down traditional approach, the engineering “bit by bit” strategy focused on the composite sections of the system to create useful devices and networks.
He observed that scientists learnt biology by making ‘junk’ (mutations, knockdowns) and ‘garbage’ (knockouts) out of genes. He asked if one could do the opposite, i.e., make genes out of ‘junk’ DNA. His work has led to the effective conversion of “junk sequences into genes”, with the ‘junk’ being non-protein-coding genes within a genome. Dr. Dhar presented an example of E. coli research to illustrate this approach, where six intergenic sequences with no history of transcription were artificially activated to code for proteins (Dhar et al., 2009). His research indicated that although most of the gene activations had little or no effect on cell growth, activating one of the intergenic sequences resulted in cell death. Subsequent deactivation of this gene restored cells to normal growth, although why this happened was unclear at the molecular level. Further, given that a DNA sequence could now be artificially expressed in several frames, he proposed the emergence of combinatorial genomics as a new way of doing biology.
Dr. Dhar then described some of the outreach work being conducted at the Centre for Systems and Synthetic Biology, in India. The Centre recently organized Biodesign India, the first synthetic biology event in the country. The aim was to try and understand what bio-design related activities were underway in India and address a degree of sensitivity to the surrounding ethical questions of this type of research. To provide an open access platform for synthetic biology research in India, the Centre has set up a synthetic biology wiki (a webpage that enables users to create and edit content) for information sharing among labs.
On the global stage, Dr. Dhar suggested that in the future we were going to see arrival of faster and cheaper DNA synthesis technologies and cited the work of Robert Carlson, who recently predicted rapid fall in the cost of long DNA synthesis (Carlson, 2009) and Craig Venter, whose group experimentally developed a “synthetic” microbial cell (Gibson et al., 2010).10 He pointed out that Dr. Venter’s approach was prohibitively expensive and time consuming, and thus not likely “scalable” in the present form. However, Dr. Dhar predicted that rapid and cheaper organism construction strategies, whole genome cloning, synthetic chromosomes, and application-oriented minimal synthetic cells would emerge in future. Dr. Dhar suggested that ideas like non-natural genetic codes, RNA structural engineering, and computer aided design of pathways may be more accessible and he envisions the emergence of several major non-biobrick initiatives in the near future.11 He also predicted a trend toward a greater number of graduate programs (Masters and Ph.D.) in synthetic biology worldwide. He observed that due to lack of experience in constructing organisms it was difficult to accurately predict the potential adverse effect of this approach, and noted that it makes him nervous to visualize the scenario of someone firing microbes as bullets, given the fact that such bullets think. As a result, he concluded by noting that it is not possible to maintain absolute control over developments in the biological sciences such as synthetic biology, but that “big challenges, [an] unclear roadmap, [and a] fear of the unknown” remain.
The session moved to an open discussion during which a number of key themes emerged. Many participants noted that our knowledge and understanding still remains limited in terms of an ability to predict the outcomes and functions of biology resulting from genetic modification or modulation. Accordingly, the challenge of developing appropriate risk assessment strategies to accommodate unpredictable systems is difficult. Equally, participants suggested that it would be difficult to regulate the things “we don’t know that we don’t know”, although participants pointed to the value of bringing together the science and policy communities to discuss these issues. Finally, participants noted the continuing worldwide expansion of biological sciences research capacity, such as the active synthetic biology community in India and the ability of research institutes in Kenya to draw on advanced computational resources.
The Human Genome Project cost several billion dollars (The Human Genome Project Completion: Frequently Asked Questions, available at http://www
.genome.gov/11006943), which included more than technology and sequencing expenses. Several companies currently offer human whole genome sequencing services and the prices continue to drop. Illumina, Inc., for example, offers genome sequencing for $19,500 with sequencing in “medically indicated cases” costing $9,500 (www .everygenome.com; accessed 3/14/2011). Similarly, whole genome sequencing by Complete Genomics reportedly costs approximately $10,000 (http://www .completegenomics.com/) and the company has reported its sequencing consumables costs to be approximately $4,400 (Drmanac et al., 2010). Companies continue to pursue advances in technology and to decrease sequencing costs as they race toward a “one thousand dollar genome”, which has been seen as an important milestone below which demand for genome data is expected to explode still further (Wolinsky 2007; Venter 2010).
Information on Folding@Home is available at http://folding
The central processing units of a computer are involved in carrying out the instructions in a computer program; depending on the nature of the problem to be solved, the number of CPUs may provide an indication of computing power.
As described in the cited article, Dr. Venter’s group chemically synthesized the genome of the bacterium Mycoplasma mycoides based on the genetic sequence of the naturally occurring organism with the addition of certain distinguishing chemical “watermarks” and inserted this synthesized genome into a recipient cell of the related bacterium Mycoplasma capricolum, from which the natural genetic material had been removed. The synthesized M. mycoides genetic material was successfully able to instruct the resulting cell to grow and self replicate (Gibson et al., 2010).
The bio-brick model seeks to create standardized DNA “parts” with defined functions for combination into new systems. More information is available through the BioBrick Foundation at http://openwetware
.org /wiki/The_BioBricks_Foundation and the Registry of Standardized Biological Parts, available at http: //partsregistry .org/wiki/index.php/Main_Page.
National Academies Press (US), Washington (DC)
National Research Council (US). Trends in Science and Technology Relevant to the Biological and Toxin Weapons Convention: Summary of an International Workshop. Washington (DC): National Academies Press (US); 2011. DEVELOPMENTS IN DESIGN, FABRICATION, AND PRODUCTION.