NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Research Council (US) Committee on Scientific Milestones for the Development of a Gene Sequence-Based Classification System for the Oversight of Select Agents. Sequence-Based Classification of Select Agents: A Brighter Line. Washington (DC): National Academies Press (US); 2010.

Cover of Sequence-Based Classification of Select Agents

Sequence-Based Classification of Select Agents: A Brighter Line.

Show details

4Committee Findings and Conclusions

This chapter summarizes key findings and major conclusions. As discussed below, the committee finds that it is not feasible to develop an accurate oversight system based on prediction. However, a gene sequence based classification system for Select Agents and a “yellow flag” biosafety system for “sequences of concern” could be developed with current technologies. The classification system discussed in Chapter 3 (see also Appendix L) could provide much needed clarification regarding application of the Select Agent Regulations. The “yellow flag” system could provide a means of guidance and oversight for “sequences of concern.” The “yellow flag” system would function as an extension of biosafety; however, because it is not regulatory in nature, it could provide information relevant to biosecurity in a more dynamic and timely fashion than the Select Agent Regulations.

The committee has identified crucial components that would enable such systems. Although the individual near-term milestones, as described, may be beneficial to scientific progress and would probably improve the current bio-safety and biosecurity system, careful consideration should also be given to the limitations and challenges of developing and implementing these or similar systems.


(1.) Purpose of the Select Agent Program: The Select Agent Program is intended to restrict access to known agents that pose a threat to biosecurity.

(a.) The Select Agent Program is intended to focus on biosecurity, rather than biosafety.1 As discussed in Chapter 1, biosafety and biosecurity are related and complementary, but there are important distinctions. Biosafety in Microbiological and Biomedical Laboratories (CDC/NIH 2007) defines biosafety programs as those which “reduce or eliminate exposure of individuals and the environment to potentially hazardous biological agents,” whereas the objective of biosecurity is to “prevent loss, theft or misuse of microorganisms, biological materials, and research-related information” (CDC/NIH 2007). Biosafety is reducing the risk that pathogens or toxins will escape containment and cause illness in researchers, clinicians, or the general public. Biosecurity is related to minimizing the possibility that such pathogens will be used for malevolent purposes.2 The BMBL sets standards for how U.S. laboratories conduct research with biological agents and toxins; the Recombinant DNA Advisory Committee (RAC) and Institutional Biosafety Committees (IBCs) provide guidance and oversight focused on biosafety.3 This is a robust and effective system. In fact, if the purpose of the Select Agent Regulations were solely biosafety, it would largely be an unnecessary duplication. However, the primary role of the Select Agent Regulations is to restrict access to agents that may be used as biological weapons by someone with nefarious intent. There is a good deal of overlap between biosafety and biosecurity threats in that the pathogens deemed to pose as the greatest biosafety risk (BSL-4 agents) are all Select Agents; however, not all Select Agents pose substantial risk to individual public health (for example, BSL-2 agents may be Select Agents). An agent may pose a security risk because of its potential for weaponization or adverse economic consequences, rather than direct effect on human health.

Handling of Select Agents requires controlled access to facilities, physical security, inventory control, and site-specific risk assessments. Everyone who has access to Select Agents must be cleared through the Federal Bureau of Investigation’s Criminal Justice Information Services Division with a background check. Failure to meet the requirements may result in criminal penalties of fines and up to 10 years of imprisonment. Thus, the Select Agent Regulations can be reasonably viewed as an instrument of law enforcement to facilitate attribution4 and prosecution in the event of domestic use or, deliberate or inadvertent possession of potential biological weapons.

(b.) The Select Agent Program necessarily focuses on the known. The Select Agent Program is intended to limit access to agents that there is reason to believe could be used as weapons—essentially to remove the “low-hanging fruit” and make it more difficult for persons with nefarious intent to obtain or create a bioweapon. The Select Agent Regulations work primarily and most effectively in the context of possession and transfer of known stocks—providing a “chain of custody” 5—in which names and Select Agent status are propagated in a well-defined manner from registered sender to registered recipient of Select Agent cultures.

As discussed throughout this report, Select Agents are defined according to a taxonomic list of known bacteria, viruses, toxins, and fungi. Novel agents, whether natural or synthetic, are not covered by the Select Agent Regulations. When a novel agent emerges, it is named, and research is initiated to study its mechanism of action, the potential threat that it presents, and its susceptibility to countermeasures. After knowledge is obtained, an agent may be considered for inclusion on the Select Agent list. It is a deliberate process. The Select Agent Regulations are appropriately backwards-looking and based on a list of known agents. They are intended to protect the nation by restricting the availability of agents that are known from actual experience to be dangerous, that can be usefully controlled by “chain of custody” measures, and that have a high potential for biowarfare or bioterror. A list of named agents is in fact a reasonable model for the Select Agent Regulations despite the serious problems and ambiguities inherent in assigning discrete taxonomic identities to a continuum of biological organisms.

(2.) “Select Agent-ness” has biological and non-biological components. The Select Agent designation depends on a variety of considerations. Some of these are biological (such as virulence, transmissibility, dissemination, and ability to be weaponized); but others are not (such as public perception, economic impact, intelligence data, availability of countermeasures, and natural prevalence). Because the security threat posed by an agent is not determined by biological criteria alone, Select Agent status can never be predicted from sequence alone. “Select Agent” is not a scientific description; it is a policy designation.

(3.) Biology is not binary. Microorganisms are not either “potential weapons of mass destruction” or “of no concern.” No single characteristic makes a microorganism a pathogen, and no clear-cut boundaries that separate a pathogen from a non-pathogen. Pathogenic microorganisms are not defined by taxonomy; it is common for a microbial species to have pathogenic and non-pathogenic members. An agent has multiple biological attributes, and the degree to which they are expressed falls along a spectrum for each biological characteristic;6 consequently, agents pose varying degrees of risk.7

Moreover, the genes and sequences that could potentially be used to create a bioweapon come from all of biology. For instance, a human sequence, such as interleukin-4, could be appropriated to trigger a severe immunological response and cause illness or death. Likewise, a toxin gene from a plant could, in theory, be incorporated into a bioweapon. Microorganisms are by no means the only source of sequences of concern. Biology is diverse and dynamic and has many unclear boundaries. No single criterion or absolute threshold can be applied to identify biological threats. The biosafety framework uses several levels of containment to addresse the various degrees of risk posed by a microorganism or experiment using several levels of containment. Because of the complexity of biology, a microorganism or an experiment is best evaluated and best overseen case by case.

(4.) It is not feasible to predict pathogenicity from sequence now, and it will not be in the foreseeable future. As discussed in Chapter 2, sequence prediction in biology is subject to a hierarchy of difficulty that reflects the complexity of the system under analysis. The simplest of such predictions would probably be that of a single protein. Next in order of predictive difficulty would be a genetic pathway (a group of co-regulated multiple proteins that act in concert). The third simplest set of sequences to evaluate as a means of forecasting function are those of whole organisms alone in a controlled environment (multiple pathways act in concert). The most difficult predictive situation would be one in which two or more organisms interact in their natural environment.8 It is that last level of complexity may give rise to the key biological attributes of pathogenicity and transmissibility, which contribute to the criteria that form the basis of inclusion of an organism on the Select Agents list.

Predicting pathogenicity, transmissibility, or environmental stability of a microorganism requires a detailed understanding of multiple attributes of the pathogen, its host, and its environment. It is a prediction problem of the greatest complexity. By the time we have a general ability to predict host-pathogen interactions on the basis of pathogen genome sequence alone, we will probably have solved a number of other major problems in biology (such as developing a vaccine for HIV, curing the common cold, and achieving personalized medicine). It might never be possible to predict pathogenicity from sequence at a level of certainty that would be required for legal statutes, such as the Select Agent Regulations, that require definitive accuracy as opposed to probabilistic risk assessment. Reliable prediction of the hazardous properties of pathogens from their genome sequence alone will require an extraordinarily detailed understanding of host, pathogen, and environment interactions integrated at the systems, organism, population, and ecosystem levels. For the foreseeable future, the only reliable predictor of the hazard posed by a biological agent is actual experience with it. High-level phenotypes like pathogenicity and transmissibility cannot now plausibly be predicted with the degree of certainty required for regulatory purposes, and it will probably not be possible in the foreseeable future.9

(5.) Prediction and design are linked. Design and prediction go hand in hand; our lack of predictive ability in biology also means that we cannot design genomes de novo. If we lack the ability to predict an organism’s phenotype from its genome sequence, we necessarily lack the ability to design a novel genome sequence with a desired phenotype. Designing a self-replicating organism that has only to interact with simple molecules in a test tube is difficult; designing a pathogen that has to interact with a complicated host, evade the host’s immune system, and be transmissible in the natural environment adds daunting layers of biological complexity. There are very few cases in which a single protein sequence has been designed to fold in a particular novel way. The first few modest successes in de novo design of single proteins constitute the current state of the art. Synthetic biology cannot be used to design and create an entirely novel pathogen, for exactly the same reasons that we cannot predict whether a genome sequence will be that of a pathogen. Without predictive ability, designers cannot know whether their designed sequences will work. The “entirely novel synthetic bioweapon” scenario is not plausible. However, as discussed in Chapter 3, it is possible, and even routine, to modify known organisms and to construct chimeras.

(6.) Synthetic genomics poses three threat scenarios that would allow a “bad actor” to obtain a pathogenic organism with Select Agent properties; one of them (modified Select Agents) is of most immediate concern. Chapter 3 described three scenarios in order of increasing technical difficulty, and therefore decreasing likelihood: modified pathogens; chimeric pathogens; and designed pathogens. More likely scenarios should be addressed before there is inordinate worry about less likely ones. The Select Agent Regulations are intended to control facile access to the most dangerous known pathogens. Synthetic genomics is beginning to make it possible to obtain pathogens by synthesis without the need for access to a live culture of an agent. A high degree of technical sophistication and great expense are necessary to synthesize and “boot” a known Select Agent genome, and an even higher degree of sophistication is required to produce a non-trivially modified Select Agent genome (a synthetic genome derived from a Select Agent with a small number of additions, deletions, and modifications of genes) that would be likely to function; nonetheless, these are the most plausible (if unlikely) “garage laboratory” scenarios. Non-trivial chimeric constructions (more wholesale rearrangement and “assembly” of parts from different organisms into a novel whole) are extraordinarily challenging and would almost certainly require large laboratory resources and iterative optimization in an experimental testing program in susceptible hosts, contravening the Biological Weapons Convention (The committee sees the realm of chimeric genomes as beyond the regulatory scope of the Select Agent Regulations). De novo design remains essentially infeasible. Thus the committee believes that the most pressing issues raised in connection with the Select Agent Regulations by synthetic genomics and synthetic biology involve the synthesis or modification of known Select Agent genomes or modifications of known Select Agent genomes.

(7.) There is an important distinction between sequence-based prediction and sequence-based classification.

Prediction of complex biological properties is not currently feasible, just as design of an entirely novel pathogen de novo is not possible. For the foreseeable future, synthetic genomics and synthetic biology will be done by modification and rearrangement of genes that already exist in nature. If we assume the most plausible threat to come from modifications and rearrangements of genes from known Select Agent genomes, we can anticipate the most likely “space” of possible modifications and most obviously worrisome chimeras that might create a genome that encodes Select Agent properties. Because we can use sequence analysis to recognize genes and genomes and classify them into known families, we can use sequence analysis to designate particular genome sequences unambiguously as equivalent to “complete, infectious” Select Agents and to identify “sequences of concern.”

Sequence-based classification is strictly operational—a set of tools for drawing decision boundaries around known sequences that do and do not belong to a desired classification. The tools are used now for robust and automatic classification of gene sequences into usefully annotated sequence families. For an operational definition of a complete Select Agent genome, we can define a parts list of genes that are thought to be necessary, although not sufficient, to make up a biologically functional Select Agent genome. We might even choose to simplify a classification system deliberately by defining an operationally “complete” genome as having a necessary subset of parts rather than a complete set. We should be able to establish a reasonable operational definition of the sequence space circumscribing complete agent genomes, as distinct from incomplete genomes or complete genomes of related non-Select Agents thus establishing a “brighter line,” an unambiguous procedure for deciding when a genome sequence is assigned one of the taxonomic names on the Select Agent list.

Determining whether a sequence really does encodes a viable, functional, “infectious form” of a Select Agent is an empirical experimental question, and will long remain beyond any foreseeable predictive ability in biology. However, for the purposes of sequence-based classification, we do not need to have complete knowledge. Partial knowledge reflects the state of current knowledge, suffices for an operational definition that partitions sequence space in a way that avoids the misclassification of non-Select Agent genomes (such as those of vaccine strains or related non-pathogenic species) while trying to “deny” the spaces encompassing the modifications of Select Agent genomes that could most plausibly still encode a Select Agent pathogen.

(8.) Sequence-based classification could be used to address an immediate challenge raised by synthetic genomics.

Synthetic genomics is increasingly making it possible to obtain Select Agents by synthesis rather than by access to a live laboratory culture and to create modifications that blur taxonomic classification boundaries yet still might be expected to function as a Select Agent. Because the Select Agent Regulations cover creation, transfer, and possession of complete synthetic genomes, not just those of viable Select Agents, gene and genome synthesis companies, for example, need to know unambiguously whether a customer’s order is for a synthetic Select Agent genome or not. A sequence-based classification system could provide a high degree of clarity—for investigators, biohobbyists, synthesis companies, and law-enforcement officials—about what DNA sequences are subject to the Select Agent Regulations and which ones are not. The current boundaries are unclear, and this does not seem appropriate for high-consequence regulations like the Select Agent Regulations.

(9.) Sequence-based classification could also be used to define sequences of concern that are not themselves Select Agents, but that may nonetheless constitute a threat.

One might argue that a disadvantage of bright line classification of Select Agent genomes is that the “bad actor” knows where the line is, too, and so can try to skirt it. What happens if the bad actor orders a Select Agent genome in pieces from different companies, or introduces just enough changes into a synthetic genome to evade Select Agent classification, or creates an entirely unexpected chimera from non-Select Agent parts?

One answer to that concern is that the Select Agent Regulations can make acquisition of Select Agents only more difficult, not impossible. It is already the case that Select Agents can be collected from the wild rather than obtained from a registered laboratory. A classification system could be designed to recognize the most plausible modified genomes and even the most obvious chimeric genomes that, according to the current state of the art, would (a) have some possibility of encoding an agent with Select Agent properties and (b) have little possibility of encoding an agent that should not be considered an Select Agent. The Select Agent list and the associated classification system would be updated as the state of the art advanced. Of course, a person with nefarious intent might be able to do better than the current state of the art in the scientific community, but this ought to be unlikely.

A second answer to the concern is that it is not and should not be the purpose of the Select Agent Regulations to regulate novel agents, any more than it is the purpose of the Select Agent Regulations to regulate access to novel emerging diseases in nature. To prohibit possession and transfer of de novo agents at the point of their synthesis would require the kind of forward-looking predictive system that we find infeasible. Rather, the Select Agent Regulations implement a necessarily backward-looking system based on a taxonomy of known Select Agents—already known from experience to be extraordinarily dangerous. If a new agent is found to be extraordinarily dangerous, it can be added to the Select Agent list, whether it is a naturally emerging pathogen or a synthetic. Initially, that may sound like closing the barn door after the horse is gone if we imagine a sophisticated bioterrorist engaged in designing novel agents; but it seems far more plausible that a novel agent would be discovered first as a newly emerging disease in nature or by accident in the course of legitimate biotechnology research.

The committee has a third answer. It is useful to identify suspicious sequences of concern that might be parts of a Select Agent or a bioweapon threat, even if they do not make up a complete genome subject to the Select Agents Regulations. As long as the response to a sequence of concern is flexible and does not immediately trigger regulatory or law-enforcement intervention, this can be a gray area, not a bright line. A sequence-based classification system would inherently organize and condense the current state of knowledge about the genomic composition of dangerous pathogens. The same system could be used to identify partial genomes and suspicious parts in the gray area, triggering common-sense follow-up. For example, a DNA-synthesis company might contact a customer to be sure that the customer is legitimate, and the customer knows that what is being ordered might be considered dangerous. A sequence-based classification system could help to make the identification of sequences of concern more systematic and consistent in the synthetic genomics community. Our committee referred to this as a yellow flag system (Figure 4.1, right side).

FIGURE 4.1. Concept of sequence-based classification and yellow flag systems, including differences and interactions between biosecurity and biosafety components (see also Appendix L).


Concept of sequence-based classification and yellow flag systems, including differences and interactions between biosecurity and biosafety components (see also Appendix L). Black lines indicate information flow; yellow lines represent decision making. (more...)

(10.) As predictive ability develops in biology, it will be more appropriate to use it in the context of probabilistic risk assessment (such as the yellow flag system), not in rigid classification of Select Agent properties.

The ability to predict biological properties from genome sequence will come gradually in a long series of steps of refinement and slowly increasing accuracy. For all the reasons described in Chapter 2, it is not reasonable to expect predictive technology to reach the accuracy necessary for defining Select Agents. However, we found it natural to think of the slowly increasing accuracy of predictive methods in the context of probabilistic risk assessment, in which the uncertainty of a prediction can be weighted appropriately. Advances in predictive technology might gradually become a counterpart of a “yellow flag” warning and biosafety framework that was initially based only on sequence-based classification. (As noted throughout this report, the classification and “yellow flag” system are presented as proposals for consideration; they should not be read as recommendations.)

The Yellow Flag System

The yellow flag system would have two primary goals: (1) to make it harder for bad actors to obtain pathogens as weapons or as tools for bioterror without detection and (2) to avoid the accidental, inadvertent, or ill-advised production of hazardous constructs by well-meaning investigators.

The yellow flag system would comprise four main elements: a centralized biosafety sequence database, annotation of the sequences as empirical evidence of the function of the genes encoded by the sequences is acquired, a process for review and assessment of the evidence to determine the disposition of the sequence of concern, and a yellow flag of the sequences that are deemed “of concern” (see Fig 4.1, right side).

There are many avenues by which a sequence might be deposited in the database and given a yellow flag, including but not restricted to the following: a researcher may observe that the gene product increases pathogenicity, the sequence may be derived from a known Select Agent and is in a region known to be critical for causing disease, or the disease-causing characteristic is eliminated when the sequence is deleted from a known pathogen. Movement of a sequence into the database can be dynamic because the system is not regulatory and a yellow flag does not restrict access to the sequence. This database system is intended to serve as a resource for information sharing.

Once a sequence is deposited in the biosafety database, it serves as a reference for anyone carrying out relevant investigations and for gene synthesis companies that would be able to compare their orders with entries in the database, screening for yellow flags. If a match occurs, the company would have a basis for notifying the purchaser of the possible concern and would request that any research results that support or refute the cause for concern be contributed to the annotations associated with the sequence in question. Similarly, other researchers carrying out experiments involving analysis of the function of yellow flag sequences would also be encouraged to provide follow-up information or references.

Scientific workgroups would be charged to analyze the annotations and make determinations as to whether the degree of concern is sufficient to merit consideration as a Select Agent, needs further study, or should be cleared of the yellow flag. A sequence may be removed from the database system entirely, although it is reasonable to retain the information in the database and indicate that the sequence has been examined and cleared. The database system would probably grow to include a variety of biosafety information, and only a subset of the sequences in the database would have yellow flags. It is important that, like the Select Agent list, the yellow flag system be fluid; sequences should be examined and yellow flags removed when they are unwarranted. The authority and resources necessary to make the process work should be provided centrally as a function supporting both biosafety and biosecurity.

We envision actions taken in response to a yellow flag as informal, prudent best practices, in that they fall outside the strict regulatory boundaries of the Select Agent Regulations. However, it would also be possible to use a yellow flag system in more formal ways. For example, an IBC or funding sponsor could ask that yellow-flagged synthetic constructs trigger special notification for purposes of oversight to track what laboratories were in possession of yellow-flagged constructs. Similarly, DNA synthesis companies might be asked to maintain records of yellow-flagged constructs that they provide to customers to facilitate forensic investigation in the event of criminal construction of a complete Select Agent from synthetic parts. Finally, a centralized system for reporting orders of yellow flag sequences could be developed to allow detection of the simplest scheme for avoiding the Select Agent Regulations—splitting the order for a viral genome or a toxin between two different gene synthesis companies.

A yellow flag biosafety system as described here would complement the Select Agent Regulations by providing oversight that is broad and flexible. It would identify sequences that potentially pose a risk without diverting attention from recognized threats or imposing restrictions and adding burden to the scientific community.


The committee’s analysis of sequence-based classification in Chapter 3 stems from a broad interpretation of its charge. However, it is the only positive and constructive response that the committee identified to address the challenges that synthetic genomics and synthetic biology pose to the Select Agent Regulations. The primary direction we were asked to consider, prediction of biological properties from sequences, is not feasible now and probably will not be in the foreseeable future. The sequence-based classification discussed in Chapter 3 is technologically feasible and may improve the current system. However, such a system has limitations and potential adverse consequences.11 Therefore, we do not specifically recommend that it be implemented. Rather, we offer the two following recommendations:

  • The sequence space around each discrete taxonomic name on the Select Agent list should be clearly defined, so that Select Agent status can be unambiguously determined from a genome sequence (for example, by a DNA synthesis company).
    The sequence space should be broad enough to include the plausible modifications and chimeras that experts reasonably believe will probably also act as Select Agents, without encompassing existing non-Select Agents.
  • A sequence-based classification system could address this problem, and should be considered and weighed against the cost and complexity of implementing this technological augmentation to the current Select Agent Regulations.

Specific milestones or research areas that would aid in implementing a sequence-based classification system are presented below. (Appendix L presents additional near-term milestones for consideration.)

  1. A sequence database with a Select Agent focus: The computational sequence analysis technologies used for sequence-based classification define sequence spaces that circumscribe the known variation of sequences that are considered to belong to a useful name while excluding the known variation of sequences that are considered to be attached to different names (see Figure 3.1). A necessary precondition is to have a number of representative sequences that belong to the desired classification and a number of the most closely related sequences that do not belong. It is not sufficient to know a single representative genome sequence of each Select Agent. The more sequences that are known, the better the expected genetic variation will be understood. To provide a sound foundation for sequence-based classification of existing Select Agents, a comprehensive sequence database should be created that thoroughly covers naturally occurring genetic variation based on geographic distribution, ecological or laboratory adaptations, and those associated with clinical severity or attenuation. The database should include not only Select Agent sequences, but also a representative set of near-neighbors for each Select Agent.
  2. An expanding sequence database of all biology: There are massive gaps in our knowledge of the genetic characteristics of much of the biological world. Genome and metagenome sequencing is rapidly closing some of the gaps in some groups of organisms but not others. For example, it would be useful to know much more about viral and microbial biodiversity in nature. Many new emerging pathogens (such as Nipha, SARS-CoV, and H5N1) were animal pathogens that suddenly jumped the species barrier; more sequence coverage of the viral and bacterial phylogenetic landscapes encoded in animal reservoirs would help in anticipating, monitoring, and responding quickly to future threats. Such a sequence database could be used to help to identify sequences of concern that may be appropriate to monitor in the yellow flag system in the interests of biosafety and biosecurity.
  3. Define the Criteria for Select Agent Designation: The criteria for designating a pathogen as a Select Agent should be reviewed and clearly defined to allow unambiguous implementation of the Select Agent Regulations. The Select Agent Regulations are based in law and backed up by serious penalties. However, the criteria for designation of a pathogen as a Select Agent are not well established and include characteristics that are independent of biological or genomic characteristics. It is not always evident to the regulated community why particular agents are included on the list. Each agent that is designated as a Select Agent should have a readily justifiable reason for such designation. The criteria for Select Agent designation should be made clear and should focus on biosecurity concerns. Agents that do not meet the criteria (whether biological and non-biological) should not be added to the list. The committee recognizes that the reason for placement on the Select Agent list may involve classified information. However, even such non-biological considerations should be based on clear criteria and informed by scientific data. For instance, in some cases, it appears that past experimentation with an agent for purposes of warfare or terrorism has resulted in de facto inclusion on the Select Agents list. If experiments led to the conclusion that the agent is unstable, difficult to make, or poorly transmissible, then the agent might not pose a threat worthy of Select Agent designation. Furthermore, because the level of threat posed by a microorganism or toxin may change over time, (for example, countermeasures may become available or the agent may be endemic), each Select Agent and Toxin should be reevaluated regularly to ensure that it meets the criteria for Select Agent designation, and is not diverting attention from more important threats.
    The committee concurs with other groups that the current system would be improved if each agent were assessed on the basis of clear criteria. Moreover, it will be difficult to create any clear and effective sequence based system, whether classification based or prediction based,, if the criteria and purpose of the Select Agent list remain unclear.
  4. Stratification of the Select Agent list: The existing list of Select Agents and toxins should be reviewed on the basis of clear criteria with the goal of prioritizing the Select Agent list on the basis of risk. Mechanisms for timely inclusion and removal of an agent or toxin from the Select Agent list are necessary for a robust oversight system. Several recent advisory panels have recommended stratification or reduction of the Select Agent list, and we are in agreement with their recommendations.12,13 As stated in the 2009 National Research Council report, “a list of more than 80 agents of varying risks dilutes attention from those that pose the greatest degree of concern, which may, in the process, render the nation less secure. It would be more effective to focus the highest scrutiny on those agents that are, indeed, of greatest concern . . . (NRC 2009a)” A gene sequence based classification system is certainly an example of this situation.14 Classifying the current 82 Select Agents would require 82 parts lists and several thousand profiles for the parts, and, as mentioned, each Select Agent classification would need to be carefully tested and maintained. A classification system would require a small team of full-time staff to develop and maintain. Sequence curation would require substantial work. Prioritizing the Select Agent list on the basis of risk would make any sequence-based approach to oversight more feasible


The use of the term milestones may be somewhat misleading here, in as much as the research described is ongoing, and will evolve in a continuous and interrelated way. A robust oversight system will have to be able to evolve as well, with continuing integration of scientific advancements. The milestones toward developing the knowledge and capabilities needed to enable a predictive oversight system (or to enhance a classification system) are shared among all fields of biology. It is a major goal of all biology to understand how DNA sequence determines the properties of biological systems, ranging upwards in complexity from single macromolecules to pathways, organisms, populations, and ecosystems. We are far from that goal. Successes in prediction and design at each level of complexity in biology as a whole are the relevant achievements to watch for, before we will be able to predict confidently from genome sequence analysis how a designed organism would replicate, interact with a host, evade a host immune system, and spread in a population to cause disease.

The goal of a predictive oversight system is so far out in front of current biological understanding that it would be unwise to attempt to address it in detail. Instead, we offer the following general milestones:

  • Ability to predict accurately the function of individual proteins from genome sequence sequence, including what ligands or macromolecules they bind to, what reactions they catalyze, where they localize, and what the kinetic rate constants for these processes are.
  • Ability to predict accurately from genome sequence the output of biochemical, regulatory, and genetic pathways (modules) of several proteins acting together.
  • Ability to predict accurately the behavior of a whole organism from its genome sequence.
  • Ability to predict accurately from their genome sequences the interactions of organisms in their natural environment from their genome sequences, such as microbe-host symbioses or host-pathogen interactions.

Those very general goals are already shared by all the biomedical sciences for advancing understanding of all biological systems. They are not peculiar to Select Agents, or even to infectious disease.

Although specific milestones for prediction of Select Agents are far beyond current scientific insight, the committee is able to identify promising research areas and technologies that would improve the ability to predict gene function, enhance understanding of infectious disease, and consequently strengthen biosecurity.15 What follows is not intended to be an exhaustive list; research findings in fields not described could well provide important advances in our understanding of genotype-to-phenotype prediction.

The committee recommends supporting these research efforts and technological developments, with the understanding that predicting function from sequence is a major biological goal. Progress in these efforts could be applied to strengthen a gene sequence-based oversight system as it evolves, but, the value of the research extends far beyond its potential contribution to biosecurity.

  1. Protein structure and function: There are important gaps in our understanding of the relationships between nucleic acid sequence and protein structure, and between protein structure and gene function. Developing a better understanding of the relationships between nucleic acid sequence, protein structure, and gene function will be critical for improving our knowledge base.
  2. Gene expression and regulation: Gene function may be multi-factorial—based on interactions with other genes, physiological conditions, and other regulatory events. Developing a better understanding of factors that regulate gene functions is needed. If an organism has specialized gene products for its virulence, it must be able to use them when they are needed but not squander its metabolic energy in producing them aimlessly or risk having them detected by host defenses and prematurely neutralized. Consequently, regulating the expression of virulence factors is an additional, essential complication of a pathogenic microorganism’s life. The number of well-characterized virulence regulatory systems is increasing rapidly, in part because of the development of rapid methods for screening gene expression on a genome-wide basis (for example, with DNA microarrays). At the same time, relatively little is known about both the specific environmental signals to which the systems respond and the exact role of the responses in the course of human infection.
  3. Pathogenic mechanisms: The molecular basis of the pathogenic characteristics of currently designated Select Agents is, in general, poorly understood, may be multigenic, may in some cases be greatly influenced by one or a few single-nucleotide polymorphisms, and may be regulated by mechanisms that are not well defined. The molecular basis of novel pathogens or human-made organisms with pathogenic potential is also not established. To inform a gene sequence-based classification system, improve our biodefense capabilities, and, most important, combat infectious disease and improve public health, a better understanding of the molecular basis of virulence should be developed. Pathogenesis due to an existing Select Agent or a novel pathogen is often host-specific, but there is little information to explain the contrast between pathogenesis in a receptive host species and the absence of pathogenesis in a species (or individual) not affected by the pathogen. Any determination of the molecular basis of the pathogenic characteristics of a microorganism must include consideration of its host and the host response. Developing a comprehensive understanding of the pathogen-host interactions that result in the creation of a disease state would be an important achievement.
  4. Animal models of disease: For many Select Agents, there are no surrogate experimental hosts for characterizing virulence; the only suitable host for a human pathogen may be humans. An adequate understanding of the function of a gene or cluster of genes cannot be obtained through computational modeling alone; to ensure confidence in results, it is essential in determining virulence characteristics should include experimental validation of function in an appropriate model system. Further development of genetically characterized animal models of various species, including non-human primates, is an important objective. For instance, current efforts to create the “Collaborative Cross” and related genetically well-defined and well-characterized mice will provide a valuable new tool to assist in the understanding of host-pathogen interactions. Novel model systems that more closely replicate human disease processes—such as humanized mice, in vitro models of human organ systems, and complete in silico models that recapitulate human physiological processes at a molecular level—are needed.
  5. Data and information management for Systems Biology: A dynamic, sequence-based program will require creation of massive new and well-integrated databases to manage greatly expanded sequence information on orders and families of organisms yet to be examined; enumeration of protein-fold families; host pathways; protein structural determinations, including posttranslational modifications; the genetic basis of virulence and immune response from the perspective of the host and the pathogen at both the pathway interaction and more detailed 3-D structural interaction levels; and vastly improved software capabilities to use the databases to predict 3-D structural effects of nucleic acid variations and host interactions accurately, especially in relation to pathogenic effects.
  6. Synthetic Biology: Synthetic biology approaches biology from an engineering perspective; it is aimed at solving a problem, creating tools, and designing or improving a system. All existing and reasonably foreseeable uses of synthetic biology involve modification or rearrangement of existing biological components. For instance, a precursor of the antimalarial compound artemisinin is being produced in E. coli, and other microorganisms are being designed to address biofuel production. The design of such pathways and chimeras is no easy task, and the entirely de novo design of genomes and organisms remains science fiction. That is due largely to the difficulties in predicting function from sequences, as described in Chapter 2; biological context is key to gene or protein function. As discussed in the recent Nature News feature “Five Hard Truths for Synthetic Biology,” the developing field of synthetic biology faces several important challenges.16 They are centered around translating biological complexity into simple tools and standardized parts that behave in a predictable ways. The committee is in agreement with the National Science Advisory Board for Biosecurity, which has stated that “synthetic biology is a rapidly evolving field, and, given its potential benefits to public health and national and economic security, research in these disciplines should be encouraged and maintained.”
  7. Metagenomics (phylogenomics): Environmental metagenomic sequencing of soils, seawater, and other complex samples consistently yields a high percentage of proteins of unknown function. It is clear that many natural offensive and defensive mechanisms that may have relevance to furthering our understanding of human pathogenesis await discovery. The advent of short-read sequencing technologies is making deep studies of complex environmental samples possible. The flood of data resulting from such studies is illustrating the need for better computational tools and infrastructure to manage, analyze, and correlate staggering amounts of information. Such efforts should be strongly supported inasmuch as unexpected discoveries from unknown organisms may prove to yield more advances than incremental hypotheses related to known organisms.
  8. Microbiome: Although possibly germ-free (gnotobiotic) before birth, humans develop a resident microbiota shortly after birth. The human microbiome is the subject of intensive study, including the major international Human Microbiome Project (HMP). Because of advances in DNA sequencing technologies and improvements in bioinformatics, it has become possible to characterize the great diversity in the human microbiota. In 2007, the National Institutes of Health launched the Human Microbiome Project (HMP) as one of its major roadmap initiatives. This major scientific endeavor has the following aims:
    • Determining whether individuals share a core human microbiome.
    • Understanding whether changes in the human microbiome can be correlated with changes in human health.
    • Developing the technological tools to support these goals.
    • Addressing the ethical, legal, and social complications raised by human microbiome research.

The Human Microbiome Project will add an enormous amount of additional microbial sequence to the already burgeoning database. That will be invaluable as we continue to sort out the sequences that have real predictive value instead of being merely suggestive because of some degree of relative homology with a putative virulence factor of a pathogen and especially of a Select Agent.


The milestones and focus areas listed above aim either to expand the general frontiers of biological knowledge, or to apply existing knowledge to the Select Agent Regulations. Our committee was deeply uncomfortable with research programs that would seek to expand knowledge solely for the purposes of improving the Select Agent Regulations.

Developing the ability to predict Select Agent pathogenicity from genome sequence raises serious dual-use concerns because prediction and design go hand in hand. Accurate computational prediction of Select Agents from genome sequences enables computational design and optimization of bioweapon genome sequences. Two major goals of biology are to predict phenotype from genotype and to improve public health by understanding pathogenicity. It does not seem wise to make special plans for an effort in predicting the characteristics of Select Agents, in advance of other important frontiers of biological knowledge.

It is more prudent to base the Select Agent Regulations on the current state of biological knowledge, as an applied problem, not a basic research problem. Predictive successes in the general biology research community should be passively monitored. Once biology in general approached the goal of determining pathogenicity from sequence, it would be appropriate to consider a predictive oversight system to identify Select Agent properties accurately from a novel genome sequence. That time may not come for decades, and it may be more than a century away.

And in the meantime? The technology and knowledge base for sequence-based classification exist now, as we described in Chapter 3. Even a classification system can present dual-use issues, in that for the system to be usefully implemented, the information must be shared. Listing the parts of a Select Agent and identifying other sequences of concern entirely on the basis of their potential to be dangerous when incorporated into a synthetic construct disseminates knowledge that theoretically could facilitate the design of a synthetic pathogen by a bad actor. However, inasmuch as this knowledge would be based on the current published state of the art (and pathogen sequences that are already widely available in GenBank), any additional dual-use concerns are not nearly as grave.

The Select Agent Regulations strive to balance a need for regulating access to the most dangerous pathogens with minimizing regulatory burdens on basic biological research aimed at monitoring, understanding, treating, and preventing disease. If the Select Agent Regulations are too burdensome, they may diminish long-term safety. Our report stops short of recommending the implementation of any specific sequence-based system for defining Select Agents; it was not our charge, and we were not properly constituted to estimate the costs, benefits, or risks associated with any specific implementation. We do find that the sequence-based classification system and yellow flag system of Chapter 3 are technologically feasible, but we have not carefully examined their costs or their effects on basic research or national security (see Appendix L). We have made no argument that the favorable aspects of using such systems to clarify a sequence-based definition of the discrete taxonomic names on the Select Agent list would outweigh any adverse aspects of creating additional layers of complexity in the regulatory framework. Rather, our principal finding is that sequence-based prediction of Select Agent properties is not feasible and is unlikely to be feasible in the foreseeable future; any research effort dedicated solely to this purpose is likely to have only adverse consequences.



This section draws on discussion in the 2009 National Research Council report “Responsible Research with Biological Select Agents and Toxins.”


The 2009 National Research Council report states:”[i]t should be noted that the use of the term “biosecurity” presents a number of difficulties. At its most basic, the term does not exist in some languages, or is identical with “biosafety”; French, German, Russian, and Chinese are all examples of this immediate practical problem. Even more serious, the term is already used to refer to several other major international issues. For example, to many “biosecurity” refers to the obligations undertaken by states adhering to the Convention on Biodiversity and particularly the Cartagena Protocol on Biosafety, which is intended to protect biological diversity from the potential risks posed by living modified organisms resulting from modern biotechnology. (Further information on the Convention may be found at <http://www​> and on the Protocol at <http://www​>.) “Biosecurity” has also been narrowly applied to efforts to increase the security of dangerous pathogens, either in the laboratory or in dedicated collections; guidelines from both the World Health Organization (WHO 2004) and the Organization for Economic Cooperation and Development (OECD 2007) use this more restricted meaning of the term. In an agricultural context, the term refers to efforts to exclude the introduction of plant or animal pathogens. (See NRC 2009a:8–9 for a discussion of this and other issues related to terminology.) Earlier NRC reports (2004ab, 2006ab, 2009a) confine the use of “biosecurity” to policies and practices to reduce the risk that the knowledge, tools, and techniques resulting from research would be used for malevolent purposes.”


The Recombinant DNA Advisory Committee (RAC) assisted the NIH in the development of the NIH Guidelines for Research Involving Recombinant DNA Molecules, which has become the standard of safe scientific practice in the use of recombinant DNA. Institutional Biosafety Committees (IBCs) which are mandated by the NIH Guidelines, are charged with reviewing research involving recombinant DNA, although many IBCs have chosen to review other forms of research that involve potential biohazards—including research involving Biological Select Agent and Toxins (BSATs). Institutions are required to register their IBCs with NIH’s Office of Biotechnology Activities.


Microbial forensics plays an important role in attribution efforts. Microbial forensics, also called bioforensics, is a relatively new scientific discipline that draws from other disciplines including genomics, microbiology and plant pathology. Microbial forensics is dedicated to analyzing microbial activity as evidence for attribution purposes and backtracking. Microbial forensics procedures support ‘decision taking’ at biosecurity levels, follows strict chain of custody of specimens and demands a rigorous (accredited) and unbiased performance. Therefore, microbial forensics includes the complete range of forensic evidence analysis from microorganisms to associated evidence materials found at the site of a suspected outbreak or crime scene.


The term chain of custody can be used on several different scales of resolution. On the grossest scale, it is sometimes used to describe the provenance of a physical sample of an organism. For example, after the 2001 anthrax letter attacks, a large effort was expended to attempt to trace the provenance of all the Ames strains at laboratories in the United States. Lack of comprehensive recordkeeping before the Select Agent rules made that a difficult and imprecise process at best. On a medium scale, chain of custody as applied to a single laboratory now is used to mean the strict recordkeeping that allows knowledge of which staff had access to the locked laboratories and locked freezers or cabinets where the Select Agent organisms are stored and knowledge of all dispositions of materials that have taken place within the laboratory. On the finest scale of resolution, chain of custody has a specific meaning related to evidence handling for a potential court case: written records of each person and each procedure followed and sealed evidence bags documenting all the physical containers that held the Select Agent material in the different process steps (tubes, plates, and so on). In this report we refer mainly to the two larger-scale resolutions of chain of custody.


For example, one microorganism may be highly virulent, but poorly transmissible from person to person, whereas another may spread easily but produce only mild illness.


As described in Chapter 1, there are many ways to categorize microorganisms according to risk: Biosafety in Microbiological and Biomedical Laboratories biosafety levels 1–4; National Institutes of Health guidelines risk groups 1–4; Centers for Disease Control bioterrorism agents categories A, B, and C; and the Department of Homeland Security Bioterrorism Risk Threat Assessment and the Select Agents list, which currently is not stratified according to risk.


Consider, the enormous number of gene sequences at play and which must be choreographed as a microorganism leaves the salivary gland of a biting insect and is injected into the human tissues.


It is important to note that identifying hazardous pathogens or experiments is not the same as distinguishing experiments that are legitimate from those that are illegitimate. Legitimate research aimed at understanding pathogenicity and treating infectious disease often requires work with biological hazards.


Near-term is used here to indicate that the milestones are not dependent upon future technological advances. The technical capabilities and biological knowledge needed to achieve them are available now. Several of these milestones would improve and evolve but they could be started now, and substantial progress could be made within 5 years.


Including dual-use concerns, as discussed below. (See also Appendix L.)


”The list should be either reduced or stratified so that biosecurity measures can be more easily applied by the registered entities according to the level of risk” and “Perform a risk assessment for each select agent and toxin on the BSAT list and develop a stratification scheme that includes biodefense and biosecurity criteria, as well as risk to public health, so that security measures may be implemented based upon risk” (NSABB 2009b). Report of the Working Group on Strengthening the Biosecurity of the United States.


”RECOMMENDATION 3: The list of select agents and toxins should be stratified in risk groups according to the potential use of the agent as a biothreat agent, with regulatory requirements and procedures calibrated against such stratification. Importantly, mechanisms for timely inclusion or removal of an agent or toxin from the list are necessary and should be developed (NRC 2009b).”


The committee wants to be clear that implementation of a classification system is not a reason to subtract specific agents from or add specific agents to the Select Agent list. Rather, implementation of a sequence based system is a benefit of reducing the list.


This is consistent with the National Strategy for countering Biological Threats “The objectives of our Strategy [include] . . . Promote global health security: Activities that should be taken to increase the availability of and access to knowledge and products of the life sciences that can help reduce impacts of outbreaks of infectious disease whether of natural, accidental, or deliberate origin.” NSC (2009). National strategy for countering biological threats. Washington, D.C., National Security Council.


”Many of the parts are undefined”; “The circuitry is unpredictable”; “The complexity is unwieldy”; “Many parts are incompatible”; and “Variability crashes the system” (Kwok 2010).

Image ch3f1
Copyright © 2010, National Academy of Sciences.
Bookshelf ID: NBK50870


  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (2.7M)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...