Sampath R, Ecker DJ.

We describe a novel approach for surveillance of emerging infectious diseases that can be used for rapid and broad identification of infectious disease causative agents. The premise of our technology is that we can provide rapid, sensitive, and cost-effective detection of a broad range of “normal” pathogenic organisms and simultaneously also diagnose disease caused by a biological weapon or an unexpected emerging infectious organism. This broad-function technology may be the only practical way to rapidly diagnose diseases caused by a bioterrorist attack or emerging infectious diseases that otherwise might be missed or mistaken for a more common infection.

According to a recent review (Taylor et al., 2001), more than 1,400 organisms are infectious to humans. These numbers do not include numerous strain variants of each organism, bioengineered versions, or pathogens that infect plants or animals. Paradoxically, most of the new technology being developed for detection of infectious agents incorporates a version of quantitative PCR, which is based on the use of highly specific primers and probes designed to selectively detect specific pathogenic organisms. This approach requires assumptions about the type and strain of bacteria or virus. Experience has shown that it is very difficult to anticipate where the next emerging infectious agent might come from, as was the case with the outbreak of SARS early in 2003. An alternative to single-agent tests is to do broad-range consensus priming of a gene target conserved across groups of organisms (Kroes et al., 1999; Oberste et al., 2000, 2001, 2003). The drawback of this approach for unknown agent detection and epidemiology is that analysis of the PCR products requires the cloning and sequencing of hundreds to thousands of colonies per sample, which is impractical to perform rapidly or on a large number of samples. New approaches to the parallel detection of multiple infectious agents include multiplexed PCR methods (Brito et al., 2003; Fout et al., 2003) and microarray strategies (Wang et al., 2002, 2003; Wilson et al., 2002). Microarray strategies are promising because undiscovered organisms might be detected by hybridization to probes on the array that were designed to bind conserved regions of previously known families of bacteria and viruses.

Here we present an alternative, a universal pathogen-sensing approach for high-throughput detection of infectious organisms that is capable of identifying previously undiscovered organisms (see Figure 4-1).

FIGURE 4-1. Overview of the universal pathogen sensor for the detection of a diverse mixture of microbial organisms present in a sample.


Overview of the universal pathogen sensor for the detection of a diverse mixture of microbial organisms present in a sample. Genomic DNA, or cDNA obtained by batch reverse-transcription of RNA, from each sample are amplified using broad range PCR primers (more...)

Our strategy is based on the principle that, despite the enormous diversity of microbes, all forms of life on earth share sets of essential common features in the biomolecules encoded in their genomes. Bacteria, for example, have highly conserved sequences in a variety of locations on their genomes. Most notable is the universally conserved region of the ribosome, but there are also conserved elements in other noncoding RNAs, including RNAse P and the signal recognition particle, among others. There are also conserved motifs in essential protein-encoding genes, in bacteria as well as viruses. Use of such broad-range priming targets across the broadest possible grouping of organisms for PCR, followed by electrospray ionization mass spectrometry for accurate mass measurement, enables us to determine the base composition (numbers of A, G, C, and T nucleotides) of the PCR amplicons. The measured base compositions from strategically selected locations of the genome are used as a signature to identify and distinguish the organisms present in the original sample. An important feature of the primer design strategy used in our approach is the positioning of propynylated nucleotides (5-propynyl deoxy-cytidine and deoxy-thymidine) at highly conserved sequence positions that enables priming of short consensus regions and significantly increases the extent to which broad groups of organisms can be amplified (Barnes and Turner, 2001a,b; Wagner et al., 1993). Furthermore, we use multiple target sites spread across different parts of the genome to add further resolution and lower the risk of missed detections.

A key to the development of a practical broad priming technology is the ability to characterize signals produced by infectious organisms in the milieu of the background that might have an excess of harmless organisms. While cloning and exhaustively sequencing many colonies can solve this, this cannot be done in a rapid diagnostic device. Our strategic breakthrough was the use of mass spectrometry to analyze the products of broad-range PCR. Mass spectrometry is remarkably sensitive and can measure the weight and determine the base composition from small quantities of nucleic acids in a complex mixture with a throughput of about a sample per minute. The ability to detect and determine the base composition of a large number of PCR amplicons in a mixed sample enables analysis and identification of broad-range PCR products essentially instantaneously. In contrast to cloning and sequencing, the information product of the mass spectrometer is base composition. While the base composition of a gene fragment is not as information rich as the sequence, a base-composition signature can be thought of as a unique index of a specific gene in a specific organism. Our detection algorithm searches a database to link each sequence for a particular organism to a composition signature so that the presence of the organism can be inferred from the presence of the signature.

During the SARS epidemic outbreak in early 2003, we demonstrated that the above-described paradigm of identification of microbial nucleic acid signatures by mass spectrometry could be adapted to identify the SARS virus. In the absence of a SARS genome sequence at the onset of the epidemic, pairs of broad primers that were designed to broadly target all other known coronaviruses were used to test clinical isolates obtained from the Centers for Disease Control and Prevention (CDC). We showed that the SARS virus potentially could be identified directly from a patient sample, obviating the need for time-consuming viral culture. We further showed that this method could distinguish between SARS and other known coronaviruses, including the human coronaviruses 229E and OC43. While direct comparisons of sensitivity, using actual patient samples, have yet to be conducted between this and other methods employed to detect SARS, we did show, using titred SARS virus spiked into human serum, that we could obtain PCR sensitivities of <1 PFU, which is consistent with our previous experience. The details of the above study will be published elsewhere (Sampath et al., under preparation).

One of the limitations of our approach is that base compositions, like sequences, vary slightly from isolate to isolate within species. We have shown that it is possible to manage this diversity by building probability “clouds” around the composition constraints for each species. This permits identification of organisms in a fashion similar to sequence analysis, albeit with somewhat lower resolution. It is counterintuitive that base composition has sufficient resolving power to distinguish organisms (one might suspect that sequences from different organisms will degenerate to similar overlapping compositions). A rigorous mathematical analysis has shown, however, that base composition retains more than sufficient information to solve the problem, provided the target sequences are strategically selected. It is important to note that, in contrast to probe-based techniques, mass spectrometry determination of base composition does not require prior knowledge of the composition in order to make the measurement, only to interpret the results. In this regard, our strategy is like DNA sequencing and phylogenetic analysis, but at lower resolution. However, the resolution provided by this analysis is more than sufficient for most rapid diagnostic applications such as identification of any organism, or to classify organisms into known phylogenetic groupings (Sampath et al., under preparation).

We envision developing applications where human clinical samples can be analyzed for diagnostically relevant levels of disease-causing agents and biological weapons simultaneously. We envision that the technology will be used in reference labs, hospitals, and the laboratory response network (LRN) laboratories of the public health system in a coordinated fashion with the ability to report the results via a computer network to a common data-monitoring center in real time. Clonal propagation of specific infectious agents, as occurs in the epidemic outbreak of infectious disease, can be tracked with base composition signatures, analogous to the pulse field gel electrophoresis fingerprinting patterns used in tracking the spread of specific food pathogens in the CDC Pulse Net system (Swaminathan et al., 2001). Effectively, our technology provides a digital barcode in the form of a series of base composition signatures, the combination of which is unique for each organism. This capability enables real-time infectious disease monitoring across broad geographic locations, which may be essential in a simultaneous outbreak or attack in different cities.


This methodology described is being developed jointly by Ibis and Science Applications International Corporation (SAIC) under a Defense Advanced Research Projects Agency (DARPA) sponsored program known as TIGER. A detailed description of the technology will be published separately. More than 25 key participants who contributed significantly to the development and implementation of various aspects of the technology are not listed individually by name.