Logo of jcmPermissionsJournals.ASM.orgJournalJCM ArticleJournal InfoAuthorsReviewers
J Clin Microbiol. Nov 2001; 39(11): 4190–4192.

Determining Confidence Intervals When Measuring Genetic Diversity and the Discriminatory Abilities of Typing Methods for Microorganisms


We describe here a method for determining confidence intervals for a commonly used index of diversity. This approach facilitates the comparison of the genetic population structure of microorganisms isolated from different environments and improves the objective assessment of the discriminatory power of typing techniques.

The discrimination of organisms on the basis of variable phenotypic or genetic markers is still the mainstay of quantitative microbial ecology and descriptive epidemiology. To determine the diversity of microorganisms in defined environments (ecosystems) or to identify the reproductive success of disease causing organisms, i.e., the spread of particular strains between hosts, genetic typing techniques are deployed which have the ability to distinguish diverse organisms of the same species. Importantly, when one is comparing the diversity of a single species between different ecosystems or comparing the various typing methods used to resolve such differences, a robust statistical approach is required that allows an objective assessment. To this end, indices of diversity have been defined mathematically that are based on the frequency with which organisms of a particular type occur in a population or can be discriminated by a given typing tool (3, 4, 5). Individuals of a population will belong to one of Z types and will occur with frequencies of π1 … πZ such that Σπ = 1. For microorganisms that usually have a very large population size, the genetic diversity (λ) can be described as λ = 1 – Σπ2, which will be the probability that two individuals chosen at random will be of a different type.

Inferences on the diversity of the population involve a sampling process. The index of diversity D, as defined by Simpson (5) and lately utilized for the assessment of the discriminatory power of typing techniques (2, 6), is an unbiased estimate of the true diversity λ of a population based on a sample of n individuals. Simply by chance, different samples will give different results, the difference being due to sample variation and by drawing repeated samples, the precision of the mean estimate for D will improve. If repeated samples of a fixed size n are drawn from the sample population, the values of D will be distributed about λ with the variance ς2 (5):

equation M1

where πj is the frequency nj/n, nj is the number of strains belonging to the jth type, and n is the total number of strains in the sample population. An estimate of the standard deviation of λ is given by the square root of ς2, and we propose the following as approximate 95% confidence interval (CI):

equation M2

We have applied these equations to determine confidence intervals (i) when assessing the genetic diversity of Staphylococcus aureus isolated from healthy carriers in the community as opposed to hospitalised patients and (ii) when comparing the discriminatory power of macrorestriction analysis by using SmaI restriction patterns with that of RAPD [random(ly) amplified polymorphic DNA] typing.

By using the same sampling frame, healthy individuals in the community and inpatients of the same age group who had stayed at the University Hospital in Nottingham for more than 3 weeks were sampled by use of swabs taken from the anterior nares. Carriage strains of S. aureus obtained from the community were genotyped by SmaI macrorestriction analysis, as well as by RAPD typing, by using two different primers. Carriage strains from hospitalized patients were typed by macrorestriction alone. Macrorestriction and RAPD analyses were performed by standard published protocols (7; Harmony [http: //www.phls.co.uk/International/Harmony/microtyping.htm]), and genotypes were classified according to conventional criteria, with more than two band differences defining separate genotypes or a cutoff value of 70% for Pearson correlation coefficients discriminating genotypes in the RAPD approach (1, 8).

Among the 117 carriage strains from the community, 57 types were distinguished by macrorestriction analysis. Of 117 carriage strains obtained from the hospital population 55 types were distinguished. The distribution and type frequencies are presented in Tables Tables11 and and2.2. The genetic diversity (D) of carriage strains in the community was 97.6%, with a CI of 96.8 to 98.5%, and for carriage strains from hospital patients (D) it equalled 89.5%, with a CI of 84.4 to 94.7% (the CI values did not overlap).

Frequency of PFGE types among community carriage strains of S. aureus
Frequency of PFGE types among hospital carriage strains of S. aureus

We also compared the discriminatory ability of macrorestriction analysis with RAPD typing for the sample of community carriage isolates. While macrorestriction analysis identified 57 SmaI restriction patterns, 26 types could be discriminated when RAPD results generated by different primers were combined (Table (Table3).3). The index of diversity based on the combined RAPD grouping was 89.9%, and the CI was 86.5 to 93.3%.

Frequency of RAPD types among community carriage strains of S. aureus

For nominal scale data, such as phenotypic or genetic markers, there is no mean or median that could serve as a reference for a central tendency or a measure of variability. Instead, we can invoke the concept of diversity to determine the distribution of observations among categories. We suggest use of the standard deviation for Simpson's index of diversity as a measure of dispersion around the true diversity, λ. Two times the standard deviation on either side of the measured value should roughly include 95% of all of the expected distribution of the sample mean and is thus an approximate measure for the confidence with which various diversity indices can be estimated. For highly diverse populations or typing techniques with extreme abilities to discriminate genotypes, the number of classes (genotypes) will increase with increasing sample size. Thus, the chance that two randomly sampled isolates differ will increase, i.e., the index of diversity becomes dependent on the sample size. Therefore, it is important to compare samples of roughly the same size as long as the number of classes depends on the sample size.

Using this algorithm, we concluded that the observed difference of the genetic diversity with nonoverlapping CIs among community and hospital carriage strains of S. aureus reflect two truly distinct population structures shaped by differential ecological constraints. Likewise, we have measured the discriminatory capacity of RAPD and macrorestriction analysis with sufficient precision and can be confident that the inability of RAPD typing to achieve the same degree of discrimination as macrorestriction analysis is an inherent property of these methods. We believe that an estimation of confidence intervals when calculating the index of diversity greatly aids the comparison of genetic diversity in different environments, as well as the ability to objectively address the discriminatory potential of diverse typing systems.


1. Grundmann, H. J., K. J. Towner, L. Dijkshoorn, P. Gerner-Smidt, M. Maher, H. Seifert, and M. Vaneechoutte. Multicenter study using standardized protocols and reagents for evaluation of reproducibility of PCR-based fingerprinting of Acinetobacter spp. J. Clin. Microbiol. 35:3071–3307. [PMC free article] [PubMed]
2. Hunter P R, Gaston M A. Numerical index of the discriminatory ability of typing systems: an application of Simpson's index of diversity. J Clin Microbiol. 1988;26:2465–2466. [PMC free article] [PubMed]
3. Li W H, Graur D. Genetic polymorphisms. In: Li W H, Graur D, editors. Fundamentals of molecular evolution. Sunderland, Mass: Sinauer Associates; 1991. pp. 35–40.
4. Shannon C E. A mathematical theory of communication. Bell Syst Tech J. 1948;27:376–423. ; 623–656.
5. Simpson E H. Measurement of diversity. Nature. 1949;163:688.
6. Struelens M J. the European Study Group on Epidemiological Markers of the European Society for Clinical Microbiology and Infectious Diseases. Consensus guidelines for appropriate use and evaluation of microbial epidemiologic typing systems. Clin Microbiol Infect. 1996;2:2–11. [PubMed]
7. Tambic A, Power E G, Anthony R M, French G L. Analysis of an outbreak of non-phage-typeable methicillin-resistant Staphylococcus aureus by using a randomly amplified polymorphic DNA assay. J Clin Microbiol. 1997;35:3092–3097. [PMC free article] [PubMed]
8. Tenover F C, Arbeit R D, Goehring R V, Mickelsen P A, Murray B E, Persing D H, Swaminathan B. Interpreting chromosomal DNA restriction patterns produced by pulsed-field gel electrophoresis: criteria for bacterial strain typing. J Clin Microbiol. 1995;33:2233–2239. [PMC free article] [PubMed]

Articles from Journal of Clinical Microbiology are provided here courtesy of American Society for Microbiology (ASM)
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...