Send to

Choose Destination
Environ Sci Technol. 2003 Oct 15;37(20):4554-60.

Statistical evaluation of bacterial source tracking data obtained by rep-PCR DNA fingerprinting of Escherichia coli.

Author information

Environmental Science & Engineering Division and Department of Mathematical & Computer Sciences, Colorado School of Mines, Golden, Colorado 80401, USA.


Pattern recognition has been applied to environmental systems for identification of numerous pollution sources including aerosolized lead and petroleum hydrocarbons. In recent years, DNA fingerprinting has gained widespread application as a means to characterize genetic variations for such purposes as microbial source tracking. This approach, however, is strongly dependent on the statistical and image analyses applied. Several statistical analyses of rep-PCR DNA fingerprints were assessed as a means to differentiate between potential sources of fecal contamination. GelCompar II and methods based on penalized discriminant analysis (PDA) and k-nearest neighbors (KNN) classification procedures were used to differentiate between 10 source groups within a library containing DNA fingerprints of 548 Escherichia coli isolates from known human and nonhuman sources. KNN performed significantly better than PDA in a jackknife analysis, though the library was not large enough to detect significant differences between GelCompar II and the other two methods. GelCompar II and KNN both attained > or = 90% correct classification in a holdout procedure. In addition, interpoint distance analyses indicate coherency within source groups, while library randomization demonstrated that KNN does not create artificial groupings. This investigation stresses the need to understand limitations of statistical analyses used in pattern recognition of DNA fingerprints.

[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for American Chemical Society
Loading ...
Support Center