Send to

Choose Destination
Biol Direct. 2018 May 22;13(1):10. doi: 10.1186/s13062-018-0215-8.

Unraveling bacterial fingerprints of city subways from microbiome 16S gene profiles.

Author information

Department of Biostatistics, University of Florida, 2004 Mowry Rd, Gainesville, FL, 32610, USA.
Department of Biostatistics, University of Florida, 2004 Mowry Rd, Gainesville, FL, 32610, USA.



Microbial communities can be location specific, and the abundance of species within locations can influence our ability to determine whether a sample belongs to one city or another. As part of the 2017 CAMDA MetaSUB Inter-City Challenge, next generation sequencing (NGS) data was generated from swipe samples collected from subway stations in Boston, New York City hereafter New York, and Sacramento. DNA was extracted and Illumina sequenced. Sequencing data was provided for all cities as part of 2017 CAMDA contest challenge dataset.


Principal component analysis (PCA) showed clear clustering of the samples for the three cities, with a substantial proportion of the variance explained by the first three components. We ran two different classifiers and results were robust for error rate (< 6%) and accuracy (> 95%). The analysis of variance (ANOVA) demonstrated that overall, bacterial composition across the three cities is significantly different. A similar conclusion was reached using a novel bootstrap based test using diversity indices. Last but not least, a co-abundance association network analyses for the taxonomic levels "order", "family", and "genus" found different patterns of bacterial networks for the three cities.


Bacterial fingerprint can be useful to predict sample provenance. In this work prediction of provenance reported with over 95% accuracy. Association based network analysis, emphasized similarities between the closest cities sharing common bacterial composition. ANOVA showed different patterns of bacterial amongst cities, and these findings strongly suggest that bacterial signature across multiple cities are different. This work advocates a data analysis pipeline which could be followed in order to get biological insight from this data. However, the biological conclusions from this analysis is just an early indication out of a pilot microbiome data provided to us through CAMDA 2017 challenge and will be subject to change as we get more complete data sets in the near future. This microbiome data can have potential applications in forensics, ecology, and other sciences.


This article was reviewed by Klas Udekwu, Alexandra Graf, and Rafal Mostowy.


Bacterial 16S gene; Classifier; Machine learning; Microbiome; Network analysis; PCA

[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center