Microbiota-based classification of infertility due to endometriosis. a, b, c, d, e Distribution of 5 trials of 10-fold cross-validation error in random forest classification of fertile versus infertile samples as the number of OTUs increases (a, CL; b, CU; c, CV; d, ET; e, PF). The model was trained using relative abundance of the OTUs (present in at least 10% of the samples) in the samples (n = 16 without endometriosis (fertile), 32 with endometriosis (infertile)). Subjects with endometriosis who had given birth, and infertile subjects due to reasons other than endometriosis, e.g. salpingemphraxis, were not analysed (Supplementary Data ). The red curve indicates an average of the five trials (pink lines). The grey line marks the number of OTUs in the optimal set. (f, g, h, i, j) Receiver operating curve (ROC) for the cross-validated sample set (f, CL; g, CU; h, CV; i, ET; j, PF). The area under receiver operating curve (AUC) is 0.8272, 0.5919, 0.8493, 0.8304 and 0.8613, respectively. The 95% confidence intervals (CI) are shown as shaded areas. The diagonal lines mark an AUC of 0.5 (i.e. random classification).