Recovery of clusters is influenced by a measure's robustness to outlying basal lineages. The quantitative Bray-Curtis (**a**–**c**), qualitative Soergel (**d**–**f**), and qualitative Pearson dissimilarity (**g**–**i**) measures were applied to the human data set. (**a**, **d**, **g**) All three methods revealed three clusters: a stool cluster, an oral cluster and a mixed navel and hair cluster. The addition of an outlying basal lineage to half the samples did not substantially affect the Bray-Curtis (**b**: 5% of sequences assigned to the outlying lineage) or uSoergel (**e**) measures, but obscured the underlying biological clusters for the uPearson dissimilarity (**h**) measure. For qualitative measures, a single sequence is sufficient to include the outlying lineage and the addition of further sequences does not influence these measures. Each data point in the scatter plots (**c**, **f**, **i**) indicates the dissimilarity measured between a pair of samples before (*x*-axis) and after (*y*-axis) adding sequences to the outlying lineage. For all measures, the addition of the outlying lineage caused pairs of samples where both contained the outlying lineage to become more similar (outlier–outlier) and pairs of samples where only one sample contained the outlying lineage to become less similar (outlier–original). Pairs of samples where neither contained the outlying lineage were unaffected (original–original). However, the degree to which the outlier–outlier and outlier–original pairs were affected depended on the measure used. The Pearson's correlation coefficient, *r*, between dissimilarity values measured before and after addition of the outlying lineage is given in the upper-left corner of each scatter plot.

## PubMed Commons