UniFrac analysis of short clipped sequences simulating 454 reads, using data from all three sequence sets (human, mouse, Guerrero Negro). (a) Diagram showing clipped reads of 100, 150, 200 and 250 bases starting with each of the forward and reverse primers. Note that F1099 + 250 is not available because it exceeds the end of the near-full-length sequences we used for the analysis. (b) Correlation in UniFrac distances between jackknifed data sets and full data sets (ranging from 0, no correlation, to 1, perfect correlation). Size of bubble reflects average strength of correlation. Note that the y-axis on this plot ranges from 0.88 to 1, so all the correlations are very strong. The x-axis shows fraction of sequences retained in the jackknifing. Box plots show quartiles, medians, 95% quantiles and outliers for n = 100 jackknife replicates. (c) Cluster recovery using the same jackknifed data as (b). Note that cluster recovery is always much lower and more variable than distance recovery, indicating that many of the details of the clustering are not supported by jackknifing. (d) Cluster recovery from each primer for each read length. Best primer at each read length is shown in green; worst is shown in red. Number inside each bubble indicates the cluster recovery (size of each bubble is also proportional to cluster recovery, same scale as (b) above. (e) UniFrac PCoA clustering of the full-length sequences (legend key: hmn = human, A, B and C are three separate individuals (12); mus = mouse, M1, M2 and M3 are the three different mothers and their offspring; GN = Guerrero Negro, 10 samples are 10 different sediment layers from shallowest to deepest). (f) UniFrac PCoA clustering of an example of good cluster recovery, F517 with 200-base reads. Note that the clustering is almost identical to that of the full-length sequences, with a slight rotation of the coordinate axes, and the relative ordering of points within each cluster is preserved. (g) UniFrac PCoA clustering of an example of poor cluster recovery, R1114 with 200-base reads. The human samples are apparently split into two separate groups, suggesting the wrong biological conclusion.