Example of 5S rRNA illustrating the flowchart of . The 12 methods evaluated against each other on groups of five 5S rRNA sequences. Starting with 20 calculations (100 sequences), an additional group is benchmarked and statistically tested until the *P* value is smaller than *α*, power is greater than 1 − *β*, or the total 239 groups of sequences ran out. (**A**) The final conclusions. Red: null hypothesis rejected; green: not rejected. (**B**) The final powers between any two methods. These are expressed as rounded percentages. If the null hypothesis is not rejected but the power is larger than 0.8, then the null hypothesis can be accepted; otherwise, it is inconclusive. (**C**) The number of groups needed to test the hypothesis or the total available groups for those inconclusive comparisons (239 maximum groups of five sequences). In each panel, the upper triangle applies to PPV and the lower triangle to sensitivity. For reference, the average sensitivity and PPV for each program is provided in Supplementary Table S2.

