Both reference sequences (Ref) and query sequence (Que) are pooled together. First, a horizontally resampling over sites was performed to generate sequence matrices (R1Rn), such as R1. Second, a vertical random permutation over the matrices R1Rn was performed by randomly designating a sequence as the query sequence, the remaining sequences as references, with such a vertical permutation being repeated 100 times per R matrix. In each random permutation, the mean genetic distance between the randomly selected query sequence and the remaining reference sequences was calculated. Third, the null hypothesis that the reference sequences and query sequence resampled belong to the same species was tested (the null hypothesis is accepted if the observed genetic distance (GD) falls into the acceptance area of 95% given the simulated datasets, and is otherwise rejected). Fourth, the number of cases where the null hypothesis was accepted over all 100 replications of horizontally resampling is counted, and is the TDR measure defined in this study (see text for details).

## PubMed Commons