We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

## Results: 4

1.

Figure 2. Simulation Results.. From: Decision-Making in Research Tasks with Sequential Testing.

(A) Evolution of knowledge. The odds for the true hypothesis increase at the slowest rate for random test choice (SIM-R), at intermediate rate for the scenario where the most informative test is chosen and published in each round (SIM-1), and at the fastest rate for the scenario where two tests are chosen in each round and the most informative test result is published (SIM-2). This illustrates that informative test choice leads to better performance than random test choice (SIM-1>SIM-R), and that there is an advantage of performing two tests even if only one test can be published (SIM-2>SIM-1). (B) Fraction of false among the positive results. For random test choice, the fraction of false positives stays constant at a level of 0.26. For both scenarios with informative test choice (SIM-1 and SIM-2), the fraction of false among the positives declines over the rounds. (C) Fraction of false among the negative results. For random test choice, the fraction of false among the negative results remains constant at a level of 0.15. For SIM-1 the fraction of false negatives tends to increase over the rounds, while for SIM-2 the fraction fluctuates around the level for random test choice. (D) Frequency of tests that support the true hypothesis. For random test choice, the chance of picking a test that is expected to support the true hypothesis (i.e. AB and BC for sequence ABC) is 1/3, because each hypothesis is supported by two of the six tests. Over the rounds, tests that support the true hypothesis tend to be chosen preferentially in the scenarios with informative test choice. This leads to a decrease of false among the positive findings. For scenario SIM-1, where all tests are published, this implies that there is an increase in the fraction of false negatives as shown panel C. For SIM-2, where results can be selected for publication, accumulating knowledge can be used to avoid the publication of false findings. The grey line shows the probability for a false finding to be published in SIM-2. The chance for a false finding to be published declines over the rounds.

2.

(A) Performance (mean log odds for the true hypothesis after the last round, and standard error of the mean). Performance falls in between the performance for random test choice (SIM-R) and the simulated scenario with informative test choice (SIM-1). This indicates that informative tests tend to be chosen preferentially but not always. The performance in EXP-1S seems to be worse in EXP-1S* and EXP-1G. This implies that solving a task in the more complex group setting EXP-1G has no negative impact on performance. Moreover knowing the error rates seems not to be of advantage for problem solving in the experiment. (B) Fraction of false among the positive results. Data from all three simple settings are pooled for panels B–D. The dynamics of false positives follows the patterns expected from simulation SIM-1. Yet, it is less pronounced because the participants sometimes fail to select the most informative tests. (C) Fraction of false among the negative results. The pattern is as expected from simulation SIM-1. However, it is less pronounced because the participants sometimes fail to select the most informative test. (D) Frequency of those among the chosen tests that support the true hypotheses. Over the rounds, participants more often select those tests that correspond to the correct sequence. Thus false positives decrease while false negatives increase.

3.

Figure 1. Sample Simulations.. From: Decision-Making in Research Tasks with Sequential Testing.

(A) Example simulation for the simple scenario with informative test choice (SIM-1). The correct sequence is ABC. In the first round, all hypotheses have the same prior probability of 1/6, and all tests have the same informativity. One test, BA, is chosen randomly and yields a positive. Since BA is not part of sequence ABC, this is a false positive. The probabilities for ABC and three other sequences decline, while the probabilities for the two sequences that contain BA (BAC and CBA) increase. In the next round, the tests AC and CB are the most informative ones. They distinguish between the two most likely hypotheses, BAC and CBA. AC is chosen and yields a negative result (true negative). This weakens hypothesis BAC and supports CBA. In the third round, CB is the most informative test. A negative result is obtained, and CBA and BAC are on par again. Further tests are performed, and yield correct answers, which establishes the correct sequence ABC as the most likely one. However, in the last round AB is tested and yields a false negative. The probability for ABC declines and finishes on par with BCA. (B) Example simulation for the complex scenario where in each round, two tests are performed but only one test results can be published. Again, ABC is the correct sequence. In first round, where all tests have the same informativity, two tests are chosen randomly. Both tests BA and AC yield a negative result and turn out to be equally valuable. The negative result on AC is randomly chosen to be published. This decreases the probabilities for ACB and BAC, and increases the probability for the four other sequences. In the second round, AB and BC are tested. Both tests yield false negatives, one of which (BC) is published. This leads to a decline for the probabilities of ABC and BCA. After a few rounds of testing, CAB is leading while the correct hypothesis ABC is second best. However, a true negative on CA bring CAB and ABC on par, and further tests establish ABC as the most likely sequence. In both panels, italic type codes for false positives and negatives.

4.

(A) Performance (mean log odds for the true hypothesis after the last round, and standard error of the mean). Performance for the settings with coordinated test choice (EXP-2G and EXP-2G*) falls in between the performance of simulations with random test choice (SIM-R), and informative test choice (SIM-2). Performance in the setting with independent test choice is worst, and not better than random. This suggests that there are substantial negative effects arising from independent testing. Analogous to EXP-1S and EXP-1S*, there seems to be a disadvantage for knowing the error rates: Participants in EXP-2G* outperform those in EXP-2G. (B) Fraction of false among the positive results. Data from all three complex settings are pooled for panels B–D. The dynamics of false positives follows the patterns expected from simulation SIM-2. (C) Fraction of false among the negative results. The fraction of false negatives in the experiments (EXP-2G, EXP-2G* and EXP-2E) is larger than expected from simulation SIM-2. (D) Frequency of those among the chosen tests that support the true hypotheses. Over the rounds, participants frequently select those tests that correspond to the correct sequence. This leads to a decline of false positives. In contrast to the simulations (SIM-2; dashed grey line), selection against false findings does not work efficiently (solid grey line). While in the simulations, false findings are increasingly selected against, there is no substantial improvement in avoiding publication of false findings in the experiment. This suggests that human subjects did not efficiently use background knowledge to avoid the publication of false findings.