Precision–recall evaluation of phenotype data on GO biological processes. The predictive power of phenotype profile correlations was evaluated against a gold standard based on six biological processes as defined by the GO: DNA repair, amino-acid biosynthesis, cell cycle checkpoint, response to osmotic stress, aerobic respiration, and galactose metabolism (
A). The fraction of known functionally related gene pairs to total predictions (precision) at a range of thresholds is plotted versus the percentage of the number of known gene relationships recovered (recall) (

). The characteristics of other high-throughput experimental data, affinity precipitation (▪), yeast two hybrid (

), synthetic lethality (

), transcription factor binding site data (

), microarray correlation (

), and functional data derived from Hughes et al (2000) (

) are shown for comparison. Two supervised feature selection methods were used to select the relevant features from the diverse collection of microarray data, one selecting single data set features independently and the other including or excluding entire data sets. The phenotype data is both more sensitive and precise than other high-throughput data on this set of processes. The phenotype profiles were also evaluated against a more general set of GO terms for comparison against existing data including (
B) and excluding (
C) the ribosome biogenesis GO term (GO:0007046), which tends to dominate gene pairs implicated by coexpression. The phenotype profiles implicate gene relationships over a broad range of biological processes.