## Results: 5

1.

A) MCAM begins with clustering a biological dataset through the combinatorial application of a set of clustering parameters, followed by biological enrichment testing in various categories of information. Following this, the enrichment is used to prune those parameters that contribute little biological information. B) The depiction of an MCA, which contains M sets, with each set having some number of k clusters and produced by a particular combination of clustering parameters. Biological enrichment is corrected for multiple hypothesis testing by using the False Discovery Rate procedure across a set and within a category of biological information. Mutual Information can be used to compare the resulting clustering solution between any two sets.

2.

For these plots, all subsets were normalized to their own 5 minute time point to make the comparisons across treatments easier. Peptide centric clusters were created by finding all peptides that co-cluster with a given peptide at least 50% of the time in . A) Both SHC1 Y349 and SHC1 Y427 centric clusters are the same for Parental HMEC cells treated with EGF. B) EGF treatment of 24H cells creates some change in the SHC1 Y427 centric cluster, but the dynamics compared to Parental treatment are relatively similar. C) The SHC1 Y349 centric cluster has changed drastically compared to Parental EGF treatment due to a very different response of Y349 phosphorylation.

3.

A) An example histogram of PFAM enrichment in the MCA plotted in descending order of the number of total terms enriched per set. Green lines mark the top 25% and red the bottom 25% of sets based on total number of labels enriched. B) Example resulting enrichment from a random control for PFAM enrichment. C) The rate of null hypothesis rejection, per biological category, for and ten random controls. Random control distribution plotted in whisker plot and blue circles represent MCA results. Null hypothesis rejection in a random control is equivalent to a false positive, which as controlled for using the False Discovery Rate procedure with a cutoff of 0.05. D) Resulting average enrichment found, per category, per cluster in (blue dots) and ten random controls (box plots).

4.

A) Pairwise comparison of the overlap in the best and worst 25% of sets based on each metric in . We performed 1000 random selections of two sets of the same size to generate a normal distribution whose mean represents the expected overlap value between any two sets pulled from that background size. We then evaluated whether pairwise overlap was significantly higher (‘Pos. Sig’) or lower (‘Neg. Sig.’) than expected by random. Significance cutoff was set at a FDR corrected alpha value of 0.05. The top right represents the pairwise comparison of the best performing 25% and the bottom left is the comparison of the worst performing 25% of sets in each category. B) Hierarchical clustering of pairwise mutual information between every set in the MCA. Self-MI is highest along the diagonal. Highlighted groups indicate dendrogram cutoffs for which the full group is composed of the denoted parameter. The labels log10/pow denote normMax_log10, log10 and the pow transformations, pareto/zscore contain zscore and pareto transformations. The topmost zscore/pareto group contains one outlier (out of the group of 41) created using the transform pow.

5.

A) The group of phosphopeptides that participate at least 50% of the time in a cluster with enrichment for GO Biological Process term “MAPKKK Cascade”, those proteins with the term are starred. This new group is enriched for GO BP term “positive regulation of DNA replication”. B) These three phosphopeptides always appear when GO Cellular Compartment term “lamellipodium” is enriched, CTTN and PXN are the proteins annotated as being localized in lamellipodium. This new group is enriched for two sequence motifs as well. C) The co-occurrence matrix clustered hieararchically. Co-occurrence between any two phosphopeptides is the number of times those two peptides are clustered together in . For the heat map, the log base 10 was taken of the normalized values, zero values became 0.5/331 prior to log transformation. D) The average values, +/− two standard deviations, are shaded for the three groups highlighted in panel C.