Evaluating Factors Affecting the Efficacy of Deciphering Mutational Signatures with Simulated Data

(A) Evaluating the effect of deciphering similar mutational signatures from mutational catalogs containing different number of cancer genomes. Signatures III and IV were simulated with cosine similarity between 0.9 and 1.0 (i.e., with extremely similar shapes) whereas the remaining two signatures were very different from any of the other signatures (A).

(B) Evaluating the effect of deciphering mutational signatures with different similarities between them from mutational catalogs of 20 cancer genomes.

(C) Evaluating the effect of deciphering different number of mutational signatures from sets of mutational catalogs derived from 10, 20, 30, 50, 70, 100, and 200 cancer genomes.

(D) Evaluating the effect of deciphering different number of mutational signatures from sets of mutational catalogs derived from 50 cancer genomes. The catalogs were simulated with different average number of mutations in a cancer genome.

(E) Evaluating the effect of deciphering two, three, five, or seven mutational signatures from large sets of mutational catalogs containing small number of average mutations per cancer genome. The line colors correspond to the ones in (D) legend.

(F) Evaluating the effect of deciphering mutational signatures with different contributions across sets of 50 mutational catalogs. Signature I’s contributions were fixed to contribute a fixed percentage of all mutations in either the whole set of mutational catalogs, i.e., the overall contribution is fixed but different genomes can have different contributions of Signature I (blue bars) or in each individual cancer genome, i.e., Signature I’s contributions are fixed in every single mutational catalog (red bars).

(G) Comparison, across all performed simulations, between the accuracy for deciphering mutational signatures and the deciphering error for identifying the contributions of these signatures. The deciphering Frobenius reconstruction error was calculated and averaged for each contribution and normalized based on the number of mutations in the respective mutational catalog. In all panels, deciphering accuracy is shown in cosine similarity where accuracy of 1.00 corresponds to extracting exactly the same process used to simulate the data.

The error bars represent the SD of the deciphering accuracies after performing each simulation scenario 100 times.

See also .

## PubMed Commons