On the identification of differentially-active transcription factors from ATAC-seq data

ATAC-seq has emerged as a rich epigenome profiling technique, and is commonly used to identify Transcription Factors (TFs) underlying given phenomena. A number of methods can be used to identify differentially-active TFs through the accessibility of their DNA-binding motif, however little is known on the best approaches for doing so. Here we benchmark several such methods using a combination of curated datasets with various forms of short-term perturbations on known TFs, as well as semi-simulations. We include both methods specifically designed for this type of data as well as some that can be repurposed for it. We also investigate variations to these methods, and identify three particularly promising approaches (a chromVAR-limma workflow with critical adjustments, monaLisa and a combination of GC smooth quantile normalization and multivariate modeling). We further investigate the specific use of nucleosome-free fragments, the combination of top methods, and the impact of technical variation. Finally, we illustrate the use of the top methods on a novel dataset to characterize the impact on DNA accessibility of TRAnscription Factor TArgeting Chimeras (TRAFTAC), which can deplete TFs – in our case NFkB – at the protein level.

Dataset Proportion of ChIP peaks containing the motif

C
Supplementary Figure 1: Overview of the benchmark datasets and TFs.A: Volcano plot of the peak-level differential accessibility analyses, illustrating the extent and significance of changes upon treatment in each dataset.The dashed line represents a 0.05 FDR threshold, and indicated are the number of ATAC peaks passing this threshold, as well as the proportion of peaks with an uncorrected p-value lower than 0.05.B: Enrichment for experimental binding sites of the factor (main TF after which the datasets are named) in ATAC peaks containing matches for the respective motifs, versus all ATAC peaks.C: Proportion of ChIPseq peaks (overlapping ATAC peaks) that contain matches for the respective motif.

D
Supplementary Figure 2 A: Reproducibility of chromVAR-based differential accessibility analysis.Each color represents a dataset.The line and shaded area respectively represent the mean and standard error of the pairwise correlations across 7 independent runs of the chromVAR(z-score)>limma pipeline for each dataset and setting, of the ranks of the (union of) top 20 motifs.B: Illustration of the motifspecific position weights for the insertionModel method.C: Illustration of the computation of the network score.The motifs are ranked for each method, and for each k=1:100 the proportion of top k motifs that are part of the known network of the true TF is computed.The area under the curve is then reported, relative to the AUC of the best possible motif ordering for that factor (gray area).The example plotted here is from the GATA1 dataset.D: Agreement between the two rank-based metrics.Each point represents the results of a method (colour) in a dataset (shape).The network score is not entirely comparable across datasets.Beside this effect, and except at very low (1-3) or high (>100) ranks (i.e.where the ranks stop being discriminatory), there is a good agreement between the two metrics.Capping ranks to 100, the overall correlation between between ranks and network score is -0.61, and the median correlation across datasets is -0.75.When peaks are not resized to all have the same width, as recommended, the distributions are often globally shifted (A, left).Even when peaks are resized (A, right), differences in the width of the z-scores distributions which are not related to experimental groups can persist in some datasets.This is not specific to the z-scores, but also present in the bias-corrected deviations (B).C-D: Effect on replacing the internal library size normalization performed by chromVAR with a TMM normalization.TMM normalization does not reduce the differences in z-score distributions (C), nor does it improve the rank of the true TF (D).

B
Supplementary Figure 6: Impact of stringency in motif matching.Rank of the true TF (A) and Precision-Recall (B) of selected methods, when using a stringent motif matching vs the default.

B
Supplementary Figure 7: Impact of using motif archetypes.Rank of the true motif/archetype (A) and Precision-Recall (B) of selected methods, when using motifs archetypes instead of the full collection of motifs.Given the difference in the size of the two sets of motifs, ranks were capped at for the purpose of creating the heatmap.Supplementary Figure 8: Impact of using nucleosome-free fragments.Comparison of the rank of the true motif obtained by the top method of each family of approaches, when using alternative peaks or overlap counts based on nucleosome-free fragments.'All frags on NF peaks' stands for counting overlaps with fragments of any size, but on peaks called on nucleosome-free (NF) fragments only.'NF frags on NF peaks' stands for using only NF-fragments, i.e. counting overlaps of NF fragments on peaks called only on NF fragments.Finally, 'NF frags on full peaks' stands for counting overlaps of only NF fragments, but on the standard peaks (called using all fragments).

B
Supplementary Figure 9: Simple aggregation of top methods' results does not improve inferences. A. The results of the top method from each of the two families of approach (on the left) are compared to those of simple aggregation methods (on the right).Beside the established Fisher's, Stouffer's and Simes' methods, a rank-based permutation approach was tested, establishing the probability of having a sum of ranks across methods lower or equal to the observed one.We also tested using limma on merged persample activity scores from chromVAR and GCsmooth+MLM.B. Same comparison, in terms of Precision and Recall at adjusted p-value <= 0.05.
no overlap overlaps motif Supplementary Figure 13: MA plots of the simulated effects.For each factor, simulated paradigm and perturbation strength, the distribution of peaks log2FC of raw fragment counts between conditions as a function of mean accessibility is plotted as 2d kernel density, separating either (A) ATAC-seq peaks overlapping a ChIP-seq peak or not, or (B) ATAC-seq peaks containing a motif or not for the TF.For visibility, the peaks were subsampled to a more comparable size across the semi-simulated datasets (1.2$×$10^5 peaks per dataset).Colored Points indicate the means of the respective distributions.

Supplementary
Example of a subset of ATAC samples from Caradonna, Paul and Marrocco (2022) (GSE200670), where different samples show different shapes of z-scores distributions across motifs/TFs.The colors indicate different experimental conditions.
motif (standard: using all fragments on peaks called on all fragments) Rank of true motif (alternative) Change (in y relative to x): worsened same improved Specifically using nucleosome−free fragments does not generally improve results datasets in which the true motif is significant) Precision (Proportion of network motifs among significant) 0.25 0.50 0.75 1.000.000.25 0.50 0.75 1.000.000.25 0.50 0.75 1.000.000.25 0.50 0.75 1of the simulated TFs.Data shown correspond to the semisimulated datasets with a pertubation strength of 1 in the activation paradigm.A: Number of peaks, proportion of ChIP-seq and/or ATAC-seq peaks with a motif, and fold-enrichment of ATAC-peaks overlapping ChIP-seq peaks (i.e.bound peaks) for the motif across the four TFs.B-C: Enrichment of all tested motifs among the ATAC-seq peaks overlapping ChIP-seq peaks (relative to all ATAC-seq peaks), with the true motif labeled in black.The dots are colored either by co-occurence with the true motif (B) or by GC content, i.e. the summed probabilities of G and C across positions in the position probability matrix divided by the total sum (C).D: GC content distributions of the ChIP-seq peaks of each TF.With dotted lines the GC content of the true motif is indicated, with dashed lines the median GC content of the motifs showing stronger enrichment in the ChIP-seq peaks than the true motif.

Supplementary Figure 14: Positive control of the semi-simulations.
Shown are ranks and significance assigned by each method to the true motif across simulation parameters (TF, paradigm and perturbation strength).In light gray are the normal simulations (as shown in main Fig4), while dark line and colored boxes correspond to the positive controls where a difference between the groups was only introduced in the peaks overlapping a motif of the true TF.