Reporter gene assays and chromatin-level assays define substantially non-overlapping sets of enhancer sequences

Background Transcriptional enhancers are essential for gene regulation, but how these regulatory elements are best defined remains a significant unresolved question. Traditional definitions rely on activity-based criteria such as reporter gene assays, while more recently, biochemical assays based on chromatin-level phenomena such as chromatin accessibility, histone modifications, and localized RNA transcription have gained prominence. Results We examine here whether these two types of definitions, activity-based and chromatin-based, effectively identify the same sets of sequences. We find that, concerningly, the overlap between the two groups is strikingly limited. Few of the data sets we compared displayed statistically significant overlap, and even for those, the degree of overlap was typically small (below 40% of sequences). Moreover, a substantial batch effect was observed in which experiment set rather than experimental method was a primary driver of whether or not chromatin-defined enhancers showed a strong overlap with reporter gene-defined enhancers. Conclusions Our results raise important questions as to the appropriateness of both old and new enhancer definitions, and suggest that new approaches are required to reconcile the poor agreement among existing methods for defining enhancers. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-023-09123-9.


Background
Transcriptional enhancers play an essential role in gene regulation and are primary mediators of development, homeostasis, disease, and evolution [1][2][3]. Among many unanswered questions about enhancer biology, one stands out as fundamental: how should an enhancer be defined? In the genomic era, the functional definition that for a quarter-century described enhancers as sequences with the ability to drive expression of a reporter gene from a minimal promoter [4,5] has entered into an uneasy co-existence with transcription and chromatinbased definitions such as ability to bind specific sets of transcription factors or coactivators, presence of certain histone modifications, location in nucleosome-depleted regions, or transcription of enhancer RNA (eRNA) (e.g. [6][7][8][9][10][11][12][13][14][15]) (Fig. 1). It has become increasingly clear that these chromatin-level enhancer definitions identify sets of sequences with strikingly low levels of overlap, with concerning implications for regulatory genome annotation, ongoing studies of enhancer biology, and interpreting results from genome-wide association studies, expression quantitative trait locus studies, and the like [16][17][18]. However, it is unknown which of these various assays are in the best agreement with reporter gene data, which remains the "gold standard" for enhancer activity, and to what extent.
Here, we perform a comprehensive comparison to investigate whether one or a collection of chromatin-based assays are able to identify the majority of enhancers from an extensive reporter-gene defined set. We show that not only do the chromatin-level assays show poor agreement among themselves, but also that they fail to discover a significant fraction of reporter-gene defined enhancers, often performing no better than random expectation. Our results raise questions as to whether any common current assays sufficiently interrogate the enhancer landscape, and about the accuracy of current regulatory genome annotations.

Results and discussion
In order to test for congruence between enhancers defined by reporter gene assays, and those defined by chromatin-based assays, we compared Drosophila enhancers obtained from two sources: REDfly [19] and EnhancerAtlas2.0 [20]. The REDfly database contains over 38,000 empirically-defined enhancers manually curated from the literature, split roughly evenly between in vivo reporter gene assays and cell-culture based reporter gene assays. The EnhancerAtlas2.0 database, by contrast, is composed of 294,158 enhancers predicted by means of an unsupervised learning approach that combines data from ChIP-seq, ATAC-seq, FAIRE-seq, and other chromatin-level assays [20]. This comparison is uniquely possible for Drosophila: as the overwhelming majority of REDfly's enhancers are based on evidence developed without consideration of chromatin-level features, they are therefore identified fully independently from those in EnhancerAtlas, using different evidence.
For an initial comparison, we took all REDfly enhancers 2 kb or shorter (11,549 total) and determined how many of these sequences overlapped an EnhancerAtlas enhancer, without regard for the annotated tissue-specificity of the REDfly enhancers ( Fig. 2A; see Methods). Of the 21 tissue-specific Drosophila EnhancerAtlas datasets, 17 had statistically significant overlap (adjusted P < 0.01, two-tailed z test), three had no significant overlap, and, surprisingly, one had significantly less overlap than expected by chance ( Fig. 3A "STARR-seq included", Table S1a). Examination of these data revealed that more than three-quarters of the REDfly enhancers we were using (8779/11549, 76%) were identified using a single method, STARR-seq [21], a cell-culture based, episomal reporter gene assay; these STARR-seq enhancers made up 80% of the REDfly-EnhancerAtlas overlapping enhancers (6027/7527). When we eliminated the STARRseq enhancers from the REDfly test set and repeated the analysis, the results were strikingly different: only five of the EnhancerAtlas datasets (24%) now showed significant overlap (with one additional dataset just below our significance threshold), whereas ten datasets (48%) had significantly less overlap than expected by chance ( Fig. 3A "STARR-seq not included", Table S1b [Additional File 1]).
STARR-seq-defined enhancers thus have a profound effect on how well sequences from the two databases compare. Only four EnhancerAtlas datasets make use of STARR-seq data, indicating that a simple confounding of the source data cannot explain the results. In the absence of STARR-seq enhancers, almost half of the REDfly datasets had less-than-expected overlap with EnhancerAtlas, suggesting that the majority of REDfly enhancers lack activity in most of the tissues covered by EnhancerAtlas. Conversely, these results suggest that STARR-seq, performed in a cultured cell line and using an episomal rather than an integrated reporter, may be identifying many sequences indiscriminately with respect to their tissue specific activity.
To examine more directly how REDfly enhancers active in specific tissues compare with the EnhancerAtlas datasets, we constructed tissue-specific REDfly enhancer sets using only in-vivo reporter gene tested enhancers < 1000 bp in length ( Fig. 2B; Table S2 [Additional File 2]). Of the 21 EnhancerAtlas sets, 11 had one or more corresponding REDfly sets. We then determined how many enhancers from each paired set overlapped. Our expectation was that there should be significant overlap, with the EnhancerAtlas sets (based on undirected genomewide assays) encompassing the great majority of REDfly enhancers (drawn from individual ad hoc experiments) (Fig. 2C). Surprisingly, only four (36%) of the eleven EnhancerAtlas sets showed significant overlap with their corresponding REDfly set (P < 0.01, Fig. 3A "all tissuespecific datasets", Table S1c [Additional File 1]). Moreover, even for the sets with significant overlap, the number of in-common enhancers was strikingly limited (median 26%, range 5%-65%; Fig. 3B, Table S1c [Additional File 1]).
Since the EnhancerAtlas definitions integrate the data from multiple assays, we reasoned that the integration algorithm might be filtering out some of the true enhancers. To test this, we took the underlying data sets used by EnhancerAtlas (Table S1e [Additional File 1], referred to as EnhancerAtlas "subsets"; Fig. 2B) and compared them individually to the matched REDfly enhancer sets. Indeed, we saw a higher number of significantly overlapping enhancer sets (66%, Fig. 3A, Table S1d,e [Additional File 1]), but again, the number of in-common enhancers within each matched set was low (median 39%, Fig. 3B, Table S1d,e [Additional File 1]).
The low degree of in-common enhancers could represent a small number of REDfly enhancers that consistently match EnhancerAtlas enhancers, or different subsets of REDfly enhancers for each EnhancerAtlas (See figure on next page.) Fig. 2 Schematic of REDfly-EnhancerAtlas comparisons performed in this study. See text and Methods for details. A REDfly enhancers under 2000 bp in length were selected and tested for overlap against each of the 21 EnhancerAtlas Drosophila enhancer sets. Similar comparisons were made subsequently after filtering out all enhancers defined by STARR-seq assays. B REDfly enhancers were placed into tissue-specific sets after discarding all sequences longer than 1000 bp (600 bp for some sets). Eleven such sets were tested for overlap with eleven tissue-matched sets from EnhancerAtlas. Each EnhancerAtlas set was then split into multiple individual sets with the data from just a single component experiment ("subsets"), and each subset was tested for overlap against the corresponding REDfly tissue-matched set. C EnhancerAtlas data are based on genome-wide assays, while REDfly enhancers are drawn from individual reporter gene experiments. Therefore, we expect the majority of REDfly enhancers for a given tissue to comprise a subset of the EnhancerAtlas enhancers for the matched tissue Lindhorst and Halfon BMC Genomics subset. To distinguish between these possibilities, we looked at the correlation between the sets of REDfly enhancers present in individual EnhancerAtlas subsets. The sets of enhancers found in different experiments were not overall well-correlated, suggesting that distinct REDfly enhancer sets are being identified (Fig. 4, Fig. S1 [Additional File 3]). However, a clear correlation structure was evident in which assays from a particular laboratory and method tended to cluster together. For example, the REDfly enhancers from one set of experiments (series GSE102839) are highly correlated (Fig. 4, white box; mean r = 0.74 ± 0.11), and those from a different set of experiments (series GSE101827) are highly correlated (Fig. 4, yellow box; mean r = 0.67 ± 0.18). However, even though both sets contain ATAC-seq experiments, they find a disjoint group of REDfly enhancers (mean r = 0.12 ± 0.08). Although assay-specific correlation is occasionally observed (Fig. 4, yellow asterisks; r = 0.77), in other cases similar assays are only weakly correlated (e.g., Fig. 4 ). However, the fact that batch effects appear to dominate the correlation structure makes it difficult to draw conclusions as to the most effective enhancer identification methods. For instance, we note that over two-thirds of the ATAC-seq results are from a single experiment series, GSE101287, and the number of subsets with significant REDfly overlap for the ATAC-seq experiments in series GSE102441 and series GSE102839 was a more modest 60% and 50%, respectively. These observations suggest two important conclusions: (1) identification of putative enhancers is highly dependent on not just the type of assay performed, but on the precise conditions under which it is performed; and (2) enhancer identification is reasonably robust given a particular set of assays and replicates. The fact that the sets of REDfly enhancers are stable within groups of replicate assays suggests that these enhancers are being specifically (i.e., non-randomly) found, despite the fact that, as shown above, the number of REDfly enhancers identified through chromatin-level assays is frequently indistinguishable from random expectation. Thus, these assays do appear to be able to identify reporter-gene-defined enhancers, but with low efficiency and potentially high false-positive rates.
Recently, Gao et al. released "scEnhancer", an EnhancerAtlas-like database based on single-cell ATAC-seq data drawn from different embryonic time points [22]. Unlike the results using EnhancerAtlas, the majority of scEnhancer datasets had a significant degree of overlap with their REDfly set (45/56, 80%; Fig. 1A, Table S1f [Additional File 1]), although similar to what we observed with EnhancerAtlas, the number of overlapping enhancers was low (median 27%, range 10%-65%; Table S1f [Additional File 1]). scATACseq may thus represent a more promising method for enhancer detection, although a proper assessment is difficult as all of the Drosophila data currently in scEnhancer are from a single source.

Conclusions
Reporter gene assays have long been considered the gold standard for defining enhancers. On these grounds, our results would seem to suggest that not only do chromatin-level assays frequently fail to identify common sets of enhancer sequences [16], but neither are they particularly effective at covering the majority of the enhancer landscape. However, we would caution against automatically accepting this conclusion, as there are well-known deficiencies that could lead to a substantial number of both false-positive and false-negative results from reporter gene assays. These include enhancer-promoter incompatibility, effects due to episomal expression or chromosomal integration, cell-type specificity, and ectopic expression due to missing repressor binding sites, as well as recent findings that enhancers can double as silencers or functionally overlap other regulatory features (see discussions in [23][24][25][26][27]). It is also possible that the "right" set of chromatin assays has yet to be applied. For instance, Koenecke et al. [28] suggest that a key parameter for enhancer identification is the relative, Fig. 3 Comparisons between REDfly and EnhancerAtlas enhancer sets. REDfly and EnhancerAtlas datasets were compared and tested for significance as described in the text and TableS1 [Additional File 1]. Box plots show medians and the first and third quartiles. Data points shown in red are significant at a Bonferonni-adjusted P-value < 0.01. Dataset names correspond to the names in Table S1 [Additional File 1]. For the "tissue-specific comparisons" (Table S1c [Additional File 1]), when multiple REDfly sets corresponded to the same EnhancerAtlas set, the one with the highest degree of overlap was selected for analysis. For the subset comparisons (Table S1d, rather than absolute, level of histone 3, lysine 27 acetylation (H3K27ac) flanking an enhancer, and many lessfrequently assayed histone modifications appear to be present in various combinations at at least some subsets of enhancers (e.g., [29][30][31]). Without true "known" enhancer data, therefore, it is impossible to say whether our results reflect significantly poor sensitivity in chromatin-level assays, or a much larger than heretofore recognized false positive rate in reporter gene assays. What is indisputable, however, is a clear need for approaches that can reconcile the poor agreement between and among the various reporter gene and chromatin-level enhancer identification methods. In this regard, the increasing tractability of genome-engineering approaches, e.g., via CRISPR/Cas9 sequence deletion and replacement, holds out an encouraging potential to interrogate the enhancer capability of sequences within their native genomic contexts.

Data sources
For comparisons using all REDfly data (Table S1a, S1b), sequences were obtained from REDfly v7.1.1 (Aug. 14 2020) by downloading all "CRM" entries in bed format, eliminating sequences > 2000 bp, and then removing overlapping sequences using the script "SelectSmall-estFeature.py" (Kazemian and Halfon 2019). All REDfly data used in this study are based on experimental reporter gene assays curated from the primary literature Fig. 4 Correlations between REDfly enhancers overlapping enhancers from each EnhancerAtlas subset used for the "L3_wing_disc" EnhancerAtlas set. The correlation structure demonstrates that experimental batch effects predominate over assay-type effects. High correlations are seen between sets from the same experiment group (see white box, yellow box), even when assay types differ (dotted yellow box). Although assay-specific correlations are sometimes present (yellow asterisks), identical assays can also be poorly correlated when from different experiment groups (white asterisks). Individual experiments on the y-axis ("GSM" identifiers) are colored according to common experimental series (i.e., performed by the same laboratory as a specific set of experiments) as provided in GEO ("GSE" identifiers, see Table S1d [Additional File 1]). Sets labeled in black are unrelated. Label colors on the x-axis indicate assay types as follows: red, H3K4me1 ChIP-seq; green, H3K27ac ChIP-seq; orange, Grh ChIP-seq; cyan, Pol2 ChIP-seq; dark blue, ATAC-seq; black, ChIP-seq against various transcription factors by the professional REDfly biocuration team. The length cutoff of 2000 bp (and below, 600-1000 bp) was selected so as to reduce potential spurious results from the inclusion of non-regulatory sequences, or multiple independent enhancers, in our comparisons. Because REDfly data are based on reporter gene assays, the length of the tested sequences can be arbitrarily large, but data from numerous deletion studies suggests that typical enhancers are on the order of hundreds, not thousands, of base pairs.
For comparisons using specific REDfly subsets (Table S1c-S1f ), "CRM" sequences were downloaded from REDfly v5.6.1 (Dec. 3 2019) in bed format with "cell-culture only" sequences excluded and a sequence length cutoff of either 1000 or 600 bp. Following removal of overlapping sequences using "SelectSmallestFeature.py" (Kazemian and Halfon 2019), the expression pattern annotations associated with each remaining sequence were used to place the sequences into one or more of 30 different tissue-specific groupings. Details of the composition of each set can be found in Supplementary Table S2. For compatibility with EnhancerAtlas, genome coordinates were converted from dm6 to dm3 using LiftOver [32] with minMatch = 0.25.
EnhancerAtlas sequences were downloaded from EnhancerAtlas 2.0 (http:// enhan cerat las. org, Nov. 19 2019). EnhancerAtlas sequences are compiled using the EnhancerAtlas unsupervised learning algorithm based on data from one or more of 12 types of chromatin-level assays [20]. In brief, data from each individual experiment ("track") are normalized and merged based on Jaccard overlap across the genome. Importantly, a peak is only merged into the consensus profile if it is supported by at least 50% of the included tracks.
Identities of the component datasets for each of the 21 tissue-specific EnhancerAtlas sets were obtained from the metadata files available in the "data source" section. The provided GEO accession codes were then used to obtain the sequence-level data from NCBI. Data processing was performed as described in the EnhancerAtlas paper [20] to ensure consistent results. Wig files were converted to bigWig format through the script wig2Big-Wig, downloaded from the UCSC genome browser at genome.ucsc.edu/goldenpath/help/bigwig.html. BigWig files were converted to bedgraph format through the script bigWigtoBedGraph. Bedgraph files were converted to bed format through peak calling using MACS2 [33] with a cutoff enrichment of 2. Data sets where we were unable to replicate the exact EnhancerAtlas processing pipeline (primarily, raw sequencing data) were omitted from further analysis. scEnhancer sequences were downloaded from scEnhancer (http:// enhan cerat las. net/ scenh ancer/, Feb. 28. 2022).

Comparison of data sets
Bed files were compared using BEDTools intersect [34] and the -wa and -u flags. Note that with these parameters, even a single intersecting basepair will cause the two sequences to be scored as an intersection, making our tests highly sensitive to any degree of sequence overlap.

Significance of comparisons
Significance of dataset overlap was determined by permuting the coordinates of each REDfly dataset 500 times using BEDTools shuffle [34] and repeating the tests for intersection. The mean and standard deviation of the permuted results were then used to calculate a z-score. A Bonferroni-corrected P value equivalent to P < 0.01 was determined for each set of comparisons.

Correlation analysis
For each comparison, each potential REDfly enhancer was scored as 1 (found) or 0 (not found). Correlations between all pairs of result vectors were determined using the R cor function and visualized as heat maps using ggplot.