U.S. flag

An official website of the United States government

PMC Full-Text Search Results

Items: 5

1.
Figure 2.

Figure 2. From: Integrating motif, DNA accessibility and gene expression data to build regulatory maps in an organism.

Identifying TF regulators of developmental expression domains. (A) Method for discovery of TF–domain associations. The association tests are performed between 195 gene sets defined by BDGP expression annotations and gene sets formed from motif scans of 325 transcription factor motifs filtered by chromatin accessibility from four developmental stages with three different regulatory region definitions. The best regulatory region definition is chosen and the associations are evaluated by the expression of the transcription factors in the expression domains. Additional details of the procedure are found in the text and Supplementary Methods (SM5–SM8). (B) Comparison of association methods by area under receiver operator curves (AUROCs). The best method ‘MultiSpec + Acc + BestReg’ of calculating TF–domain associations uses multi-species motif scans instead of single species, an accessibility filter instead of none, and merges best results across three different regulatory region definitions (‘p5K’, ‘p1K’, ‘IG’). The three ROC curves are calculated using domain specific expression of the TF as the ground truth and the region of low false positive rate is plotted. The AUROC is reported in the legend and is 0.674 for our best method. The gray dotted line shows the expected ROC. (C) Comparison of best association method using multi-species scores filtered by accessibility to equivalent ChIP-based method. This analysis is the same as above, but is restricted to 35 TF/motifs for which we have ChIP data.

Charles Blatti, et al. Nucleic Acids Res. 2015 Apr 30;43(8):3998-4012.
2.
Figure 1.

Figure 1. From: Integrating motif, DNA accessibility and gene expression data to build regulatory maps in an organism.

Motifs and DNA accessibility together accurately predict TF–DNA binding. (A) Scoring profiles around a typical Drosophila gene locus. The positions of the genes (light blue) are shown in this 15 kb browser view. The scoring profiles depicted are, from top to bottom, the chromatin accessibility from DNaseI-seq of the stage 11 embryo, the multi-species motif scores of BIN within accessible regions (top 10% of accessible windows), and finally the DNA binding of BIN from a ChIP-seq experiment in the stage 11 embryo. (B) Inverse cumulative frequency distributions for four evaluations. Each line plots for a given correlation value (x-axis), the percentage of the 69 ChIP sets (y-axis) that are greater than that correlation value. The evaluations using multi-species (single-species) scores are solid blue (dotted red) lines. The darker lines represent evaluations between ChIP scores and ‘motif + accessibility’ scores, while the lighter lines represent evaluations comparing ChIP scores to ‘motif only’ scores in only accessible regions. (C) Pairwise correlation between ChIP scores and motif scores within accessible genomic regions. The columns of the heatmap represent the 69 ChIP named for the assayed TF, laboratory source, and developmental stage. The rows represent the experimentally determined motifs of the 40 corresponding TFs. Each cell is colored for the Pearson correlation between 2000 windows selected to have 1000 non-coding, accessible ChIP profile peaks and 1000 non-coding, accessible random regions. In a cell where the motif and ChIP profile represent the same TF, the rank (or star if rank>3) of that motif by correlation among the 40 TFs is enumerated. (D) Correlation of accessibility scores with motif only scores from different motifs. Similar to (C) except instead of using scores of ChIP profiles we used the four DNaseI-seq chromatin accessibility profiles named for their developmental stage. The Spearman correlation is calculated on 2000 windows selected to have 1000 non-coding accessibility peaks and 1000 non-coding random regions.

Charles Blatti, et al. Nucleic Acids Res. 2015 Apr 30;43(8):3998-4012.
3.
Figure 3.

Figure 3. From: Integrating motif, DNA accessibility and gene expression data to build regulatory maps in an organism.

Applications of TF–domain association discovery. (A) Clypeolabrum network () example. Four expression gene sets from BDGP related to clypeolabrum development in the early embryo are shown as blue nodes ordered counter clockwise from the top left. Grey nodes indicate TFs. Edges are drawn when the corresponding TF–domain association is significant (<1E−7). TF nodes are colored from light to dark by the number of association edges they have. Edges are colored by the type of expression support indicated in the legend and have been filtered to remove TFs with similar motifs (SM9). TFs are clustered by the set of clypeolabrum expression domains they regulate. Below the network are in situ images of three different TFs at different stages whose clypeolabrum associations are supported with consistent expression. The clypeolabrum (black circle) emerges from the procephalon (red circles). (B) Distribution across binding domains families for TFs with greatest regulatory region biases. Each bar represents a different TF colored by its DBD family and height indicting the statistical strength of the bias between the proximal ‘p1K’ regulatory region and the more distal, insulator defined ‘IG’ regulatory region. The starred transcription factor is shown in detail in the inset plot with the P-values of the two methods for all TF–domain pairs in blue and for the 195 DISCO–domain pairs in red. Only points outside of the green lines are considered to be significantly biased. (C) Distribution across stages for expression domains with the greatest regulatory bias. Same as (B) with inset plot showing 325 TF associations with embryonic dorsal epidermis (stage 13–16).

Charles Blatti, et al. Nucleic Acids Res. 2015 Apr 30;43(8):3998-4012.
4.
Figure 4.

Figure 4. From: Integrating motif, DNA accessibility and gene expression data to build regulatory maps in an organism.

Modeling expression domain enhancers. (A) ROCs for methods of detecting 684 REDfly enhancers from 684 negative sequences. Using the chromatin accessibility score from embryonic DNaseI-seq data (AUROC 0.789) is the best method. It is more discriminative than using scores for the presence of chromatin mark H3K4Me3, the binding of transcriptional co-activator CBP, or the maximum of 325 multi-species motif scores. (B) Comparisons of four different models. For each type of model, we calculated the AUROC using the RFVO test set on each of the 40 expression domains (SM13). The distribution of these forty values is visualized with the x-axis showing a particular value of the AUROC and the y-axis indicating the percentage of the domains with a stronger AUROC. Of the four models compared, the best model, ‘Motif * Express + Access’, combines 325 motif based features with four accessibility based features in a linear model (see panel C). The ‘Motif * Express’ and ‘Access’ models use a subset of features from the best model, and the ‘ChIP * Express’ model (SM14) uses one feature from each of the 69 downloaded ChIP data sets. (C) Training domain specific models of enhancer expression. Our linear model combines each putative enhancer's accessibility features with TF features that are the product of the motif score, the importance from our compendium of the TF in regulating the domain, and the TF's expression from in situ annotations and from RNA-seq data. ‘Good’ models (RFVO AUROC > 0.7 or Test AUROC > 0.6) are applied to every accessible window in the genome. The top 5% of windows that predict the given domain expression that are within the regulatory region of a gene expressed in that domain are predicted as domain-specific enhancers. We evaluate these predictions by their agreement with REDfly enhancers.

Charles Blatti, et al. Nucleic Acids Res. 2015 Apr 30;43(8):3998-4012.
5.
Figure 5.

Figure 5. From: Integrating motif, DNA accessibility and gene expression data to build regulatory maps in an organism.

Enhancers model comparison to ChIP and genome-wide predictions. (A) Comparison of the RFVO AUROCs. One point is plotted for each of the 40 expression domains with the color indicating its developmental stage. The x-axis (y-axis) is the AUROC of the ‘Motif * Express + Access’ (‘ChIP * Express’) model. Off diagonal points (labeled) are expression domains that find better models using one set of features instead of the other. Motif and accessibility features show greatest advantage over ChIP-based ones for stage 13–16 expression domains. (B) Evaluation of 406 open regions that overlap REDfly enhancers. Each open region (x-axis) is near genes annotated with a number of possible expression domains (lower plot, blue dots). We order the possible expression domains by the predictions of our ‘Motif * Express + Access’ models and identify the rank of the ‘true’ expression domain annotated for the enhancer in REDfly. We plot a statistic (upper plot, red line) that achieves a maximum possible value of 1 when the REDfly domain is the best of all possible expression domains of that open region. (C) Genome browser view of enhancer predictions near stg gene. The position and structure of genes is shown at the top. At the bottom, the chromatin accessibility from DNaseI-seq of four developmental time points is shown as colored profiles. Each possible expression domain of stg is shown (‘Gene Expression Domains’) and color-coded. The ‘REDfly enhancers’ are shown with the fill and border color matching their annotated gene expression domains. Finally, the ‘Open Region Assignments’ show which expression domains are likely driven by each 500 bp open region. The color and size of the open region box indicate the driven expression domain and the significance of the prediction. Five different open regions are circled where the most significant expression domain prediction is consistent with the annotation of an overlapping REDfly enhancer.

Charles Blatti, et al. Nucleic Acids Res. 2015 Apr 30;43(8):3998-4012.

Supplemental Content

Recent activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...
Support Center