| Leila Taher | at 11:00 |
Edit |
Delete |
Function conservation in diverged noncoding elements
Transcriptional regulatory elements show a highly modular organization, and consist of a variable number of degenerate binding sites for several transcrip-tion factors. Mutations in regulatory elements outside of active binding sites or those that do not deactivate transcription factor binding are likely to have little or no impact on the function of these elements. Due to the circumscribed impact of mutations on the function of regulatory elements, regulatory sequences often diverge extensively while retaining their ancestral functions. Therefore, cross-species sequence comparison – the primary method to identify regulatory elements in metazoan genomes – often has limitations in detecting functionally conserved elements that lack sufficient sequence similarity.
This work explores the structure of regulatory information encoded in the distribution of TFBS in absence of sequence similarity. We modeled noncoding sequences as arrangements of transcription factor binding sites (TFBSs), and compared noncoding regions using alignments between sequences of TFBSs instead of nucleotide alignments. Additionally, our alignment model contemplates evolutionary events in regulatory sequences, e.g., matches, mismatches, and duplications of TFBSs. To train our model, we developed a strategy that reconstructs the TFBS structure of diverged sequences based on phylogenetic relationships among groups of species. Thereby, we constructed a set of 2,500 noncoding elements that have diverged between human and zebrafish but are likely to share the same ancestry. Our method correctly detects ancestral identity for over 50% of these elements embedded into 50kb stretches of background DNA. We evaluated the significance of the alignment scores by comparing true orthologs to sequences with similar GC-content, and enforced an almost zero false positive rate of predictions. Applying our method to a selected set of 3,000 human loci, we predicted approximately 400 pairs of sequences that are very likely to share a common ancestor and have preserved their function despite not being conserved between human and zebrafish.
We validated predicted zebrafish elements with reporter-gene assays in trans-genic zebrafish, observing in vivo enhancer activity in 86% of elements (6/7). Moreover, the human and zebrafish putative functional orthologs directed highly overlapping tissue-specific expression patterns.
|