Format

Send to

Choose Destination
Nat Genet. 2019 Jun;51(6):981-989. doi: 10.1038/s41588-019-0411-1. Epub 2019 May 27.

Similarity regression predicts evolution of transcription factor sequence specificity.

Author information

1
Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada.
2
Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada.
3
Institute of Integrative Biology, University of Liverpool, Liverpool, UK.
4
Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.
5
Canadian Institutes For Advanced Research (CIFAR) Artificial Intelligence Chair, Vector Institute, Toronto, Ontario, Canada.
6
Ontario Institute of Cancer Research, Toronto, Ontario, Canada.
7
Divisions of Biomedical Informatics and Developmental Biology, Center for Autoimmune Genomics and Etiology (CAGE), Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA.
8
Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA.
9
Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada. t.hughes@utoronto.ca.
10
Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada. t.hughes@utoronto.ca.
11
CIFAR, Toronto, Ontario, Canada. t.hughes@utoronto.ca.

Abstract

Transcription factor (TF) binding specificities (motifs) are essential for the analysis of gene regulation. Accurate prediction of TF motifs is critical, because it is infeasible to assay all TFs in all sequenced eukaryotic genomes. There is ongoing controversy regarding the degree of motif diversification among related species that is, in part, because of uncertainty in motif prediction methods. Here we describe similarity regression, a significantly improved method for predicting motifs, which we use to update and expand the Cis-BP database. Similarity regression inherently quantifies TF motif evolution, and shows that previous claims of near-complete conservation of motifs between human and Drosophila are inflated, with nearly half of the motifs in each species absent from the other, largely due to extensive divergence in C2H2 zinc finger proteins. We conclude that diversification in DNA-binding motifs is pervasive, and present a new tool and updated resource to study TF diversity and gene regulation across eukaryotes.

PMID:
31133749
DOI:
10.1038/s41588-019-0411-1

Supplemental Content

Full text links

Icon for Nature Publishing Group
Loading ...
Support Center