Format

Send to

Choose Destination
Cell. 2014 Sep 11;158(6):1431-1443. doi: 10.1016/j.cell.2014.08.009.

Determination and inference of eukaryotic transcription factor sequence specificity.

Author information

1
Center for Autoimmune Genomics and Etiology (CAGE) and Divisions of Biomedical Informatics and Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA; Banting and Best Department of Medical Research and Donnelly Centre, University of Toronto, Toronto ON M5S 3E1, Canada.
2
Banting and Best Department of Medical Research and Donnelly Centre, University of Toronto, Toronto ON M5S 3E1, Canada.
3
Departamento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago 8331150, Chile.
4
Computational Biology Center, Sloan-Kettering Institute, New York, NY 10065, USA.
5
Department of Molecular Genetics, University of Toronto, Toronto ON M5S 1A8, Canada.
6
Banting and Best Department of Medical Research and Donnelly Centre, University of Toronto, Toronto ON M5S 3E1, Canada; Icahn Institute for Genomics and Multiscale Biology, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York City, NY 10029, USA.
7
Sorbonne Universités, UPMC Univ Paris 06, CNRS UMR 7621, CNRS, Laboratoire d'Océanographie Microbienne, Observatoire Océanologique, F-66650 Banyuls/mer, France.
8
Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA.
9
Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA; Plant Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA.
10
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.
11
Department of Electronic and Computing Systems, University of Cincinnati, Cincinnati, OH 45221, USA.
12
Program in Systems Biology, University of Massachusetts Medical School, Worcester, MA 01655, USA.
13
DNA2.0 Inc., Menlo Park, CA 94025, USA.
14
Genomic Analysis Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA; Plant Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA; Howard Hughes Medical Institute, The Salk Institute for Biological Studies, La Jolla, CA 92037, USA.
15
Banting and Best Department of Medical Research and Donnelly Centre, University of Toronto, Toronto ON M5S 3E1, Canada; Department of Molecular Genetics, University of Toronto, Toronto ON M5S 1A8, Canada. Electronic address: t.hughes@utoronto.ca.

Abstract

Transcription factor (TF) DNA sequence preferences direct their regulatory activity, but are currently known for only ∼1% of eukaryotic TFs. Broadly sampling DNA-binding domain (DBD) types from multiple eukaryotic clades, we determined DNA sequence preferences for >1,000 TFs encompassing 54 different DBD classes from 131 diverse eukaryotes. We find that closely related DBDs almost always have very similar DNA sequence preferences, enabling inference of motifs for ∼34% of the ∼170,000 known or predicted eukaryotic TFs. Sequences matching both measured and inferred motifs are enriched in chromatin immunoprecipitation sequencing (ChIP-seq) peaks and upstream of transcription start sites in diverse eukaryotic lineages. SNPs defining expression quantitative trait loci in Arabidopsis promoters are also enriched for predicted TF binding sites. Importantly, our motif "library" can be used to identify specific TFs whose binding may be altered by human disease risk alleles. These data present a powerful resource for mapping transcriptional networks across eukaryotes.

PMID:
25215497
PMCID:
PMC4163041
DOI:
10.1016/j.cell.2014.08.009
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Elsevier Science Icon for PubMed Central
Loading ...
Support Center