Format

Send to

Choose Destination
Nucleic Acids Res. 2016 Jan 4;44(D1):D116-25. doi: 10.1093/nar/gkv1249. Epub 2015 Nov 19.

HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models.

Author information

1
Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991, GSP-1, Vavilova 32, Moscow, Russia Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, GSP-1, Gubkina 3, Moscow, Russia ivan.kulakovskiy@gmail.com.
2
Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, GSP-1, Gubkina 3, Moscow, Russia.
3
Design Technological Institute of Digital Techniques, Siberian Branch of the Russian Academy of Sciences, 630090, Academician Rzhanov 6, Novosibirsk, Russia Institute of Systems Biology Ltd, 630112, office 901, Krasina 54, Novosibirsk, Russia.
4
Moscow Institute of Physics and Technology, 141700, Institutskiy per. 9, Dolgoprudny, Moscow Region, Russia.
5
King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Thuwal 23955-6900, Saudi Arabia.
6
Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, GSP-1, Gubkina 3, Moscow, Russia Center for Bioengineering, Russian Academy of Sciences, 117312, 60-letiya Oktyabrya 7/2, Moscow, Russia.
7
Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991, GSP-1, Vavilova 32, Moscow, Russia Vavilov Institute of General Genetics, Russian Academy of Sciences, 119991, GSP-1, Gubkina 3, Moscow, Russia Moscow Institute of Physics and Technology, 141700, Institutskiy per. 9, Dolgoprudny, Moscow Region, Russia vsevolod.makeev@gmail.com.

Abstract

Models of transcription factor (TF) binding sites provide a basis for a wide spectrum of studies in regulatory genomics, from reconstruction of regulatory networks to functional annotation of transcripts and sequence variants. While TFs may recognize different sequence patterns in different conditions, it is pragmatic to have a single generic model for each particular TF as a baseline for practical applications. Here we present the expanded and enhanced version of HOCOMOCO (http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco10), the collection of models of DNA patterns, recognized by transcription factors. HOCOMOCO now provides position weight matrix (PWM) models for binding sites of 601 human TFs and, in addition, PWMs for 396 mouse TFs. Furthermore, we introduce the largest up to date collection of dinucleotide PWM models for 86 (52) human (mouse) TFs. The update is based on the analysis of massive ChIP-Seq and HT-SELEX datasets, with the validation of the resulting models on in vivo data. To facilitate a practical application, all HOCOMOCO models are linked to gene and protein databases (Entrez Gene, HGNC, UniProt) and accompanied by precomputed score thresholds. Finally, we provide command-line tools for PWM and diPWM threshold estimation and motif finding in nucleotide sequences.

PMID:
26586801
PMCID:
PMC4702883
DOI:
10.1093/nar/gkv1249
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center