Use of ChIP-Seq data for the design of a multiple promoter-alignment method

Ionas Erb; Juan R González-Vallinas; Giovanni Bussotti; Enrique Blanco; Eduardo Eyras; Cédric Notredame

doi:10.1093/nar/gkr1292

Use of ChIP-Seq data for the design of a multiple promoter-alignment method

Nucleic Acids Res. 2012 Apr;40(7):e52. doi: 10.1093/nar/gkr1292. Epub 2012 Jan 9.

Authors

Ionas Erb¹, Juan R González-Vallinas, Giovanni Bussotti, Enrique Blanco, Eduardo Eyras, Cédric Notredame

Affiliation

¹ Bioinformatics and Genomics program, Centre for Genomic Regulation and UPF, 08003 Barcelona, Spain.

Abstract

We address the challenge of regulatory sequence alignment with a new method, Pro-Coffee, a multiple aligner specifically designed for homologous promoter regions. Pro-Coffee uses a dinucleotide substitution matrix estimated on alignments of functional binding sites from TRANSFAC. We designed a validation framework using several thousand families of orthologous promoters. This dataset was used to evaluate the accuracy for predicting true human orthologs among their paralogs. We found that whereas other methods achieve on average 73.5% accuracy, and 77.6% when trained on that same dataset, the figure goes up to 80.4% for Pro-Coffee. We then applied a novel validation procedure based on multi-species ChIP-seq data. Trained and untrained methods were tested for their capacity to correctly align experimentally detected binding sites. Whereas the average number of correctly aligned sites for two transcription factors is 284 for default methods and 316 for trained methods, Pro-Coffee achieves 331, 16.5% above the default average. We find a high correlation between a method's performance when classifying orthologs and its ability to correctly align proven binding sites. Not only has this interesting biological consequences, it also allows us to conclude that any method that is trained on the ortholog data set will result in functionally more informative alignments.

Publication types

Research Support, Non-U.S. Gov't
Validation Study

MeSH terms

Animals
Binding Sites
Cattle
Chromatin Immunoprecipitation*
Dogs
Evolution, Molecular
Humans
Mice
Promoter Regions, Genetic*
Sequence Alignment / methods*
Sequence Analysis, DNA*
Software
Transcription Factors / metabolism

Substances

Transcription Factors