Format

Send to

Choose Destination
J Comput Biol. 2011 Feb;18(2):155-68. doi: 10.1089/cmb.2010.0220.

Alignment constrained sampling.

Author information

1
Department of Computer Science, Cornell University, Ithaca, New York, USA.

Abstract

We present ALICO (ALIgnment COnstrainted) null set generator: a framework to generate randomized versions of an input multiple sequence alignment that preserve some of its crucial features including its dependence structure. In particular, we show that, on average, ALICO samples approximately preserve the PIDs (percent identities) between every pair of input sequences. At the same time our examples demonstrate that the average k-mer composition of each of the sampled sequences show great resemblance to the k-mer composition of our genomic training data. Of note is that ALICO requires only pairwise alignment training data rather than multiple alignment training data. We demonstrate the utility of ALICO in predicting the correct results returned by the "homology-aware" finders PhyloCon, MEME with conservation prior and PRIORITY-C, as well as by our "naive" finder GibbsMarkov, applied to the MacIsaac orthologous yeast data. Finally, we show that using ALICO sampling derived p-values to combine results from multiple finders often outperforms its best individual component. Supplementary Material is available at www.liebertonline.com/cmb .

PMID:
21314455
DOI:
10.1089/cmb.2010.0220
[Indexed for MEDLINE]

Supplemental Content

Loading ...
Support Center