|Susanta Tewari|| at 11:00
Computing Probability under Infinite-sites Model in Population Genetics
Maximum likelihood estimation of mutation parameter theta uses full information in data as opposed to other summary based statistics. But computing probability under the popular infinite-sites model has been a long standing problem. The number of genealogies involved grows extremely large under this model, hence, computing the exact probability is difficult. However, earlier efforts  enumerated the exact number of ancestral configurations and genealogies involved for a given sample. In , an improvement was made using "forward algorithm" to traverse the recursion graph to expand the range of sample data that can be handled exactly. This difficulty has spurred many approximate importance sampling based approaches at computing the probability.
In our ongoing work, we have redefined the recursion equation under the infinite-sites model so as to gain in both exact and approximate calculations. We call this method "accelerated recursion (AR)", which exploits the fact that coalescences under this model do not disturb the sample configurations in ways that mutation does. This enables us to compact the coalescence events and achieve an improvement always, dramatic, for certain configurations. On a smaller note, the encoding of information in gene trees for computing the probability has so far not been optimal. We have come up with an optimal encoding using symmetric paths in gene trees, that always results in less number of configurations in the ancestry of the sample data;again especially significant for certain data patterns. We will show some numbers to make the above claims precise.