|

We demonstrate an
important connection between network motifs in certain biological networks and
validity of evolutionary trees constructed using parsimony methods.
Parsimony methods assume that taxa are described by a set of characters and
infer phylogenetic trees by minimizing number of character changes required
to explain observed character states.
From the perspective of applicability of parsimony methods, it is
important to assess whether the characters used to infer phylogeny are
likely to provide a correct tree. We introduce a graph theoretical characterization
that helps to select correct characters. Given a set of characters and a
set of taxa, we construct a network called character overlap graph. We show
that the character overlap graph for characters that are appropriate to use
in parsimony methods is characterized by significant under-representation
of subnetworks known as holes, and provide a mathematical validation for
this observation. This characterization explains success in constructing
evolutionary trees using parsimony method for some characters (e.g. protein
domains) and lack of such success for other characters (e.g. introns). In
the latter case, the understanding of mathematical obstacles to applying
parsimony methods in a direct way, has lead us to a new approach for
dealing with inconsistent and/or noisy data. Namely, we introduce the
concept of persistent characters which is similar but less restrictive as
the well known concept of pairwise compatible characters. Application of
this approach to introns produces the evolutionary tree consistent with
Coelomata hypothesis. In contrast, the direct application of a parsimony
method, using introns as characters, produces a tree which is inconsistent
with any of the two competing evolutionary hypotheses. Similarly, replacing
persistency with pairwise compatibility does not lead to a correct tree.
This indicates that the concept of persistency provides an important
addition to the parsimony toolbox.
|