- Journal List
- PLoS ONE
- v.3(7); 2008
- PMC2447175

# Universal Scaling in the Branching of the Tree of Life

^{1}Claudio J. Tessone,

^{1,}

^{2}Konstantin Klemm,

^{3}Víctor M. Eguíluz,

^{1,}

^{*}Emilio Hernández-García,

^{1}and Carlos M. Duarte

^{4}

^{}

^{1}IFISC, Instituto de Física Interdisciplinar y Sistemas Complejos (CSIC-UIB), Palma de Mallorca, Spain

^{2}ETH Zürich, Zürich, Switzerland

^{3}Bioinformatics Group, Department of Computer Science, University of Leipzig, Leipzig, Germany

^{4}IMEDEA, Instituto Mediterráneo de Estudios Avanzados (CSIC-UIB), Esporles, Spain

Conceived and designed the experiments: VME. Analyzed the data: EAH CJT KK VME. Contributed reagents/materials/analysis tools: VME. Wrote the paper: EAH VME EHG CMD. Discussed and interpreted the results: CD EH KK CT EH.

## Abstract

Understanding the patterns and processes of diversification of life in the planet is a key challenge of science. The Tree of Life represents such diversification processes through the evolutionary relationships among the different taxa, and can be extended down to intra-specific relationships. Here we examine the topological properties of a large set of interspecific and intraspecific phylogenies and show that the branching patterns follow allometric rules conserved across the different levels in the Tree of Life, all significantly departing from those expected from the standard null models. The finding of non-random universal patterns of phylogenetic differentiation suggests that similar evolutionary forces drive diversification across the broad range of scales, from macro-evolutionary to micro-evolutionary processes, shaping the diversity of life on the planet.

## Introduction

The Tree of Life is a synoptic depiction of the pathways of evolutionary differentiation between Earth life forms [1], and contains valuable clues on the key issue of understanding the diversification of life in the planet [2]. The branching pattern of the Tree of Life, which is being captured at increasing resolution by the advent of molecular tools [3], can be examined to investigate fundamental questions, such as whether it follows universal rules, and at what extent random differentiation mechanisms explain the shape of phylogenetic trees. The examination of the structure of the Tree of Life can also help to infer whether evolution acts at intraspecific scales in a way different from the action of evolution at the interspecific scale. Here we address these fundamental questions on the basis of a comprehensive comparative analysis of phylogenetic trees representing different fractions and domains of the Tree of Life, from interspecific to intraspecific scales. We draw from previous analyses of the geometry of the Tree of Life [4], the characterization of other branching systems [5], [6], and using tools derived from modern network theory [7]–[10] to examine the scaling of the branching in the Tree of Life [11], [12]. Our analysis is based on a thorough data set of more than 5000 interspecific phylogenies and a sample of 67 intraspecific phylogenies (see Text S1), thereby testing the universality of the results derived across scales.

A phylogenetic tree is a set of nodes, each node representing a diversification event, connected by branches (links). For each node *i*, a subtree *S _{i}* is made up of a root at node

*i*and all the descendant nodes stemming from this root. The subtree size

*A*gives the number of subtaxa that diversify from node

_{i}*i*(including itself). Beyond this measure of the diversity degree, the characterization of how the diversity is arranged through the phylogenies can be achieved through the cumulative branch size,

*C*, a measure of the subtree shape. It is defined [13] as the sum of the branch sizes associated to all the nodes in the subtree

_{i}*S*,

_{i}*C*Σ

_{i}=*A*. For the same tree size, and restricting to binary branching events, the smallest value of the cumulative branch size is obtained for a completely symmetric, balanced tree, whereas the most asymmetric, the pectinate or comb-like tree in which all branches split successively from a single one, yields the largest

_{j}*C*value [13]. To be clearer, we show in Figure 1 the analysis of

_{i}*A*and

_{i}*C*for a completely balanced tree (Figure 1A) and for a completely imbalanced tree (Figure 1B). A portion of a real phylogenetic tree is also shown (Figure 1C). How the shape of the tree (i.e., the distribution of the biological diversification) does change with tree size (i.e., with the number of taxa it contains) is given by the scaling of the subtree shape

_{i}*C*vs. the subtree size

*A*, as described by the allometric scaling relation

*C∼A*. We quantitatively characterize the shape of each tree in our data set by calculating the functions

^{η}*F*(

*A*) and

*F*(

*C*), which are the complementary cumulative distribution functions (CCDF) of

*A*and

_{i}*C*values in the tree, respectively, and the value of the allometric scaling exponent,

_{i}*η*. We compare the results derived from the analyses of inter- and intra-specific phylogenetic trees among them, to test for the preservation of branching patterns across evolutionary scales, and against those derived from the analyses of randomly-generated trees to test whether the allometric scaling derived can be modeled using simple, random branching rules.

## Results

The branch-size CCDF displays power-law tails of the form for large branch size *A* (Figure 2A). The power-law exponents *τ _{A}* are remarkably similar for the data sets analyzed:

*τ*= 1.76±0.03, and 1.74±0.02 for intra- and interspecific phylogenies, respectively. Similarly, the cumulative-branch-size CCDF also displays a power-law tail of the form at large

_{A}*C*, with a similar agreement between the exponents of the intra- and interspecific data sets:

*τ*= 1.53±0.02 and 1.53±0.02, respectively (Figure 2B). The discrepancy observed between the two data sets at the tail of the distributions can be explained by the different sizes of the typical trees on them: each tree contributes a natural cutoff to the overall distribution, and since the intraspecific trees are smaller in average, their cutoff appears at smaller tree sizes.

_{C}The allometric exponent, *η*, that characterizes the scaling of tree shape with tree size (Figure 3A), is also remarkably similar for the intraspecific (*η* = 1.43±0.01) and the interspecific (*η* = 1.44±0.01) phylogenies. This constancy of the exponents is still more remarkable when realizing (inset of Figure 3A) that it does not only apply to average properties of sets of intraspecific and interspecific trees, but also to individual phylogenies of groups of organisms pertaining to different kingdoms and living across widely contrasting environments, as it is reflected by the very narrow range of *η* obtained from different phylogenies (〈*η*〉 = 1.47, *σ* = 0.03, Figure 3A). The scaling exponents for our large interspecific data set are also matched almost perfectly (Figure S1) by those derived from a set of 67 interspecific phylogenies randomly drawn from the published literature thereby validating the uniformity of the scaling rules of the broad interspecific phylogenies and the smaller set of intraspecific ones used here. The later was also derived from a similar random sample taken from the published literature (see Text S1).

The allometric scaling of *C∼A ^{1.44}* derived from our analysis falls somehow in between those obtained by simulated phylogenies derived from two extreme topologies: The symmetric tree gives

*C∼A*ln

*A*, which corresponds to

*η*= 1 with a logarithmic correction, while the pectinate tree has

*η*= 2. The natural null model for tree construction, the Equal-Rates Markov (ERM) model [14], [15], yields a scaling

*C∼A*ln

*A*similar to the symmetric tree with

*η*= 1 but different from the scaling displayed by empirical inter- and intraspecific phylogenies, particularly for large ones (Figure 3B). Therefore some topological aspects of phylogenetic trees are not adequately reproduced by the ERM model. Our results imply that successful lineages diversify more profusely than expected under random branching, generating the large imbalances that characterize emerging depictions of the Tree of Life [4]. Alternative models introducing correlations, such as the proportional-to-distinguishable-arrangements (PDA) model [4], [16] or the beta splitting model [17], could generate more realistic phylogenies. Guided by previous biological allometric scaling analysis, we have assumed a power-law scaling of the form

*C*∼

*A*. However, other ansatz could also fit the data. The important point, however, is that these modeling approaches should give similar scaling properties for intra- as for interspecific branching.

^{η}## Discussion

Traditionally, microevolutionary and macroevolutionary processes have been studied independently by population geneticists and evolutionary biologists, respectively [18]. The divide between these two levels of generation of biological diversity is an old one, rooted in the controversy between Darwinian gradualism and the saltationism proposed by others, prominently paleontologists, to explain macroevolutionary processes [19]. The debate as to whether macroevolution is more than the accumulation of microevolutionary events remains active [18], [20], [21], although refined paleontological evidence supports the continuum between micro- and macroevolution for some lineages [22]. The results presented here show that the branching and scaling patterns in intraspecific and interspecific phylogenies do not differ significantly for the topological properties we have calculated. Thus, shall saltation processes be a factor at the macroevolutive level, this is not reflected in the topology of phylogenetic branching as examined here. Evidence for possible differences in phylogenetic topologies between the inter- and intraspecific levels may require a detailed analysis of branching times, which we have not attempted.

Processes leading to scaling laws in size distributions in natural systems have been formulated as growth models [23], [24]. Many of the findings carry over to scaling properties found in networks [25] and their description in terms of branching processes [26]. But most of these models predict branching topologies similar to the ERM model. An alternative approach to understand the observed exponent would be to trace analogies with scaling laws in different branching systems [5], [6], [27] which have been explained by invoking a natural optimization criterion based in the fact that the observed trees contain the largest possible number of apices within the smallest number of branching levels. For binary trees of size *A*, where nodes are restricted to occupy uniformly a *D* dimensional Euclidean space, the minimum value of *C* scales as *A ^{η}*, with

*η*=

*(D+1)/D*. This scaling also describes the

*D*-dimensional tree with the maximum size for a given depth (the average distance between root and leaves). The value of

*η*obtained in our phylogeny analysis,

*η*≅1.44, is achieved only for optimal trees restricted to spaces of

*D*≅2.27 dimensions. Given the apparently unlimited number of variables that may yield differences among taxa, restricting their representation to a space with such a small number of dimensions seems unreasonable. This interpretation suggests that the evolutionary process yielding the observed phylogenies is not the most parsimonious one, which could potentially yield a similar biodiversity with fewer branching levels. In fact, the natural choice

*D*= ∞ gives an optimal exponent

*η*= 1, which correspond to the ERM value and departs from observed scaling. Optimal traffic networks [28] also led to the exponent

*τ*= 2 which departs from the empirical scaling exponent reported here for phylogenetic trees.

_{A}In summary, the remarkably similar allometric exponents reported here to characterize universally the scaling properties of intra- and inter-specific phylogenies across kingdoms, reproductive systems and environments, strongly suggests the conservation of branching rules, and hence of the evolutionary processes that drive biological diversification, across the entire history of life. Although at short branch sizes the topology of observed phylogenies cannot differ much from that expected under random and symmetric trees, due to the restriction of binary bifurcations in phylogenetic tree reconstruction, significant departures become universally evident as trees become larger, where the null ERM model and real phylogenies differ (Figure 2B). These deviations suggest (a) that the evolution of life leads to less biodiversity than an optimal tree can possibly generate; and (b) the operation of a mechanism generating a correlated branching, where some memory of past evolutionary events is maintained along each branch. This correlated branching pattern implies that entities that diversify faster than average lead to new biological forms that diversify more than average themselves. Invariance across the broad scales considered here indicates that relatively simple rules govern the phylogenetic branching and the unfolding of biodiversity. Their deviation from random models indicates that evolutionary success is a correlated trait within lineages, yielding present asymmetries in the structure of the Tree of Life.

## Materials and Methods

### Phylogenies databases

On June 30th 2007 we downloaded the 5,212 phylogenetic trees available at that time in the database TreeBASE (http://www.treebase.org). TreeBASE constitutes a large database of interspecific phylogenies, which were collected from previously published research papers. The size of trees oscillates from 10 to 600 tips. Most of the bifurcations in these trees are binary, as confirmed by the fact that the ratio between the number of tips and the total number of nodes gives 0.52 when averaged over all the trees (for perfect binary trees, the ratio is 0.50).

As a comprehensive database comparable to TreeBASE does not exist for intraspecific phylogenies, we constructed an intraspecific data set by manually compiling 67 intraspecific phylogenies from several published phylogenetic analysis [S1–S45]. We compiled this data set in such a way that it contains: 1) Organisms from the main different environments (terrestrial, marine and fresh water), climatic regions (from polar to desert), and branches of life (Table S1). 2) Phylogenetic trees reconstructed with the main phylogenetic tree estimation methods, i.e., neighbor-joining, maximum parsimony and maximum likelihood methods.

In order to test whether the results derived from the examination of the relatively small (67 phylogenies) intraspecific data base can be compared with the much larger (5212) set of interspecific phylogenies extracted from TreeBASE, we sampled the literature to construct a dataset of 67 interspecific phylogenies drawn from the literature [S46–S85] using the same criteria as those to derived the intraspecific phylogeny data base (Table S1), obtaining full agreement (Figure S1). The intra- and interspecific phylogenies derived from the literature ranged between 30 and 170 tips, and they contained mainly binary branching events. An example for each kind of phylogenies is shown in Figures S2A and S3A.

### Branch size and cumulative branch size distributions

We associate to each node *i* of a phylogenetic tree two quantities, the size *A _{i}* (number of nodes) of the subtree

*S*made up of node

_{i}*i*and all the descendant nodes below it, that is, the subtree which does not contain the global root of the original tree, and the cumulative branch size,

*C*, defined as the sum of the branch sizes associated to all the nodes in the subtree

_{i}*S*,

_{i}*C*= Σ

_{i}*A*. To characterize the probability distributions of the

_{j}*A*and

_{i}*C*values on a particular phylogenetic tree we compute the respective complementary cumulative distribution functions (CCDF):

_{i}*F*(

*A*)

*=*probability(

*A*), and

_{i}>A*F*(

*C*)

*=*probability(

*C*). We observe that these quantities scale, for large values of

_{i}>C*A*and

*C*, as power laws: and . The exponents

*τ*and

_{A}*τ*, thus, characterize the probabilities of {

_{C}*A*} and {

_{i}*C*}: and, respectively.

_{i}### Allometric scaling relationship

We observe that a functional relationship among the values of *C* and *A*, i.e. among shape and size, exists and also follows a power law, *C∼A ^{η}*, characterized by an exponent

*η*. Since this relationship encodes the variation of a system property as size is varied, we can call this an

*allometric scaling relationship*, to stress its connections with other functional relationships relating function and size [11], [13], [27]. We note that introduction of the change of variables

*C∼A*into leads to, from which

^{η}*η*= (1−

*τ*)/(1−

_{A}*τ*). Thus, only two out of the three exponents are independent. As simple examples for which the above exponents can be computed by direct counting, we mention the pectinate or fully unbalanced tree, i.e. a tree in which all branching occurs successively along a single branch, characterized by the exponents

_{C.}*τ*= 0,

_{A}*τ*= 1/2,

_{C}*η*= 2, or the fully symmetric or Cayley tree, characterized by

*τ*= 2, and

_{A}*C∼A*ln

*A*, which except for the weak logarithmic correction corresponds to

*η*= 1 and

*τ*= 2. Figures S2B and S3B show, in contrast, the allometric scaling relationship for the particular examples of intra- and inter-specific phylogenies displayed in Figures S2A and S3A.

_{C}In order to investigate whether observations differ from random expectations, we have compared the allometric scaling found here with the prediction of a null model [29], the Equal-rates Markov (ERM) model. The ERM model was attributed to Harding [30], and to Cavalli-Sforza and Edwards [31], although it is based on models of the diversification process that date back at least to Yule [23]. The main assumption of the ERM model is that the phylogeny is the product of random branching. This is the result when the “effective speciation rate” (the difference between extinction and speciation rate) is equal for all species. The effective speciation rate may change chronologically, provided that it is the same for all lineages at a given time [23]. For this model we obtain *C∼A* ln *A*, or *η* = 1, and also *τ _{A}* =

*τ*= 2. The random asymmetries introduced by the ERM are not strong enough to change the scaling behavior from the symmetric tree result.

_{C}The quantity *C _{i}/A_{i}* can be thought as a measure of the average

*depth*or

*distance*of the phylogenetic tree leaves to the node

*i*. This can be seen taking into account that C

*= Σ(*

_{i}*d*+1), where

_{ij}*d*corresponds to the distance of each of the nodes

_{ij}*j*of the subtree

*S*to the root

_{i}*i*. Thus, the relationship between

*C*and

*A*can be written as

*C*=

_{i}*A*+〈

_{i}*d*〉

*, where 〈*

_{i}A_{i}*d*〉

*is the average depth of the nodes in the subtree*

_{i}*S*. The relationship between

_{i}*C*and the depth is obtained: C

_{i}/A_{i}*/A*

_{i}*= 〈*

_{i}*d*〉

*+1. This quantity is closely related to the Sackin's index defined as the distance of the leaves to the root:*

_{i}*S*= Σ

_{l}_{∈leaves}

*d*

_{l}_{,root}[32], [33]. It can be shown that for binary trees

*C = 2S+1*, where

*C*= Σ

_{∀i}

*d*

_{i}_{,root}. Since the scaling law relating the increase of the depth or Sackin's index with three size is known to be the same as the scaling of the Colless' index, measuring the symmetry or balance of a phylogenetic tree [34], our results for

*η*can be put in the context of the numerous studies available on the unbalance of phylogenetic trees [4], [17], [35]. Thus, connections between several methodologies previously used to analyze the topology of trees, such as size distributions [10], [23], unbalance and depth [4], [8], [32]–[35], and transport efficiency [7], [13], [27], [28], are revealed within the framework presented here.

## Supporting Information

#### Text S1

Scaling of branch size and cumulative branch size: TreeBASE vs. manually selected data sets. We provide the list of references corresponding to the selected intraspecific and interspecific phylogenetic trees; the statistics of all data sets with two specific examples; and a summary table of taxa in the data sets.

(0.06 MB DOC)

^{(63K, doc)}

#### Table S1

Break-down of the number of analyzed inter- and intra-species trees with respect to taxa.

(0.03 MB DOC)

^{(29K, doc)}

#### Figure S1

Cumulative complementary distribution functions (CCDFs) for branch size (*F*(*A*), panel A) and cumulative branch size (*F*(*C*), panel B), and the allometric scaling relation (*C* {similar, tilde operator } *A*^{η}, panel B) averaged and logarithmically binned over all phylogenetic trees. Empty squares are for the interspecific TreeBASE data set, solid circles are for the manually compiled intraspecific data set, and triangles are for the new manually compiled interspecific data set of reduced size. Solid lines are power laws fitted to the TreeBASE behavior, as in Figs. 2 and and33 of the main text.

(1.22 MB TIF)

^{(1.1M, tif)}

#### Figure S2

A: An example of an intraspecific phylogenetic tree: different strains of the bacteria Vibrio vulnificus [S19]. Most of the branchings are binary, but there are some 3rd order branchings. B: The allometric scaling plot showing the relationship of cumulative branch size (*C*) to branch size (*A*) from each node of that tree. The solid line corresponds to the fitting *C* {similar, tilde operator } *A*^{1.43} to this intraspecific dataset.

(2.66 MB TIF)

^{(2.5M, tif)}

#### Figure S3

A: An example of an interspecific phylogenetic tree: the catfish species (order Siluriformes) [S80]. Most of the branchings are binary, but there are some 3rd order branchings. B: The allometric scaling plot showing the relationship of cumulative branch size (*C*) to branch size (*A*) from each node of that tree. The solid line corresponds to the fitting *C* {similar, tilde operator } *A*^{1.44} to this intraspecific dataset.

(2.58 MB TIF)

^{(2.4M, tif)}

## Footnotes

**Competing Interests: **The authors have declared that no competing interests exist.

**Funding: **We acknowledge financial support from MEC (Spain) and FEDER, project FISICOS, from CSIC (Spain) project PIE 200750I016, from SBF (Switzerland) through project C05.0148 (Physics of Risk), and from the European Commission through the NEST-Complexity project EDEN. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

## References

**Public Library of Science**

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (269K) |
- Citation

- The need for the incorporation of phylogeny in the measurement of biological diversity, with special reference to ecosystem functioning research.[Bioessays. 2009]
*King I.**Bioessays. 2009 Jan; 31(1):107-16.* - Neutral biodiversity theory can explain the imbalance of phylogenetic trees but not the tempo of their diversification.[Evolution. 2011]
*Davies TJ, Allen AP, Borda-de-Água L, Regetz J, Melián CJ.**Evolution. 2011 Jul; 65(7):1841-50. Epub 2011 Mar 19.* - Universal artifacts affect the branching of phylogenetic trees, not universal scaling laws.[PLoS One. 2009]
*Altaba CR.**PLoS One. 2009; 4(2):e4611. Epub 2009 Feb 26.* - Scaling properties of protein family phylogenies.[BMC Evol Biol. 2011]
*Herrada A, Eguíluz VM, Hernández-García E, Duarte CM.**BMC Evol Biol. 2011 Jun 6; 11:155. Epub 2011 Jun 6.* - The role of evolutionary processes in producing biodiversity patterns, and the interrelationships between taxonomic, functional and phylogenetic biodiversity.[Am J Bot. 2011]
*Swenson NG.**Am J Bot. 2011 Mar; 98(3):472-80. Epub 2011 Feb 17.*

- Hierarchicality of Trade Flow Networks Reveals Complexity of Products[PLoS ONE. ]
*Shi P, Zhang J, Yang B, Luo J.**PLoS ONE. 9(6)e98247* - Phylogenetic Diversity Theory Sheds Light on the Structure of Microbial Communities[PLoS Computational Biology. 2012]
*O'Dwyer JP, Kembel SW, Green JL.**PLoS Computational Biology. 2012 Dec; 8(12)e1002832* - Phylogenetic Properties of RNA Viruses[PLoS ONE. ]
*Pompei S, Loreto V, Tria F.**PLoS ONE. 7(9)e44849* - Scaling properties of protein family phylogenies[BMC Evolutionary Biology. ]
*Herrada A, Eguíluz VM, Hernández-García E, Duarte CM.**BMC Evolutionary Biology. 11155* - From the scala naturae to the symbiogenetic and dynamic tree of life[Biology Direct. ]
*Kutschera U.**Biology Direct. 633*

- PubMedPubMedPubMed citations for these articles
- TaxonomyTaxonomyTaxonomy records associated with the current articles through taxonomic information on related molecular database records (Nucleotide, Protein, Gene, SNP, Structure).
- Taxonomy TreeTaxonomy Tree

- Universal Scaling in the Branching of the Tree of LifeUniversal Scaling in the Branching of the Tree of LifePLoS ONE. 2008; 3(7)

Your browsing activity is empty.

Activity recording is turned off.

See more...