• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of bioinfoLink to Publisher's site
Bioinformatics. Nov 15, 2008; 24(22): 2579–2585.
Published online Sep 26, 2008. doi:  10.1093/bioinformatics/btn503
PMCID: PMC2579716

Phylogenetic distances are encoded in networks of interacting pathways

Abstract

Motivation: Although metabolic reactions are unquestionably shaped by evolutionary processes, the degree to which the overall structure and complexity of their interconnections are linked to the phylogeny of species has not been evaluated in depth. Here, we apply an original metabolome representation, termed Network of Interacting Pathways or NIP, with a combination of graph theoretical and machine learning strategies, to address this question. NIPs compress the information of the metabolic network exhibited by a species into much smaller networks of overlapping metabolic pathways, where nodes are pathways and links are the metabolites they exchange.

Results: Our analysis shows that a small set of descriptors of the structure and complexity of the NIPs combined into regression models reproduce very accurately reference phylogenetic distances derived from 16S rRNA sequences (10-fold cross-validation correlation coefficient higher than 0.9). Our method also showed better scores than previous work on metabolism-based phylogenetic reconstructions, as assessed by branch distances score, topological similarity and second cousins score. Thus, our metabolome representation as network of overlapping metabolic pathways captures sufficient information about the underlying evolutionary events leading to the formation of metabolic networks and species phylogeny. It is important to note that precise knowledge of all of the reactions in these pathways is not required for these reconstructions. These observations underscore the potential for the use of abstract, modular representations of metabolic reactions as tools in studying the evolution of species.

Contact: rf.ruetsap@eiruzam.neilerua

Supplementary information: Supplementary data are available at Bioinformatics online.

1 INTRODUCTION

Phylogenetic relationships between species are traditionally inferred from genomic data, based on observed mutations in the sequence of orthologous genes found in all studied species—a typical example being the SSU rRNA (16S rDNA) gene sequence (Olsen et al., 1994). Results obtained are potentially biased, however, by the highly variable rates of evolution observed across species (Huynen and Bork, 1998). Moreover, identification of orthologs and paralogs in the genomes is complicated by gene duplication and loss, horizontal gene transfer and functional replacement events, resulting in misannotations.

Recently, higher level functional components have been considered as replacements for or complements of gene-based phylogenies. The annotations of the metabolic reactions are the most promising source of information due to the abstraction of the cellular functions they provide and their availability in numerous species (Kanehisa et al., 2006). One approach to exploit this information is to calculate a distance between species based on the enzymatic genes found in their genome, or on the network of reactions they define by exchanging metabolites, or both; links between these two aspects have been demonstrated (Liu et al., 2007). Examples include phylogenies inferred from the presence or absence of enzymes in the genomes, either alone (Ma and Zeng, 2004) or in combination with the metabolic network structure (Forst et al., 2006; Oh et al., 2006; Zhang et al., 2006), from the similarity of enzyme sequences or functional annotation in combination with the comparison of their direct neighbors in the reactions network (Clemente et al., 2007; Forst and Schulten, 2001; Heymans and Singh, 2003), from the presence or absence of pathways across species (Liao et al., 2002) and from the completeness of pathways across species (Hong et al., 2004).

In these studies, metabolic reactions are represented as directed or undirected graphs. Nodes either represent metabolites that are linked by the enzymes that process them, or enzymes linked by metabolites they exchange. However, in addition to the large amount of information required across all species for meaningful comparisons, these representations are potential sources of bias, whose impact has not been evaluated in phylogeny reconstruction. This presents several issues. First, incorporation of the so-called ubiquitous metabolites, e.g. water, connects functionally distant metabolites without real mechanistic biological meaning, producing an unrealistically small degree of separation of nodes (Ma and Zeng, 2003). The criteria by which metabolites should be included or excluded in this context are unclear. Second, the structure of these networks is highly sensitive to annotation errors, as, especially in newly sequenced genomes, the presence of orthologous enzymes in species is initially assessed by sequence similarity. In addition to the risk of false positives or negatives, the exact set of reactions in which the putative enzyme is involved may not match those in the reference species from which the annotations are transferred. This is even more critical when the transferred annotation is a generic enzyme name, such as an EC code, which, due to its abstract nature, can be associated to several distinct physical reactions.

Here, we describe a new representation in which metabolic reactions are represented as an undirected, weighted network of interacting pathways (NIPs). Nodes in NIPs are metabolic pathways, i.e. non-exclusive and consensual sets of metabolic reactions as defined by the reference source KEGG (Kanehisa et al., 2006). Edges link overlapping pathways sharing at least one metabolite, i.e. at least one enzyme in each of the two pathways uses this metabolite as a substrate or product. This representation is designed to be less sensitive to common biases from annotation errors and other sources, since false positives and negatives are less likely to occur at the level of a pathway than for an enzyme. Still, NIPs depend on the definition of metabolic pathways proposed by reference databases. The wide use of KEGG as a reference pathway source shows, however, that these definitions are in practice employed as standard representations of metabolism by the biochemistry community and are unlikely to be greatly modified in the future. Other algorithmic-based representations of metabolic networks based on genome-scale data (e.g. gene expression, topology of the reaction network) have recently been proposed in the literature—see (Aittokallio and Schwikowski, 2006) for a review. The relationship between these novel representations and the phylogeny of species is currently under investigation.

We anticipated that this higher hierarchical level of organization of metabolic networks would reveal patterns of their evolution by being more focused on the notion of modularity, an emergent property of networks that has been studied extensively (Hartwell et al., 1999; Papin et al., 2004; Ravasz et al., 2002; Spirin et al., 2006) but which cannot be easily extracted from the genome sequence alone. This new representation is expected to better capture phylogenetic relationships among species than previous approaches, by focusing less on the components (enzymes and metabolites) of metabolic pathways and more on how they interact in a modular manner.

2 METHODS

The general approach used to measure the correlation between phylogenetic distances and structure of metabolic networks is summarized in Figures 1 and and22 and below.

Fig. 1.
Extraction of NIPs from metabolic networks. A list of all metabolites processed is compiled for each pathway known to exist in a given species; example is given here of fictive metabolites A, B and C processed in three metabolic pathways. Pathways that ...
Fig. 2.
General approach. (A) For any given species, metabolic networks are extracted and descriptors of their structure and complexity are calculated. Network-based distances between all pairs of species are derived from the NIP and NIM descriptors. In this ...

2.1 Extraction of metabolic networks

Metabolic reactions were retrieved from two public sources, the December 2006 release of the KEGG database (Kanehisa et al., 2006), and the November 2006 release of the Ma dataset (Ma and Zeng, 2003). The latter source is a manually curated version of the former for 107 species out of the 289 available from KEGG. We reconstructed two networks, a network of interacting pathways (NIP) and a network of interacting metabolites (NIM), for each of the species. NIPs were built by linking overlapping metabolic pathways sharing at least one metabolite (Fig. 1). For comparison, NIMs were also built by linking metabolites converted in a reaction occurring in at least one metabolic pathway. Edges in these two undirected graphs are weighted, either by the number of metabolites shared (in NIPs) or by the number of pathways in which metabolites are converted (in NIMs), respectively. The weight of a node is the sum of weights of its incident edges. Note that NIPs contain no information about the underlying metabolic reactions or the enzymes that catalyze them, and only keep information about which metabolic pathways are present in the species and how they overlap.

To account for the potential bias represented by ubiquitous metabolites, two variations of the NIPs and NIMs datasets were considered. The first, termed ‘filtered’, excludes all metabolites considered ubiquitous by the authors or the respective source. The second, termed ‘unfiltered’, only excludes water. To compare results obtained with the KEGG and Ma sources, the same 107 species were considered in both. A description of the metabolic pathways used for the construction of the NIPs is provided in Supplementary Table 1.

2.2 Reference phylogenetic distances

The phylogenetic distance matrix used as a reference was derived from a multiple alignment of the gene sequences for the small subunit of the ribosomal RNA of each of the 107 species by employing a DNA sequence evolution model. The sequences were retrieved from the European ribosomal RNA database (Wuyts et al., 2004) and the GenBank database (Benson et al., 2006), and aligned using CLUSTALW (Chenna et al., 2003). The DNA evolution model used, GTR+I+G, was the one best fitting the alignment data, as determined by MODELTEST (Posada and Crandall, 1998) using hierarchical likelihood ratio tests involving 56 different models available in PAUP* (Swofford, 2003). We excluded 9 of the 107 species due to uncertain identifier matching in the database. The 98 remaining species were grouped into 80 taxa to include strains of the same species, resulting in 12 Archaea, 60 Bacteria and 8 Eukarya representing 15%, 75% and 10% of the total, respectively. The list of these 80 taxa with their main taxonomic ranks (domain, kingdom and class) and the KEGG identifiers of the associated species and strains are presented in Supplementary Table 2. Phylogenetic trees were inferred from the resulting distance matrices using the neighbor-joining algorithm implemented by the NEIGHBOR program of the PHYLIP toolbox (Felsenstein, 1989).

2.3 Description of metabolic networks

Networks can be characterized both qualitatively and quantitatively using graph theory (Harary, 1969) and information theory (Weaver and Shannon, 1949), by applying a variety of topological, compositional and information-theoretic descriptors (Bonchev and Buck, 2005); i.e. quantities that are uniquely associated with specific aspects of network structure and complexity. Four categories of descriptors—degree, centrality, distance and cliques-related—were considered, with a total of 35 unique descriptors, some of them devised specifically for this study. Weighted and unweighted flavors of descriptors were considered for those descriptors related to node and edge count, and three different versions of their information content were used for those descriptors related to values distributions. An expanded set of 69 descriptors (35 unique plus 34 derivatives) was thus constructed (Supplementary Table 3 and associated references). Compositional descriptors (i.e. list of nodes) reporting only parts of metabolic networks (largest cliques, nodes at center) were selected to be highly sensitive to the whole network structure, thus lowering the risk of collision (similar values even when the network is significantly different). The values of these descriptors were calculated for each NIP and NIM using the NETWORKX library.1

Table 3.
Descriptors best predicting phylogenetic distance (abridged)

2.4 Network-based distances between taxa

Based on the above expanded set of 69 network descriptors, we computed a pairwise distance vector between each pair of the 80 taxa (Fig. 2A). The distance between the values of each descriptor was calculated according to its type. For numeric descriptors, this distance was the absolute value of the difference. When the descriptor was a vector of numeric values (e.g. node degree distribution) we used three different distance functions; the sum of the absolute values of the difference between each element, the Manhattan and the Euclidean distance. When the descriptor was a set (e.g. a list of network nodes), we used the Jaccard distance—the ratio between the cardinality of the intersection and the cardinality of the union of the two sets. When taxa were represented by several strains or individuals, the distance between each of their descriptor values was taken as the mean of the pairwise distances calculated between the strains. The use of several distance calculations for some descriptors (see Supplementary Table 3) resulted in a distance vector of 79 distance values for each pair of taxa. A dataset was constructed as described for the Archaea, Bacteria, Eukarya and for the 80 taxa together.

2.5 Correlation estimation

The correlation between network-based distances and reference phylogenetic distances of taxa was assessed by training regression models to predict the latter from any combination of the former (Fig. 2B). Training sets were constructed to report, for each pair of taxa and for each metabolic network dataset, the two types of distances. These training sets are available as Supplementary Table 4. Supervised learning algorithms implemented in the WEKA toolbox (Witten and Frank, 1999) and Supplementary Table 5) were applied on the training sets to reproduce, i.e. predict, the phylogenetic distance from any combination of network distances. A Pearson's coefficient of the 10-fold cross-validation and that of the whole training set (referred to as q2 and R2, respectively) was calculated by comparing known and predicted phylogenetic distances. For a given training set, the correlation between network-based and phylogenetic distances was then taken as the highest q2 obtained among all regression models. A high score would mean that phylogenetic distances are fully encoded in, i.e. they can be calculated from, the structure and organization of metabolic networks. To detect any overfitting, 10 randomized versions of each training set were also evaluated, in which reference phylogenetic distances were shuffled using the Fisher–Yates algorithm (Fisher and Yates, 1938).

Finally, we identified the smallest subset of network descriptors that still performs as well as the complete set. This was done using feature selection algorithms (Guyon and Elisseeff, 2003; Hall and Holmes, 2003) and a heuristic evaluation of subsets of descriptors on the regression models identified earlier as the best ones. A tool, METACLASSIFY, was developed to automate the training of the regression models and to retrieve the results.2

3 RESULTS AND DISCUSSION

3.1 Networks of interacting pathways

NIPs were constructed to represent the metabolism of species, as outlined in the Section 2 and Figure 1, from two metabolic reaction datasets: KEGG and Ma for the same 107 species, with ubiquitous metabolites either removed (filtered dataset) or kept (unfiltered).

A NIP contains 37% to 97% of all known metabolic pathways of the 107 species; an example is shown in Figure 3. Use of NIPs instead of the entire network of metabolic reactions (NIMs) represents an 8- to 11-fold compression of the network size, from an average of 507 metabolites down to an average of 63 pathways. NIPs are also more compact, with an average node degree of 9.0 ± 3.6 to 48.0 ± 15.6 (filtered and unfiltered version, respectively) to compare with values of 2.4 ± 0.1 to 5.1 ± 0.3 for NIMs. As shown below, this substantial compression nevertheless conserves all information needed to accurately reconstruct phylogeny of species.

Fig. 3.
Example of NIP. NIPs extracted from the filtered KEGG metabolic dataset for Saccharomyces cerevisiae. Node shade is proportional to the number of metabolic pathways overlapping with the represented one. Edge shade is proportional to the number of metabolites ...

3.2 Prediction of the phylogenetic distance

We assessed the correlation between metabolic network-based distances and phylogenetic distances by training regression models, for all pairs of species considered (Fig. 2 and Section 2). These models were trained to predict phylogenetic distance from any combination of network-based distances. The correlation coefficients between predicted and reference phylogenetic distances calculated from the 16S rRNA sequences, evaluated using 10-fold cross-validation (q2) and on the whole training set (R2) are given in Table 1. Their analysis led to the following observations.

Table 1.
Accuracy of the inferred phylogenetic distances

First, the accuracy of the phylogenetic distance prediction from our set of 79 descriptors of metabolic network structure and complexity is high, for both NIPs and NIMs (q2 of 0.92 ± 0.02 and 0.93 ± 0.04, respectively). This observation demonstrates the utility of metabolic network organization for phylogeny reconstruction, and compares very favorably with similar work (see below). The average relative error in phylogenetic distance prediction is highest for small distances (~18% for distances below 0.2), and decrease exponentially for larger distances (from ~4% to ~0.75% for distances above 0.2; data not shown).

Second, both types of metabolic network representations perform equally well, although NIPs are better than NIMs at reconstructing phylogeny of Eukarya (q2 of 0.79 ± 0.12 and 0.70 ± 0.3, respectively). We show here that the amount of information required to build NIMs (i.e. the full set of metabolic reactions) is not necessary to perform good reconstructions, and can advantageously be replaced by NIPs (i.e. knowledge of which pathways are present and which metabolites they exchange). This observation is particularly important in the context of missing or erroneous genome annotations, which are a particular problem with newly sequenced genomes.

Third, unfiltered datasets perform better than filtered datasets. The additional structural information provided by ubiquitous metabolites slightly improves reconstructions of phylogenies. This effect is observed with equal strength in NIMs (q2 of 0.94 ± 0.04 and 0.92 ± 0.04 for unfiltered and filtered datasets, respectively) and in NIPs (0.93 ± 0.01 and 0.91 ± 0.03, respectively).

When considering the species domains of Archaea, Bacteria and Eukarya independently, the performances are still good—an average q2 of 0.61 ± 0.15, 0.82 ± 0.03 and 0.74 ± 0.21, respectively. However, high differences between the q2 and R2 in the Archaea and Eukarya indicate some overfitting that may be due to the small size of these domains (15% and 10% of the datasets, respectively).

No such overfitting could be detected when reconstructing phylogeny of all species, as shown by the small difference between q2 and R2 and by the low scores obtained with randomized training sets in which known 16S phylogenetic distances were shuffled (see Section 2). The highest q2 achieved by regression models in these randomized sets was 0.07 and 0.08 for NIPs and NIMs, respectively. These results demonstrate that our approach is robust against overfitting: regression models do not report artifactual relationship between metabolic network structure and the phylogeny of species after being trained on deliberately incorrect datasets where this relationship was effectively destroyed.

3.3 Prediction of the phylogenetic tree

The performance of the phylogeny reconstruction from metabolic network descriptors was also evaluated by comparing the trees inferred from the predicted phylogenetic distances with the reference 16S tree. An example of tree obtained is shown in Figure 4, with discrepancies highlighted.

Fig. 4.
Example of predicted tree. Example of phylogenetic tree predicted from descriptors of NIPs from the unfiltered version of the Ma dataset, for all 80 taxa considered in this study (right). For comparison, the tree resulting from the 16S sequences is also ...

By using the same reference tree, subset of taxa and scores, we directly compared the performance of our approach which those of Heymans and Singh (2003), Forst et al. (2006), Zhang et al. (2006) and Clemente et al. (2007), where phylogeny reconstruction from metabolic data was also considered (Table 2). These studies were shown to outperform previous similar approaches from Forst and Schulten (2001) and Liao et al. (2002). For the same sets of 16 and 8 taxa used in Heymans and Singh (2003) and Clemente et al. (2007) respectively, our approach achieved better second cousins scores of 0.3 to 0.737 and 0.625 to 1, to compare with the scores of 0.27 and 0.571, respectively reported. For the same set of 27 taxa used in Forst et al. (2006), our approach achieved better branch distance scores of 0.005 to 0.021 (except for three out of our eight metabolic datasets), to compare with the score of 0.023 reported. Finally, for the same set of 47 taxa used in Zhang et al. (2006), our approach achieved better Penny and Hendy's topological similarity scores of 0.7 to 0.95, to compare with the score of 0.386 reported.

Table 2.
Accuracy of the inferred phylogenetic trees

3.4 Best predictors of the phylogenetic distance

Descriptors of NIP structure and complexity do not contribute equally to phylogeny reconstruction. For the filtered NIP datasets from KEGG and Ma, we were able to significantly reduce their number from 79 to 22 descriptors in both datasets (Supplementary Table 6, abridged in Table 3 into 16 and 14 non-redundant descriptors for the KEGG and Ma datasets, respectively), while performing nearly as well in predicting the phylogenetic distances among the taxa. Our study is the first to identify the precise aspects of metabolic network structure and complexity that best encode the phylogeny of species.

Analysis of these lists shows an interesting combination of descriptors related to degree distribution, distance distribution, clique composition and clique-size distribution. Importance of degree and distance distribution in describing NIPs supports the hypothesis of a link between the scale-freeness (Barabasi and Albert, 1999) and small-worldness (Watts and Strogatz, 1998) of biological networks and the phylogeny of species. A surprising result of our analysis is the apparent significant role of NIP cliques, i.e. groups of completely interconnected pathways. Large cliques are found in NIPs (up to 20 pathways), while NIMs typically have small cliques (3 to 5 metabolites). Metabolism of species is organized around a core of highly overlapping pathways, the structure and composition of which are important to distinguish these species. In terms of the KEGG nomenclature, this core is dominated by carbohydrate and amino acid metabolic pathways that preferentially exchange either pyruvate or acetyl-CoA (Supplementary Table 7).

Finally, the considerable contribution of weighted-type descriptors emphasize the importance of quantification of pathway cross-talk. Descriptors considering the strength of the connections between pathways are more predictive of the phylogenetic distance than their non-weighted version (where the number of metabolites shared by pathways is ignored). This could explain the advantage of keeping ubiquitous metabolites, which add information about the amount of metabolites pathways exchange.

4 CONCLUSIONS

To address the relationship between metabolic and phylogenetic information, we developed and used an abstract representation of metabolic reactions called Network of Interacting Pathways or NIP, together with an extensive set of descriptors of the structure and complexity of networks. We demonstrated that networks of metabolic reactions, as well as their simplified pathway-based representation, contain enough information to accurately predict phylogenetic distances among species. The full knowledge of all metabolic reactions involved is not required, and can advantageously be replaced by the knowledge of which pathways are present and which pathways overlap. Ubiquitous metabolites, usually ignored, are shown to slightly improve the reconstructions.

The success of our approach reveals that the organization of metabolic networks reflects, i.e. encodes, the phylogeny of the corresponding species. Evolution not only leaves its footprint on gene and protein sequences, but also in the fine wiring of functional modules—here, metabolic pathways. However, as shown by the few discrepancies observed between the reference phylogeny and the phylogeny reconstructed from metabolic networks, not all of the mutations leading to or following speciation lead to modifications in the structure and complexity of metabolic networks.

Using machine learning approaches we have been able, for the first time, to identify the most important features of pathway organization that best encode the phylogeny of species: scale-freeness, small-worldness, high average clustering coefficient and the presence of a core of densely overlapping pathways. Our results suggest that the efficient functioning of the living cells depends very strongly on fine details of the cross-talk among functional modules, which might be considered as an organizational principle of complex networks. While most approaches to identify functional modules in metabolic networks are based on the hypothesis that metabolic reactions are significantly denser within modules than across modules (Guimera and Nunes Amaral, 2005; Holme et al., 2003; Kreimer et al., 2008), our results suggest that connections between modules are very dense themselves, and of subtle complexity.

Compacting up to 11-fold the information contained in metabolic networks, NIPs represent a higher hierarchical level of the metabolic system that appears to encode essential evolutionary information and permits highly accurate quantitative predictions. Among the possible applications of the NIP representation, we are evaluating its use as a standard to assess network modularization approaches, and to explore the major differences in the organization of metabolic networks between major taxonomic groups.

Funding

National Institutes of Health (grants R01AI050196, R01AI055347 and U54AI057168 to G.A.B.,PI); European union (grant LSHG-CT-2006-037469 to B.S.,PI).

Conflict of Interest: none declared.

Supplementary Material

[Supplementary Data]

ACKNOWLEDGEMENTS

We are grateful to Dr C. Turbeville, Dr J. Alves and Dr M. Rivera (VCU, Richmond) for useful discussions concerning phylogeny and phylogenetic tree reconstruction. We also thank other members of the Buck laboratory and the Center for the Study of Biological Complexity for useful support and discussions. Finally, we thank the anonymous referees for their constructive comments, which contributed for the better presentation of our study.

REFERENCES

  • Aittokallio T, Schwikowski B. Graph-based methods for analysing networks in cell biology. Brief. Bioinform. 2006;7:243–255. [PubMed]
  • Barabasi A, Albert R. Emergence of scaling in random networks. Science. 1999;286:509–512. [PubMed]
  • Benson DA, et al. GenBank. Nucleic Acids Res. 2006;34(Database issue):D16–D20. [PMC free article] [PubMed]
  • Bonchev D, Buck G. Quantitative measures of network complexity. In: Bonchev D, Rouvray D, editors. Complexity in Chemistry, Biology, and Ecology. New York: Springer; 2005. pp. 191–235.
  • Chenna R, et al. Multiple sequence alignment with the clustal series of programs. Nucleic Acids Res. 2003;31:3497–3500. [PMC free article] [PubMed]
  • Clemente JC, et al. Phylogenetic reconstruction from non-genomic data. Bioinformatics. 2007;23:e110–e115. [PubMed]
  • Felsenstein J. PHYLIP - phylogeny inference package (version 3.2) Cladistics. 1989;5:164–166.
  • Fisher R, Yates F. Statistical Tables for Biological, Agricultural and Medical Research. 3rd edn. London: Oliver and Boyd; 1938. pp. 26–27.
  • Forst CV, Schulten K. Phylogenetic analysis of metabolic pathways. J. Mol. Evol. 2001;52:471–489. [PubMed]
  • Forst CV, et al. Algebraic comparison of metabolic networks, phylogenetic inference, and metabolic innovation. BMC Bioinformatics. 2006;7:67. [PMC free article] [PubMed]
  • Guimera R, Nunes Amaral LA. Functional cartography of complex metabolic networks. Nature. 2005;433:895–900. [PMC free article] [PubMed]
  • Guyon I, Elisseeff A. An introduction to variable and feature selection. J. Mach. Learn. Res. 2003;3:1157–1182.
  • Hall M, Holmes G. Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans. Knowl. Data Eng. 2003;15:1437–1447.
  • Harary F. Graph Theory. Reading, MA: Addison-Wesley; 1969.
  • Hartwell LH, et al. From molecular to modular cell biology. Nature. 1999;402(Suppl. 6761):C47–C52. [PubMed]
  • Heymans M, Singh AK. Deriving phylogenetic trees from the similarity analysis of metabolic pathways. Bioinformatics. 2003;19(Suppl. 1):i138–i146. [PubMed]
  • Holme P, et al. Subnetwork hierarchies of biochemical pathways. Bioinformatics. 2003;19:532–538. [PubMed]
  • Hong SH, et al. Phylogenetic analysis based on genome-scale metabolic pathway reaction content. Appl. Microbiol. Biotechnol. 2004;65:203–210. [PubMed]
  • Huynen MA, Bork P. Measuring genome evolution. Proc. Natl Acad. Sci. USA. 1998;95:5849–5856. [PMC free article] [PubMed]
  • Kanehisa M, et al. From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res. 2006;34(Database issue):D354–D357. [PMC free article] [PubMed]
  • Kreimer A, et al. The evolution of modularity in bacterial metabolic networks. Proc. Natl Acad. Sci. USA. 2008;105:6976–6981. [PMC free article] [PubMed]
  • Liao L, et al. Proceedings of the 6th International Conference on Knowledge-Based Intelligent Information and Engineering Systems. Crema, Italy: 2002. Genome comparisons based on profiles of metabolic pathways.
  • Liu W, et al. A network perspective on the topological importance of enzymes and their phylogenetic conservation. BMC Bioinformatics. 2007;8:121. [PMC free article] [PubMed]
  • Ma H.-W, Zeng A.-P. Reconstruction of metabolic networks from genome data and analysis of their global structure for various organisms. Bioinformatics. 2003;19:270–277. [PubMed]
  • Ma H.-W, Zeng A.-P. Phylogenetic comparison of metabolic capacities of organisms at genome level. Mol. Phylogenet. Evol. 2004;31:204–213. [PubMed]
  • Oh S, et al. Construction of phylogenetic trees by kernel-based comparative analysis of metabolic networks. BMC Bioinformatics. 2006;7:284. [PMC free article] [PubMed]
  • Olsen GJ, et al. The winds of (evolutionary) change: breathing new life into microbiology. J. Bacteriol. 1994;176:1–6. [PMC free article] [PubMed]
  • Papin JA, et al. Hierarchical thinking in network biology: the unbiased modularization of biochemical networks. Trends Biochem. Sci. 2004;29:641–647. [PubMed]
  • Paradis E, et al. APE: analyses of phylogenetics and evolution in R language. Bioinformatics. 2004;20:289–290. [PubMed]
  • Penny D, Hendy M. The use of tree comparison metrics. Syst. Zool. 1985;34:75–82.
  • Posada D, Crandall KA. MODELTEST: testing the model of DNA substitution. Bioinformatics. 1998;14:817–818. [PubMed]
  • Ravasz E, et al. Hierarchical organization of modularity in metabolic networks. Science. 2002;297:1551–1555. [PubMed]
  • Shasha D, et al. ICDE '04: Proceedings of the 20th International Conference on Data Engineering. Washington, DC, USA: IEEE Computer Society; 2004. Unordered tree mining with applications to phylogeny; p. 708.
  • Spirin V, et al. A metabolic network in the evolutionary context: multiscale structure and modularity. Proc. Natl Acad. Sci. USA. 2006;103:8774–8779. [PMC free article] [PubMed]
  • Swofford D. PAUP*: Phylogenetic Analysis Using Parsimony (and Other Methods), version 4.0b 10. Sunderland, MA: Sinauer Associates Inc.; 2003.
  • Watts DJ, Strogatz SH. Collective dynamics of ‘small-world’ networks. Nature. 1998;393:440–442. [PubMed]
  • Weaver W, Shannon C. The Mathematical Theory of Communication. Urbana, Illinois: University of Illinois Press; 1949.
  • Witten IH, Frank E. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. San Francisco: Morgan Kaufmann; 1999.
  • Wuyts J, et al. The European ribosomal RNA database. Nucleic Acids Res. 2004;32(Database issue):D101–D103. [PMC free article] [PubMed]
  • Zhang Y, et al. Phylophenetic properties of metabolic pathway topologies as revealed by global analysis. BMC Bioinformatics. 2006;7:252. [PMC free article] [PubMed]

Articles from Bioinformatics are provided here courtesy of Oxford University Press
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...