Logo of eukcellPermissionsJournals.ASM.orgJournalEC ArticleJournal InfoAuthorsReviewers
Eukaryot Cell. Feb 2006; 5(2): 272–276.
PMCID: PMC1405893

Retention and Loss of Amino Acid Biosynthetic Pathways Based on Analysis of Whole-Genome Sequences


Plants and fungi can synthesize each of the 20 amino acids by using biosynthetic pathways inherited from their bacterial ancestors. However, the ability to synthesize nine amino acids (Phe, Trp, Ile, Leu, Val, Lys, His, Thr, and Met) was lost in a wide variety of eukaryotes that evolved the ability to feed on other organisms. Since the biosynthetic pathways and their respective enzymes are well characterized, orthologs can be recognized in whole genomes to understand when in evolution pathways were lost. The pattern of pathway loss and retention was analyzed in the complete genomes of three early-diverging protist parasites, the amoeba Dictyostelium, and six animals. The nine pathways were lost independently in animals, Dictyostelium, Leishmania, Plasmodium, and Cryptosporidium. Seven additional pathways appear to have been lost in one or another parasite, demonstrating that they are dispensable in a nutrition-rich environment. Our predictions of pathways retained and pathways lost based on computational analyses of whole genomes are validated by minimal-medium studies with mammals, fish, worms, and Dictyostelium. The apparent selective advantages of retaining biosynthetic capabilities for amino acids available in the diet are considered.

Before the genomic era, minimal-medium studies offered essential information about the metabolic potential of an organism. Comparative genomics can now discover the same information about an organism even when the life cycle or environment is so complex as to preclude the defining of a minimal medium. The amino acid requirements of a variety of organisms were the subject of considerable interest in the last century. The first successful synthetic diet using purified amino acids was reported in 1935 for rats from the laboratory of W. C. Rose (13). Subsequent work showed that rats, mice, or salmon fed on diets lacking any one of nine amino acids (Phe, Trp, Ile, Leu, Val, Lys, His, Thr, or Met) would waste away and die (7, 17). These are known as the essential amino acids. The other 11 amino acids found in proteins could be omitted from the diet with no deleterious effects and so were considered nonessential. Yeasts such as Saccharomyces cerevisiae, as well as plants such as Arabidopsis thaliana, are able to grow in media devoid of amino acids, demonstrating that they can synthesize all of the amino acids from sugars and fats in the media or generated photosynthetically. Clearly, the common progenitor of plants, fungi, and animals carried genes for all of the enzymes in the 20 amino acid biosynthetic pathways, but almost half of them have been lost in consumers. Now that the sequences of a considerable number of eukaryotic genomes have been completed, we can inspect them to determine when in evolution the pertinent genes were lost.

Since the biosynthetic pathways and their respective enzymes are well characterized in mammals and fungi, orthologs can be recognized in whole genomes. When key enzymes in a pathway are missing, it can be concluded that the respective amino acid is not synthesized. We analyzed the genomes of two alveolates, Cryptosporidium hominis and Plasmodium falciparum; one euglenozoid, Leishmania major; and six animals, Homo sapiens, Tetraodon nigroviridis, Ciona intestinalis, Drosophila melanogaster, Anopheles gambiae, and Caenorhabditis elegans. Previously, we used this approach to predict the metabolic capabilities of a free-living soil amoeba, Dictyostelium discoideum, for which a defined medium had been developed (6). We confirmed that the pathways leading to the five amino acids not added to the medium were intact but also found genes for the pathways leading to four other amino acids that were included in the medium (16). We verified that Dictyostelium cells could synthesize these four amino acids by successfully growing cells in media lacking all of them. Omission of any of the 11 remaining amino acids in the new minimal medium precluded growth of the cells, confirming the bioinformatic analyses which showed that genes for the pathways to these 11 amino acids were missing in the genome (16). This approach seems sufficiently robust to apply to other organisms with fully sequenced genomes so as to follow the pattern of gene retention and loss during evolution.



Complete genome sequences for humans (H. sapiens), the fly D. melanogaster, the mosquito A. gambiae, the worm C. elegans, the yeast S. cerevisiae, and the plant A. thaliana were downloaded from the National Center for Biotechnology Information. The genome for the fish T. nigroviridis was downloaded from Genoscope (genoscope.cns.fr), and that for the chordate C. intestinalis was downloaded from the Department of Energy Joint Genome Institute. The genomes for D. discoideum, P. falciparum, L. major, and C. hominis were recovered from their respective websites: dictybase.org, plasmodb.org, sanger.ac.uk/Projects/L_major/, and hominis.mic.vcu.edu.


The amino acid sequences of enzymes in the amino acid biosynthetic pathways of S. cerevisiae and H. sapiens were downloaded from the KEGG website (www.kegg.com; references 11 and 12). Pfam domains in these enzymes were used to collect potential orthologs from other organisms (http://hmmer.wustl.edu/; reference 2). The yeast and human enzymes were also compared to gene products in the other complete genomes by using the BlastP program. Genes with the pertinent Pfam domain(s) and a BLAST score of e-80 or better were considered functional orthologs. Genes below this BLAST threshold that still had the pertinent Pfam domains were checked by mutual best BLAST hit. A BlastP cutoff of e-20 was used for the early-diverging eukaryotes. Genes were considered missing if there were no hits at better than e-1. The smallest of the enzymes used to BLAST was 221 amino acids and so should be easily recognized. In addition to the yeast and human enzymes, bacterial enzymes from KEGG were also used to query the genomes of the early-diverging eukaryotes. No additional putative amino acid biosynthetic genes were found.


After orthologous genes were collected, the biosynthetic pathway to each amino acid in each organism was analyzed. If every enzyme in the pathway had an ortholog in a genome, the pathway was considered functional in that organism. If one or more enzymes in a pathway were missing in a genome, the pathway was considered nonfunctional in that organism. When a biosynthetic pathway appeared to be nonfunctional in an organism that was phylogenetically close to organisms with an intact pathway, we manually inspected the genome.


By searching the genomes of the 10 eukaryotic organisms for protein sequences and Pfam domain structures, we were able to find probable functional orthologs to amino acid metabolic enzymes. These orthologs were organized into the biosynthesis pathways, which were then predicted to be either functional or nonfunctional. Almost always, nonfunctional pathways lacked pertinent genes even when the BLAST threshold was set at e-1. When a biosynthetic gene was present, the BLAST hit score was always better than e-60 (the great majority better than e-80). This resulted in a very clear distinction between present and missing genes. There were a few cases where genes with intermediate BLAST scores were found but were clearly not functional orthologs. For example, yeast isopropylmalate dehydrogenase, which is encoded by a leucine biosynthesis gene, recognized a human gene (IDH3A) with a BLAST score of e-17. This gene, however, encodes isocitrate dehydrogenase (National Center for Biotechnology Information gi,18314368).

Multistep biosynthetic pathways in which an essential gene is missing were almost always found to have lost all of the genes dedicated to that pathway (Table (Table1).1). In only two cases was a single gene missing from the pathway. The Dictyostelium serine pathway lacks a functional phosphoserine phosphatase. As expected, Dictyostelium requires serine for growth (16). The Leishmania methionine pathway lacks cystathionine gamma-synthase. Although it is possible that a gene unrelated to the traditional cystathionine gamma-synthase could have acquired this function and remained unrecognized, such cases are rare. Alternative pathways not including the missing genes would have been considered in Dictyostelium, worms, fish, or mammals when their minimal media were defined. However, all of the amino acids expected to be required were directly observed to be essential.

Loss of pathway-dedicated enzymesa

The parasites Leishmania, Cryptosporidium, and Plasmodium grow within mammalian cells, an environment rich in amino acids. They diverged before the separation of plants and animals and must have inherited genes for the biosynthesis of all amino acids (Fig. (Fig.1).1). However, they subsequently lost the ability to make most amino acids (Tables (Tables22 and and3).3). Cryptosporidium, whose metabolic capabilities are sparse (23), has retained the biosynthetic pathways for only four amino acids: asparagine, glutamine, glycine, and proline. Plasmodium has retained the ability to synthesize these four plus aspartate and glutamate. The euglenozoid Leishmania diverged separately from Cryptosporidium and Plasmodium. Although they are all intracellular parasites, Leishmania has a more expansive set of amino acid biosynthetic pathways: alanine, asparagine, aspartate, cysteine, glutamate, glutamine, glycine, proline, and tyrosine.

FIG. 1.
Evolutionary tree depicting the branching order of the organisms studied (adapted from reference 15). The tree is rooted on seven archaebacterial genomes. Abbreviations: Ch, C. hominis; Pf, P. falciparum; At, A. thaliana; Dd, D. discoideum; Sc, S. cerevisiae ...
Pathways for nonessential amino acidsa
Pathways for the essential amino acidsa

Dictyostelium, a phagocytic amoeba, possesses amino acid biosynthetic capabilities very similar to those of metazoans (Tables (Tables22 and and3).3). In addition to losing all of the genes in pathways for the human-essential amino acids, they have lost genes required for serine and arginine biosynthesis. These bioinformatic predictions have been experimentally verified by testing the ability of Dictyostelium to grow in media lacking these amino acids (16). Since the yeast S. cerevisiae diverged from the line leading to vertebrates after Dictyostelium and has retained the ability to synthesize the 20 amino acids, loss of these pathways in the amoebae must have occurred independently of the subsequent loss in the animal branch (1).

All of the enzymes for the biosynthesis of the 11 amino acids known to be nonessential for rodents can be easily recognized in the human and fish genomes (Table (Table2).2). Ten of the 11 pathways are complete in the genomes of all metazoans, but enzymes for the synthesis of arginine are missing in C. intestinalis, D. melanogaster, A. gambiae, and C. elegans. Arginine has previously been shown to be an essential amino acid for C. elegans (19). The loss of the arginine biosynthetic enzymes appears to have occurred independently in these organisms, after each diverged from the line leading to vertebrates, which can still make arginine. All animals lack the biosynthetic pathways for isoleucine, leucine, valine, phenylalanine, tryptophan, lysine, methionine, threonine, and histidine (Table (Table3).3). For genes dedicated to these pathways, no putative homolog could be found in any animal genome.

Only four pathways are universally conserved in the 12 eukaryotes examined: those leading to asparagine, glutamine, glycine, and proline. The enzymes for glycine and glutamine synthesis, serine hydroxymethyltransferase and glutamine synthase, are highly conserved in all organisms. Although the ability to synthesize asparagine and proline is universally conserved, analysis of the enzymes in the pathways suggests that the means for their synthesis is not. Asparagine synthase (glutamine hydrolyzing) transaminates aspartate from glutamine and is present in animals, Dictyostelium, and Plasmodium. Alternately, aspartate can be amidated with free ammonia by the aspartate ammonia-ligase present in Cryptosporidium and Leishmania. Likewise, proline is synthesized in Dictyostelium and Plasmodium from arginine via ornithine and glutamate semialdehyde, while animals, Leishmania, and Cryptosporidium encode enzymes to convert glutamate to proline via phosphoglutamate and glutamate semialdehyde. The final steps in these pathways to proline are both catalyzed by pyrroline-5-carboxylate reductase.


When an organism becomes a consumer by eating other organisms, all of the amino acids are available in the diet and no longer need to be synthesized. Unless amino acid biosynthetic pathways serve other essential functions besides providing an amino acid, they are unnecessary and dispensable. Genes in the dispensable pathways accumulate deleterious mutations, lose the ability to encode functional enzymes, and are eventually deleted from the genome. Deletion was the fate for almost all of the genes specific to the pathways that were lost (Table (Table1).1). It is important to note that this common set of pathways was lost in at least four independent evolutionary instances (Fig. (Fig.1).1). This process of complete purging of genes in a pathway is not without precedent (9). Our predictions of pathways retained and pathways lost based on computational analyses of whole genomes are validated by the observed minimal requirements of mammals, fish, worms, and Dictyostelium (7, 16, 19). Therefore, our predictions for insects, Ciona, and the parasites are likely to be substantiated when minimal media are defined for these organisms.

Selective conservation of a pathway over time indicates that it is indispensable for the metabolic needs of the organism. An example of this is the arginine synthesis pathway, which is part of the urea cycle used in mammals and embryonic fish to remove excess nitrogen (20, 22). Dictyostelium and invertebrates utilize alternate nitrogen excretion metabolites (4, 16, 21). Therefore, the urea cycle and arginine synthesis are not essential and were lost in these organisms.

Only four pathways (Asn, Gln, Gly, and Pro) are universally conserved. The importance of these pathways is emphasized by their retention in Cryptosporidium. Its tiny genome lacks genes for the tricarboxylic acid cycle and the biosynthetic pathways to sugars and nucleotides (23), yet it has retained these capabilities. The synthesis of glycine also produces 5,10-methylenetetrahydrofolate. This is the major source of one-carbon units used in dTMP and purine ring biosynthesis, making this pathway indispensable. Glutamine synthesis is essential for nitrogen assimilation, detoxification, and general nitrogen metabolism (e.g., transamination), and the same may be true for asparagine. The utility of the pathway to proline is not immediately obvious. However, yeast strains in which the gene encoding pyrroline-5-carboxylate reductase is deleted have been found to grow slowly even in rich media with a plentiful supply of proline (Barbara Dunn, personal communications).

Dictyostelium feeds on bacteria and yeast. They have lost the pathways to all of the amino acids essential for humans. Surprisingly, they have retained all of the biosynthetic pathways found in metazoans, except that for serine. Loss of the ability to synthesize serine appears to result from the fairly recent inactivation of phosphoserine phosphatase, since a pseudogene can be recognized in the genome. Our bioinformatically predicted pathway loss and retention are validated by minimal-medium studies (16).

Ten pathways (Ala, Asp, Asn, Gly, Ser, Cys, Tyr, Pro, Glu, and Gln) were uniformly conserved in the animal lineage. Discerning the selective advantage they might provide is aided in some cases by symptoms of human metabolic disorders. The tyrosine synthesis pathway is also part of the phenylalanine catabolic pathway. Mutations in phenylalanine hydroxylase, which makes tyrosine, are the cause of phenylketonuria. The resulting buildup of phenylalanine and relative depletion of tyrosine cause clinical symptoms of seizures and mental retardation (10). Likewise, loss of cystathionine beta-synthase, a component of the cysteine pathway, causes a buildup of homocysteine, which contributes to ocular, skeletal, nervous system, and vascular problems (18). Defects in enzymes of the serine biosynthesis pathway lead to congenital microcephaly, severe psychomotor retardation, and intractable seizures (5). When any of these three pathways fails, the resulting metabolic imbalance has severe consequences.

The importance of the glycine, asparagine, and glutamine pathways has been previously discussed. The alanine pathway, like proline, does not have an obvious reason for conservation. In the same yeast experiment, knocking out alanine transaminase was detrimental to cells, even in rich media with a plentiful supply of alanine (Barbara Dunn, personal communications). The importance of aspartate and glutamate is likely to result from their nitrogen handling. It is likely that free-living organisms require a more responsive nitrogen-handling capability than parasites and therefore require the ability to synthesize glutamate and aspartate, so as not to rely only on the simple transaminase reactions involving glutamine and asparagine.

The metabolic capabilities of parasites living within other cells offer unique insights into the loss and retention of pathways. Parasites lack several pathways that are conserved in all of the free-living organisms we studied. The ability to thrive without these pathways may be confined to the obligate intracellular parasites that rely on their host for additional metabolic functions. For instance, the lack of phenylalanine hydroxylase in Cryptosporidium may not result in a phenylalanine-tyrosine imbalance if these amino acids are rapidly exchanged with the host where excess phenylalanine can be metabolized. Thus, by identifying conspicuous voids in metabolic capabilities, we can learn what critical functions the host provides for its parasite with the potential of intervention.


This work was supported by an NSF Biocomplexity Grant and NIH grant GM62350 to W.F.L. S.H.P. is supported by an NIH training grant (GM08806).

We thank Rolf Olsen and Christophe Anjard for useful discussions. Barbara Dunn, Stanford University, Stanford, CA, kindly shared unpublished results.

We have no financial conflict of interest.


1. Bapteste, E., H. Brinkmann, J. A. Lee, D. V. Moore, C. W. Sensen, P. Gordon, L. Durufle, T. Gaasterland, P. Lopez, M. Muller, and H. Philippe. 2002. The analysis of 100 genes supports the grouping of three highly divergent amoebae: Dictyostelium, Entamoeba, and Mastigamoeba. Proc. Natl. Acad. Sci. USA 99:1414-1419. [PMC free article] [PubMed]
2. Bateman, A., L. Coin, R. Durbin, R. D. Finn, V. Hollich, S. Griffiths-Jones, A. Khanna, M. Marshall, S. Moxon, E. L. Sonnhammer, D. J. Studholme, C. Yeats, and S. R. Eddy. 2004. The Pfam protein families database. Nucleic Acids Res. 32:D138-D141. [PMC free article] [PubMed]
3. Reference deleted.
4. Craig, R. 1960. The physiology of excretion in the insect. Annu. Rev. Entomol. 5:53-68.
5. de Koning, T. J., and L. W. Klomp. 2004. Serine-deficiency syndromes. Curr. Opin. Neurol. 17:197-204. [PubMed]
6. Franke, J., and R. Kessin. 1977. A defined minimal medium for axenic strains of Dictyostelium discoideum. Proc. Natl. Acad. Sci. USA 74:2157-2161. [PMC free article] [PubMed]
7. Greenstein, J. P., and M. Winitz. 1961. Chemistry of the amino acids. John Wiley & Sons, Inc., New York, N.Y.
8. Reference deleted.
9. Hittinger, C. T., A. Rokas, and S. B. Carroll. 2004. Parallel inactivation of multiple GAL pathway genes and ecological diversification in yeasts. Proc. Natl. Acad. Sci. USA 101:14144-14149. [PMC free article] [PubMed]
10. Kahler, S. G., and M. C. Fahey. 2003. Metabolic disorders and mental retardation. Am. J. Med. Genet. C Semin. Med. Genet. 117:31-41. [PubMed]
11. Kanehisa, M. 1997. A database for post-genome analysis. Trends Genet. 13:375-376. [PubMed]
12. Kanehisa, M., and S. Goto. 2000. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28:27-30. [PMC free article] [PubMed]
13. McCoy, R. H., C. D. Meyer, and W. C. Rose. 1935. Feeding experiments with highly purified amino acids. VIII. Isolation and identification of a new essential amino acid. J. Biol. Chem. 112:283.
14. Reference deleted.
15. Olsen, R. M. 2005. How many protein encoding genes does Dictyostelium discoideum have? p. 265-278. In W. F. Loomis and A. Kuspa (ed.), Dictyostelium genomics. Horizon Press, Far Hills, N.J.
16. Payne, S. H. 2005. Metabolic pathways, p. 41-57. In W. F. Loomis and A. Kuspa (ed.), Dictyostelium genomics. Horizon Press, Far Hills, N.J.
17. Rose, W. C., M. J. Oesterling, and M. Womack. 1948. Comparative growth on diets containing ten and nineteen amino acids, with further observations upon the role of glutamic and aspartic acids. J. Biol. Chem. 176:753-762. [PubMed]
18. Townsend, D. M., K. D. Tew, and H. Tapiero. 2004. Sulfur containing amino acids and human disease. Biomed. Pharmacother. 58:47-55. [PubMed]
19. Vanfleteren, J. R. 1980. Nematodes as nutritional models, p. 47-79. In B. M. Zuckerman (ed.), Nematodes as biological models, vol. 2. Academic Press, Inc., New York, N.Y.
20. Walsh, P. J. 1998. Nitrogen excretion and metabolism, p. 199-214. In D. H. Evans (ed.), The physiology of fishes, 2nd ed. CRC Press, Inc., New York, N.Y.
21. Wright, D. J. 1998. Respiratory physiology, nitrogen excretion and osmotic and ionic regulation, p. 103-131. In R. N. Perry and D. J. Wright (ed.), The physiology and biochemistry of free-living and plant-parasitic nematodes. CABI Publishing, New York, N.Y.
22. Wright, P., A. Felskie, and P. Anderson. 1995. Induction of ornithine-urea cycle enzymes and nitrogen metabolism and excretion in rainbow trout (Oncorhynchus mykiss) during early life stages. J. Exp. Biol 198:127-135. [PubMed]
23. Xu, P., G. Widmer, Y. Wang, L. S. Ozaki, J. M. Alves, M. G. Serrano, D. Puiu, P. Manque, D. Akiyoshi, A. J. Mackey, W. R. Pearson, P. H. Dear, A. T. Bankier, D. L. Peterson, M. S. Abrahamsen, V. Kapur, S. Tzipori, and G. A. Buck. 2004. The genome of Cryptosporidium hominis. Nature 431:1107-1112. [PubMed]

Articles from Eukaryotic Cell are provided here courtesy of American Society for Microbiology (ASM)
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...