![]() | ![]() |
Formats:
|
||||||||||||||||
Copyright Hsiao and Vitkup. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Role of Duplicate Genes in Robustness against Deleterious Human Mutations 1Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America 2Department of Biomedical Informatics, Columbia University, New York, New York, United States of America Wayne N. Frankel, Academic Editor The Jackson Laboratory, United States of America * E-mail: dv2121/at/columbia.edu Conceived and designed the experiments: DV. Performed the experiments: TLH DV. Analyzed the data: TLH DV. Wrote the paper: TLH DV. Received June 14, 2007; Accepted January 30, 2008. Abstract It is now widely recognized that robustness is an inherent property of biological systems [1],[2],[3]. The contribution of close sequence homologs to genetic robustness against null mutations has been previously demonstrated in simple organisms [4],[5]. In this paper we investigate in detail the contribution of gene duplicates to back-up against deleterious human mutations. Our analysis demonstrates that the functional compensation by close homologs may play an important role in human genetic disease. Genes with a 90% sequence identity homolog are about 3 times less likely to harbor known disease mutations compared to genes with remote homologs. Moreover, close duplicates affect the phenotypic consequences of deleterious mutations by making a decrease in life expectancy significantly less likely. We also demonstrate that similarity of expression profiles across tissues significantly increases the likelihood of functional compensation by homologs. Author Summary Genetic robustness is the ability of an organism to buffer deleterious genetic mutations. It has been previously demonstrated that the functional compensation by duplicates plays an important role in protection against gene deletions in model organisms. Close duplicates often share similar functions, and loss of one paralog may be buffered by others. In the present work we specifically investigate the contribution of gene duplicates to backup against deleterious human mutations. We find that genes with close homologs are significantly less likely to harbor known disease mutations compared to genes with remote homologs. In addition, close duplicates affect the phenotypic consequences of deleterious mutations by making a decrease in life expectancy less likely. Similarity of expression profiles across tissues increases the likelihood of functional compensation by homologs. Taken together, our analysis demonstrates that functional compensation by close duplicates plays an important role in human genetic disease. Introduction The ability of an organism to survive in various environmental conditions indicates robustness to external perturbations. On the other hand, relative insensitivity to harmful genetic mutations represents genetic robustness. Several large scale gene deletion studies demonstrated that organisms exhibit a significant degree of genetic robustness against null mutations [6]. Although these studies have an important caveat that genes without a detectable phenotype may be essential under different growth conditions [7],[8], it is clear that genetic robustness is widespread in biological systems [3]. Two distinct mechanisms of genetic robustness have been extensively discussed. Alternative signaling or parallel metabolic pathways illustrate network contributions to genetic robustness [9]. In contrast, a partial functional overlap between sequence paralogs represents the contribution of gene duplicates. The study by Gu et al. [4] demonstrated a significant contribution to functional compensation by duplicate yeast genes. A similar pattern of the functional compensation was also observed in C. elegans [5]. The mechanism of genetic robustness by duplicates was recently investigated by Kafri et al. [10], who showed that null deletions in yeast are often compensated by over-expression of sequence homologs. The role and magnitude of the paralog contribution to robustness against deleterious human mutations are not currently well understood. While the study by Lopez-Bigas et al. [11] suggested a contribution by highly conserved paralogs, Yue et al. [12] showed recently that disease and all genes have an equal fraction of paralogs. In the present work, we demonstrate the importance of considering the sequence similarity between paralogs for understanding the likelihood and magnitude of functional compensation. We also explore the effects of mRNA co-expression between duplicates on the observed functional back-up. Understanding the mechanisms of genetic robustness will be important for identification and prioritization of medically important human mutations. Results/Discussion Disease and all gene sets We investigated the functional compensation by duplicates using three curated collections of human disease genes. Although we currently do not know the total number of disease genes, more than a thousand genes with known mutations affecting human health have been identified [13]. First, we used the collection of 1003 Swiss-Prot [14] human genes with non-synonymous disease mutations annotated in the OMIM database [13]. Second, we investigated the collection of 1609 human genes from the OMIM Morbid Map annotated to be involved in disease, but not as susceptibility or non-disease. Our third disease gene set, obtained from the study by Jimenez-Sanchez et al. [15], included a curated collection of 881 human genes and the associated disease phenotypes such as the age of onset and reduction in life expectancy. The considered disease gene sets significantly overlap, i.e. 636 genes are present in all three sets (see Figure S1, Supporting Information). Without a collection of human genes which are certainly non-disease, we used several large collections of all human genes (all gene sets). We primarily used the comprehensive collection of 20,262 human genes from the Ensembl build 35 [16]. As a representative set of well-characterized human genes, we also considered the collection of 12211 human genes from the Swiss-Prot database [14]. The effects of duplicate sequence homology To understand the role of gene duplicates in robustness against deleterious human mutations we searched for homologs of the disease and all human genes using protein BLASTP [17] (see Methods). Briefly, for each query sequence its closest human paralog was identified as the non-self hit which can be aligned over more than 80% of the length of both sequences. The sequence hits with an E-value larger than 0.001 were not considered (results are qualitatively insensitive to the gene set used or the cutoffs and parameters applied in the similarity searches, see Table S1–S3, Supporting Information). For the human genes with identified paralogs (475 in the disease gene set and 8257 in the all-gene set), the distributions of amino acid sequence identities of the closest homologs are significantly different for disease and all-gene sets (see Figure S2, Supporting Information). The average identity of the closest homolog is 52.9% for disease genes and 58.3% for all genes (non-parametric Wilcoxon's test P = 1.6*10−7). The observed difference cannot be explained simply by the existence of several large protein families with a small number of known disease genes; after removing sequences with more than one paralog in the human genome, the average identity of the closest homolog is 50.0% for disease genes and 54.3% for all genes (P = 2*10−2). Neither can the difference arise due to difficulties in disambiguating allelic variants from close sequence differences in copy number variable genes [18],[19]. After excluding genes with highly similar paralogs of sequence identity greater than 90%, the average identity of the closest paralog is 51.4% for disease genes and 54.4% for all genes (P = 7*10−4).In Figure 1 = −0.025, P = 0.6) or gene density of disease mutations (rS = −0.036, P = 0.4) and the sequence identity of the closest homolog. This suggests that the number of disease mutations identified in genes may be determined primarily by experimental, mutational, or gene history biases [20], and not affected by the possibility of functional compensation. Similarly, no correlation between deleterious variability and evolutionary distance to murine orthologs was observed in the study by Sunyaev et al. [21].
If close sequence homologs provide functional back-up against medically damaging mutations, it is likely that they also contribute to relaxation of constraints against deleterious human polymorphisms. As was demonstrated by Lynch et al. [22], most duplicated genes experience a brief period of relaxed selection after duplication. The functional constraints on human genes can be estimated through the normalized ratio of non-synonymous to synonymous single nucleotide polymorphisms (SNPs) per site (Ka/Ks) [17],[23]. A small value of the Ka/Ks ratio suggests a higher constraint on a gene, i.e. a smaller fraction of observed non-synonymous polymorphisms. Figure 2
While there are many examples of homologous iso-enzymes providing functional compensation [7],[25], this mechanism is less established for other functional classes. To understand the significance of the duplicate compensation among various functional categories we (applied the approach described in the previous section and) compared sequence identities of closest paralogs for disease genes and all human genes in the 53 “GO slim” functional classes. Using a false discovery rate of 5%, we found that, in additional to metabolism, the functional category “response to stimulus” showed evidence of statistically significant compensation by duplicates (see Table 1 and Table S4, Supporting Information); the “response to stimulus” category contains cytokines, receptors, protein kinases and other proteins involved in signal transduction. Consequently, functional compensation by duplicates is not limited to metabolism and is also significant among other important functional classes.
The observed paucity of close homologs for known disease genes could be a consequence of their faster evolution in comparison with all human genes. To investigate this possibility we analyzed Ka and Ka/Ks values calculated using PAML [26] for all 13055 one-to-one human-mouse orthologous pairs from the Ensembl database [27]. Both Ka and Ka/Ks measures for known disease genes are significantly lower than those of all-gene set (mean/median Ka: disease 0.0729/0.0833, all 0.0851/0.0971, P = 4*10−2; mean/median Ka/Ks: disease 0.119/0.105, all 0.137/0.113, P = 1*10−2.). These findings are in agreement with the study by Kondrashov et al. [28] who considered 1273 disease genes and 16580 other human genes. Although the earlier study by Smith and Eyre-Walker [29] reported the opposite pattern (a higher Ka/Ks ratio for disease genes), their results were based on significantly smaller gene sets (387 disease and 2024 non-disease genes). Consequently, it is unlikely that the elevated sequence similarity between paralogs of non-disease genes is related to their slower rate of evolution.Recently, He et al. demonstrated a lower duplicability of “important” yeast genes (essential genes and genes with knockout phenotypes) [30]. To explore the possibility that lower duplicability of disease genes affects our results we followed the approach by He et al. [30]. Based on the Ensembl database [27] we identified singleton human genes (genes without duplicates in the human genome, see Methods) with mouse, chicken, and zebrafish orthologs. We then looked at whether the orthologs of singleton human genes have duplicated in the mouse, chicken, and zebrafish genomes (see Text S1, Supporting Information). The analysis showed that singleton disease genes are as likely to have duplicate orthologs as all human singleton genes (9.2% of 338 disease singletons and 8.5% of 5657 all human singletons, χ2-test P = 0.5. See Figure S3, Supporting Information). Therefore, human disease genes are as likely to retain duplicates in evolution as all human genes.Phenotypic consequences of mutations The sequence identity between duplicates influences the phenotypic consequences of gene deletions in yeast [4]. As the sequence identity decreases, null mutations with weak growth phenotypes become less likely and mutations with strong growth phenotypes become more likely. Inspired by this analysis, we decided to investigate if duplicates also affect phenotypic consequences of human disease mutations. For that purpose we used the collection of human disease genes with manually curated phenotypes [15]. While we did not detect a significant correlation between the presence of close duplicates and the age of onset, the population frequency, or the mode of inheritance, we found a significant correlation between the sequence identity to the closest duplicate and the reduction in life expectancy (Spearman's rank correlation rS = −0.21, P = 2*10−6, χ2-test, P = 2*10−4 see Figure 3
Several known examples illustrate this interesting result. Mutations in red-sensitive opsin gene cause partial colorblindness (OMIM#303900). Nevertheless, the life expectancy is not seriously affected due to the presence of the green-sensitive opsin gene (close homolog of the red-sensitive gene). Another example involves several homologous iso-enzymes of the human glycogen phosphorylase; the three iso-enzymes are primarily active in muscle, liver, and brain. Although defects in the muscle and liver forms cause glycogen storage disease V (MIM#232600) and VI (MIM#232700) respectively, neither of the defects reduces life expectancy. The effect of expression profile similarities Because gene duplicates often have different patterns of expression [25],[31],[32], it is likely that the functional compensation depends not only on the sequence similarity, but also on the similarity of their expression profiles across human tissues. We decided to test this hypothesis using the comprehensive expression dataset by Su et al. [33], which includes expression of 44775 human transcripts in 79 tissues. Initially, we used the absolute values of gene expression in different tissues to calculate the relative expression difference between every gene and its closest sequence homolog. The relative expression difference was defined as (Exp(Gene)−Exp(Paralog)/(1/2*((Exp(Gene)+Exp(Paralog)). Using this measure we did not find any significant differences between disease and all genes (P = 0.1). It is likely that each gene is expressed primarily in a small number of tissues and the simple averaging of expression values across all tissues will not be informative. Therefore, in order to better reflect the observed expression patterns, we considered a gene to be expressed in a tissue if at least one of the gene transcripts was found to be significantly expressed (“present call”) in the tissue by Su et al. [33]. We defined Similarity of Tissue Expression (STE) for a gene pair as the ratio of the number of tissues where the two genes are both expressed to the number of tissues where at least one of the genes is expressed; STE is essentially the Jaccard's coefficient of similarity for binary expression patterns. The STE value of one would indicate complete overlap between expression profiles, while values close to zero would indicate poor overlap. Since expression profile similarity and sequence similarity of duplicates tend to be correlated [25],[31],[32], we demonstrated (see Figure 4 = 4.0, P<0.05, see Methods).
Conclusions Our analysis clearly demonstrates that gene duplicates affect the phenotypic consequences of deleterious human mutations. Several studies suggested possible mechanisms of functional back-up by duplicates [4],[9],[10],[34]. It is likely that similar mechanisms also play a role in human genetic diseases. In some cases duplicates might actively compensate for the mutated homolog, for example by partially carrying the metabolic flux of the mutated gene [25]. In other cases, genes with close duplicates may have smaller functional loads compared to singletons, i.e. genes with duplicates may be essential in a smaller number of environmental conditions [7]. As a result, a disease phenotype is less likely to be observed. We take the view that both of these cases represent functional compensation, although it may be called active compensation in the first case and passive compensation in the second. In our view, the probabilistic approach used in our paper to investigate the likelihood of disease mutations given the sequence identity of the closest homolog can be applied for identification and prioritization of medically relevant mutations. Such prioritization approaches are necessary as large collections of human genetic variation, such as mutations associated with various cancers [35],[36] and common human polymorphisms [37], are being generated at an accelerated rate. A probabilistic scheme, similar to the one used in our paper, can be directly applied as a prior in search for causative mutations; the information about homolog expression profiles can be also considered. The development of such probabilistic prioritization schemes is beyond the scope of this paper. Nevertheless, the fact that genes with 70–100% sequence identity homologs are about 2–3 times less likely to harbor disease mutations, and a significant fraction of such genes in the human genome, suggest that duplicate homology information may be important for the prioritization of medically relevant mutations. The collections of disease genes used in our work are incomplete and significantly biased towards Mendelian diseases [15]. When large and reliable datasets of genes responsible for complex diseases become available it will be interesting to investigate whether fundamental differences exist between functional compensation for Mendelian and multi-factorial diseases. In future studies, it will be also important to investigate robustness to deleterious human mutations achieved through various network effects [3],[9]. Such studies will bring the important biological concept of robustness into the realm of human genetics. Methods Three sets of human disease genes were used in our study. We obtained a list of 1003 human genes (1006 Swiss-Prot entries) with disease non-synonymous mutations from the Swiss-Prot database [14] (July 2005; http://expasy.org/cgi-bin/listshumsavar.txt). The list of 881 human disease genes (923 OMIM entries) with annotated phenotypes was taken from the study by Jimenez-Sanchez et al. [15]. We also considered another disease set consisting of genes annotated as “disease”, but neither as “susceptibility” nor as “non-disease” in the OMIM Morbid Map [13]. This set included 1609 genes (2239 MIM entries). Two sets of all human genes were used based on the Ensembl [16] and Swiss-Prot databases. The longest protein isoform of every human gene was obtained from the Ensembl human genome build 35. We only retained genes annotated as “pep:known” or “pep:CCDS” (representing genes mapped to human-specific entries of Swiss-Prot, RefSeq, SPTrEMBL or CCDS). In total 20,262 genes were included. The other all- human gene set consisted of 12,211 protein sequences from the Swiss-Prot database. All-against-all BLASTP searches were performed using standard parameters [17]. Sequence homologs were identified as non-self hits with E-value < = 0.001 that could be aligned over more than 80% of both the query length and the length of identified sequence. Throughout the manuscript the term “singleton human genes” is used to describe the genes without any sequence homologs which can be identified the BLASTP searches.We obtained H. sapiens to D. rerio, H. sapiens to G. gallus, and H. sapiens to M. musculus orthology information as well as paralogous relationships within D. rerio, G. gallus, and M. musculus from the Ensembl database [27]. Ka and Ka/Ks values of all 1 1 human-mouse orthologous pairs were calculated using the PAML package and obtained directly from the Ensembl database [27].The sets of synonymous and non-synonymous human SNPs were obtained from the dbSNP database [24]. These included 87920 SNPs corresponding to 14825 human genes. For each bin of homolog sequence identity, the Ka/Ks ratio was calculated. The proportion of non-synonymous sites (0.717) was calculated from simulation; for each nucleotide in the protein coding region a random transition or transversion mutation was performed at the ratio of 0.6/0.4, according to the published estimates in mammals [38],[39],[40],[41]. We used manually curated phenotypes from the study by Jimenez-Sanchez et al. [15] to calculated Spearman's rank correlation between reduction in life expectancy (ordinal data: none, mild, moderate, and severe) and sequence identity to the closest homolog. The functional categories of human genes used in our study were based on the annotation by GOA [42]; 53 of GO slims for GOA (http://www.geneontology.org/GO_slims/goslim_goa.obo) were considered and Benjamin-Hochberg's algorithm was applied for multiple hypothesis correction. The gene expression profiles in 79 human tissues were obtained from the study by Su et al. [33]. We eliminated probe sets with cross hybridization effects (as identified by Su et al.). In total, we considered expression profiles for 15097 human genes. The expression value of gene G at tissue T was set to 1 if at least one of gene G's transcripts was detected as “Present call” in tissue T based on the Affymetrix detection algorithm (provided by Su et al. [33]). Similarity of Tissue Expression (STE) of a gene pair was defined as the Jaccard's coefficient of the binary expression profiles of the two genes, that is, the ratio of the number of tissues where the two genes are both expressed to the number of tissues where at least one of the genes is expressed. We performed the likelihood ratio test to investigate whether the similarity in tissue expression influences the probability of being a disease gene independently of the sequence identity to the closest homolog. The logistic regression was used to model the probability of being a disease gene using the expression and sequence similarities. In the null hypothesis the disease gene probability is determined only by sequence identity of the closest homolog; in the alternative hypothesis the probability is determined by sequence identity and tissue expression similarity of the closest homolog. The probabilities shown in Figure 1
= 0.2 in Figure 1Figure S1 Venn diagram showing the overlap of the three disease gene sets used in the analysis. Blue: SwissProt, green: OMIM, red: Jimenez-Sanchez G et al.. (0.03 MB DOC) Click here for additional data file.(25K, doc) Figure S2 The distribution of the closest homolog sequence identities for the disease and all gene sets. (0.04 MB DOC) Click here for additional data file.(36K, doc) Figure S3 Human disease singleton genes as equally likely to have duplicate orthologs in the mouse, chicken, and zebrafish genomes as all human singleton genes. (0.02 MB DOC) Click here for additional data file.(24K, doc) Table S1 Comparison of sequence identity of the closest homolog for the disease and all-gene sets using different BLASTP E-value cutoffs. (0.03 MB DOC) Click here for additional data file.(26K, doc) Table S2 Comparison of sequence identity of the closest homolog for the disease and all-gene sets using different cutoffs for the minimal alignable region between two sequences. (0.03 MB DOC) Click here for additional data file.(26K, doc) Table S3 Comparison of sequence identity of the closest homolog using different combinations of the disease and all-gene collections. (0.03 MB DOC) Click here for additional data file.(28K, doc) Table S4 Comparison of sequence identity of the closest homolog for the disease and all genes in different GO slim categories. (0.12 MB DOC) Click here for additional data file.(117K, doc) Table S5 Comparison of the Similarity of Tissue Expression (STE) between the disease and all gene sets for sequences with various sequence identities of the closest homolog. (0.03 MB DOC) Click here for additional data file.(29K, doc) Text S1 Investigating the duplicability of human genes. (0.03 MB DOC) Click here for additional data file.(27K, doc) Footnotes The authors have declared that no competing interests exist. This work was funded by a Columbia start-up package for the investigator. References 1. Hartwell LH, Hopfield JJ, Leibler S, Murray AW. From molecular to modular cell biology. Nature. 1999;402:C47–52. [PubMed] 2. Stelling J, Sauer U, Szallasi Z, Doyle FJ, 3rd, Doyle J. Robustness of cellular functions. Cell. 2004;118:675–685. [PubMed] 3. Wagner A. In: Robustness and Evolvability in Living Systems. SH. LSS, editor. Princeton University Press; 2005. 4. Gu Z, Steinmetz LM, Gu X, Scharfe C, Davis RW, et al. Role of duplicate genes in genetic robustness against null mutations. Nature. 2003;421:63–66. [PubMed] 5. Conant GC, Wagner A. Duplicate genes and robustness to transient gene knock-downs in Caenorhabditis elegans. Proc Biol Sci. 2004;271:89–96. [PubMed] 6. Steinmetz LM, Scharfe C, Deutschbauer AM, Mokranjac D, Herman ZS, et al. Systematic screen for human disease genes in yeast. Nat Genet. 2002;31:400–404. [PubMed] 7. Papp B, Pal C, Hurst LD. Metabolic network analysis of the causes and evolution of enzyme dispensability in yeast. Nature. 2004;429:661–664. [PubMed] 8. Dudley AM, Janse DM, Tanay A, Shamir R, Church GM. A global view of pleiotropy and phenotypically derived gene function in yeast. Mol Syst Biol. 2005;1:2005–0001. [PubMed] 9. Wagner A. Robustness against mutations in genetic networks of yeast. Nat Genet. 2000;24:355–361. [PubMed] 10. Kafri R, Bar-Even A, Pilpel Y. Transcription control reprogramming in genetic backup circuits. Nat Genet. 2005;37:295–299. [PubMed] 11. Lopez-Bigas N, Ouzounis CA. Genome-wide identification of genes likely to be involved in human genetic disease. Nucleic Acids Res. 2004;32:3108–3114. [PubMed] 12. Yue P, Moult J. Identification and analysis of deleterious human SNPs. J Mol Biol. 2006;356:1263–1274. [PubMed] 13. McKusick V. Mendelian inheritance in man. A catalog of human genes and genetic disorders. The Johns Hopkins University Press; 1998. 14. Bairoch A, Apweiler R. The SWISS-PROT protein sequence data bank and its new supplement TREMBL. Nucleic Acids Res. 1996;24:21–25. [PubMed] 15. Jimenez-Sanchez G, Childs B, Valle D. Human disease genes. Nature. 2001;409:853–855. [PubMed] 16. Hubbard T, Andrews D, Caccamo M, Cameron G, Chen Y, et al. Ensembl 2005. Nucleic Acids Res. 2005;33:D447–453. [PubMed] 17. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. [PubMed] 18. Hurles M. Are 100,000 “SNPs” useless? Science. 2002;298:1509. author reply 1509. [PubMed] 19. Nguyen DQ, Webber C, Ponting CP. Bias of selection on human copy-number variants. PLoS Genet. 2006;2:e20. [PubMed] 20. Reich DE, Schaffner SF, Daly MJ, McVean G, Mullikin JC, et al. Human genome sequence variation and the influence of gene history, mutation and recombination. Nat Genet. 2002;32:135–142. [PubMed] 21. Sunyaev S, Kondrashov FA, Bork P, Ramensky V. Impact of selection, mutation rate and genetic drift on human genetic variation. Hum Mol Genet. 2003;12:3325–3330. [PubMed] 22. Lynch M, Conery JS. The evolutionary fate and consequences of duplicate genes. Science. 2000;290:1151–1155. [PubMed] 23. Graur D, Li HW. Fundamentals of molecular evolution. Sinauer Associates; 2000. 24. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001;29:308–311. [PubMed] 25. Blank LM, Kuepfer L, Sauer U. Large-scale 13C-flux analysis reveals mechanistic principles of metabolic network robustness to null mutations in yeast. Genome Biol. 2005;6:R49. [PubMed] 26. Yang Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci. 1997;13:555–556. [PubMed] 27. Hubbard TJ, Aken BL, Beal K, Ballester B, Caccamo M, et al. Ensembl 2007. Nucleic Acids Res. 2007;35:D610–617. [PubMed] 28. Kondrashov FA, Ogurtsov AY, Kondrashov AS. Bioinformatical assay of human gene morbidity. Nucleic Acids Res. 2004;32:1731–1737. [PubMed] 29. Smith NG, Eyre-Walker A. Human disease genes: patterns and predictions. Gene. 2003;318:169–175. [PubMed] 30. He X, Zhang J. Higher duplicability of less important genes in yeast genomes. Mol Biol Evol. 2006;23:144–151. [PubMed] 31. Gu Z, Nicolae D, Lu HH, Li WH. Rapid divergence in expression between duplicate genes inferred from microarray data. Trends Genet. 2002;18:609–613. [PubMed] 32. Makova KD, Li WH. Divergence in the spatial pattern of gene expression between human duplicate genes. Genome Res. 2003;13:1638–1645. [PubMed] 33. Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, et al. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci U S A. 2004;101:6062–6067. [PubMed] 34. Kafri R, Levy M, Pilpel Y. The regulatory utilization of genetic redundancy through responsive backup circuits. Proc Natl Acad Sci U S A. 2006;103:11653–11658. [PubMed] 35. Sjoblom T, Jones S, Wood LD, Parsons DW, Lin J, et al. The consensus coding sequences of human breast and colorectal cancers. Science. 2006;314:268–274. [PubMed] 36. Greenman C, Stephens P, Smith R, Dalgliesh GL, Hunter C, et al. Patterns of somatic mutation in human cancer genomes. Nature. 2007;446:153–158. [PubMed] 37. Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, et al. A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449:851–861. [PubMed] 38. Li WH, Wu CI, Luo CC. Nonrandomness of point mutation as reflected in nucleotide substitutions in pseudogenes and its evolutionary implications. J Mol Evol. 1984;21:58–71. [PubMed] 39. Maeda N, Wu CI, Bliska J, Reneke J. Molecular evolution of intergenic DNA in higher primates: pattern of DNA changes, molecular clock, and evolution of repetitive sequences. Mol Biol Evol. 1988;5:1–20. [PubMed] 40. Fay JC, Wyckoff GJ, Wu CI. Positive and negative selection on the human genome. Genetics. 2001;158:1227–1234. [PubMed] 41. Wyckoff GJ, Wang W, Wu CI. Rapid evolution of male reproductive genes in the descent of man. Nature. 2000;403:304–309. [PubMed] 42. Camon E, Magrane M, Barrell D, Lee V, Dimmer E, et al. The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology. Nucleic Acids Res. 2004;32:D262–266. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||
Nature. 1999 Dec 2; 402(6761 Suppl):C47-52.
[Nature. 1999]Cell. 2004 Sep 17; 118(6):675-85.
[Cell. 2004]Nature. 2003 Jan 2; 421(6918):63-6.
[Nature. 2003]Proc Biol Sci. 2004 Jan 7; 271(1534):89-96.
[Proc Biol Sci. 2004]Nat Genet. 2002 Aug; 31(4):400-4.
[Nat Genet. 2002]Nature. 2004 Jun 10; 429(6992):661-4.
[Nature. 2004]Mol Syst Biol. 2005; 1():2005.0001.
[Mol Syst Biol. 2005]Nat Genet. 2000 Apr; 24(4):355-61.
[Nat Genet. 2000]Nature. 2003 Jan 2; 421(6918):63-6.
[Nature. 2003]Proc Biol Sci. 2004 Jan 7; 271(1534):89-96.
[Proc Biol Sci. 2004]Nat Genet. 2005 Mar; 37(3):295-9.
[Nat Genet. 2005]Nucleic Acids Res. 2004; 32(10):3108-14.
[Nucleic Acids Res. 2004]J Mol Biol. 2006 Mar 10; 356(5):1263-74.
[J Mol Biol. 2006]Nucleic Acids Res. 1996 Jan 1; 24(1):21-5.
[Nucleic Acids Res. 1996]Nature. 2001 Feb 15; 409(6822):853-5.
[Nature. 2001]Nucleic Acids Res. 2005 Jan 1; 33(Database issue):D447-53.
[Nucleic Acids Res. 2005]Nucleic Acids Res. 1996 Jan 1; 24(1):21-5.
[Nucleic Acids Res. 1996]Nucleic Acids Res. 1997 Sep 1; 25(17):3389-402.
[Nucleic Acids Res. 1997]Science. 2002 Nov 22; 298(5598):1509; author reply 1509.
[Science. 2002]PLoS Genet. 2006 Feb; 2(2):e20.
[PLoS Genet. 2006]Nat Genet. 2002 Sep; 32(1):135-42.
[Nat Genet. 2002]Hum Mol Genet. 2003 Dec 15; 12(24):3325-30.
[Hum Mol Genet. 2003]Science. 2000 Nov 10; 290(5494):1151-5.
[Science. 2000]Nucleic Acids Res. 1997 Sep 1; 25(17):3389-402.
[Nucleic Acids Res. 1997]Nucleic Acids Res. 2001 Jan 1; 29(1):308-11.
[Nucleic Acids Res. 2001]Nucleic Acids Res. 2001 Jan 1; 29(1):308-11.
[Nucleic Acids Res. 2001]Nature. 2004 Jun 10; 429(6992):661-4.
[Nature. 2004]Genome Biol. 2005; 6(6):R49.
[Genome Biol. 2005]Comput Appl Biosci. 1997 Oct; 13(5):555-6.
[Comput Appl Biosci. 1997]Nucleic Acids Res. 2007 Jan; 35(Database issue):D610-7.
[Nucleic Acids Res. 2007]Nucleic Acids Res. 2004; 32(5):1731-7.
[Nucleic Acids Res. 2004]Gene. 2003 Oct 30; 318():169-75.
[Gene. 2003]Mol Biol Evol. 2006 Jan; 23(1):144-51.
[Mol Biol Evol. 2006]Nucleic Acids Res. 2007 Jan; 35(Database issue):D610-7.
[Nucleic Acids Res. 2007]Nature. 2003 Jan 2; 421(6918):63-6.
[Nature. 2003]Nature. 2001 Feb 15; 409(6822):853-5.
[Nature. 2001]Nature. 2001 Feb 15; 409(6822):853-5.
[Nature. 2001]Genome Biol. 2005; 6(6):R49.
[Genome Biol. 2005]Trends Genet. 2002 Dec; 18(12):609-13.
[Trends Genet. 2002]Genome Res. 2003 Jul; 13(7):1638-45.
[Genome Res. 2003]Proc Natl Acad Sci U S A. 2004 Apr 20; 101(16):6062-7.
[Proc Natl Acad Sci U S A. 2004]Proc Natl Acad Sci U S A. 2004 Apr 20; 101(16):6062-7.
[Proc Natl Acad Sci U S A. 2004]Genome Biol. 2005; 6(6):R49.
[Genome Biol. 2005]Trends Genet. 2002 Dec; 18(12):609-13.
[Trends Genet. 2002]Genome Res. 2003 Jul; 13(7):1638-45.
[Genome Res. 2003]Nature. 2003 Jan 2; 421(6918):63-6.
[Nature. 2003]Nat Genet. 2000 Apr; 24(4):355-61.
[Nat Genet. 2000]Nat Genet. 2005 Mar; 37(3):295-9.
[Nat Genet. 2005]Proc Natl Acad Sci U S A. 2006 Aug 1; 103(31):11653-8.
[Proc Natl Acad Sci U S A. 2006]Genome Biol. 2005; 6(6):R49.
[Genome Biol. 2005]Science. 2006 Oct 13; 314(5797):268-74.
[Science. 2006]Nature. 2007 Mar 8; 446(7132):153-8.
[Nature. 2007]Nature. 2007 Oct 18; 449(7164):851-61.
[Nature. 2007]Nature. 2001 Feb 15; 409(6822):853-5.
[Nature. 2001]Nat Genet. 2000 Apr; 24(4):355-61.
[Nat Genet. 2000]Nucleic Acids Res. 1996 Jan 1; 24(1):21-5.
[Nucleic Acids Res. 1996]Nature. 2001 Feb 15; 409(6822):853-5.
[Nature. 2001]Nucleic Acids Res. 2005 Jan 1; 33(Database issue):D447-53.
[Nucleic Acids Res. 2005]Nucleic Acids Res. 1997 Sep 1; 25(17):3389-402.
[Nucleic Acids Res. 1997]Nucleic Acids Res. 2007 Jan; 35(Database issue):D610-7.
[Nucleic Acids Res. 2007]Nucleic Acids Res. 2001 Jan 1; 29(1):308-11.
[Nucleic Acids Res. 2001]J Mol Evol. 1984; 21(1):58-71.
[J Mol Evol. 1984]Mol Biol Evol. 1988 Jan; 5(1):1-20.
[Mol Biol Evol. 1988]Genetics. 2001 Jul; 158(3):1227-34.
[Genetics. 2001]Nature. 2000 Jan 20; 403(6767):304-9.
[Nature. 2000]Nature. 2001 Feb 15; 409(6822):853-5.
[Nature. 2001]Nucleic Acids Res. 2004 Jan 1; 32(Database issue):D262-6.
[Nucleic Acids Res. 2004]Proc Natl Acad Sci U S A. 2004 Apr 20; 101(16):6062-7.
[Proc Natl Acad Sci U S A. 2004]