• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. Feb 20, 2007; 104(8): 2779–2784.
Published online Feb 14, 2007. doi:  10.1073/pnas.0610797104
PMCID: PMC1815258
Evolution

Tissue-driven hypothesis of genomic evolution and sequence-expression correlations

Xun Gu* and Zhixi Su*

Abstract

To maintain normal physiological functions, different tissues may have different developmental constraints on expressed genes. Consequently, the evolutionary tolerance for genomic evolution varies among tissues. Here, we formulate this argument as a “tissue-driven hypothesis” based on the stabilizing selection model. Moreover, several predicted genomic correlations are tested by the human–mouse microarray data. Our results are as follows. First, between the human and mouse, we have elaborated the among-tissue covariation between tissue expression distance (Eti) and tissue sequence distance (Dti). This highly significant EtiDti correlation emerges when the expression divergence and protein sequence divergence are under the same tissue constraints. Second, the tissue-driven hypothesis further explains the observed significant correlation between the tissue expression distance (between the human and mouse) and the duplicate tissue distance (Tdup) between human (or mouse) paralogous genes. In other words, between-duplicate and interspecies expression divergences covary among tissues. Third, for genes with the same expression broadness, we found that genes expressed in more stringent tissues (e.g., neurorelated) generally tend to evolve more slowly than those in more relaxed tissues (e.g., hormone-related). We conclude that tissue factors should be considered as an important component in shaping the pattern of genomic evolution and correlations.

Keywords: expression divergence, tissue expression, mammalian genomics, gene duplications

Understanding the underlying evolutionary mechanism is fundamental for investigating the emergence of genome complexity (12). It remains highly controversial as to what factors could determine the evolutionary rate of expression and sequence divergence (315). An important issue is the role of tissue-specific factors in genomic evolution. Several studies have suggested that tissue-specific constraints may generate among-tissue variation of expression divergence between human and chimpanzee (3, 4, 6, 8), or between human and mouse (16). Moreover, it has been found (17, 18) that the rate of expression divergence may be negatively associated to the broadness of tissue expression of the gene. Interestingly, Rifkin et al. (19) reported that, relative to the prediction of strict neutral model (1), the natural expression variation in the Drosophila population was constrained. However, there are many debates among authors about measures for expression divergence and tissue specificity, biological/statistical issues for expression-sequence correlation, and methods for multitissue analysis (see refs. 14 and 15 for recent reviews). As a working hypothesis, it seems generally accepted that correlated genomic evolution is mainly driven by various stabilizing (or purifying) selections including at the tissue level (6, 7, 19, 20). This view does not exclude the role of natural selection, which may occur in some lineages for some genes that perform specified functions. A good example is from the analyses of refs 3 and 4, implying adaptive expression shifts of some genes in the human brain. However, the opposite view remains (9, 21), arguing that expression divergence was mainly driven by natural selection.

We have recognized that, without developing an explicit evolutionary model that can provide a common ground for predicting and testing by coherent data analyses, it is difficult to have a comprehensive understanding of these issues. In this article, we develop a stochastic model for genomic evolution under the principle of stabilizing selection and formulate the tissue-driven hypothesis by postulating that stabilizing selections for both expression and sequence divergences may be affected simultaneously by the common factors of tissues in which the genes is expressed. Facilitated by substantial multispecies microarrays (16), we test several predicted genomic correlations from the tissue-driven hypothesis. Finally, we discuss the evolutionary scenario of genomic correlations, demonstrating that accumulated tissue constraints may shape the correlated pattern of sequence and expression evolution.

The Model

Expression Divergence Under Stabilizing Model.

We modified the stabilizing selection model (22) of quantitative characters to describe the tissue-specific constraint on expression divergence. For a gene expressed in a certain tissue (ti), the stabilizing selection on the expression level x follows a Gaussian fitness function

equation image

where θe is the optimal value of expression level, wti is the coefficient for stabilizing selection on gene expression in tissue ti; a large wti means a strong selection pressure, and vice versa (Fig. 1). Under the stabilizing model of Eq. 1, we have shown that the expression divergence follows an Ornstein–Uhlenback (OU) process (23). The stochastic OU process is characterized by the infinitesimal mean −β0(x − θe) and variance ε2/2Ne, where ε2 is the mutational variance, Ne is the effective population size, and β0 = wtiε2 measures the direct force against the deviation from the optimum. Given the initial expression value x0, the OU model claims that x(t) follows a normal distribution with the mean and variance given by

equation image

respectively, where β = 2Neβ0 is the decay rate of expression divergence.

Fig. 1.
Stabilizing model for tissue expression divergence. (A) Fitness function plotting against the expression level under the stabilizing selection; θe is the optimal expression level. (B) Scheme of interspecies expression divergence between two orthologous ...

For two genomes, say, human and mouse, that have diverged t time units ago (Fig. 1), the expression distance can be derived similarly to Gu (12). Let x1 and x2 be the expression levels of two orthologous genes 1 and 2, respectively. Assuming the initial value is at the optimum (x0 = θe), from Eq. 2 we have E[x1 x0] = E[x2 x0] = θe. If gene expression diverged along a lineage independently, we have E[x1x2 x0] = E[x1 x0]E[x2 x0] = θe2, and E[x1x2] = Ee2], resulting in Cov(x1, x2) = Vare). In the same manner, one can show V(x1) = V(x2) = ϵ2(1 − e2βt)/2β + Vare). Therefore, the expression distance for any gene pair g in tissue ti, Eti,g = E[(x1x2)2], is given by

equation image

where ϵg2 is the mutational variance, βg is the decay rate of expression divergence of gene pair g, and Wti,g = βgg2 is the strength of stabilizing selection on expression divergence. Thus, Eti,g is inversely related to Wti,g. When → ∞, Eti,g = 1/Wti,g.

Tissue-Dependent Evolutionary Rate of Protein Sequence.

Gu (24) studied the evolutionary rate of a protein sequence, based on the principle that stabilizing selection on protein function generates sequence conservation. In the case of single protein function y (such as enzyme activity or DNA-binding affinity, also called molecular phenotype, the stabilizing selection on y follows a simple Gaussian form (Fig. 2)

equation image

Thus, the coefficient of selection on y is given by s(y) = 1 − f(y) ≈ (y − θg)2/2σg2. On the other hand, random (nonsynonymous) mutations in the coding region affect the molecular phenotype y according to a distribution with the mean θg and variance σm2 (Fig. 2). Consequently, the mean of selection of coefficient is given by s = E[(y − θg)2]/2σg2 = σm2/2σg2, and the selection intensity Sg = 4Nes = 2Neσm2g2. In the general case of multiple (K) molecular phenotypes of protein function, Gu (24) have shown

equation image

where the subscript i assigns σm2 and σg2 specific to the ith molecular phenotype.

Fig. 2.
The stabilizing model. (A) Fitness function plotting against the molecular phenotype (y) of protein function under the stabilizing selection. (B) Distribution of random mutations that affect the molecular phenotype y.

Stabilizing selection of molecular phenotypes (measured by a set of σg,i2) may be tissue-dependent, which can be modeled as σg,i2 = ag,i2/Zg, i = 1,…, K. Whereas each ag,i2 is a tissue-independent constant, tissue factor Zg measures the accumulated tissue effect on fitness; a larger Zg means a greater tissue effect. For gene g expressed in Lg different tissues, we implement an additive-effect model Zg = Zg=j=1LgLj, in which Lj is the contribution from tissue j. The mean selection intensity in Eq. 5 can be rewritten in terms of tissue dependency

equation image

where Sg,0 = −2NeΣi=1Kσm,i2/ag,i2 is the tissue-independent component. Hence, given the mutation rate v, the evolutionary rate of gene g is given by

equation image

Eq. 7 links between-tissue-effects and evolutionary rate of protein sequence. Apparently, the evolutionary rate decreases when the accumulated tissue effect Zg is strong and vice versa.

Tissue-Driven Hypothesis and Predictions.

The tissue-driven hypothesis of genomic evolution postulates that the tissue factor plays an important role of functional constraint on the rate of genomic evolution, because genes influence phenotypic characters by expression in specific tissues. The phenotypic consequences of genetic variations in regulatory and coding sequences are both affected by the common microenvironment of tissues. Below, we discuss several predicted genomic correlations that can be tested by the genomic data.

Tissue Expression Distance (Eti).

To measure the expression difference of a tissue between two species, we define Eti as the mean expression distance over N orthologous genes in tissue ti, that is, Eti = Σg=1NEti,g/N, where Eti,g is given by Eq. 3. Under some moderate conditions, Eti can be approximated by

equation image

[see supporting information (SI) Appendix], where the mean tissue factor Wti is the (harmonic) average of Wti,gs, [beta with macron above] is the mean decay-rate of expression divergence, and t is the time of speciation. Eq. 8 indicates that the tissue expression distance increases with time t and decreases with the mean tissue factor wti. When [beta with macron above] is close to 0 (very weak stabilizing selection) or t is short (closely related species), Eq. 8 can be reduced to Eti ≈ 2ϵ2t, i.e., the Brownian model (12, 13), where ε2 is the mean mutational variance over genes. In the case of distantly related species when the expression divergence approaches the steady state, the time-dependent term in Eq. 8 vanishes, resulting in Eti ≈ 1/wti.

Tissue Expression and Sequence Distances: The EtiDti Correlation.

Let dg be the evolutionary distance between an orthologous gene pair (g). For a set (Nti) of genes that are expressed in tissue ti, the mean evolutionary distance is given by Dti=g=1Ntidg/Nti. Because dg = 2λgt, where λg is given by Eq. 7, we have shown that the mean selection intensity of tissue (ti)-expressed protein sequences can be written as StiS0(Zti + α) (see SI Appendix); Zti is the mean of accumulated tissue-(ti) factors over expressed genes, S0 is the mean of tissue-independent components, and α is a constant. Thus, we have

equation image

where Dυ = 2υt.

According to the tissue-driven hypothesis, two mean tissue factors Wti and Zti should be positively correlated, because they represent the effects of common microenvironment of tissue ti on expression divergence and protein sequence conservation, respectively. This argument predicts a positive correlation between tissue expression distance (Eti) and tissue sequence distance (Dti). In the special case when Zti = wti and Eti ≈ 1/wti (steady-state expression divergence), we obtain the following form

equation image

where a = 1 − S/2 and b = S0/2.

Interspecies and Interduplicate Tissue Expression Divergence: The EtiTdup Correlation.

The tissue-driven hypothesis also predicts that tissue factors may affect the expression divergence between duplicate genes. Consider a pair of duplicate genes that have diverged τ evolutionary time units. Under the similar stabilizing selection model, one can obtain the expression distance between duplicated genes, Tdup,ti,g, which is virtually the same as Eq. 3. To be clear, we use Qti,g for the tissue factor of expression divergence between duplicate pair g. For a set (Ndup) of duplicate genes, let Tdup=g=1NdupTdup,ti,g/Ndup be the tissue (ti) duplicate distance. Similar to Eqs. 5 and 6, we have

equation image

where Qti is the mean tissue factor for the interduplicate expression divergence in tissue ti, [gamma with macron] is the mean decay rate of expression divergence, and [tau] is the mean evolutionary time of the duplicate gene set. Hence, positively correlated wti and Qti under the tissue-driven hypothesis leads to a testable prediction of positive correlation between Eti and Tdup. In particular, a linear EtiTdup relationship is expected when wti = Qti.

Tissue Broadness and Preference.

One can rewrite the accumulated tissue effect on gene g in Eq. 6 as Zg = Lg× Zg, where Lg is the number of (Lg) of tissues in which gene g is expressed, and Z¯g=j=1LgZj/Lg is the average tissue factor for gene g. In fact, Zg measures the effect of tissue preference, or tissue types, on the expression divergence. In short, the accumulated tissue effect can be decomposed into two factors: tissue broadness (Lg) and tissue preference (Zg). The protein sequence becomes more conserved if the gene is expressed in more tissues or in tissues with more stringent constraints.

Although many studies have showed the effect of tissue broadness (9, 17, 25), the effect of tissue preference has not been well investigated. We address this issue by grouping genes with the same tissue broadness (Lg). When Lg is the same, the larger the Zg value, the greater the selection intensity Sg and so the lower evolutionary rate λg. This prediction can be tested under the tissue-driven hypothesis that claims a positive correlation between Wti,g and Zti,g (see below).

Results

In this section, we use the human and mouse genomic data to test these predicted genomic correlations derived from the tissue-driven hypothesis. We focused on 29 orthologous (adult) tissues in which the human and mouse microarrays are available; see Materials and Methods. For each tissue, we estimate the tissue expression distance (Eti) between the human and mouse as well as the tissue protein distance (Dti) and the tissue duplicate distance (Tdup) for expression divergence.

Tissue Expression Divergence Between Human and Mouse.

Based on 8,936 human–mouse orthologs, we estimated the tissue expression distance Eti for each of 29 tissues. Fig. 3 shows a substantial variation of Eti among tissues. Indeed, there is a 2.4-fold difference from the lowest Eln = 0.85 (lymph node, ln) to the highest Epc = 0.206 (pancreas, pc).

Fig. 3.
Variation of human–mouse tissue expression distances (Eti) among 29 tissues. Abbreviations for these tissues are shown in parentheses.

Previous studies (7, 8, 10) observed that brain may have more expression conservation than other tissues, but the small sample size (approximately five tissues) has raised some doubts. We addressed this issue because more (i.e., 29) tissues were examined. We found an overall expression conservation in some neurorelated tissues, e.g., pituitary (pi), amygdala (ad), hypothalamus (hp), and cerebellum (cb) (Fig. 3). In contrast, testis (ts) may have a rapid interspecies expression divergence. Although it remains open to question, one possibility is that the overall relaxed developmental constraint in the testis may facilitate the operation of sexual selection after speciation. Moreover, we noticed that some hormone-related tissues, such as pancreas, may have more developmental plasticity to allow rapid expression divergence, possibly through the interactions with environmental cues during evolution. In short, substantial variation of Eti among tissues implies the role of tissue-specific factors in mammalian genomic evolution.

Correlation (EtiDti) Between Tissue Expression and Sequence Divergence.

For each tissue ti, we calculated the tissue protein distance (Dti) between the human and mouse. Similar to Eti, the observed variation of Dti among tissues may indicate the tissue's role in protein evolution. Moreover, the tissue-driven hypothesis expects covariation between Eti and Dti, because it postulates the same tissue-specific developmental constraint that may affect both tissue expression divergence and sequence divergence of expressed proteins. We indeed found a highly significant correlation between Eti and Dti based on 29 human–mouse tissues (Fig. 4). In the case of high expression (Fig. 4A), the (Pearson) coefficient of correlation is R = 0.55 (P < 0.001), whereas R = 0.66 (P < 0.001) in the case of normal expression (Fig. 4B). Use of the Spearman rank correlations results in very similar P values (<0.001). Hence, the significance of EtiDti correlation provides statistical evidence to support the tissue-driven hypothesis. In addition to two cutoffs presented in Fig. 4 A and B, we have examined several other criteria for gene expression and found that the EtiDti correlation is robust against the choice of cutoff (data not shown).

Fig. 4.
Correlations between tissue expression distance (Eti) and tissue protein distance (Dti) for highly expressed proteins (A) and for normally expressed proteins (B). See Fig. 3 for the description of abbreviations of tissue names. In each case, the correlation ...

Tissue Correlation (EtiTdup) Between Interspecies and Duplicate Expression Divergence.

A positive EtiTdup correlation implies that when a tissue allows more interspecies expression divergence, it should also tolerate more extensive expression divergence between duplicated genes. Based on 1,312 duplicate pairs that were duplicated before the human–mouse split, we estimated the duplicate tissue distance (Tdup) in each of tissue ti. Fig. 5 shows a highly significant correlation between tissue expression distance (Eti) and Tdup (P < 0.001 for either Pearson or Spearman rank correlation). This result supports the tissue-driven hypothesis that duplicated genes tend to have more expression divergence in a tissue with relaxed developmental constraint and vice versa.

Fig. 5.
The correlation between tissue expression distance (Eti) and tissue duplicate distances (Tdup). Here, Tdup is the average of human and mouse duplicates. The correlation is statistically highly significant (P < 0.001).

Evolutionary Rate of Protein Sequence Under Multiple Tissue Constraints.

Let Lg be the number of tissues in which gene g is expressed, or the tissue broadness. For gene g that is expressed in Lg different tissues, we propose an index tg that can be used to measure the effect of tissue preference approximately. We thus calculated the effect of tissue preference (tg) for all 8,936 genes in both human and mouse. We further classified these genes into groups according to the number (Lg) of tissues in which they are expressed, i.e., Lg = 1, 2,… 28, excluding Lg = 29 because tg is identical in this case. Noticeably, for each group, a negative correlation between the protein distance (dg) and tg is observed (Fig. 6A). Twenty-five cases are statistically significant (P < 0.05), whereas cases of Lg = 11, 14, and 23 are not (0.05 < P < 0.1) largely because of the small sample size. In particular, 15 cases show highly statistically significant (P < 0.0001). For instance, Fig. 6B shows the dg vs. tg correlation in the case of Lg = 5. Here, we used AD = 200 as the cut-off for gene expression. For instance, we increased the cut-off up to AD = 800 to examine the so-called transcription leakage effect. At any rate, all these gave virtually the same results.

Fig. 6.
Evolution under multiple tissue constraints. (A) Negative coefficients of tgdg correlations for gene groups with the same tissue broadness (Lg). (B) The tgdg plotting in the case of Lg = 5.

Given the same tissue broadness, the overwhelming negative dgtg correlation indicates that genes that are expressed in stronger constrained tissues (e.g., neurorelated) tend to evolve more slowly at the sequence level than those expressed in weaker constrained tissues (e.g., hormone-related), as predicted by the tissue-driven hypothesis. Apparently, if a protein is expressed in several different tissues, the evolution of protein sequence may be under multi-tissue-specific constraints. Hence, broadly expressed genes generally tend to evolve slowly at the sequence level. Indeed, we found a highly significant negative correlation between dg and Lg, confirming previous studies (e.g., refs. 8 and 25) (data not shown).

Discussion

Under the stabilizing selection model, we have formulated the tissue-driven hypothesis and elaborated several predicted genomic correlations, taking advantage of multitissue human–mouse microarrays. In summary, we found highly significant correlations between tissue expression distance (Eti) and tissue sequence distance (Dti) and between Eti and the duplicate tissue distance (Tdup), supporting the hypothesis that the evolution of expression pattern and protein sequence may be under the same constraint of tissue factors. Moreover, for genes with the same expression broadness, we found that genes expressed in more stringent tissues tend to evolve more slowly than those in more relaxed tissues. Our findings provide some insights on how the rate of genomic evolution can be shaped by the up-level physiological-developmental structure of the organism.

Functional Constraint vs. Positive Selection.

A basic assumption of the tissue-driven hypothesis is that genome evolves largely under functional constraints maintained by stabilizing selections at levels from cell physiology to development. In some evolutionary lineages, episodic adaptive selection may happen either in expression pattern or in protein function (9, 21, 2629). For instance, hundreds of genes (≈2% human genes) showed dramatic brain-specific expression shifts in the human lineage (3, 4, 26, 27). When the tissue-driven hypothesis is extended to include adaptive selection, we found the predictions for both EtiDti and EtiTdup hold. We have examined the rapid-shift (S) model of expression divergence (12). In this case, one can show that the tissue expression distance in Eq. 8 can be modified as Eti = Shm + (1 − e2βt)/wti, and the duplicate tissue distance in Eq. 11 as Tdup = Sdup + (1 − e2βt)/Qti, where Shm and Sdup are the rapid-shift components between human–mouse genes and between duplicate genes, respectively. Except for extreme cases, Shm and Sdup apparently do not affect the predicted genomic correlations.

Effect of Expression Level on Protein Sequence Evolution.

It has been claimed (9, 17, 18) that highly expressed genes tend to evolve slowly. We have examined this confound effect of tissue broadness and found that our main results are robust. For instance, high significance of EtiDti correlation (Fig. 4) holds at various cutoffs, from normal to highly expressed genes. On the other hand, our model can be extended to take the effect of expression level into account, e.g., by assuming the tissue-factor Zg is expression level-dependent.

Tissue Expression Pattern in Primates and Mammals.

The EtiDti correlation between the human and chimpanzee has been investigated by Khaitovich et al. (8), based on five tissues (brain, liver, heart, kidney, and testis). However, our reanalysis of the same data sets leads to nonsignificant result (Spearman rank test P > 0.2), as opposed to the original claim (8) (the Pearson correlation P < 0.05). It is known that the Pearson correlation could be too liberal in small sample size. Because the current study (29 human–mouse tissues) includes these five tissues, we did observe a roughly consistent ranking in Eti or Dti, i.e., the lowest values in the cerebellum/brain, whereas we found the highest values in the testis. Hence, one may speculate that the EtiDti correlation holds in both primates and mammals, although more primate microarray data are needed.

Some Technical Issues.

We have examined several technical issues that may affect our interpretations. First, our analysis is robust against the noise of microarrays, because the expression variation among biological replicates of microarrays is much smaller than the average expression difference between the human and mouse (16). Nevertheless, using the corrected expression distance (13), a conserved measure for interspecies expression divergence, we obtained virtually the same results (data not shown). Second, the exclusion of young duplicates (5) (after the human-mouse split) has almost no effect on our results. Third, we have used several alternative options to determine the status of expression level in a tissue. In all cases, highly significant genomic correlations are always observed.

Because of expression leakage or fluctuation, observed similar gene expression profiles do not necessarily mean a similar tissue function. The extent of these nonfunctional expressions is subject to the debate (30). It seems that the expression leakage may be more frequent in those tissues with relatively weak developmental constraints. Besides, evolution of expressions can be affected by many issues such as transregulatory elements (31) or the alternative splicing isoforms (32). Indeed, more questions are raised than we can solve in evolutionary genomics (3033).

Materials and Methods

Genome Data Sets.

Homology information of human and mouse genes was obtained from the National Center for Biotechnology Information (www.ncbi.nlm.nih.gov/HomoloGene). After extracting the reciprocally unique hit pairs with IDs starting with the prefix “NM-,” we identified a total of 17,462 high-quality human–mouse orthologous genes for further study. Meanwhile, human (HG-U133A and GNF1H) and mouse (GNF1M) Affymetrix microarray data (Affymetrix, Santa Clara, CA) were retrieved from http://symatlas.gnf.org (16). We focused on the following 29 orthologous (adult) tissues in which the human and mouse microarrays are available: Adipose tissue (at), adrenal gland (ag), amygdala (ad), bone marrow (bm), cerebellum (cb), CD4+ T cells (T4), CD8+ T cells (T8), dorsal root ganglion (dr), heart (ht), hypothalamus (hp), kidney (kn), liver (li), lung (lu), lymph node (ln), olfactory bulb (oc), ovary (ov), pancreas (pc), pituitary (pi), placenta (pl), prostate (pt), salivary gland (sg), skeletal muscle (sm), testis (ts), thymus (tm), thyroid (tr), tongue (to), trachea (tc), trigeminal (tg), and uterus (ur). As suggested by the authors (16), we mainly used the normalized (log2-based) ratio value (AffyRatio) of the medium expression value among biological replicates. Using the annotation tables at http://symatlas.gnf.org, we matched the human–mouse orthologous genes to the human and mouse Affymetrix tags, respectively. The final data set included 8,936 human–mouse orthologous genes. Note that ≈20% of cases that had multiple tags in the microarray were targeted against the single gene. We solved this problem by assigning the averaged or the highest expression value for each of these genes (16). Nevertheless, these two treatments provided virtually the same results.

Estimation of Tissue Expression Distance (Eti).

Consider a set (N) of orthologous genes between species 1 (human) and species 2 (mouse). Let xg1,ti and xg2,ti be the (log2-transformed) expression levels of the gth orthologous genes in tissue ti, respectively. Under the OU model, one can easily show that the tissue (ti) expression distance defined in Eq. 8 can be estimated as follows

equation image

Estimation of Tissue Protein Distance (Dti).

We calculated Dti as the mean of evolutionary distances of proteins that are expressed in tissue ti. For each gene, the evolutionary distance was estimated by the Poisson correction; other methods gave virtually the same results (data not shown). For each tissue ti, we inferred the status of gene expression as follows: (i) High expression: the AffyRatio of the gene is above the medium expression among 79 human tissues (16). (ii) Normal expression: calculate the percentages of AD counts (adjusted by the background AD = 200) of the gene in all 29 tissues and then, in a descending order, select the expressed tissues of the gene until the accumulated AD percentage up to 97.5%. This approach may avoid some spurious high AD counts.

Estimation of Tissue Duplicate Distance (Tdup) for Expression Divergence.

Consider a set (Ndup) of duplicate gene pairs. For the jth duplicate pair, the expression levels (AffyRatio) of two duplicate genes in a given tissue (ti) are denoted by xj and yj, respectively. Then, similar to the calculation of Eti in Eq. 12, we estimate Tdup by the formula

equation image

A large Tdup value reflects the plasticity of tissue-specific developmental constraint that allows more expression divergence between duplicate genes.

Estimation of Tissue Broadness and Preference.

The number (Lg) of tissues in which gene g is expressed, or the tissue broadness, can be inferred as described above. For gene g that is expressed in Lg different tissues, let Ej (j = 1,…, Lg) be the jth tissue expression distance between the human and mouse. Because a large Ej means less tissue constraint on expression divergence, we propose an index that can be used to measure the effect of tissue preference as follows

equation image

where tissue expression distance Ej is estimated by Eq. 12. In particular, when the expression divergence is close to the steady state, we have Ej ≈ 1/Wj so that tg is an estimate of the mean tissue factor j=1LgWj/Lg, which is a proxy for the tissue preference Zg = Z¯g=j=1LgZj/Lg under the tissue-driven hypothesis that predicts WjZj, creating a negative correlation between tg and the evolutionary distance of protein sequence (dg).

Supplementary Material

Supporting Text:

Acknowledgments

We thank Dongping Xu for computer assistance. This work was supported by a National Institutes of Health grant and the National Science Foundation of China Overseas Outstanding Young Investigator Award (to X.G.).

Abbreviation

OU
Ornstein–Uhlenback.

Footnotes

The authors declare no conflict of interest.

This article contains supporting information online at www.pnas.org/cgi/content/full/0610797104/DC1.

References

1. Kimura M. The Neutral Theory of Molecular Evolution. Cambridge, UK: Cambridge Univ Press; 1983.
2. Wilson AC, Carlson SS, White TJ. Annu Rev Biochem. 1977;46:573–639. [PubMed]
3. Enard W, Khaitovich P, Klose J, Zollner S, Heissig F, Giavalisco P, Nieselt-Struwe K, Muchmore E, Varki A, Ravid R, et al. Science. 2002;296:340–343. [PubMed]
4. Gu J, Gu X. Trends Genet. 2003;19:63–65. [PubMed]
5. Huminiecki L, Wolfe KH. Genome Res. 2004;14:1870–1879. [PMC free article] [PubMed]
6. Khaitovich P, Weiss G, Lachmann M, Hellmann I, Enard W, Muetzel B, Wirkner U, Ansorge W, Paabo S. PLoS Biol. 2004;2:E132. [PMC free article] [PubMed]
7. Yanai I, Graur D, Ophir R. Omics. 2004;8:15–24. [PubMed]
8. Khaitovich P, Hellmann I, Enard W, Nowick K, Leinweber M, Franz H, Weiss G, Lachmann M, Paabo S. Science. 2005;309:1850–1854. [PubMed]
9. Liao BY, Zhang J. Mol Biol Evol. 2006;23:530–540. [PubMed]
10. Gu Z, Nicolae D, Lu HH, Li WH. Trends Genet. 2002;18:609–613. [PubMed]
11. Makova KD, Li WH. Genome Res. 2003;13:1638–1645. [PMC free article] [PubMed]
12. Gu X. Genetics. 2004;167:531–542. [PMC free article] [PubMed]
13. Gu X, Zhang Z, Huang W. Proc Natl Acad Sci USA. 2005;102:707–712. [PMC free article] [PubMed]
14. Li WH, Yang J, Gu X. Trends Genet. 2005;21:602–607. [PubMed]
15. Gilad Y, Oshlack A, Rifkin SA. Trends Genet. 2006;22:456–461. [PubMed]
16. Su AI, Cooke MP, Ching KA, Hakak Y, Walker JR, Wiltshire T, Orth AP, Vega RG, Sapinoso LM, Moqrich A, et al. Proc Natl Acad Sci USA. 2002;99:4465–4470. [PMC free article] [PubMed]
17. Duret L, Mouchiroud D. Mol Biol Evol. 2000;17:68–74. [PubMed]
18. Yang J, Su AI, Li WH. Mol Biol Evol. 2005;22:2113–2118. [PubMed]
19. Rifkin SA, Houle D, Kim J, White KP. Nature. 2005;438:220–223. [PubMed]
20. Denver DR, Morris K, Streelman JT, Kim SK, Lynch M, Thomas WK. Nat Genet. 2005;37:544–548. [PubMed]
21. Jordan IK, Marino-Ramirez L, Koonin EV. Gene. 2005;345:119–126. [PMC free article] [PubMed]
22. Lande R. Evolution (Lawrence, Kans) 1979;33:234–251.
23. Hansen TF, Martins EP. Evolution (Lawrence, Kans) 1996;50:1404–1417.
24. Gu X. Genetica. 2006 Nov 1; doi: 10.1007/s/0709-006-0022-5. [Cross Ref]
25. Zhang L, Li WH. Mol Biol Evol. 2004;21:236–239. [PubMed]
26. King MC, Wilson AC. Science. 1975;188:107–116. [PubMed]
27. Caceres M, Lachuer J, Zapala MA, Redmond JC, Kudo L, Geschwind DH, Lockhart DJ, Preuss TM, Barlow C. Proc Natl Acad Sci USA. 2003;100:13030–13035. [PMC free article] [PubMed]
28. Piganeau G, Eyre-Walker A. Proc Natl Acad Sci USA. 2003;100:10335–10340. [PMC free article] [PubMed]
29. Sella G, Hirsh AE. Proc Natl Acad Sci USA. 2005;102:9541–9546. [PMC free article] [PubMed]
30. Yanai I, Korbel JO, Boue S, McWeeney SK, Bork P, Lercher MJ. Trends Genet. 2006;22:132–138. [PubMed]
31. Zhang Z, Gu J, Gu X. Trends Genet. 2004;20:403–407. [PubMed]
32. Su Z, Wang J, Yu J, Huang X, Gu X. Genome Res. 2006;16:182–189. [PMC free article] [PubMed]
33. Eisen JA, Fraser CM. Science. 2003;300:1706–1707. [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...