• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of pnasPNASInfo for AuthorsSubscriptionsAboutThis Article
Proc Natl Acad Sci U S A. Sep 28, 2004; 101(39): 14300–14304.
Published online Sep 7, 2004. doi:  10.1073/pnas.0405949101
PMCID: PMC521103
From the Cover

Higher offspring survival among Tibetan women with high oxygen saturation genotypes residing at 4,000 m


Here we test the hypothesis that high-altitude native resident Tibetan women with genotypes for high oxygen saturation of hemoglobin, and thus less physiological hypoxic stress, have higher Darwinian fitness than women with low oxygen saturation genotypes. Oxygen saturation and genealogical data were collected from residents of 905 households in 14 villages at altitudes of 3,800–4,200 m in the Tibet Autonomous Region along with fertility histories from 1,749 women. Segregation analysis confirmed a major gene locus with an autosomal dominant mode of inheritance for high oxygen saturation levels, associated with a 10% higher mean. Oxygen saturation genotypic probability estimators were then used to calculate the effect of the inferred oxygen saturation locus on measures of fertility, in a subsample of 691 women (20–59 years of age and still married to their first husbands, those with the highest exposure to the risk of pregnancy). The genotypic probability estimators were not significantly associated with the number of pregnancies or live births. The high oxygen saturation genotypic mean offspring mortality was significantly lower, at 0.48 deaths compared with 2.53 for the low oxygen saturation homozygote, because of lower infant mortality. Tibetan women with a high likelihood of possessing one to two alleles for high oxygen saturation had more surviving children. These findings suggest that high-altitude hypoxia is acting as an agent of natural selection on the locus for oxygen saturation of hemoglobin by the mechanism of higher infant survival of Tibetan women with high oxygen saturation genotypes.

High-altitude native populations are exposed to lifelong ambient hypoxia that stresses the oxygen delivery system and elicits adaptations. The genetic bases and thus the evolutionary interpretation of the adaptive traits of high-altitude populations are generally unknown, with the exception of oxygen saturation of hemoglobin among high-altitude native Tibetans. Tibetans at a given high altitude vary widely in percent oxygen saturation of hemoglobin despite uniform ambient hypoxic stress. A putative major gene (an inferred locus) with a recognizable quantitative effect having an autosomal dominant mode of inheritance that is associated with ≈6% higher oxygen saturation has been detected in two areas of the Tibet Autonomous Region (1, 2). The high oxygen saturation genotypes may have greater Darwinian fitness because they are less physiologically stressed, in the sense of having higher arterial oxygen content and less departure from the internal milieu that evolved at sea level. Here, we test the hypotheses that Tibetan women with high oxygen saturation genotypes have higher fertility or lower offspring mortality than women with low oxygen saturation genotypes. Because the oxygen saturation locus is unknown, the approach is to assign to each person genotypic probability estimators for oxygen saturation, and therefore calculate genotypic mean values of demographic traits.

Materials and Methods

Population and Sample. Residents of 14 villages at 3,800–4,200 m in the Tibet Autonomous Region of China participated in data collection, which took place from November 1997 through August 2000. Villages were selected from three counties in two major subcultural areas of central Tibet Autonomous Region to avoid the possibility that chance local factors might influence the results (see ref. 3 for details of data collection in 13 villages; for the present study, one more village was sampled the same way to provide a larger sample). Tibetan research assistants visited 905 households (every household willing to participate in each village, an estimated 95% of all households). Genealogy, oxygen saturation, and female fertility data were collected as follows. An adult household head provided a genealogy for each household in as much detail as he or she could recall. Genealogies provided by relatives in other households allowed cross-checks of the information, and discrepancies were reconciled by reinterviewing. Everyone in each household was invited to participate in a survey of oxygen saturation of hemoglobin that also involved providing information on factors known to influence oxygen saturation, including age, sex, pregnancy status, self-reported health status, self-reported symptoms of chronic obstructive lung disease, medications, and smoking behavior (see ref. 4 for details). A total of 95% of those encountered (≈63% of the total number of residents) agreed to participate, and a total of 3,812 people provided pulse oximetry. All females 18 years of age and older in each household were asked to respond to a detailed fertility survey questionnaire. That information was provided by 1,749 women from a total of 1,864 women in that age range reported to be members of those households. Virtually everyone agreed to respond; the remaining women were not in continuous residence. Participants provided informed consent according to a protocol approved by the Case Western Reserve University Institutional Review Board.

Reasoning that any effect of genotype is most likely to be detectable among women with the highest exposure to the risk of pregnancy, we (after reporting the results of a segregation analysis on a larger set of individuals) present analyses of data from a subgroup of married women 20–59 years of age who were still married to their first husbands. They are 67% of the women in that age range. The lower age boundary was selected because only eight (2%) of the women under 20 years of age were married; the average age at marriage of women in the subgroup analyzed was 23 years. The upper age boundary was selected because women 60 years of age and older may have forgotten or not reported some fertility events. Evidence that this may have happened is the finding that the reported number of pregnancies increased steadily every decade of age from the 20s through the 50s, then decreased into the 60s and further decreased into the 70s. The average age of the women in the subgroup analyzed was 38 ± 10.3 years (±SD) (n = 691), and they had had an average of 4.5 ± 2.7 births with a range from 0 to 15 (n = 687) and an average of 4.1 ± 2.3 living children with a range from 0 to 13 (n = 666).

Data Analyses. Data analyses entailed (i) pedigree segregation analysis of oxygen saturation data to obtain oxygen saturation genotypic probability estimators (GPEs) for each person in the pedigrees, and (ii) use of the method of Hasstedt and Moll (5) to determine from these genotypic probability estimators the effect of the inferred oxygen saturation locus on measures of female fertility and offspring survival.

Data Preparation. A total of 52 families provided information on 9,589 individuals, 2,883 of whom had oxygen saturation measurements. To analyze the data with existing software, loops caused by one man having children by two or more sisters or by one woman having children by two or more brothers were broken, and individuals who were second degree relatives or less related were assumed to be unrelated when few of the individuals through whom they were related had phenotypic information. As a result, two large families consisting of 6,108 and 2,906 individuals, respectively, were broken into 278 families. Overall, with <5% removal or duplication of individuals, this resulted in 328 pedigrees containing 2,961 individuals with O2 saturation information, as indicated in Table 1.

Table 1.
Description of pedigree data from 14 villages in the Tibet Autonomous Region

Data Adjustment. Before performing segregation analyses, oxygen saturation levels were examined for the effects of age, sex, smoking status, village altitude, and the reported symptoms of respiratory disease, as well as their first-order interactions, using a stepwise multiple linear regression procedure. Symptoms of respiratory disease were evaluated by the answers to six questions about the presence or absence of cough or phlegm in the morning, throughout the day, and throughout the year. Missing data for altitude on 12 people were replaced by the average of the variable (3,945 m). Missing data for the six pulmonary diseases symptoms were replaced by 0 (no reported symptoms). A total of 475 (16%) did not respond to the first question about cough, and 589 (20%) did not respond to the first question about phlegm, yet they responded to the subsequent questions. This apparently results from the procedure of collecting data from entire households. Individuals would have heard the questions asked of other household members; when it was their turn they responded with assertions of good health without symptoms or they reported symptoms of a specific duration because they already knew the options. Seven terms statistically significant (at the 5% level) were retained (F = 27.61, 7 df, P < 0.0001, R2 = 0.062): sex, smoker, altitude, sex × cough, smoker × cough, age × phlegm production, and altitude × phlegm production. This model includes interactions between main effects that are not in the model (age, cough, and phlegm production) and these main effects were added into the regression model. In addition, a new binary variable (no reported symptoms of respiratory disease or reported symptoms) was created to determine whether the two groups (individuals without symptoms of respiratory disease or individuals with symptoms) were different, and this variable was also included in the regression model (F = 17.95, 11 df, P < 0.0001, R2 = 0.063). The group effect was not significant. We also compared the empirical cumulative distributions of saturation levels in the two groups, with and without symptoms, after adjusting for the other variables listed above and found them to be virtually identical. The regression equation in Table 2 was used to adjust the oxygen saturation levels for all subsequent analyses.

Table 2.
Regression coefficients of the final model used to adjust oxygen saturation levels (n = 2,961)

Segregation Analysis. To investigate the evidence for a major gene influencing phenotypic variation in oxygen saturation levels, pedigree segregation analysis (6) of adjusted oxygen saturation levels was performed on the entire data set by using a class D regressive model (7). The major gene effects are assumed to result from segregation at a single locus having two alleles, A and B. In this analysis, the A allele is associated with lower saturation levels. The parameters in the most general model include a gene frequency (qA), a transmission probability for each genotype transmitting A (τAA, τAB, τBB), three genotype means (μAA, μAB, μBB), a common variance σ2, residual familial correlations for father–mother pairs (ρFM), parent–offspring pairs (ρPO) and sibling pairs (ρSS), and a power transformation (λ1). For the single locus models that we test, the following constraints were considered: (i) high values recessive (μAA = μAB), (ii) high values dominant (μAB = μBB), (iii) additive [μAB = (μAA + μBB)/2], and (iv) codominant (μAA, μAB, and μBB arbitrary). A model with no transmission of a major gene fixes the transmission parameters equal to the A allele frequency (τAA = τAB = τBB = qA). A mixed multifactorial/Mendelian model with residual familial correlations fixes the transmission parameters to Mendelian expectations (τAA = 1, τAB = 0.5, τBB = 0). For this segregation analysis, the segreg (version 4.1) program of the statistical analysis for genetic epidemiology (sage) package (8) was used.

Likelihood-ratio tests were used to select the most parsimonious model. The test statistic is minus twice the difference in the log likelihood (lnL) for two models, one nested in the other; under certain regularity conditions, this statistic is, in large samples, distributed as χ2, with the number of degrees of freedom (df) being equal to the difference in the number of parameters estimated between the two models.

Table 3 gives the results for (i) a general model with arbitrary transmission probabilities; (ii) an environmental model with no major gene transmission; and (iii) a mixed Mendelian model. The environmental model was rejected, and the departure from the mixed multifactorial/Mendelian model (B dominant in Table 3) was not significant (χ2 = 7.4, 3 df, P = 0.060) when compared with the general model. Finally, we determined whether a codominant mode of inheritance fits better than a dominant mode of inheritance (Table 4). The dominant model for high values was not rejected (χ2 = 0.9, 1 df, P = 0.343). Therefore, the most parsimonious model for oxygen saturation level after adjusting for covariates is Mendelian dominant for high values with an allele frequency of 0.78, together with residual familial correlations. As a check, all of the covariates used to adjust the data were introduced into the final model to verify that none were still significant.

Table 3.
Parameter estimates from segregation analysis of oxygen saturation levels
Table 4.
Major locus mode of inheritance of oxygen saturation levels

Table 5 compares the major locus segregation results of this analysis with those of previously published analyses using a model that assumed all multifactorial contributions could be accounted for by polygenic inheritance. All three analyses detected a major locus with an autosomal dominant mode of inheritance for high oxygen saturation levels. The high genotypic mean of the present study is about the same as that for a different group of villages at a similar altitude (some of the villages in the 1997 study were included in the present study). The oxygen saturation difference between high and low genotypic means was nearly 10% in the present study, as compared to ≈6% in the two earlier ones. The reason for this may be that, in addition to the difference in the samples analyzed, the present model included residual familial correlations without assuming that they are due to polygenic inheritance and also allowed for simultaneous estimation of a Box–Cox power transformation parameter (9) in each analysis to better approximate normality conditional on genotype.

Table 5.
Comparison of maximum likelihood estimates of the major gene parameters for oxygen saturation obtained in the present study with those previously published

Calculation of Individual Genotypic Probability Estimates. In calculating the likelihood for all of the pedigree data, there is summation over the three possible genotypes (AA, AB, and BB) for each individual in the entire data set. The probabilities that a particular individual has each of these genotypes, the genotypic probability estimates, are obtained by dividing each of the three terms in the likelihood corresponding to that individual by the total likelihood (5). To do this, all unknown parameters in the likelihood are replaced by their maximum likelihood estimates as given in the penultimate columns of Tables Tables33 and and44.

Calculation of Genotypic Mean Estimates of Fertility Traits and Tests of Significance Using Multiple Regression Analyses and Permutation Tests. First, multiple linear regression analyses were used to test the effects of a basic set of covariates, unrelated to genotype, that plausibly contribute to variation in each of the fertility and related traits for the subgroup of women 20–59 years of age still married to their first husbands. These covariates were age, household economic status, marital type (e.g., monogamy, polygyny), age at first and most recent pregnancy, age at first and most recent birth, altitude of village of residence, and current use of family planning. For each trait, the covariates significant at the 5% level were retained in a second multiple regression analysis, to which the oxygen saturation genotype probabilities were added as further covariates, testing whether the corresponding putative locus significantly contributes to variation in that trait. Because the three genotype probability estimates for an individual sum to one, this regression analysis was performed without a separate intercept term in the regression model (5). All of the other covariates were centered, with the result that, assuming that the regression model is correct, the coefficients of the three probabilities are the minimum variance unbiased genotypic mean estimators, for each of the oxygen saturation genotypes, of the fertility trait being analyzed. Because of nonnormality of the residuals in the multiple regression analyses, permutation tests were also performed to test for a significant difference between these means. For this purpose, the genotype probabilities (AA versus AB/BB, because there was never a significant difference between AB and BB) were shuffled across persons 2,000 times; for each such permutation replicate, the corresponding t statistic was calculated. The permutation P value is then given by the proportion of these 2,000 (absolute value) statistics that was larger than that for the observed data. If the true P is 0.05, then the confidence interval (95% CI) obtained on the basis of 2,000 permutations is less than ±0.01.


The likelihood of becoming pregnant and delivering a live birth was similar for all three genotypes (Table 6). The number of pregnancies and live births did not differ among the genotypes and varied by just 0.31 pregnancies or live births (Table 6). In contrast, the number of children now alive varied widely from the high-saturation genotypic mean of 3.79 to the low-saturation genotypic mean of just 1.64, implying that more offspring of low-saturation genotype women had died. The number of children who had died varied from a genotypic mean of 0.48 for the high-saturation genotypes to 2.53 for the low-saturation genotype. The excess mortality occurred during infancy. The high-saturation genotypic mean of just 0.32 infant deaths was markedly lower than the low-saturation genotypic mean of 1.69. There were no significant differences in child (1–15 years of age) mortality. Key to the evolutionary perspective is offspring mortality before reproductive age, conventionally defined as 15 years of age among humans. The genotypic mean prereproductive mortality was 8% of live births for the high-saturation genotypes compared with 43% of live births for the low-saturation homozygotes. The net result was more living children for the high-saturation genotypes (3.79 compared with 1.64 offspring).

Table 6.
Estimates of fertility measures for three oxygen saturation genotypes


The present study in a large sample from three counties in the Tibet Autonomous Region confirmed previous findings of a major gene for oxygen saturation (1, 2). It extended those observations by addressing consequences of the genetic variation on measures of Darwinian fitness.

The main finding of this study of Tibetan women residing at a mean altitude of ≈4,000 m was significantly lower offspring, primarily infant, mortality and more living children among those with a higher probability of having one or two alleles for high oxygen saturation. There was no evidence of a compensatory increase in fertility of the low oxygen saturation genotype that might offset the higher mortality and result in the same number of surviving offspring. If this pattern were to continue over time, then the frequency of the allele for high oxygen saturation would increase. These data do not distinguish whether the higher infant survival of the high oxygen saturation genotype mothers results from characteristics of the newborn, such as birth weight, that are substantially influenced by maternal characteristics (10) or the infant's ability to withstand postnatal stress that might be more influenced by his or her own genotype.

A limitation to interpreting the results is that oxygen saturation genotypes were not directly observed because the locus is unknown. Oxygen saturation phenotypes exhibited a pattern of segregation consistent with an autosomal dominant mode of inheritance for high values. The mean genotypic difference in oxygen saturation was ≈10%. However, because the standard deviation was also ≈10%, it was not possible to unequivocally assign individuals to genotypic categories. We used data on oxygen saturation phenotype and family relationships to estimate the probability that an individual had each of the three possible genotypes. Next, we used an individual's genotypic probability estimators as a factor influencing fertility. According to Hassted and Moll, “Intuitively, the genotypic probability estimators (GPEs) partially assign an individual to a given genotype. Thus, an individual with genotypic probabilities of 0.85, 0.14 and 0.01 for genotypes dd, Dd and DD would contribute 85% of an observation to genotype dd, 14% to genotype Dd, and 1% to genotype DD” (ref. 5, page 321). That is, the reported genotypic means are composites of contributions from all of the women in the sample.

The effects of some potential confounding factors that might bias the findings in favor of detecting genotypic mean differences and inferring natural selection can be excluded. There was no independent means of confirming paternity as reported in the pedigrees; however, misclassification would probably bias against detecting a major gene. Although the oxygen saturation locus is unknown, there is strong indirect evidence that oxygen saturation level is heritable. This study and two other reports of samples across a broad geographic and altitude range of the Tibet Autonomous Region, collected in different years, yield consistent estimates of heritability and genotypic mean differences in oxygen saturation. It is very unlikely that the selection of study participants introduced a bias into the results because virtually all women in 14 villages in three geographically separated counties of the Tibet Autonomous Region responded to the fertility survey and nearly all households contributed phenotypic and pedigree data. Selective survivorship of women with a high likelihood of being low saturation homozygotes is possible. Those women survived a risky infancy and may be a biased subsample of that genotype. However, the survivors would probably be biased toward better function than their deceased counterparts, and that bias would act against the hypothesis of genotypic differences in fertility. The possibility of developmental effects on oxygen saturation or fertility was taken into account by including women across the broad age range of 20–59 years and by the multiple regression approach that included age as a covariate.

In conclusion, the present study found no evidence of genotypic mean differences in fecundity as measured by reported numbers of pregnancies and live births. It found evidence that high saturation genotypes had lower offspring mortality and more surviving offspring. Differential survival of offspring among genotypes is a defining feature of natural selection. The results link variation in a heritable trait, oxygen saturation of hemoglobin, with differences in offspring mortality in a large sample of high-altitude native resident Tibetan women. Differential offspring survivorship among maternal genotypes will result in the transmission of relatively more high saturation alleles to the next generation. These findings suggest that high-altitude hypoxia is acting as an agent of natural selection on the heritable trait of oxygen saturation of hemoglobin by the mechanism of higher infant survival of Tibetan women with higher oxygen saturation genotypes.


The fieldwork was conducted in affiliation with the Tibet Academy of Social Sciences, Lhasa. We thank Ms. Beimatsho, Mr. Kesang Yishi and Dr. Ben Jiao of the Tibet Academy of Social Sciences, who accomplished the enormous task of collecting the measurements of oxygen saturation and interviews in the 905 households, for their hard work. We thank Heather Lindstrom for overseeing the painstaking work of linking and coding the genealogies. We thank the officials and the study participants for their generous cooperation and hospitality. This work was supported by the Henry R. Luce Foundation (to M.C.G.), National Science Foundation Award SBR 9706980 (to C.M.B.), U.S. Public Health Service Resource Grant RR03655 (to R.C.E.), the National Center for Research Resources (to R.C.E.), and National Institute of General Medical Sciences Grant GM28356 (to R.C.E.).


1. Beall, C. M., Blangero, J., Williams-Blangero, S. & Goldstein, M. C. (1994) Am. J. Phys. Anthropol. 95, 271–276. [PubMed]
2. Beall, C. M., Strohl, K. P., Blangero, J., Williams-Blangero, J., Brittenham, G. M. & Goldstein, M. C. (1997) Hum. Biol. 69, 597–604. [PubMed]
3. Goldstein, M. C., Jiao, B., Beall, C. M. & Tsering, P. (2002) China J. 47, 19–39.
4. Beall, C. M. (2000) High Altitude Med. Biol. 1, 25–32. [PubMed]
5. Hasstedt, S. J. & Moll, P. P. (1989) Genet. Epidemiol. 6, 319–332. [PubMed]
6. Elston, R. C. & Stewart, J. (1971) Hum. Heredity 21, 523–542. [PubMed]
7. Bonney, G. E. (1984) Am. J. Hum. Genet. 18, 731–749. [PubMed]
8. Statistical Solutions (2003) sage: Statistical Analysis for Genetic Epidemiology (Statistical Solutions, Cork, Ireland).
9. Box, G. E. P. & Cox, D. R. (1964) J. Royal Stat. Soc. B 26, 211–252.
10. Moore, L. G., Zamudio, S., Zhuang, J., Sun, S. & Droma, T. (2001) Am. J. Phys. Anthropol. 114, 42–53. [PubMed]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • Compound
    PubChem Compound links
  • MedGen
    Related information in MedGen
  • OMIM
    OMIM record citing PubMed
  • PubMed
    PubMed citations for these articles
  • Substance
    PubChem Substance links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...