NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Research Council (US) Committee on Population; Wachter KW, Finch CE, editors. Between Zeus and the Salmon: The Biodemography of Longevity. Washington (DC): National Academies Press (US); 1997.

Cover of Between Zeus and the Salmon

Between Zeus and the Salmon: The Biodemography of Longevity.

Show details

12The Potential of Population Surveys for Genetic Studies

Robert B. Wallace


Knowledge of the genetic causes of health conditions and age-related physiologic changes is growing rapidly. Much of the lore of genetics and health, in addition to basic genetic science and molecular biology, comes from the study of informative families and patient groups and, to some extent, from specifically designed population studies. Many populations have been surveyed in recent years to address general health issues, and many more are being surveyed for other important reasons, such as for testing social, economic, or political hypotheses. Among the sponsors of extensive population studies of many types, emphasizing older persons and their health and social needs, is the U.S. National Institute on Aging (NIA), part of the National Institutes of Health. With the emerging technology for conducting genetic studies, it is time to ask whether ways can be found to exploit these major population surveys to better understand the genetics of conditions important to public health. The methodologic challenges for marrying large population surveys to genetic hypotheses are complex and not easily solved, in part because each survey was thoroughly rationalized, scrutinized, and funded to address a set of important nongenetic scientific questions relating to general health, social behavior, and economics. However, given the substantial costs of these population surveys and the restricted availability of research funds, it is essential to at least explore possible intersections of genetic inquiry with existing and planned field studies. The purpose of this paper is to (1) catalog many of the important geographic surveys being supported and/or archived by NIA, (2) describe selected, potential applications of these surveys for genetic study, (3) address the various modes of specimen collection applicable in population surveys, and (4) suggest a research agenda to realize these potential methodologic enhancements.

Summary Of Recent NIA Population-Based Surveys

Many population surveys are conducted in the United States and elsewhere; there is no clear way to identify all of them. Many of the NIA-sponsored surveys are extensive in scope and themes, and several are conducted outside of the United States. Information on a selection of these surveys, the basic characteristics of which are described in Table 12-1 , was taken from the NIA document ''Databases on Aging," a summary of surveys relevant to the demography, economics, and epidemiology of aging, published in February, 1996.1 The surveys noted in Table 12-1 are not an exhaustive list of those available, nor does the table cite many of the survey data sets available in archive form for analysis. In some instances, the tabular information is simplified because of the complex, multiple sampling frames and the varied target populations and different survey intervals. On occasion, survey design and operational information were incomplete.

TABLE 12-1. Summary of Selected NIA-sponsored Population Surveys .

TABLE 12-1

Summary of Selected NIA-sponsored Population Surveys .

In summary, the survey study designs reveal the following: (1) the surveys vary dramatically in health-related content; many were intended largely to study behavioral, social, and economic issues; (2) most of the surveys are recent but inactive, and it is unclear whether participants could be located or recontacted to obtain additional information; (3) many of the surveys contain information on at least some family members, but sometimes this is limited to spouse pairs and the extent of documenting either nuclear or extended pedigrees is often limited or uncertain; (4) collection of bodily specimens—either blood or other tissues or fluids—is rare. In the few instances where specimens were collected, this was limited mostly to U.S. national samples and subsamples conducted by the U.S. National Center for Health Statistics; (5) follow-up rates for the longitudinal panels were generally quite good, including mortality follow-up when part of the protocol; and (6) the original investigators would almost always need to be contacted to explore further participant contact and any possibility of specimen collection, including the determination of ethical and administrative procedures. In general, this suggests that retrospective use of these surveys, particularly the inactive ones, would require additional resources and energy to suit them for genetic study, but nonetheless, a reasonable potential remains for exploitation of, at least, ongoing or planned surveys.

Potential Applications Of Population Surveys For Genetic Study

It is beyond the scope of this report to review advances in basic and clinical genetics and the relation of genetic structure and function to disease occurrence and outcome. For those involved in social surveys who are not schooled in genetics, a very brief discussion emphasizing the complexity of the situation may be of value. On 23 matched (except for the one pair of sex chromosomes) pairs of chromosomes in the nuclei of each human cell, the human genome contains about 100,000 genes, discrete functional and structural sites that interact with the internal cellular and external environment to direct basic cell growth, activity, and death and to transfer this information to the next generation. Each matched gene may vary somewhat from its mate and from the respective genes at the same site in other individuals. These structural variants, called alleles, may function somewhat differently from each other. The specific genetic makeup of an individual is called the genotype. The process of change in the structure of a gene, often accompanied by changes in function, is called mutation; mutation may occur spontaneously or be accelerated by external environmental forces. Mutations may be harmful or helpful to an organism or be biologically neutral. Not all human genes have yet been identified as to structure and function, but work is progressing rapidly. Determination of the structure and function of genes and the relation of altered gene structure to disease occurrence is made more complex by several recent observations: (1) some genes are not necessarily in one physical location on a chromosome; (2) to the extent that important chronic illnesses are gene-related, there are probably multiple genes involved; (3) the mechanisms of genetic regulation and how environmental factors alter that function are incompletely understood; and, (4) some genetic material (DNA) is located outside the nuclear chromosomes in the cytoplasm of the cell and is probably of maternal origin only. Thus, the search for gene-disease associations is clearly complex and difficult, although extremely important. However, as noted below, there are other potential genetically related applications of population studies.

To find potential applications of existing population studies for addressing genetic hypotheses, it is instructive to indicate some general categories of applications, temporarily leaving aside study methods and logistical issues. These categories are possible through the rapidly expanding ability to identify and characterize many genes within individuals and large population samples. However, as in all other fields of measurement, quality control in the laboratory determination of various alleles is essential, as substantial error can occur in laboratory procedures.

Given the emerging capacity to determine alleles in population samples, the following is a selected list of general genetic research applications in population surveys, recognizing that specific studies have many scientific and methodologic contingencies:

Determination of Genotype Frequencies in Well-Defined Populations

A general survey application is to determine the distribution of various genes and alleles in defined populations. While it is an empirical question whether well-constructed and executed population samples will reveal estimates of genotype (allele) frequencies markedly different from more customary sources, such as volunteer populations, clinical populations, blood donors, and newborn screening samples, this use is probably one of the best applications for preexisting and planned general population samples. In addition to their specific sample representativeness, the NIA populations may be attractive because of their broad national coverage, multinational representation, and in some instances access to special populations such as the institutionalized elderly or the oldest old. This access would be particularly valuable as genes are discovered that are associated with late-onset diseases, given the age distribution of many NIA-sponsored survey participants. Several potential specific applications are presented, with examples from the recent scientific literature:

  1. Identification of the age-specific prevalence rates for various alleles to explore hypotheses that these alleles are associated with longevity, at least in cross-sectional designs.
  2. More precise estimation of rare gene/allele population prevalence, because many populations have thousands of participants. This would generally increase the ability to study the biologic behavior of putative, but rarely occurring, gene-disease associations. For an example in population genetic modeling, see Joyce and Tavare (1995). Another example is the population prevalence of the genetic variants of phenylketonuria, a condition that has a frequency of about 1:10,000 (Eisensmith and Woo, 1994).
  3. Determination of allele frequency in multiple ethnic and national groups; this allows assessment of the genetic relatedness of such groups, as well as a comparison of gene frequencies with disease-occurrence rates (Gill and Evett, 1995).
  4. Calculation of population inbreeding coefficients in selected populations to explore the emergence of recessive traits. If genetic determinations are done on available families ascertained from population surveys, it is possible to quantify the degree of population inbreeding, which positively correlates with the emergence rate of hidden (recessive) genetic conditions in that population. This value is sometimes a very important datum in understanding population disease rates (Gill and Evett, 1995).
  5. Assessment of genetic relatedness of migrant and native ethnic, tribal, or national populations. Genetic tools may be useful for tracking the origins of population migration in prehistoric and early historic times (Kalnin et al., 1995).
  6. Provision of high-quality population-referent genetic-marker data for forensic applications. The use of geographically defined populations may offer greater precision in estimating genetic-marker prevalence, which has many medico-legal uses (van Oorschot et al., 1994).
  7. Geographic searches for original or "founding" populations for various genetic diseases. Sometimes it is useful in understanding the population distribution of important genetic diseases to determine the historical source of the original mutations. This can often be difficult, but variation in allele frequencies among different ethnic groups in a population can offer useful clues, as in familial hypercholesterolemia, which is caused by a major gene (Rubinsztein et al., 1994).
  8. Exploration for certain genes or alleles that may explain geographic differences in individual response to certain medications. For example, a particular allele that alters metabolism of a common class of antihypertensive drugs may be much more common in Chinese than in Caucasian populations (Lee, 1994). In an analogous manner, gene-directed alterations in the metabolism of toxic environmental chemicals may be used to explain population differences in disease risk associated with those exposures. Another example is the possibility of determining population differences in the risk of adverse reactions to blood transfusion based on the genetic characteristics of donor and recipient populations (Shivdasani and Anderson, 1994).
  9. Determination of the risk of specific diseases in individuals. A major hope has been to use population studies to explore the role of various genetic determinations in disease risk for individuals. Earlier attempts at using measurable phenotypes (physical manifestations of gene function such as blood type or eye color) as risk factors for disease occurrence were only modestly successful. Even now, with a host of gene-measurement techniques, determining gene-disease associations in prospective population studies is probably inefficient, although perhaps useful for selected investigations. Most gene-disease associations are sought by other means, such as twin studies, segregation analysis of pedigrees, and case-control studies. Defined populations could also be used to verify gene-disease associations, discovered elsewhere, in a population context and to determine the sensitivity and specificity of particular alleles for predicting disease occurrence. One example is the recent association of the apolipoprotein E alleles with variation in the risk of dementia or cognitive change, which was seen in a cohort study (Hyman et al., in press). Other examples of genetic applications for determining disease risk in populations are the suggestion that there may be genes determining susceptibility to tuberculosis infection (Skamene, 1994) and the emerging demonstration of genetic forces in cardiovascular-disease risk factors, such as obesity and hypertension (Schork et al., 1994). Knowledge of population gene frequency for known disease-causing alleles is also quite useful for planning and executing population-based genetic screening programs, which are becoming more common as disease genes are discovered (Shickle and Harvey, 1993). However, regardless of whether genetic markers predict disease onset, they have other emerging uses, such as predicting the natural history and outcome of diseases. For example, certain alleles predict whether young diabetic patients will acquire a certain severe form of retinal disease (Cruickshanks et al., 1992).

Many other applications exist, based on the population determination of genetic markers, but exploiting these opportunities requires dialogue and interdisciplinary cooperation to identify and answer important scientific questions.

Unbiased Sampling of Families or Pedigrees to Ascertain Gene-Disease Associations

Currently, most gene-disease associations are explored in families. The basic logic is to ascertain whether certain genes (alleles) occur in the same members of genetically related families as does a medical condition of interest, called segregation analysis. Other general methods for studying gene-disease associations involve selected parts of family units such as siblings and cousins, or identical and fraternal twins. However, selecting these families (pedigrees) for study from clinical or volunteer populations may obscure some potential associations because of chance clustering of nongenetically related common diseases in these families, possibly leading to spurious negative findings. Sampling families from existing, defined general population surveys, particularly those with information on health and disease history, might be an effective way of unbiased pedigree sampling. The main obstacle may be that occurrence rates for most medical conditions are relatively low, even for population samples numbering in the thousands, and thus not all survey samples may be fully useful for identifying representative families with the multiple occurrence of various conditions. Panel (cohort) studies, as opposed to prevalence (cross-sectional) studies may be somewhat more valuable in this regard because over time additional cases of the study disease will occur and be monitored.

It is also possible, although somewhat inefficient, to ascertain certain family structures, such as twins, multiple siblings, or multiple cousins, from population surveys for further study.

Using Population Surveys for Estimates of Phenotypic Expression

Health-related population surveys have been determining disease and risk-factor occurrence in populations for many years. This fact is restated to emphasize that accurate data on the occurrence of nonfatal diseases and physiologic measures are surprisingly difficult to acquire in many national and regional populations, particularly when inferences from mortality statistics are unavailable. Potentially, social and economic surveys can collect basic health data in certain situations and these data will often contribute to knowledge on population health, as well as help assess the promise of that population for genetic study.

Collecting Genetic Information In Population Surveys

Genetic data can be divided roughly into two categories. The first category is the historical information obtained at interview, including family pedigrees with their biologic relationships, and the disease experience of those families both within and across generations. Standardized techniques exist for ascertaining and recording pedigree information (Bennett et al., 1995). The second category is the bodily specimens on which the genetic studies can be run. There are several general ways to acquire such specimens:

Venopuncture. The most effective way is to obtain blood at the time of survey, if resources allow. This procedure requires specific training of interviewers and use of equipment to store and transmit blood specimens. An alternative approach is to have a smaller number of trained venopuncturists visit the survey respondents later. This protocol would be particularly helpful when a willing primary respondent can gather available family members. Another approach that has been successful is to supply the study participant with blood vials, a prepaid mailing container, and a voucher to pay a local physician or clinic to obtain, process, and transmit the specimen. A less efficient but ancillary approach is to obtain a blood specimen that was stored for some other reason.

Hair follicles. In this technique participants are asked to supply a hair specimen that includes the follicular roots. We have less experience in obtaining such samples in the survey setting, but it may be worth pursuing.

Cheek swabs. This technique is noninvasive and offers promise where venopuncture is impossible. Processing specimens is more cumbersome and expensive, and a problem exists with contamination of the specimen by oral bacteria, food particles, etc.

Surgical specimens. A common technique in molecular epidemiology, particularly in the study of cancer occurrence and prognosis, is to acquire stored tissue specimens obtained at surgery, on which many genetic markers can be determined. These specimens will, of course, vary in availability, depending in part on the time interval since the operation, but they still may be an important source of markers—one that can be accessed by mail if participant and pathologist consents are obtained. A corollary approach is to obtain stored blood or tissue specimens that were obtained at autopsy. Unfortunately, autopsy rates are low and decreasing; this option will often be unavailable.

Ethical Considerations In The Acquisition And Storage Of Genetic Specimens

Although it is beyond the scope of this report to comprehensively review the ethical considerations of obtaining genetic markers, there are growing concerns and evolving regulations about the acquisition and disposition of these markers.

It is necessary for each investigator to ensure that appropriate consent procedures are followed for specimen acquisition, banking, and future applications. Particularly sensitive issues include: (1) accessibility of personal genetic information by other parties, such as family members or insurance companies, (2) accessibility of the specimens to other scientific laboratories, (3) ownership of potential commercial uses of biologic findings from collected specimens, (4) application of specimens for scientific determinations unplanned at the time of collection. (5) fear of discovering previously unreported paternity; and (6) disclosure of high levels of disease risk discovered after the main study has ended, including investigator obligation to maintain contact with participants after completion of the study. These and related issues have been recently discussed in depth, although not always with full resolution (Clayton et al., 1995; Wagener, 1995).

Potential Enhancements To NIA-Sponsored Surveys To Improve Applications To Genetic Studies

Given the apparent potential for amalgamating population surveys with genetic study, several possible enhancements to these studies might lead to improved applicability. These suggestions are intended for further discussion and research planning:

  1. Increase the number of studies in which biologic specimens with genetic material are routinely obtained. This is, of course, both a logistic and a resource challenge. Surveys done over the telephone or over large geographic areas present substantial logistical difficulties, such as interviewer training and personal protection (the U.S. Occupational Safety and Health Administration has strict rules for persons handling blood and other biologic materials), specimen collection, and shipping. In addition, the prospect and burden of specimen collection may potentially decrease participation rates, possibly subverting survey success. However, as discussed, several alternative approaches exist to acquire specimens. Clearly, it is most efficient to consider appending genetic protocols to population surveys when the survey is being planned.
  2. Obtain in informed consent, where possible, for future genetic studies in the event that they are executed. As noted above, the ethical issues surrounding genetic determinations are complex and evolving. In recognition of investigator, institutional, and participant constraints, permission for future genetic determinations could be obtained as scientific issues emerge, perhaps offering with the consent procedures a set of guidelines about the type and disposition of studies possible in the future. Although this is being done presently, there are still numerous pitfalls, such as the action after identifying a genotype associated with high risk of severe but preventable disease. An additional important element of informed consent is to document in advance who is to be contacted when a personally important genetic finding is discovered.
  3. Assemble an expert panel to develop a recommended, standard format for collecting pedigree information from participants in general health and social surveys, irrespective of any immediate need for the information. This procedure would allow later reference to the pedigrees, both for sampling among them and for specific ascertainment if segregation analyses within a pedigree are being considered. Family structure and household rosters are sometimes collected as part of current social, economic, and demographic surveys, but a standardized format, easily displayed and in computerized form, would be very helpful. At a minimum, it would be essential to identify all first-degree relatives, their vital status, and location at the time of survey.
  4. Create a set of standard survey items that ascertain major conditions within families as part of routine survey procedures. This would allow a standardized, general approach to cross-national and cross-cultural studies, where uniformity of morbidity and family data is a problem. There could also be sets of items that emphasize known or potential genetic and congenital diseases, allowing later restudy of the families or populations as candidate genetic markers emerge.


Both from substantive, methodologic, and ethical perspectives, study of the genetic causes of disease and dysfunction is advancing rapidly. Clearly, large-scale population surveys can be extremely informative on genetic issues if appropriate forethought accompanies the inception of these surveys. There is a need for multidisciplinary approaches, at the very least among social scientists, geneticists, epidemiologists, and survey researchers, to fully use the population-survey opportunity.


  • Bennett RL, Steinhaus KA, Uhrich SB, O'Sullivan CK, Resta RG, Lochner-Doyle D, Markel DS, Vincent V, Hamanishi J. Recommendations for standardized human pedigree research. American Journal of Human Genetics. 1995;56:745–752. [PMC free article: PMC1801187] [PubMed: 7887430]
  • Clayton EW, Steinberg KK, Khoury MJ, Thompson E, Andrews L, Ellis Kahn MJ, Kopelman M, Weiss JO. Informed consent for genetic research on stored tissue samples. Journal of the American Medical Association. 1995;274:1786–1792. [PubMed: 7500511]
  • Cruickshanks KJ, Vadheim CM, Moss SE, Roth MP, Riley WJ, Macleran NK, Langfield D, Sparkes RS, Klein R, Rotter JI. Genetic markers associated with proliferative retinopathy in persons diagnosed with diabetes before 30 yr of age. Diabetes. 1992;41:879–885. [PubMed: 1612203]
  • Eisensmith RC, Woo SL. Population genetics of phenylketonuria. Acta Paediairica. Supplement. 1994;407:19–26. [PubMed: 7766949]
  • Hyman BT, Gomez-lsla T, Briggs M, Chung H, Nichols S, Kohout F, Wallace R. A population-based longitudinal study of the influence of apolipoprotein E genotype on the risk of cognitive impairment in the elderly. Annals of Neurology. In Press.
  • Gill P, Evett I. Population genetics of short tandem repeat (STR) loci. Genetica. 1995;96:69–87. [PubMed: 7607461]
  • Joyce P, Tavare S. The distribution of rare alleles. Journal of Mathematical Biology. 1995;33:602–618. [PubMed: 7608640]
  • Kalnin VV, Kalnina OV, Prosniak MI, Khidiatova IM, Khusnutdinova EK, Raphicov KS, Limborska SA. Use of DNA fingerprinting for human population genetic studies. Molecular and General Genetics. 1995;247:488–493. [PubMed: 7770057]
  • Lee EJ. Population genetics of the angiotensin-converting enzyme in Chinese. British Journal of Clinical Pharmacology. 1994;37:212–214. [PMC free article: PMC1364601] [PubMed: 8186067]
  • Rubinsztein DC, van der Westhuyzen R, Coetzee GA. Monogenic primary hypercholesterolemia in South Africa. South African Medical Journal. 1994;84:339–344. [PubMed: 7740380]
  • Schork NJ, Weder AB, Trevisan M, Laurenzi M. The contribution of pleiotropy to blood pressure and body-mass index variation: The Gubbio Study. American Journal of Human Genetics. 1994;54:361–373. [PMC free article: PMC1918169] [PubMed: 8304351]
  • Shivdasani RA, Anderson KC. HLA homozygosity and shared HLA haplotypes in the development of transfusion-associated graft-versus-host disease. Leukemia and Lymphoma. 1994;15:227–234. [PubMed: 7866271]
  • Shickle D, Harvey I. "Inside out," back-to-front: A model for clinical genetic population screening. Journal of Medical Genetics. 1993;30:580–582. [PMC free article: PMC1016458] [PubMed: 8411031]
  • Skamene E. The Bcg gene story. Immunobiology. 1994;191:451–460. [PubMed: 7713559]
  • van Oorschot RA, Gutowski SJ, Robinson SL. HUMTHOI: amplification, species specificity, population genetics and forensic applications. International Journal of Legal Medicine. 1994;107:121–126. [PubMed: 7893608]
  • Wagener DK. Ethical considerations in the design and execution of the National and Hispanic Health and Nutrition Examination Survey (HANES) Environmental Health Perspectives. 1995;103 (Supplement 3):75–80. [PMC free article: PMC1519014] [PubMed: 7635116]



Copies of this document may be obtained from Richard Suzman. Ph.D., National Institute on Aging, Gateway Building, National Institutes of Health, Bethesda. MD 20892, USA.

Copyright © 1997, National Academy of Sciences.
Bookshelf ID: NBK100410


  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (1.5M)

Related information

  • PMC
    PubMed Central citations
  • PubMed
    Links to PubMed

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...