NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Research Council (US) Committee on Human Genome Diversity. Evaluating Human Genetic Diversity. Washington (DC): National Academies Press (US); 1997.

Cover of Evaluating Human Genetic Diversity

Evaluating Human Genetic Diversity.

Show details

Executive Summary

At the request of the National Science Foundation and the National Institutes of Health, the Committee on Human Genome Diversity was organized in 1996 in the Board of Biology of the National Research Council's Commission on Life Sciences. The committee's charge, as defined in the agreement between the Research Council and the sponsors, was to evaluate the consensus proposal to establish a Human Genome Diversity Project (HGDP).

As the committee's fact-finding progressed, it became apparent that the precise nature of the proposed HGDP was elusive; different participants in the formulation of the ''consensus" document had quite different perceptions of the intent of the project, and even of its organizational structure. Accordingly, because there was no sharply defined proposal that the committee could evaluate, it chose to examine the scientific merits and value of research on human genetic variation and the organizational, policy, and ethical issues that such research poses in a more-general context.

The committee, which comprised representatives of all the relevant disciplines, met on 4 occasions to respond to its charge. At these meetings, spokespersons for the scientific community and the public were invited to share their views with the committee. To provide an opportunity for those unable to present their opinions in person, the committee circulated a questionnaire to numerous individuals and organizations. It received hundreds of responses to the questionnaire, for which the committee is most appreciative. The report that follows considered those oral and written presentations.

Briefly, the committee is persuaded that a global assessment of the extent of human genetic variability has substantial scientific merit and warrants support, largely because of the insight that the data collected could provide into the origin and evolution of the human species. A comprehensive survey of human genetic variability both between and within populations could map such variability and place it in social and environmental context. Careful variability sampling in conjunction with the Human Genome Project could contribute fundamentally to a new era of modern molecular medicine and transform scientific understanding of human evolution and the course of human prehistory.

However, the committee foresees numerous ethical, legal, and human-rights challenges in the prosecution of a global effort and offers possible guidelines to the resolution of some, albeit not all, of the challenges that the committee identifies. The committee recognizes that these ethical, legal, and human-rights concerns will not be the only problems. A global survey will also pose numerous technical and organizational difficulties, of which some can be foreseen now and others cannot. Those difficulties, although complex, appear more tractable than do some of the ethical, legal, and moral ones, primarily because solutions to the latter must embrace a wide array of different value systems and cultures, each with its own sense of the rights and obligations of individuals and of the group to which they belong. The committee offers suggestions and recommendations on sampling, specimen collection and storage, and data management that in its view would mitigate some of the technical and organizational difficulties inherent in a multinational, multicultural study.

Sampling Issues

The committee considered 5 possible sampling strategies (see table 1) to determine which one would be most appropriate for a coordinated global effort.

TABLE 1. Sampling Strategies.


Sampling Strategies.

Strategy I is the simplest sampling scheme because its sole requirement is that the sample be representative of the human species. To achieve this end the sample should not be derived from one restricted group of human beings. This scheme yields a sample that cannot be linked to specific individuals, geographic areas or populations. Each sample is identified simply as being from a human being, and no other information is obtained. This is the least-expensive type of sample to acquire, and its collection minimizes many ethical issues at both the personal and population levels (see chapter 5).

Strategy II differs from Strategy I in that it records the geographic location of each sampling point but the sample cannot be linked to particular persons or populations. The geographic points to be sampled could be chosen either by using a grid method or by sampling geographic areas in proportion to the density of populations in them. All the hypotheses related to genome evolution and patterns of variation that were testable with strategy I are also testable with strategy II, but in addition it is possible to test hypotheses related to the patterns of spatial variation and some hypotheses about the geographic subdivision of humans and patterns of gene flow or migration.

Strategy III is the first of the 3 population-based sampling designs given in table 1. It records not only the geographic location of a sample, but also information provided about self-reported ethnicity, primary language, sex, age, and parental birthplaces. However, no personal identifiers would be obtained and thus neither the information nor the samples could be linked to specific persons. All the hypotheses that are testable with strategies I and II can be tested with strategy III, but this strategy broadens the universe of testable hypotheses to those related to population-level relationships and differences measured primarily with data on the frequencies of alleles (alternative forms of genes at the same locus) or haplotypes (particular states of a region of DAN if the DNA region is a coding region, haplotypes correspond to alleles), and associated and derived statistics.

Strategy IV, the second of the population-based strategies given in table 1, includes biomedically relevant information on individually identifiable phenotypes, particularly disease phenotypes. All the hypotheses mentioned in connection with strategy III could be tested with this scheme, but in addition one could look for genotype-disease associations instead of the much-weaker population-disease associations possible with strategy III. However, even such an enhanced data set would still be limited to disease-association studies and could not address disease causation directly. Hidden or unknown heterogeneity in the populations sampled could easily lead to false conclusions, and additional sampling (often the gathering of pedigree data) would be needed to confirm the results obtained with this strategy.

Those limitations can be avoided by going to a third level of population sampling, strategy V, the sampling of families or pedigreed persons in a population instead of persons of unknown relationship. When pedigree data are gathered with population and phenotypic data, more-definitive phenotypic studies are possible and they have enhanced power to detect markets close enough to disease loci to produce a within-family association. Moreover, when many closely linked marker loci exhibit heterozygosity, family data often allow the construction of haplotypes with more certainty. Therefore, this form of sampling would greatly increase the biomedical utility of a human genome sample collection.

Of the various sampling strategies discussed and summarized in table 1, population-based sampling strategy III, in which only basic group-identification data are gathered, is recommended over the other strategies since the data and specimens cannot be linked to specific individuals. Strategy I does not provide a rationale for global sampling, and strategy II has many of the same ethical complications as strategy III but with a substantial restriction in breadth of testable hypotheses. Strategies IV and V could greatly increase the cost, complicate sampling logistics, raise serious ethical and security concerns, and benefit only a few investigators (although the investigations that would be so benefited have the most-direct biomedical relevance). Strategy III offers the best balance of breadth of testable hypotheses, expense, and ethical complications.

A coordinated global sampling effort to develop a common resource for research on human genome variation should use a population-based sampling design in which the geographic location of the sample and self-reported ethnicity, primary language, sex, age, and parental birthplaces are recorded. The committee notes that the inclusion of parental birthplaces along with the other information identified above could, in some instances, inadvertently identify specific individuals.

For any given population, samples of a few hundred to several hundred persons, or even more, should be obtained whenever possible. In larger populations where the investigator(s) deems stratified sampling to be necessary, larger overall samples would be desirable.

Sample Collection And Data Management

The committee believes that at a stage when genotyping technology is evolving rapidly, it would be scientifically inappropriate and premature to designate a common core set of markers that is to be genotyped in all samples. Given advances in technology, a natural outcome will be that individual investigators will perform large-scale surveys of a large number of markers to generate balanced data sets. In spite of differences among individual investigators in sampling designs due to the different hypotheses being tested, many will use common technologies that can provide uniformity in the types and numbers of markers analyzed. To encourage and to facilitate as widespread testing as possible, blood samples collected from human populations should be converted primarily into purified DNA.

With currently available laboratory and information technologies, the material-management and data-management aspects of a coordinated human genome variation research effort do not appear to constitute a serious barrier to implementation of the project. There are multiple feasible models for specimen and data management and numerous instances of international cooperation in the creation of shared repositories of biologic tissue and data. The specimens and data to be captured, analyzed, and disseminated by the project have unique aspects, which will require attention and resources, but none of them is intractable. The most-important decision about project design will be whether it will acquire specimens and data that can be linked to specific individuals and thereby need to meet a "clinical" standard for specimen and data security and access control.

The committee was not charged with providing detailed guidance on data management; therefore our remarks are directed toward general issues. However, there are numerous questions that need to be addressed in a more specific manner than the committee believed itself charged to do. These details must be resolved before embarking on a major data collecting enterprise: The committee recommends that a special panel be convened to do so. Among these questions are the following:

  • Is the data system to be a static (add-only) archive, an originator-modifiable archive, or a cooperative work system? Will the system store results of analyses of the primary data, or third-party annotations, or comments on the basic data?
  • How will release to public access be handled? Will there be multiple levels of access?
  • How will data-sample links be assured?
  • If the database is to be accessible to members of the participating populations, how will the multi-language interfaces that such would require be developed?
  • How will maintenance be done and funded over the long term?
  • How will the databases be structured to meet the conflicting ends of archiving and use of data?

Other important considerations are the following:

  • Establishment of a resource-allocation mechanism to monitor and adjudicate requests for both renewable and nonrenewable research materials,
  • A review mechanism for determining the scientific and ethical merit of requests for specimens (analogous to an institutional review board),
  • A mechanism to detect and respond to unauthorized reuse of specimens for research not agreed to by subject populations,
  • If individually identifiable specimens are collected, a procedure the committee does not advocate, then a mechanism must be established for recontact with and reconsent of participating groups and persons if currently unforeseen uses of specimens arise that are beyond the scope of the original informed consent.
  • Enforcement of ethical protocols, especially the right of groups and individual persons to withdraw their samples from the collection if the samples are personally identifiable.

Human Rights Considerations

Collecting biologic samples from specific individuals and families to extrapolate information about the social groups to which they belong is not a new scientific practice. The confluence of several sets of ethical considerations gives that practice greater risks that human genetic variation researchers must recognize. Continued use of outmoded social categories to structure biomedical research, emerging possibilities for commercializing biomedical knowledge, and heightened awareness of the stigmatizing potential of genetic information all increase public concern about human genetic variation research. To the extent that such research must continue to rely on socially defined human groups, the process of managing any coordinated effort to survey human diversity will be increasingly complex. For each socially identified set of samples, protocols for group involvement and concurrence (including in the design of the research protocol) will have to be negotiated and balanced against the researchers' fundamental ethical obligations to protect the freedom, privacy, and welfare of the individuals involved, including the right not to participate in a study.

It is crucial to have a complete research protocol for review before the actual consent form and process for obtaining consent can be designed and evaluated. For any specific goal-oriented protocol, it should be possible to anticipate the risks and benefits to the subjects and pursue informed consent accordingly. For projects that are not able to specify goals in sufficient detail to quantify risks and benefits reasonably, the worst-case scenario should be assumed: the benefits will be at the lowest anticipated level, and the risks at the highest. That means that the burden of proof for any DNA-sampling project that does not have a well-defined hypothesis will be high. It also underlines the most basic starting point for all ethical analyses of genetic-variation research, regardless of which model is pursued: defining a hypothesis and determining the benefit of knowing whether it is true.

Accurate identification of population units for sampling purposes requires extensive knowledge of the social, political, and linguistic composition of the region to be sampled. Published ethnographic studies can provide some of this knowledge, as can anthropologists who work with the peoples. If this information is not available, researchers should study the local situation in consultation with local leaders, experts, and other researchers before designing the sampling strategy.

In locations where women's rights to self-determination are not recognized (and thus their informed consent not possible), "women should not normally be involved in the research" (commentary on guideline 11 of the International Ethical Guidelines for Biomedical Research Involving Human Subjects), because it is likely that they will not have the freedom and power to choose whether to participate. While it is obviously wrong to exclude women from participation in a study that could lead to results from which they could benefit, it is equally important to insist on informed consent that is freely given.

We think that it is too extreme a position always to require both group and individual consent to DNA collection for genetic-variation research. Nonetheless, researchers will have to make sure that their participants understand both the objections of their community and the rationale for them as part of the informed-consent process and, when doing research that is opposed by a specific community, will also have to take into account the possible impact of doing such research on the likelihood that other communities will cooperate with other genetic-variation researchers in the future.

Should the population itself be able to withdraw from the project? The answer might be that "community withdrawal" is not possible; if that is the case, it should be spelled out in both the protocol and the individual consent processes, as well as in the discussion of the protocol with community representatives. In general, consent and withdrawal are rights of individual research subjects and should not depend on the approval or disapproval of government authorities, however defined.

Studies that collect DNA specimens that can be linked to specific, identifiable persons must institute measures that will prevent unauthorized access to this information, so as to prevent individual research participants from stigmatization and discrimination, and must include mechanisms for follow-up about the results of the studies conducted on collected samples. It is not ethically or legally acceptable to ask research participants to "consent" to future but yet-unknown uses of their identifiable DNA samples. Consent in such a case is a waiver of rights, and such waivers are explicitly prohibited by federal research regulations.

Arrangements regarding financial interests in the products or outcomes of the research should be negotiated as part of the original project review and informed-consent process. In addition, a monitoring and enforcement mechanism, with representation of the affected groups, should be in place. One of the major lessons from the Rio de Janeiro Biodiversity Summit is the importance of economic and political considerations in negotiating research participation with identified human groups. That should not be surprising, inasmuch as social groups are usually created and sustained as a means of pursuing their members' economic and political interests. However, this adds a dimension to informed-consent negotiations that is foreign to most social and biomedical scientists: negotiating over what the participating group receives in return for participation.

Organization And Management

We recognize that neither the National Science Foundation nor the National Institutes of Health are prepared or even able to fund a global survey such as that contemplated and that they seek advice on the role they should play. Accordingly, the committee offers the following guidance: These agencies should focus their financial support, at least initially, on projects originating in the United States and expand their support to the international scene only after the US activities are successfully launched. The establishment of an international effort will require defining the roles of interested investigators, on the one hand, and national and international agencies, on the other. Without defining such roles, any global survey would be correctly criticized for substituting a self-appointed set of administrators without official standing in any country for the recognized national and international agencies of governance, and is unlikely to succeed. The funding agencies, specifically the National Science Foundation and the National Institutes of Health, should initiate such discussions through their international offices. These discussions will take time to bring to fruition, and until a consensus is achieved the US effort would be generating information of substantial moment relevant to the feasibility and urgency of an international study, and identifying administrative barriers that would have to be surmounted.

The committee found its deliberations on the value, design and implementation of tissue repositories, whether centrally or regionally located, constantly thwarted by the absence of information on what repositories are actually available now and the specimens that might be accessible to other investigators. Such information would be of substantial use to many in the scientific community. The committee recommends that NIH or NSF identify all such repositories as well as the availability of the specimens to the scientific community in the United States as well as elsewhere.

Finally, the recommendations of this committee with regard to sampling strategy, sample size, and the collection of specimens and data should be taken into account when considering the scientific merit of an individual request for support.

Copyright © 1997, National Academy of Sciences.
Bookshelf ID: NBK100432


  • PubReader
  • Print View
  • Cite this Page

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...