NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Research Council (US) Panel on Collecting, Storing, Accessing, and Protecting Biological Specimens and Biodata in Social Surveys; Hauser RM, Weinstein M, Pool R, et al., editors. Conducting Biosocial Surveys: Collecting, Storing, Accessing, and Protecting Biospecimens and Biodata. Washington (DC): National Academies Press (US); 2010.

Cover of Conducting Biosocial Surveys

Conducting Biosocial Surveys: Collecting, Storing, Accessing, and Protecting Biospecimens and Biodata.

Show details

5Findings, Conclusions, and Recommendations

As the preceding chapters have made clear, incorporating biological specimens into social science surveys holds great scientific potential, but also adds a variety of complications to the tasks of both individual researchers and institutions. These complications arise in a number of areas, including collecting, storing, using, and distributing biospecimens; sharing data while protecting privacy; obtaining informed consent from participants; and engaging with Institutional Review Boards (IRBs). Any effort to make such research easier and more effective will need to address the issues in these areas.

In considering its recommendations, the panel found it useful to think of two categories: (1) recommendations that apply to individual investigators, and (2) recommendations that are addressed to the National Institute on Aging (NIA) or other institutions, particularly funding agencies. Researchers who wish to collect biological specimens with social science data will need to develop new skills in a variety of areas, such as the logistics of specimen storage and management, the development of more diverse informed consent forms, and ways of dealing with the disclosure risks associated with sharing biogenetic data. At the same time, NIA and other funding agencies must provide researchers the tools they need to succeed. These tools include such things as biorepositories for maintaining and distributing specimens, better guidance on informed consent policies, and better ways to share data without risking confidentiality.


Although working with biological specimens will be new and unfamiliar to many social scientists, it is an area in which biomedical researchers have a great deal of expertise and experience. Many existing documents describe recommended procedures and laboratory practices for the handling of biospecimens. These documents provide an excellent starting point for any social scientist who is interested in adding biospecimens to survey research.

Recommendation 1: Social scientists who are planning to add biological specimens to their survey research should familiarize themselves with existing best practices for the collection, storage, use, and distribution of biospecimens. First and foremost, the design of the protocol for collection must ensure the safety of both participants and survey staff (data and specimen collectors and handlers).

Although existing best-practice documents were not developed with social science surveys in mind, their guidelines have been field-tested and approved by numerous IRBs and ethical oversight committees. The most useful best-practice documents are updated frequently to reflect growing knowledge and changing opinions about the best ways to collect, store, use, and distribute biological specimens. At the same time, however, many issues arising from the inclusion of biospecimens in social science surveys are not fully addressed in the best-practice documents intended for biomedical researchers. For guidance on these issues, it will be necessary to seek out information aimed more specifically at researchers at the intersection of social science and biomedicine.


As described in Chapter 2, the collection, storage, use, and distribution of biospecimens and biodata are tasks that are likely to be unfamiliar to many social scientists and that raise a number of issues with which even specialists are still grappling. For example, which biospecimens in a repository should be shared, given that in most cases the amount of each specimen is limited? And given that the available technology for cost-efficient analysis of biospecimens, particularly genetic analysis, is rapidly improving, how much of any specimen should be used for immediate research and analysis, and how much should be stored for analysis at a later date? Collecting, storing, using, and distributing biological specimens also present significant practical and financial challenges for social scientists. Many of the questions they must address, such as exactly what should be held, where it should be held, and what should be shared or distributed, have not yet been resolved.

Developing Data Sharing Plans

An important decision concerns who has access to any leftover biospecimens. This is a problem more for biospecimens than for biodata because in most cases, biospecimens can be exhausted. Should access be determined according to the principle of first funded, first served? Should there be a formal application process for reviewing the scientific merits of a particular investigation? For studies that involve international collaboration, should foreign investigators have access? And how exactly should these decisions be made? Recognizing that some proposed analyses may lie beyond the competence of the original investigators, as well as the possibility that principal investigators may have a conflict of interest in deciding how to use any remaining biospecimens, one option is for a principal investigator to assemble a small scientific committee to judge the merits of each application, including the relevance of the proposed study to the parent study and the capacities of the investigators. Such committees should publish their review criteria to help prospective applicants. A potential problem with such an approach, however, is that many projects may not have adequate funding to carry out such tasks.

Recommendation 2: Early in the planning process, principal investigators who will be collecting biospecimens as part of a social science survey should develop a complete data sharing plan.

This plan should spell out the criteria for allowing other researchers to use (and therefore deplete) the available stock of biospecimens, as well as to gain access to any data derived therefrom. To avoid any appearance of self-interest, a project might empower an external advisory board to make decisions about access to its data. The data sharing plan should also include provisions for the storage and retrieval of biospecimens and clarify how the succession of responsibility for and control of the biospecimens will be handled at the conclusion of the project.

Recommendation 3: NIA (or preferably the National Institutes of Health [NIH]) should publish guidelines for principal investigators containing a list of points that need to be considered for an acceptable data sharing plan. In addition to staff review, Scientific Review Panels should read and comment on all proposed data sharing plans. In much the same way as an unacceptable human subjects plan, an inadequate data sharing plan should hold up an otherwise acceptable proposal.

Supporting Social Scientists in the Storage of Biospecimens

The panel believes that many social scientists who decide to add the collection of biospecimens to their surveys may be ill equipped to provide for the storage and distribution of the specimens.

Conclusion: The issues related to the storage and distribution of biospecimens are too complex and involve too many hidden costs to assume that social scientists without suitable knowledge, experience, and resources can handle them without assistance.

Investigators should therefore have the option of delegating the storage and distribution of biospecimens collected as part of social science surveys to a centralized biorepository. Depending on the circumstances, a project might choose to utilize such a facility for immediate use, long-term or archival storage, or not at all.

Recommendation 4: NIA and other relevant funding agencies should support at least one central facility for the storage and distribution of biospecimens collected as part of the research they support.


Several different types of data must be kept confidential: survey data, data derived from biospecimens, and all administrative and operational data. In the discussion of protecting confidentiality and privacy, this report has focused on biodata, but the panel believes it is important to protect all the data collected from survey participants. For many participants, for example, data on wealth, earnings, or sexual behavior can be as or more sensitive than genetic data.

Conclusion: Although biodata tend to receive more attention in discussions of privacy and confidentiality, social science and operational data can be sensitive in their own right and deserve similar attention in such discussions.

Protecting the participants in a social science survey that collects biospecimens requires securing the data, but data are most valuable when they are made available to researchers as widely as possible. Thus there is an inherent tension between the desire to protect the privacy of the participants and the desire to derive as much scientific value from the data as possible, particularly since the costs of data collection and analysis are so high. The following recommendations regarding confidentiality are made in the spirit of balancing these equally important needs.

Genomic data present a particular challenge. Several researchers have demonstrated that it is possible to identify individuals with even modest amounts of such data. When combined with social science data, genomic data may pose an even greater risk to confidentiality. It is difficult to know how much or which genomic data, when combined with social science data, could become critical identifiers in the future. Although the problem is most significant with genomic data, similar challenges can arise with other kinds of data derived from biospecimens.

Conclusion: Unrestricted distribution of genetic and other biodata risks violating promises of confidentiality made to research participants.

There are two basic approaches to protecting confidentiality: restricting data and restricting access. Restricting data—for example, by stripping individual and spatial identifiers and modifying the data to make it difficult or impossible to trace them back to their source—usually makes it possible to release social science data widely. In the case of biodata, however, there is no answer to how little data is required to make a participant uniquely identifiable. Consequently, any release of biodata must be carefully managed to protect confidentiality.

Recommendation 5: No individual-level data containing uniquely identifying variables, such as genomic data, should be publicly released without explicit informed consent.

Recommendation 6: Genomic data and other individual-level data containing uniquely identifying variables that are stored or in active use by investigators on their institutional or personal computers should be encrypted at all times.

Even if specific identifying variables, such as names and addresses, are stripped from data, it is still often possible to identify the individuals associated with the data by other means, such as using the variables that remain (age, sex, marital status, family income, etc.) to zero in on possible candidates. In the case of biodata that do not uniquely identify individuals and can change with time, such as blood pressure and physical measurements, it may be possible to share the data with no more protection than stripping identifying variables. Even these data, however, if known to intruders, can increase identification disclosure risk when combined with enough other data. With sufficient characteristics to match, intruders can uniquely identify individuals in shared data if given access to another data source that contains the same information plus identifiers.

Conclusion: Even nonunique biodata, if combined with social science data, may pose a serious risk of reidentification.

In the case of high-dimensional genomic data, standard disclosure limitation techniques, such as data perturbation, are not effective with respect to preserving the utility of the data because they involve such extreme alterations that they would severely distort analyses aimed at determining gene—gene and gene—environment interactions. Standard disclosure limitation methods could be used to generate public-use data sets that would enable low-dimensional analyses involving genes, for example, one gene at a time. However, with several such public releases, it may be possible for a key match to be used to construct a data set with higher-dimensional genomic data.

Conclusion: At present, no data restriction strategy has been demonstrated to protect confidentiality while preserving the usefulness of the data for drawing inferences involving high-dimensional interactions among genomic and social science variables, which are increasingly the target of research. Providing public-use genomic data requires such intense data masking to protect confidentiality that it would distort the high-dimensional analyses that could result in ground-breaking research progress.

Recommendation 7: Both rich genomic data acquired for research and sensitive and potentially identifiable social science data that do not change (or change very little) with time should be shared only under restricted circumstances, such as licensing and (actual or virtual) data enclaves.

As discussed in Chapter 3, the four basic ways to restrict access to data are licensing, remote execution centers, data enclaves, and virtual data enclaves. Each has its advantages and disadvantages.1 Licensing, for example, is the least restrictive for a researcher in terms of access to the data, but the licensing process itself can be lengthy and burdensome. Thus it would be useful if the licensing process could be facilitated.

Recommendation 8: NIA (or preferably NIH) should develop new standards and procedures for licensing confidential data in ways that will maximize timely access while maintaining security and that can be used by data repositories and by projects that distribute data.

Ways to improve the other approaches to restricted access are needed as well. For example, improving the convenience and availability of virtual data enclaves could increase the use of combined social science and biodata without a significant increase in risk to confidentiality. The panel notes that much of the discussion of the confidentiality risk posed by the various approaches is theoretical; no one has a clear idea of just what disclosure risks are associated with the various ways of sharing data. It is important to learn more about these disclosure risks for a variety of reasons—determining how to minimize the risks, for instance, or knowing which approaches to sharing data pose the least risk. It would also be useful to be able to describe disclosure risks more accurately to survey participants.

Recommendation 9: NIA and other funding agencies should assess the strength of confidentiality protections through periodic expert audits of confidentiality and computer security. Willingness to participate in such audits should be a condition for receipt of NIA support. Beyond enforcement, the purpose of such audits would be to identify challenges and solutions.

Evaluating risks and applying protection methods, whether they involve restricted access or restricted data, is a complex process requiring expertise in disclosure protection methods that exceeds what individual principal investigators and their institutions usually possess. Currently, not enough is known to be able to represent these risks either fully or accurately. The NIH requirement for data sharing necessitates a large investment of resources to anticipate which variables are potentially available to intruders and to alter data in ways that reduce disclosure risks while maintaining the utility of the data. Such resources are better spent by principal investigators on collecting and analyzing the data.

Recommendation 10: NIH should consider funding Centers of Excellence to explore new ways of protecting digital representations of data and to assist principal investigators wishing to share data with others. NIH should also support research on disclosure risks and limitations.

Principal investigators could send digital data to these centers, which would organize and manage any restricted access or restricted data policies or provide advisory services to investigators. NIH would maintain the authority to penalize those who violated any confidentiality agreements, for example, by denying them or their home institution NIH funding. Models for these centers include the Inter-university Consortium for Political and Social Research (ICPSR) and its projects supported by NIH and the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) and the UK data sharing archive. The centers would alleviate the burden of data sharing as mandated of principal investigators by NIH and place it in expert hands. However, excellence in the design of data access and control systems is likely to require intimate knowledge of each specific data resource, so data producers should be involved in the systems’ development.


As described in Chapter 4, informed consent is a complex subject involving many issues that are still being debated; the growing power of genetic analysis techniques and bioinformatics has only added to this complexity. Given the rapid pace of advances in scientific knowledge and in the technology used to analyze biological materials, it is impossible to predict what information might be gleaned from biological specimens just a few years hence; accordingly, it is impossible, even in theory, to talk about perfectly informed consent. The best one can hope for is relatively well-informed consent from a study’s participants, but knowing precisely what that means is difficult. Determining the scope of informed consent adds another layer of complexity. Will new analyses be covered under the existing consent, for example? There are no clear guidelines on such questions, yet specific details on the scope of consent will likely affect an IRB’s reaction to a study proposal.

What Individual Researchers Need to Know and Do Regarding Informed Consent

To be sure, there is a wide range of views about the practicality of providing adequate protection to participants while proceeding with the scientific enterprise, from assertions that it is simply not possible to provide adequate protection to offers of numerous procedural safeguards but no iron-clad guarantees. This report takes the latter position—that investigators should do their best to communicate adequately and accurately with participants, to provide procedural safeguards to the extent possible, and not to promise what is not possible.2 Social science researchers need to know that adding the collection of biospecimens to social science surveys changes the nature of informed consent. Informed consent for a traditional social science survey may entail little more than reading a short script over the phone and asking whether the participant is willing to continue; obtaining informed consent for the collection and use of biospecimens and biodata is generally a much more involved process.

Conclusion: Social scientists should be made aware that the process of obtaining informed consent for the use of biospecimens and biodata typically differs from social science norms.

If participants are to provide truly informed consent to taking part in any study, they must be given a certain minimum amount of information. They should be told, for example, what the purpose of the study is, how it is to be carried out, and what participants’ roles are. In addition, because of the unique risks associated with providing biospecimens, participants in a social science survey that involves the collection of such specimens should be provided with other types of information as well. In particular, they should be given detail on the storage and use of the specimens that relates to those risks and can assist them in determining whether to take part in the study.

Recommendation 11: In designing a consent form for the collection of biospecimens, in addition to those elements that are common to social science and biomedical research, investigators should ensure that certain other information is provided to participants:

  • how long researchers intend to retain their biospecimens and the genomic and other biodata that may be derived from them;
  • both the risks associated with genomic data and the limits of what they can reveal;
  • which other researchers will have access to their specimens, to the data derived therefrom, and to information collected in a survey questionnaire;
  • the limits on researchers’ ability to maintain confidentiality;
  • any potential limits on participants’ ability to withdraw their specimens or data from the research;
  • the penalties3 that may be imposed on researchers for various types of breaches of confidentiality; and
  • what plans have been put in place to return to them any medically relevant findings.

Researchers who fail to properly plan for and handle all of these issues before proceeding with a study are in essence compromising assurances under informed consent. The literature on informed consent emphasizes the importance of ensuring that participants understand reasonably well what they are consenting to. This understanding cannot be taken for granted, particularly as it pertains to the use of biological specimens and the data derived therefrom. While it is not possible to guarantee that participants have a complete understanding of the scientific uses of their specimens or all the possible risks of their participation, they should be able to make a relatively well-informed decision about whether to take part in the study. Thus the ability of various participants to understand the research and the informed consent process must be considered. Even impaired individuals may be able to participate in research if their interests are protected and they can do so only through proxy consent.4

Recommendation 12: NIA should locate and publicize positive examples of the documentation of consent processes for the collection of biospecimens. In particular, these examples should take into account the special needs of certain individuals, such as those with sensory problems and the cognitively impaired.

Participants in a biosocial survey are likely to have different levels of comfort concerning how their biospecimens and data will be used. Some may be willing to provide only answers to questions, for example, while others may both answer questions and provide specimens. Among those who provide specimens, some may be willing for the specimens to be used only for the current study, while others may consent to their use in future studies. One effective way to deal with these different comfort levels is to offer a tiered approach to consent that allows participants to determine just how their specimens and data will be used. Tiers might include participating in the survey, providing specimens for genetic and/or nongenetic analysis in a particular study, and allowing the specimens and data to be stored for future uses (genetic and/or nongenetic). For those participants who are willing to have their specimens and data used in future studies, researchers should tell them what sort of approval will be obtained for such use. For example, an IRB may demand reconsent, in which case participants may have to be contacted again before their specimens and data can be used. Ideally, researchers should design their consent forms to avoid the possibility that an IRB will demand a costly or infeasible reconsent process.

Recommendation 13: Researchers should consider adopting a tiered approach to obtaining consent. Participants who are willing to have their specimens and data used in future studies should be informed about the process that will be used to obtain approval for such uses.

What Institutions Should Do Regarding Informed Consent

Because the details of informed consent vary from study to study, individual investigators must bear ultimate responsibility for determining the details of informed consent for any particular study. Thus researchers must understand the various issues and concerns surrounding informed consent and be prepared to make decisions about the appropriate approach for their research in consultation with staff of survey organizations. These decisions should be addressed in the training of survey interviewers. As noted above, however, the issues surrounding informed consent are complex and not completely resolved, and researchers have few options for learning about informed consent as it applies to social science studies that collect biospecimens. Thus it makes sense for agencies funding this research, the Office for Human Research Protection (OHRP), or other appropriate organizations (for example, Public Responsibility in Medicine and Research [PRIM&R]) to provide opportunities for such learning, taking into account the fact that the issues arising in biosocial research do not arise in the standard informed consent situations encountered in social science research. It should also be made clear that the researchers’ institution is usually deemed (e.g., in the courts) to bear much of the responsibility for informed consent.

Recommendation 14: NIA, OHRP, and other appropriate organizations should sponsor training programs, create training modules, and hold informational workshops on informed consent for investigators, staff of survey organizations, including field staff, administrators, and members of IRBs who oversee surveys that collect social science data and biospecimens.

The Return of Medically Relevant Information

An issue related to informed consent is how much information to provide to survey participants once their biological specimens have been analyzed and in particular, how to deal with medically relevant information that may arise from the analysis. What, for example, should a researcher do if a survey participant is found to have a genetic disease that does not appear until later in life? Should the participant be notified? Should participants be asked as part of the initial interview whether they wish to be notified about such a discovery? At this time, there are no generally agreed-upon answers to such questions, but researchers should expect to have to deal with these issues as they analyze the data derived from biological specimens.

Recommendation 15: NIH should direct investigators to formulate a plan in advance concerning the return of any medically relevant findings to survey participants and to implement that plan in the design and conduct of their informed consent procedures.


Investigators seeking IRB approval for biosocial research face a number of challenges. Few IRBs are familiar with both social and biological science; thus, investigators may find themselves trying to justify standard social science protocols to a biologically oriented IRB or explaining standard biological protocols to an IRB that is used to dealing with social science—or sometimes both. Researchers can expect these obstacles, which arise from the interdisciplinary nature of their work, to be exacerbated by a number of other factors that are characteristic of IRBs in general (see Chapter 4).

Recommendation 16: In institutions that have separate biomedical and social science IRBs, mechanisms should be created for sharing expertise during the review of biosocial protocols.5

What Individual Researchers Need to Do Regarding IRBs

Because the collection of biospecimens as part of social science surveys is still relatively unfamiliar to many IRBs, researchers planning such a study can expect their interactions with the IRB overseeing the research to involve a certain learning curve. The IRB may need extra time to become familiar and comfortable with the proposed practices of the survey, and conversely, the researchers will need time to learn what the IRB will require. Thus it will be advantageous if researchers conducting such studies plan from the beginning to devote additional time to working with their IRBs.

Recommendation 17: Investigators considering collecting biospecimens as part of a social science survey should consult with their IRBs early and often.

What Research Agencies Should Do Regarding IRBs

One way to improve the IRB process would be to give members of IRBs an opportunity to learn more about biosocial research and the risks it entails. This could be done by individual institutions, but it would be more effective if a national funding agency took the lead (see Recommendation 14).


It is the panel’s hope that its recommendations will support the incorporation of social science and biological data into empirical models, allowing researchers to better document the linkages among social, behavioral, and biological processes that affect health and other measures of well-being while avoiding or minimizing many of the challenges that may arise. Implementing these recommendations will require the combined efforts of both individual investigators and the agencies that support them.

See the discussion on “Choosing a Data Sharing Strategy” in Chapter 3.

In a few cases, it may be necessary to deceive participants about the purposes of a study—for example, in field tests of labor market discrimination—but these situations are unlikely to occur in biosocial studies. However, the Common Rule (45 CFR 46: 46.116.c.2, 46.116.d.3) explicitly permits such exceptions when they are scientifically necessary.

Penalties might include NIH eliminating researchers’ eligibility for funding and institutions eliminating research privileges of faculty.

Note that this report does not address the issue of obtaining informed consent from children.

Sharing expertise between biomedical and social science IRBs does not require a return to the days when there was only one IRB at each institution, a situation that still exists at many small institutions. For example, the Social and Behavioral Science IRB at the University of Wisconsin, Madison, has asked a geneticist to serve as an ex officio member of the IRB when it considers protocols that use genetic data.



See the discussion on “Choosing a Data Sharing Strategy” in Chapter 3.


In a few cases, it may be necessary to deceive participants about the purposes of a study—for example, in field tests of labor market discrimination—but these situations are unlikely to occur in biosocial studies. However, the Common Rule (45 CFR 46: 46.116.c.2, 46.116.d.3) explicitly permits such exceptions when they are scientifically necessary.


Penalties might include NIH eliminating researchers’ eligibility for funding and institutions eliminating research privileges of faculty.


Note that this report does not address the issue of obtaining informed consent from children.


Sharing expertise between biomedical and social science IRBs does not require a return to the days when there was only one IRB at each institution, a situation that still exists at many small institutions. For example, the Social and Behavioral Science IRB at the University of Wisconsin, Madison, has asked a geneticist to serve as an ex officio member of the IRB when it considers protocols that use genetic data.

Copyright © 2010, National Academy of Sciences.
Bookshelf ID: NBK50736
PubReader format: click here to try


  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (598K)
  • Disable Glossary Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...