NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Research Council (US) Panel on Collecting, Storing, Accessing, and Protecting Biological Specimens and Biodata in Social Surveys; Hauser RM, Weinstein M, Pool R, et al., editors. Conducting Biosocial Surveys: Collecting, Storing, Accessing, and Protecting Biospecimens and Biodata. Washington (DC): National Academies Press (US); 2010.

Cover of Conducting Biosocial Surveys

Conducting Biosocial Surveys: Collecting, Storing, Accessing, and Protecting Biospecimens and Biodata.

Show details


Rapidly developing technology has made it increasingly feasible and attractive for researchers to collect blood and other biological specimens in nonclinical settings. As a result, those who conduct multipurpose household surveys have become increasingly interested in collecting various types of biospecimens along with responses to the more familiar social and behavioral questions (see, for example, National Research Council, 2008). Doing so enables researchers to extend their standard analyses of social and behavioral measures by integrating various biomarkers into their theoretical frameworks and empirical models. This practice of collecting biological specimens along with the traditional social and behavioral data promises a variety of benefits with respect to the sorts of questions that can be answered and the types of connections that can be explored, but it also adds a great deal of complexity—and cost—to the investigator’s task. Although social scientists have long had to be concerned about such things as informed consent, privacy, collection and storage issues, and data sharing, the addition of biospecimens to their studies creates new issues and casts old issues in a new light.

Social science researchers wishing to collect biospecimens must address a wide variety of additional legal, ethical, and social issues, as well as a number of practical issues related to the storage, retrieval, and sharing of data. For example, deriving biological data from biospecimens and linking them to social science databases adds considerable effort and costs associated with developing a biorepository, establishing data sharing policies, implementing an increasingly complex informed consent process, establishing an additional process for reviewing how the biodata are going to be shared and used for secondary analysis, executing material transfer agreements, dealing with intellectual property issues, and navigating a more complex process for obtaining Institutional Review Board (IRB) approval that encompasses both human subjects protection and biosafety compliance (Box 1-1 presents the panel’s definitions of some key terms used in this report that need to be clearly distinguished in the context of this study). Researchers also must consider what steps are necessary to protect the confidentiality of participants, especially when data obtained from biospecimens are uniquely identifying. Finally, a number of questions must be answered about what happens to the biospecimens beyond the life of the particular investigation: Will the biospecimens be stored? If so, who will be allowed to use them? What permissions will be necessary? Who owns the biospecimens? Who can discard them? How long will they be retained? Can subjects demand their destruction? Does this include destruction of any biodata derived from them? Can the specimens and data be used for purposes other than those specified in the subjects’ original consent? Will the investigator report back to subjects on findings with health implications for them or their family? Will the investigator contact other family members? What are the limits on the investigator’s ability to maintain confidentiality? Are there circumstances in which biological specimens may be acquired for research purposes without consent?

Box Icon

BOX 1-1

A Note on Terminology Used in This Report. The terms “biospecimens,” “biomarkers,” and “biodata” are sometimes used interchangeably, and researchers should be aware that these terms can have different meanings (more...)


The ability to collect biospecimens along with social survey data opens up a wide range of research opportunities. It becomes possible, for example, to estimate the distribution of a particular genetic variant within a representative sample of the general population and to correlate genetic variations with differences in human phenotypes. It also becomes possible to use the biodata derived from biospecimens to verify certain responses to survey questions, such as influenza exposure or infection with a sexually transmitted disease. But the potentially most far-reaching applications result from combining genetic and other biological data with data on social and environmental factors. The collection of biological specimens in population surveys that also collect data on socioeconomic, demographic, behavioral, physical health, and psychosocial factors opens up new avenues for research and may allow researchers to build integrated biosocial models of various biological and social phenomena.

For example, by combining biological and social survey data, it may be possible to document the linkages among social, behavioral, and biological processes that affect health and various other measures of well-being. To the extent that biomarkers reflect health, one can examine the effects of social factors on health or look at how health affects social status and social inequality. The ability to examine genetic data in conjunction with environmental and phenotypic data offers an important opportunity to study gene–environment interactions. It is now widely recognized that phenotypes are generally the product of an interplay between genetic and environmental factors; the availability of individual-level genetic and environmental information should make it possible to study this interplay in much greater detail than has previously been possible. Researchers could use such surveys to study the genetic determinants of longevity, for example, or to examine the association between genetically determined low monoamine oxidase levels and violent behavior and to learn whether that association is affected by whether the subjects were abused as children (Huizinga et al., 2006; Widom and Brzustowicz, 2006). Researchers could also examine the relationship between measures of life stress and the length of telomeres at the ends of chromosomes that serve as a biomarker of a cell’s biological (versus chronological) age (Epel et al., 2004).1

At the same time, not every social science survey will benefit from collecting biospecimens, and the significant costs involved must be weighed against the benefits in deciding whether to do so. The collection of biospecimens should be integral to the study design and the hypotheses being tested, rather than being tacked on to the study just because it can be done. Indeed, the collection of biospecimens may even detract from the principal mission of a survey. It might, for example, be so expensive and time-consuming that it would lessen the survey’s effectiveness. Certainly, as noted above, the collection of biological samples will necessitate the expenditure of resources for storage, data sharing, and other purposes. Furthermore, the collection of biospecimens imposes a burden on participants as well as investigators and could conceivably affect contemporaneous and subsequent response rates. It is important to recognize, moreover, that in many cases, the potential benefits of biodata—particularly genetic data—are not altogether clear. This point is illustrated by genome-wide association studies (GWASs) focused on the linkages between single nucleotide polymorphisms (SNPs)2 and common diseases such as diabetes and cancer. The high level of enthusiasm for such studies has been tempered by the finding that SNPs account for only a small percentage of the genetic risk for these diseases (Dickson et al., 2010; Wade, 2010). On the other hand, biobanking—the collection, storage, processing, and distribution of biological specimens—makes it more likely that biospecimens collected as part of a survey will have a valuable payoff, even if it is one that cannot be predicted when the specimens are collected. And while the specimens themselves are depletable, the biodata derived from them have potentially limitless uses. The trade-offs remain complex here as well, however, since the use of biospecimens and biodata for purposes other than the original research raises issues related to informed consent (see Chapter 4).


Today many surveys sponsored by the National Institute on Aging (NIA) and other federal agencies either collect biological specimens such as blood, saliva, urine, and buccal swabs or plan to do so in the near future. As discussed above, these data provide population-representative data from nonclinical samples that can be used for a variety of purposes including the calibration of self-reports of health and as a way to explore new pathways and causal linkages between biological and social variables. Some of this data is also being banked with the intention of using it in the future, for as yet unspecified purposes. Surveys that have collected or that currently collect biospecimens include the Dynamics of Health, Aging, and Body Composition Study (blood and saliva), the Framingham Heart Study (blood), the Health and Retirement Study (blood specimens and buccal swabs), the National Health and Nutrition Examination Survey (NHANES) (blood, urine, hair, and buccal swabs, although there is some variation year to year), the National Longitudinal Survey of Adolescent Health (blood, saliva, and urine specimens), the National Long-Term Care Study (blood and buccal specimens), the NIA Alzheimer’s Initiative (blood specimens and autopsy tissues), the Study of Women’s Health Across the Nation (SWAN) (blood and urine specimens), and the Wisconsin Longitudinal Study (DNA). DNA amplification has been carried out on the biological specimens collected in some of these surveys, such as NHANES and SWAN, and could be performed on specimens collected in some of the other surveys as well. As a result of these efforts, a great deal of work has been done to develop policies and guidelines for the acquisition, collection, storage, and use of biological specimens, and this report draws on that experience. At the same time, these are complex subjects, and many questions remain unanswered. Even when the challenges are similar to those familiar to biomedical researchers, they will be new to most social scientists engaging in biosocial research.


Public Attitudes Toward the Collection of Biospecimens

Because the collection of biospecimens as part of social surveys depends on the willingness of individuals to contribute them, public attitudes and perceptions play an important role in the success of such efforts, and researchers must take these attitudes into account. Some authors claim that the public perceives biospecimens and the data derived therefrom to be significantly different from the traditional demographic, social, and economic data collected in surveys (see, for example, Greely, 2009). The former can sometimes be seen as more “objective” or “real” and thus potentially more powerful. They can also be perceived as being more hidden or secret because they can reveal things that cannot be known in any other way—things that those contributing the specimens may themselves not know, such as whether they possess a biomarker that is related to the probability of developing a certain disease. Thus they can be seen as more worthy of being protected or kept secret, in line with the strong tradition of keeping health information private (Greely, 2009).

Whether any of these public perceptions are well grounded in reality is another matter. A great deal depends on the types of biospecimens being collected. For example, survey participants may be more sensitive about sharing their earnings history than about providing a saliva sample. It is clear, however, that survey participants have less control over what is revealed through bio-specimens than through traditional survey responses. A participant can refuse to answer—or lie about—inquiries concerning, say, sexual history or income, but cannot prevent a blood sample from revealing the presence of a sexually transmitted disease or a DNA sample from indicating the existence of a genetic condition or predisposition.

In discussing public attitudes and perceptions toward the collection of biospecimens, it may be useful to talk about a gradient of sensitivity with respect to confidentiality. Some biological measures derived from biospecimens vary sufficiently across time that they do not raise the risk of reidentification. Some biological measures derived from biospecimens, such as cholesterol level, pose no more (or less) of a problem for confidentiality protection than many socioeconomic measures, while others, such as indications of illicit drug use or HIV or other disease status or genetic measures (such as a DNA sequence), may raise far more difficult issues of confidentiality and privacy protection. The potential harms from a confidentiality breach are significant because such measures, once associated with a specific person, may not only stigmatize the individual but also be used against him or her with regard to employment or in some other way. Moreover, genetic measures on one individual in a family may reveal characteristics of other family members. Additionally, the development of databases of genetic specimens that are stored for long periods increases not only the research potential but also the potential risks as new knowledge is discovered about genetic associations with health and behavior. Further increasing both the potential for innovative research and the potential for breach of confidentiality is the growing practice of linking survey records with administrative records, such as Social Security benefits and Medicare claim files.

It is also worth noting that people’s attitudes and behaviors with regard to biospecimens are often inconsistent. People frequently report, for example, that they worry about the release of their genetic information because insurance companies might use it to discriminate, even though this has been expressly prohibited by law since the passage of the 2008 Genetic Information Non-discrimination Act (GINA). At the same time, some measures that people do not consider sensitive and share readily, such as cholesterol levels, are at least as determinative of future serious disease as are genes given the present state of knowledge.

Findings from Opinion Surveys

Americans differ in their attitudes toward the collection of biospecimens and their willingness to participate in surveys that collect them (Westin, 2008). Greater knowledge generally leads to more favorable attitudes, and people are more likely to participate when they understand the importance of the research. Westin (2008) argues that the general public can be categorized as falling into one of three groups—(1) “privacy intense,” (2) “privacy unconcerned,” or (3) “privacy pragmatists.” He argues that approximately 25 to 35 percent of the population can be characterized as “privacy intense”: they are skeptical about the motives and interests of government and business, they consider privacy to be extraordinarily important, they believe that the risks of their information being disclosed are very high, and they tend to be skeptical of the benefits.

In surveys, Americans consistently express the view that medical and health information is the most sensitive personal information. Furthermore, although Americans tend to trust doctors and health care providers with this information, they worry about third parties, such as insurance companies or employers, obtaining it. Not surprisingly, people with health problems are the most sensitive in this regard.

A large majority of the public—78 percent in a survey Westin performed for the Institute of Medicine (Westin, 2007)—say they are interested in health research, and three-quarters of the public believe health research is very important for society. As part of the survey, respondents were asked about participating in research that would require access to their medical records and other health information: How willing would they be to participate in such a study, and would they demand full disclosure of the study before they gave consent? Thirteen percent of respondents said they would not want to be contacted under any circumstances, and they would not even want to talk with somebody about participating. One percent said that they would always be willing to take part in any such study and that they did not even have to be asked for their consent. Eight percent said they would be willing to give general consent in advance if they were asked by the institution that held their medical records or health information. Another 19 percent said they would agree if they were given assurance that their identity would not be revealed and that an IRB would administer the study. And 38 percent—the largest single group—said they would have to have the research described to them each time so they could decide whether to participate. Thus a total of 57 percent would agree to having their information used if certain privacy-oriented conditions were met.

Observed Participation Rates in Social Surveys Collecting Biospecimens

Evidence from a number of different social surveys provides a sounder basis than opinion surveys for assessing people’s willingness to participate in social science research that includes the collection of biospecimens. Most social surveys report higher rates of willingness to participate in such research than are perhaps suggested by opinion surveys, although these rates vary depending on the method of data collection and the health, age, and social characteristics of the subjects. Marmot and Steptoe (2008), for example, report on the experience of the Whitehall II and the English Longitudinal Study of Aging (ELSA) in the United Kingdom. Data collection in these surveys involved both face-to-face contact with participants and the assessment of physical measures, including blood sampling, and the authors were concerned that participants might find the test sessions too long or burdensome. On the other hand, participants could benefit from the periodic medical screening sessions, which might reveal health problems that would otherwise have gone undetected. The authors report only a 16 percent loss of participants between the baseline survey and the first clinical follow-up. Involving participants in more intensive investigations did not deter them from taking part or lead to widespread sample attrition. Rather, participants in the more intensively studied group were more likely to remain involved in the study. The authors hypothesize that many participants found these more intensive studies to be intrinsically interesting and that they derived from the study detailed clinical information that would help them monitor their health status (Marmot and Steptoe, 2008).

Lindau and colleagues (2009) report participation rates in a nationally representative probability survey of 1,550 community-residing women aged 57–85 conducted in 2005 and 2006. All 1,550 female respondents in the study were asked to provide a self-administered vaginal swab specimen midway through the interview; 1,028 agreed to do so. Hauser and Weir (in press) report a similar response rate (approximately 65 percent) for saliva collection by mail for DNA analysis in the Wisconsin Longitudinal Study.


Survey researchers intending to collect biospecimens must grapple with a number of issues, many of which will be unfamiliar to them. The majority of these issues can be grouped into three broad areas: (1) the collection and storage of biospecimens, (2) sharing of biospecimens and the data collected therefrom, and (3) informed consent. In each of these areas there are concerns that must be addressed, questions that must be answered, and policies that must be devised if the benefits of collecting the biospecimens are to be fully realized while the interests of research participants are protected.

Concerning the collection and storage of biospecimens, for example, what precautions should be taken in collecting the specimens from survey participants? What considerations should factor into an investigator’s choice of a storage facility in which to maintain the specimens from a survey? What should the policies be for sharing those specimens?

With respect to the sharing of biospecimens, what are the risks to confidentiality in sharing specimens or the data derived from them? How can those risks be minimized while the usefulness of the data is maximized? What are the advantages and disadvantages of restricting access to the data versus restricting the data themselves?

Concerning informed consent, there are a great many uncertainties and controversies: How does one arrange for informed consent for specimens and data to be used in some future unspecified research project? How should researchers handle situations in which an analysis of data has revealed significant health information about a participant—information that may not be known to the participant? What happens when a study participant withdraws his or her consent? What should be included in an informed consent form for a social science survey that will include the collection of biospecimens? How should one deal with IRBs when submitting a proposal to conduct this sort of study?

The following chapters address each of these issues in detail.


To address the issues outlined above, in 2008 NIA’s Behavioral and Social Research (BSR) Program asked the National Academies to convene an ad hoc panel of experts for the purpose of identifying best practices with respect to collecting, storing, protecting, and accessing biospecimens collected in social science surveys and the biodata derived therefrom. It is worth stating at the outset that these issues are not new: the research community is familiar with the challenge of reconciling the benefits of providing wider access to research data and the resulting increased risk of a breach of confidentiality. Several previous National Research Council (NRC) reports have addressed aspects of the subject (see, for example, National Research Council, 1993, 2005). However, these issues have not been sufficiently examined in the context of biosocial surveys that collect both biospecimens and typical social science data, a discussion that is becoming increasingly salient. BSR is continuing to develop a portfolio of new research directions linking social and behavioral research with data on genetics and genomics. Several large longitudinal data collection efforts funded by BSR (e.g., the Health and Retirement Survey and the Wisconsin Longitudinal Study) are now collecting various types of biospecimens. In other cases, plans for collecting new biospecimens are currently under way. For many BSR-supported researchers, the procedures and protocols surrounding the collection, storage, and sharing of biospecimens are new. Furthermore, ongoing advances in bioinformatics (see, for example, Homer et al., 2008) have raised issues of confidentiality and security that have prompted BSR to review its procedures with respect to data sharing.

The 10-member panel that conducted this study was appointed under the auspices of the Committee on National Statistics and the Committee on Population of the National Academies. Its members included leading experts in social, behavioral, genetic, ethics, and genomic studies who were familiar with the wide range of issues involved. The panel was charged with preparing a report that would address these issues and provide recommendations for best practices, procedures, and guidance for funding agencies, IRBs, and researchers.

To accomplish its task, the panel organized a public workshop as a means of interacting with other leading scientists engaged in (or considering) the collection of biospecimens. (The workshop agenda is presented in Appendix A, while the participants are listed in Appendix B.) The workshop discussions were designed to explore issues related to informed consent, data collection, confidentiality protection, data archiving, and data access for multipurpose population surveys that collect biological specimens and measures in addition to socioeconomic/demographic, behavioral/lifestyle, and physical and mental health measures. Specifically, the panel was tasked to review the following issues, with particular reference to surveys sponsored by NIA:

  • information that should be provided to survey respondents for informed consent and how the language of consent forms affects people’s willingness to participate in surveys;
  • methods for collecting and processing genetic and biological specimens and measures to minimize the burden on respondents, maximize research potential, and protect confidentiality and privacy;
  • relevant laws, regulations, and policies, including the Common Rule for Protection of Human Subjects, the Confidential Information Protection and Statistical Efficiency Act of 2002, the 2002 regulations issued under the Health Insurance Portability and Accountability Act of 1996, and relevant National Institutes of Health policies on data sharing, certificates of confidentiality, and related topics, including the repository for genomewide association studies;
  • factors for IRBs to consider in reviewing requests for the collection of biological specimens and measures in surveys;
  • the risks of and evidence for actual misuse of biological specimens and measures in surveys;
  • whether and which statistical techniques can be used to make genetic and other biological measures anonymous in microdata files while preserving their utility for research;
  • the costs and benefits of alternative systems for archiving genetic and other biological specimens and measures derived from population surveys to permit later research use while protecting confidentiality; and
  • the costs and benefits of alternative forms of access to microdata containing genetic and biological measures, such as secure research data centers and licensing.

A word about this statement of task and the scope of this study is in order. Although the statement of task mentions both biological specimens and measures, the panel chose to focus this report on the former, for two reasons. First, as will be clear from the discussion in the following chapters, unique issues of collection, storage, and sharing and of informed consent and confidentiality are associated with biospecimens that do not arise with respect to biological measures such as the taking of height and weight or blood pressure. Second, whereas the data derived from measures are well defined and finite in scope, a wide-ranging and potentially limitless set of data can be derived from biospecimens, further complicating the issues that must be addressed.

The panel also wishes to emphasize that the starting point for this study is a decision that the benefits of collecting biospecimens as part of a social science survey outweigh the costs noted earlier. This study does not address the calculus that factors into the decision about whether to collect biospecimens as part of a social science survey. The panel emphasizes, however, that the trade-offs involved are complex, not least because, as noted above, the potential benefits are as yet not fully understood.

To carry out its charge, the full panel met four times. In March 2008 the panel met to discuss its statement of task with the sponsor; to review prior work; and to identify critical themes and speakers for the public workshop, which was held in November 2008. Following the workshop, the panel met three more times (November 2008, June 2009, and August 2009) to discuss the presentations and the rich interactive discussions that had occurred at the workshop, to deliberate, and to outline this report. The report is based on the deliberations of the panel as informed by the workshop but is the product of the panel, not merely an account of the workshop.


The remainder of this report presents the panel’s findings, conclusions, and recommendations. Chapter 2 deals with issues concerning the collection, storage, use, and distribution of biological data, including issues of custodian-ship and ownership. Chapter 3 reviews issues related to confidentiality and data sharing, including deidentification and other approaches to preserving the privacy of participants. Chapter 4 addresses issues related to informed consent, including biobanking, the use of blanket consent, and the role of IRBs. Finally, Chapter 5 offers the panel’s recommendations for practices and procedures that can best facilitate research and protect participants as the collection of biospecimens in social science surveys moves forward over the next 5 to 10 years.



Telomeres are repetitive DNA-protein complexes at the ends of chromosomes that protect the chromosomes from deterioration. Recent research points to the crucial role of telomeres in cellular aging. See Aubert and Lansdorp (2008) for a recent review.


A single nucleotide polymorphism (SNP) is a single-base variation in the genetic code, the most common form of polymorphism.

Copyright © 2010, National Academy of Sciences.
Bookshelf ID: NBK50727


  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (549K)
  • Disable Glossary Links

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...