NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Institute of Medicine (US); Grossmann C, Powers B, McGinnis JM, editors. Digital Infrastructure for the Learning Health System: The Foundation for Continuous Improvement in Health and Health Care: Workshop Series Summary. Washington (DC): National Academies Press (US); 2011.

Cover of Digital Infrastructure for the Learning Health System

Digital Infrastructure for the Learning Health System: The Foundation for Continuous Improvement in Health and Health Care: Workshop Series Summary.

Show details

5Weaving a Strong Trust Fabric


Building trust among all stakeholders of the digital infrastructure—in particular the patient population—is vital to progress and constitutes the focus of this chapter. Included are considerations of the most effective ways to engage stakeholders through demonstration of the value of health information exchange in improving outcomes and efficiency, building confidence in security and privacy safeguards, and examining the learning health system–specific challenges posed in these areas. Examinations range from a focus on the sociotechnical components of privacy and the risk–benefit calculation in health information exchange to technical approaches to ensuring data privacy and security.

Edward Shortliffe of the American Medical Informatics Association addresses the need to build a strong fabric of trust among stakeholders by communicating and demonstrating value. Dr. Shortliffe states that in order for health information technology (HIT) to meet its full potential, patient and provider participation must be secure. This sense of security depends on an appreciation of the value presented by the HIT used as well as creating and maintaining proper security and safeguards. Sharing a personal anecdote about a provider who admitted that only patient demand would motivate him to adopt an electronic health record (EHR) system, Dr. Shortliffe observes that sufficient patient demand could even obviate the need for federal incentives. Using electronic banking as an example, he suggests that educational programs are necessary to inform stakeholders about the risks and benefits of EHRs, and predicts that with the establishment of an environment of trust, the value of increased convenience and quality offered by EHRs and data sharing will overcome concerns about privacy. Currently, however, the risks of adopting an EHR system are better understood and communicated, so the focus of stakeholder engagement activities going forward should be on communicating the benefits—most importantly, better care and lower costs.

The implementation of fair information practices to ensure privacy and security is the focus of the Center for Democracy and Technology’s Deven McGraw. Citing surveys showing that while individuals desire electronic access to their health information, they have significant privacy concerns, she suggests that providing individuals with meaningful choices around privacy is an important approach to addressing these concerns. Ms. McGraw points to a comprehensive approach to patient privacy and data security based on the Markle Common Framework for Secure and Private Health Information Exchange. Key elements of the framework include an open and transparent process, specification of purpose, individual participation and control, and accountability and oversight. Closing with a warning that overreliance on consent leads to weak protection— shifting the burden of privacy protection to the individual—and that existing regulations are insufficient to cover the emerging issues of a learning health system, she notes the need for a trust fabric based on fair information practices.

Since its passage in 1996 and recent modifications, the Health Insurance Portability and Accountability Act (HIPAA), has served as the legal and policy framework for health information privacy. Bradley Malin of Vanderbilt University describes the current state of play around health data de-identification and highlights some of the relevant learning health system–related issues posed by HIPAA. Included among these are identity resolution while maintaining privacy and concern that de-identification could cause modifications to patient information that influence the meaning of clinical evidence. He asserts, however, that most of these challenges are not insurmountable, and that efforts to quantify risk are an important first step to mitigation. Dr. Malin suggests that use cases that better define health information uses, and progress in the area of distributed query-based research will be important in progressing toward a privacy-assured learning health system.

Ian Foster of Argonne National Laboratory addresses the technical components surrounding trust in the digital infrastructure for the learning health system. Dr. Foster lays out a number of challenges facing the a establishment of a secure digital platform. He points to the fact that a learning health system requires data sharing on an unprecedented scale, and that the purpose of this sharing be extended beyond individual patient care support to include research and population health. Identifying the challenge as one of a highly complex system with an unclear definition of security, Dr. Foster suggests some basic principles and technology solutions that can form a basis for progress: auditabililty (information can be mapped to an individual and data can be mapped to its origin); scalability; and transparency in terms of data usage, policies, and enforcement. Methods to achieve these principles include attribute-based authorization, distributed attribute management, and end-to end (scalable) security.



, M.D., Ph.D.


American Medical Informatics Association

There is a widely acknowledged need for individuals to trust the use of EHRs in the management of their health and health care. People must believe that their personal data are being protected, and used consistently in their best interest. Formal studies in scientific journals that document the positive influences of electronic records on quality, safety, and efficiency—typically poorly communicated to the lay public—will not counter a deep concern that individual privacy can be compromised or that personal data will be used for nefarious purposes. Thus all the laudable goals we seek with the use of health information technology (HIT) that are under discussion at this workshop are dependent on a “fabric of trust”—the willingness of individuals and, by extension, society to contribute personal data and clinical experiences to the development of a learning healthcare system.

Individuals in the healthcare community bring a deep understanding of the health policy, financing, and quality issues that can be enhanced by the empowering use and effective implementation of HIT. We see strong advantages to society in the use of electronic health records (EHRs) and their adaptation to support a learning health system. Yet the individuals in our communities—and I fear this includes many members of the media—have a limited understanding of such issues and would find most of our work difficult to follow. What they can easily understand, however, are news stories that emphasize the way in which EHRs may threaten their privacy, the confidentiality of personal data, and general security issues (such as lost or stolen laptop computers containing private medical data regarding thousands of patients). We need to understand that the public’s support for EHRs depends on their sense that their care is improved or their life is simplified when their provider uses the technology. The public needs to believe that all prudent measures are being taken to ensure that their personal data are protected from loss or inappropriate access.

Anecdotal Evidence of the Current Challenges

Like everyone else attending this workshop, I am a patient as well as a health professional. Long ago I made the personal decision, based on my understanding of the trade-offs, that I would greatly prefer to be cared for by a health system and by individual clinicians who had embraced the use of EHRs. When I recently moved to a new city and had to identify a primary care provider, I decided to rule out any physician or provider organization that lacked the infrastructure or philosophy that would allow me to communicate through e-mail with my physician and his office staff. Frustrated by my recent experience in another city, I swore that I would never again subject myself to a healthcare environment or physician who had not adopted modern electronic means of communication, data management, and information dissemination. I wanted to be sure it would be simple for me to book appointments online, to request prescription refills, to check lab results, and to review other aspects of my personal record. I also wanted to have reasonable faith in the authentication and authorization procedures that were in place before I or others could access my information online. I recognize that I am an early adopter of new information technologies by nature, but as I looked at the plethora of smart phones, Facebook pages, and laptops in airport security lines that surround me every day, I suspected that I was not alone in using such “digital literacy” criteria to guide my choice of physician and healthcare system. I have subsequently been pleased to find a suitably rigorous, electronically sophisticated physician and healthcare environment in my new city and realize that I personally associate such capabilities with quality of care, safety, and cost containment. Furthermore, I have minimal fear that my personal data are being indiscriminately accessed by others or being handled in ways that would make it easy for them to be lost or stolen.

It is natural to ask whether I am typical of patients with regard to my search for a physician who chooses to use EHRs. One indication that I am atypical was the conversation that I had with my previous physician when I asked him whether he had any plans to automate the practice in which he worked. He was surprised that any patient cared about such an esoteric topic. He told me that I was the first patient who had ever queried him on the matter, asserting that there was no demand from patients for him to use an EHR. Additionally, he was personally disinterested in the expense or the retraining that would be required. He noted that he would be retiring in 6–8 years and asked why he should go through this kind of transformation at the very end of his career. He had no interest in using an EHR and did not care what incentives were being offered by the government.

He did acknowledge that if all his patients were telling him that they really cared about automating the office, accepting e-mail, and providing EHR access for patients, then he might feel differently about the topic. One wonders whether federal incentives and the meaningful use criteria would have even been necessary if the average citizen was enamored of EHRs and warned their doctors that they would change providers if the practice did not implement electronic records. Under the current circumstances, however, he viewed the CMS incentives as a conspiracy in Washington, trying to force unproven technology upon him and his patients.

Public Use of HIT

Conversations with others have convinced me that my former physician is not atypical but that I, as a patient requesting that my providers use an EHR, am quite unusual. Seeking to better understand the public’s attitudes toward EHRs, I was fascinated to come across a recent book that provides extensive survey data about the public and their access to and use of electronically available health information. Written by researchers at Brookings Institution and Brown University, Digital Medicine summarizes and interprets the results of many national e-health public opinion surveys. The emphasis is not on the technology per se but on current trends in adoption, acceptance, and pursuit of e-health solutions. Documenting relatively low use of information technology for health purposes by certain segments of society, the authors state a motivating argument that “in order to achieve the promise of health information technology, digital medicine must overcome the barriers created by political divisions, fragmented jurisdiction, the digital divide, the cost of technology, ethical conflicts, and privacy concerns” (West and Miller, 2009). I have described this volume in more detail elsewhere, noting that education—both of the public and of current and future health professionals—is viewed as a key element in any solution. There is evidence that this issue has been too often overlooked when others have assessed approaches to making better use of information technology in health care (Shortliffe, 2010). Given the economic determinants of e-health use and the digital divide, low-cost technologies and improved access through publicly available means continue to be key requirements.

Yet public familiarity with technology, and personal use of information resources in managing one’s own health care, is not the same as having a society that understands and supports the use of EHRs by physicians and other health professionals. If we need educational programs to enhance the public’s capabilities in the use of the electronic media for accessing health information, we also need to help them understand the risks and benefits of EHR use.

The Value Proposition: Convenience vs. Risk

I believe that convenience, quality, and perceived value of EHRs will trump concerns about privacy or other risks—but only if there is a climate of trust. The financial system has helped to demonstrate this social phenomenon to us. Consider, for example, the use of one ubiquitous financial technology, the automated teller machine (ATM). When ATMs were introduced, it rapidly became obvious to the public that there were huge advantages in using these machines rather than relying on the traditional interaction with a bank teller or the use of travelers’ checks. We all know there are risks associated with electronic banking and ATMs—fraud, stolen PIN numbers, lost cards, and the like—but convenience and universal access to one’s funds have clearly outweighed those concerns. In fact, individuals are even willing to pay for the convenience of an ATM, given the surcharges that are typically absorbed by the user. We perceive the value to be high, and the risks to be low—and most banks have explicit assurances about maximum losses in the case of documented fraud or theft. There is a climate of trust that, on balance, our funds are protected by the system with which we choose to interact.

But the acceptance of such trade-offs in the use of electronic banking clearly requires that the public appreciate the positive value of the innovation offered to them. The value proposition for EHR use is much less well understood by the public, and what they do know has tended to focus more on potential negatives (loss of privacy, government intrusion, etc.) rather than the benefits. Stories about threats to the safety and confidentiality of online health data have tended to dominate in the press; even when most organizations are taking measures to protect against the described threats, the public largely focuses on the negatives.

Engaging the Public

In educating the public about the ways in which the use of EHRs can be positive, the emphasis needs to be on aspects of their implementation that create a sense of value for individual patients or their families. The greater good—for public health, research, or a learning health system—must be viewed as secondary. Since we know that patients tend to trust their own doctors, one crucial source of trust in the health system is the individual’s own physician. Thus, there is an important potential interaction between physicians and their patients that can help to inform the public about the clinical value of EHRs, and to assist in the creation of a climate of trust. That outcome, of course, requires that physicians themselves perceive the value of EHRs and believe that it outweighs the costs associated with adoption.

We know that the public appeal of EHRs will grow when they are viewed as convenient for patients, empowering them as partners in their own management, and providing a way to deal with the opacity of traditional healthcare interactions. Their consent for data use—and the subsequent steps toward a learning health system—will follow if there is a strong trust in the data stewardship that occurs when EHR data are shared, anonymized, pooled, and reused.



, J.D.


Center for Democracy and Technology

Health information technology (HIT) and electronic health information exchange are engines of health reform and have tremendous potential to improve health, reduce costs, and empower patients. While some progress has been made on resolving the privacy and security issues raised by e-health, significant gaps remain and implementation challenges loom.

Many surveys show that people want to have electronic access to their health information, but these same surveys also demonstrate that people have significant privacy concerns about how their data will be used and protected. For example, a 2005 study by the California HealthCare Foundation revealed that a majority of the respondents (67%) have significant concerns about the privacy of their medical records (CHCF, 2005). More recent surveys by the Agency for Healthcare Research and Quality confirm these findings (AHRQ, 2009).

While most people acknowledge the importance of ensuring patient privacy in health information systems, many assume that providing a simple “opt-in” or “opt-out” option fully addresses the issue. Providing individuals with some meaningful choices is an integral part of any privacy system, but relying solely on a check box or blanket consent will not allay consumer fears or, more importantly, provide adequate safeguards against misuse of patient data.

The consequences of not ensuring privacy adequately can include failing to collect complete or adequate patient data. Without privacy protections, people may engage in “privacy-protective behaviors” to avoid having their information used inappropriately. A 2007 Harris Interactive survey revealed that one in six adults withhold information from providers due to privacy concerns (Harris Interactive, 2007). The frequency increases among people with poor health and among racial and ethnic minorities who report higher levels of concern and are more likely to engage in privacy-protective behaviors (CHCF, 2005).

A Comprehensive Strategy for Fair Information Practices

To counter these tendencies and to facilitate the collection of the most complete patient data possible, a comprehensive approach to patient privacy and data security is needed. It is important to note that privacy and security protections are not themselves obstacles to achieve these goals. Rather, enhanced privacy and security can enable higher levels of patient participation in health data collection and facilitate HIT and health information exchange.

The core elements of such a comprehensive strategy include commonly used fair information practices, such as those articulated in the Markle Common Framework for Secure and Private Health Information Exchange (Markle Foundation, 2006). The principles outlined seem so straightforward that, based on common sense, it would seem that everyone employs them. Unfortunately, this is often not the case. However, a serious application of these practices should serve as the lynchpin to building a trusted information-sharing infrastructure

Some of the key elements of fair information practices include: openness and transparency, purpose specification and minimization, collection and data use limitation, individual participation and control, data integrity and quality, security safeguards and controls, accountability and oversight, and remedies. Perhaps the most important element of a comprehensive approach is to develop an open and transparent process. Taking the time to educate patients about the purpose, uses, and goals of collecting their health information can go a long way toward building public trust. Such openness and transparency can reap higher rewards than simply presenting a consent form with little or no explanation and a vague guarantee of security and privacy.

Some elements of this framework are reflected in the Health Information Portability and Accountability Act (HIPAA) privacy and security rules, which provide important baseline protections for patient information. The recent rules added by the Health Information Technology for Economic and Clinical Health Act offer improvements, but existing regulations remain insufficient to cover all of the emerging issues in this new and rapidly evolving environment. For instance, there are now many entities involved in the health information infrastructure that are not covered by HIPAA and other federal regulations. There is also still some ambiguity on the roles, rights, and responsibilities of the various entities involved. For example, a prominent finding in the IOM study on HIPAA and medical research indicates that lack of clarity of the rules and their inconsistent interpretation often pose as much of an obstacle to research as the rules themselves (IOM, 2009).

Limitation of the Informed Consent Model

In this approach, consent is still important but, as noted, is only one element of a comprehensive approach. Indeed, it may not even be the most important component necessary to ensure data security and patient privacy since too much emphasis on consent can often lead to weak privacy protection in practice (CDT, 2009). In practice, an over reliance on consent provides weak privacy protection since it shifts the burden of privacy protection to the individual as opposed to requiring that data holders to be good stewards of patient information that they use and maintain. The evidence is clear that individuals pay little attention to consent forms, and too often don’t understand the full implications of what they have agreed to.

To ensure the highest level of privacy and security, we need fair information best practices to govern the digital infrastructure for a learning health system. Individual participation and control (consent) should play a role, but other principles (transparency; data minimization, collection, use and disclosure limitations, accountability, and oversight) are equally important in building trust.



, Ph.D.


Vanderbilt University

In order to function efficiently and effectively, a learning health system requires reliable access to several critical pieces of information. First, it needs to be informed through knowledge that is derived from the healthcare system. This information must flow continually, so that the system can be updating through current patient experiences. The importance of this information is greater than simply ensuring the accuracy of a patient’s EHR. Rather, the provision of this information enables the evolution toward a system that is flexible and able to continually evolve. Second, a learning health system needs to access, and analyze, health information on large populations to inform decision support models that allow for personalized approaches to care.

HIPAA and Data De-Identification

The Health Information Portability and Accountability Act (HIPAA) defines protected health information as information that is explicitly linked to a particular individual or could reasonably be expected to allow individual identification. The HIPAA Privacy Rule permits health information to be shared without patient consent for “secondary” purposes in two ways.

First, HIPAA permits data to be shared without oversight or contractual use agreements provided the data are “de-identified”—which is not the same as “anonymous.” Rather, the regulation is designed to mitigate risk while facilitating the sharing of health information. De-identification can be achieved in two different ways: safe harbor and expert determination. Safe Harbor is satisfied when the data are stripped of 18 enumerated features. These include explicit identifiers (such as the individual’s name and Social Security number), as well as potential quasi-identifiers (such as the date of birth, gender, and zip code). In contrast, expert determination (sometimes referred to as the statistical standard) states that health information is de-identified if an expert uses generally acceptable scientific principles and methods to certify that the risk of identifying an individual is sufficiently small. In doing so, the expert must document the methods and the results of any analysis used to justify this determination. Additionally, the covered entity is prohibited from revealing any mechanisms generated in the process that would allow an individual to be re-identified.

If a covered entity believes that de-identification would hamper the ability to support a learning system, then it could opt for an alternative: the HIPAA limited dataset. Under this model, the covered entity continues to be prohibited from sharing explicit patient identifiers, but can provide dates and geographic information. The caveat, however, is that the recipient of such information must enter into a data use agreement that states the recipient cannot use the information in a way that would harm, or attempt to identify, the corresponding individuals.

De-Identified Data in a Learning Health System

What is easy? One thing that is relatively easy to do is to build automated approaches to find and suppress patients’ identifiers from structured health information. At the present time, there are currently no standards for representing identifiers, but there are various terminologies and message-based standards that we use to represent medical information. It would be fruitful to extend such languages to define types of identifiers.

What is not so easy? When repurposing an electronic medical record system, such as for clinical phenotyping of patients, we use natural language text. As a result, it is more challenging to guarantee the de-identification of this information. There exists software to automatically detect and suppress identifiers within natural language, but none are guaranteed to find all of the identifiers, all of the time. Even if the software is completely efficient, there is still no guarantee that the residual information would protect the corresponding individual from re-identification.

There are, however, alternatives to simply handing health information over to any interested recipient. For instance, we could construct an environment in which the clinical text is housed in a secure environment where an abstract programming interface allows users to submit programs to the system and retrieve aggregate statistics. This model has already been adopted by various statistical agencies around the world for providing access to sensitive governmental information.

What is hard? De-identification, and even aggregation, is not devoid of risks. The HIPAA safe harbor standard, for instance, leaves a certain portion of the population unique with respect to the residual demographics. Latanya Sweeney provided an example in her testimony before National Committee on Vital and Health Statistics several years ago, where she reported that 0.04% of the U.S. population is expected to be unique on residual demographics (NCVHS, 2007). The concern here is that such demographics have been linked to public resources that contain explicit identifiers to accomplish “re-identification.” Moreover, when considering the expert determination approach for de-identification, there is no clear designation of what the statistical threshold should be or who can be designated as an expert. It would help greatly if there was a certification process, something similar to a Certified Information Systems Security Personnel program. Furthermore, and perhaps most challenging, is the fact that de-identification tools could suppress potentially useful clinical information. This is a great concern if it influences the meaning of clinical evidence. For example, if the evidence is changed from “no evidence of myocardial infarction” to “evidence of myocardial infarction,” the statistics upon which the learning system is built could be subject to noise.

Common Challenges and Next Steps

Let us return to HIPAA from the perspective of challenges. At the present time, HIPAA does not make it easy to support longitudinal studies. If a patient was distributed across multiple covered entities, it would be difficult to resolve the patient’s presence without access to identifiers. In the healthcare domain, we can execute some record linkage techniques without revealing patient identifiers through certain cryptographic mechanisms, but the interpretation of HIPAA is such that we are not allowed to apply those encryption technologies even though the keys never get revealed. This is somewhat strange, because it could be guaranteed with very strong evidence that a recipient of such information could not determine who the corresponding patient is.

One notion that I wish to make clear is that the challenges I have alluded to are not necessarily insurmountable. In particular, many of the risks that various studies have promoted (such as the risk of re-identification) may be less of a concern than initially anticipated. We can, and have, quantified risks prior to disclosing health information. Once such measurements are in hand, we can mitigate the risks. These are things we should do. Additionally, we must recognize that not every dataset of health information is susceptible to re-identification in the same way. In a study conducted by Latanya Sweeney, it was shown that one could use publicly available voter registration lists, for instance, to re-identify patients in a de-identified dataset because they shared common demographics (Sweeney, 2002). However, in 2008 we went back and surveyed all the state electoral commissions to see what you would actually get if you purchased or found their voter registration lists. In our investigation we found that the cost of conducting identification is completely different across the states. For instance, in Wisconsin it costs almost $13,000 to purchase such a list, whereas in the state of Minnesota it only costs $46. But it is equally, if not more, important to recognize that the information available in such resources varies. Date of birth is provided in voter lists in the states of Tennessee, Washington, and Illinois, but not in the list published by the state of Wisconsin. Additionally, in the state of Minnesota, only the year of birth is shown. There are always ways of intelligently surpassing, generalizing, or perturbing information such that you preserve the aggregate statistics or the statistics that a learning health system requires.


I will conclude with three parting statements on HIPAA, privacy, and the learning health system. First, as a society we must recognize that privacy risks are context dependent. There is no silver bullet ensuring that if a covered entity de-identifies data according to a particular recipe it is sufficiently protected. Second, the healthcare community must define use cases for the health information to be utilized. If there are no use cases, technologists will not know how the learning system should look, and will be unable to design protections for health information that support a learning system. We probably will not be able to develop methods that support all possible needs in healthcare within the next several years, but we may be able to orient technologies that address some of the bigger challenges first. Moreover, when providing such use cases, it needs to be made clear who needs access to the data. Is it the public? Is it the employees of covered entities? The amount of trust we have in the anticipated recipient influences the amount of health information that can be reported and the way in which it is reported. Finally, we need to determine if the system can learn from the health data remotely. Do we really need to share all of the data with all of the recipients? Or can we enable an environment that is built upon query-response systems? The more control we have over where health information goes and when, the better chance we have of ensuring that is appropriately secured.



, Ph.D.


Argonne National Laboratory

A learning health system is “designed to: generate and apply the best evidence for the collaborative healthcare choices of each patient and provider; drive the process of discovery as a natural outgrowth of patient care; and ensure innovation, quality, safety, and value in health care” (IOM, 2007). The security challenge is to ensure that the wrong people do not learn the wrong things!

A learning health system requires data sharing on a far larger scale than today. This sharing must occur within a highly fragmented environment: most of the ~6,000 hospitals in the United States have restrictive and idiosyncratic data policies and practices, focused on avoiding risk rather than enabling learning. In this context, secure data sharing is as much a political as a technological challenge, and will require political as well as technological solutions. These comments are restricted to technology issues, and speak to the following questions: What can technology do and not do? What can we learn from other large-scale distributed systems in which sensitive data are shared on a large scale? What principles can guide us as we work to create systems that are sufficiently flexible to encompass not only today’s applications but those of the future; scalable to a large number of participants; and robust to various threats, including not only malicious acts but also human error and the challenges of complexity?

Defining the Problem

Often the hardest step in building a secure system is characterizing what the system is and what we mean by security. In the case of the U.S. healthcare system, we are dealing with thousands of hospitals, millions of patients, and tens of millions of visits. Participants differ in their institutional structures, cost structures, incentives, capabilities, and regulatory environments. Information technology is often deployed and operated with a view to risk mitigation or avoidance rather than to enable a learning health system. Data sharing is needed not only for individual patients, but also for population health and research studies. Additionally, sharing needs evolve over time, as, for example, an individual patient moves from one caregiver to another or a research project is established linking different organizations. The overall situation is one of complexity, diversity, and constant change.

Further complicating the problem is the fact that the security needs of this system are not well defined. Policy statements tend to speak in generalities, stating, for example, that we should ensure security and privacy, offer patients options, maintain appropriate levels of privacy and security, and build in security and privacy from the outset (IOM, 2007). None of these prescriptions is precise. HIPAA regulations try to be specific, but are open to interpretation and can depend on statistical tests (Jajosky and Groseclose, 2004). We also have political and social considerations, such as objections to universal identifiers and different views on opt in vs. opt out.

Principles for Building Secure Systems

Overall, we have a system that is highly complex and a definition of security that is far from clear. Designing technical solutions to achieve security in this context is a challenging and, perhaps in some sense, impossible task. Nevertheless, there are basic principles that, if followed, can help improve the quality of security solutions.

Auditability means that all actions are mapped to individuals and the origin of all data is unambiguous. Any healthcare security and privacy solution must inevitably combine technical protections with appropriate regulatory frameworks (including penalties for release of data). Thus, we need to build in auditing at a foundational level so that any action performed on healthcare information can be mapped to the individual who performed that action. Equally important, both for research purposes and to protect from other sorts of attacks—for example, delivery of incorrect data—is to ensure that all data can be mapped unambiguously to their origin. This latter requirement becomes increasingly important as patients become more mobile.

Scalability means that the cost of adding participants—whether new institutions or new individuals—is small. Without this property, technological obstacles too easily impede the new connections required to support patient mobility and research studies.

Transparency is important from two perspectives. First, we require transparency with respect to what it done with data and where it is stored. Second, we need transparency with respect to the policies that are being enforced and the consequences of those policies. If multiple policies are being applied, it should be easy to work out what that actually means for an individual’s data.

These principles may appear obvious, but it is striking how often systems deployed in healthcare settings ignore them. For example, we frequently see hospitals using virtual private networks (VPNs) to enable secure remote access. VPN technology is effective in protecting against snooping of messages transmitted between two points. However, it does not provide for scalability (every new participant requires an additional point-to-point VPN), auditability (there is no immediate control over who sees data when they are received from the remote location), or transparency (the policies that are enforced in this way are unclear, and the risks of information leakage hard to quantify). If, as is often the case, scaling is handled by adding more VPNs in an ad hoc manner, the result can easily become a complex system in which both usability and security are compromised.

Technology Success Stories

There are, fortunately, simple and well-understood methods that we can apply to help achieve auditability, scalability, and transparency. I describe three such methods here: attribute-based authorization, distributed attribute management, and end-to-end security. Each has been deployed and used on a large scale—for example, within grid systems such as the cancer Biomedical Informatics Grid (caBIG®), Biomedical Informatics Research Network, TeraGrid, and Open Science Grid—albeit for sharing either scientific data or clinical data for research purposes (Oster et al., 2008; Pordes et al., 2007). Many of these systems use technologies implemented within the Globus Toolkit (Foster, 2006).

Attribute-based authorization addresses the frequent (and fundamental) requirement in healthcare security to be able to control who can access a piece of data, software program, or other resource. This problem is often solved by associating an access control list—a list of authorized individuals—with each resource. However, the cost of change is then high. If Dr. X joins the team, Dr. X must be added to all relevant access control lists: a potentially complex and error-prone process.

Using attribute-based authorization, we express access control policies in terms of the properties that an individual must have in order to be allowed access. Properties can include the individual’s identity, but more commonly will be properties such as “has Institutional Review Board (IRB) approval for participating in study 123” or “is a faculty member in the department of surgery.” Attribute-based authorization provides scalability, because a single rule can govern any number of people that satisfy that rule. In addition, we end up with greater transparency. Instead of having to work out what Alice, Bob, and Chris have in common, we can read the access control rule to determine what condition applies. An important technology here is the eXtensible Access Control Markup Language, frequently used to express access control policies.

Distributed attribute management is an important adjunct to attribute-based authorization. The idea is that we rely on authoritative sources for all attributes. For example, an institution is likely the authoritative source for attributes concerning employment status and qualifications; the IRB for attributes concerning IRB approvals; and the National Institutes of Health for membership of study sections. Then, when an individual attempts to access a resource, the security system reaches out to each required authoritative source, each of which takes responsibility for ensuring that they are issued correctly. With the attributes in hand, the security system can then enforce appropriately the policies that apply at the individual resource. An important technology here is the Security Assertion Markup Language, which defines protocols and representations for requesting and communicating attribute assertions.

End-to-end security is a scalable, more capable alternative to VPNs. As we extract data from databases and move them to remote locations, there will typically be a set of things that we want to ensure happen: that the data are anonymized, that their provenance is documented, that they are not modified en route, and that privacy is preserved. We can achieve many of these things by wrapping the data in a cryptographic envelope that can then be processed appropriately as data move from one location to another. By thus packaging data in a manner that maintains key properties independent of context, we enhance our ability to achieve auditability, scalability, and transparency.


Security is a systems problem. Without clarity on the nature of the system we are securing, and what we mean by security, we will likely fail to create secure systems. We need to spend more time studying these issues within the context of a learning health system. Auditability, scalability, and transparency are all properties that we should seek to realize as we design a secure learning health system. In architecting security solutions, we can leverage attribute-based authorization, distributed attribute management, and end-to-end security—three methods that have been proven to scale and that tend to support these desirable properties.


  • AHRQ (Agency for Healthcare Research and Quality). Consumer engagement in developing electronic health information systems. 2009. [accessed January 31, 2011]. http://healthit​.ahrq​.gov/portal/​/gateway/PTARGS_0_9442_909189_0_0_18​/09-0081-EF.pdf.
  • CDT (Center for Democracy and Technology). Rethinking the role of consent in protecting health information privacy. 2009. [accessed January 31, 2011]. http://www​​/pdfs/20090126Consent.pdf.
  • CHCF (California HealthCare Foundation). National Consumer Health Privacy Survey 2005. 2005. [accessed January 31, 2011]. http://www​​/2005/11​/national-consumer-health-privacy-survey-2005.
  • Foster I. Globus Toolkit version 4: Software for service-oriented systems. Journal of Computational Science and Technology. 2006;21(4):523–530.
  • Harris Interactive. Many US adults are satisfied with use of their personal health information. 2007. [accessed January 31, 2011]. http://www​.harrisinteractive​.com/vault/Harris-Interactive-Poll-Research-Health-Privacy-2007-03.pdf.
  • IOM (Institute of Medicine). The learning healthcare system: Workshop summary. Washington, DC: The National Academies Press; 2007.
  • IOM (Institute of Medicine). Beyond the HIPAA privacy rule: Enhancing privacy, improving health through research. Washington, DC: The National Academies Press; 2009. [PubMed: 20662116]
  • Jajosky R, Groseclose S. Evaluation of reporting timeliness of public health surveillance systems for infectious diseases. BMC Public Health. 2004;4(1):29. [PMC free article: PMC509250] [PubMed: 15274746]
  • Markle Foundation. The common framework: Overview and principles. 2006. [accessed February 25, 2011]. http://www​​/sites/default/files​/Overview_Professionals.pdf.
  • NCVHS (National Committee on Vital and Health Statistics). Enhanced protections for uses of health data: A stewardship framework for “secondary uses” of electronically collected transmitted health data. 2007. [accessed February 25, 2011]. http://www​
  • Oster S, Langella S, Hastings S, Ervin D, Madduri R, Phillips J, Kurc T, Siebenlist F, Covitz P, Shanbhag K, Foster I, Saltz J. caGrid 1.0: An enterprise grid infrastructure for biomedical research. Journal of the American Medical Informatics Association. 2008;15(2):138–149. [PMC free article: PMC2274794] [PubMed: 18096909]
  • Pordes R, Petravick D, Kramer B, Olson D, Livny M, Roy A, Avery P, Blackburn K, Wenaus T, Würthwein F, Foster I, Gardner R, Wilde M, Blatecky A, McGee J, Quick R. The Open Science Grid; Paper presented at Scientific Discovery Through Advanced Computing (SciDAC) Conference. 2007.
  • Shortliffe E. Tracking e-health. Issues in Science and Technology. 2010;(Spring):92–95.
  • Sweeney L. k-Anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness, and Knowledge-Based Systems. 2002;10(5):557–570.
  • West DM, Miller EA. Digital medicine: Health care in the Internet era. Washington, DC: Brookings Institution Press; 2009.
Copyright © 2011, National Academy of Sciences.
Bookshelf ID: NBK83558


  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (2.7M)

Related information

  • PMC
    PubMed Central citations
  • PubMed
    Links to PubMed

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...