Our overall methodological approach, as described in this chapter, was guided by the Agency for Healthcare Research and Quality's (AHRQ's) Methods Guide for Effectiveness and Comparative Effectiveness Reviews (hereafter referred to as the Methods Guide).43 and by the methods used in the original Closing the Quality Gap series, drawing particularly on Volume 1, Series Overview and Methodology,34 and Volume 7, Care Coordination.17 Consistent with these earlier works, we adopted the framework developed by the Cochrane Effective Practice and Organisation of Care Review Group (EPOC) for relevant study designs, as follows: patient or cluster randomized controlled trials (RCTs; Key Questions [KQs] 1–4), nonrandomized cluster controlled trials (KQs 1–4), controlled before-and-after studies (KQs 1–4), and uncontrolled studies that include a pre- and post-intervention assessment (KQs 2–3 only). These designs can yield valid evidence about quality improvement interventions. Other key methodological decisions from this series include a focus on outpatient care and the inclusion of studies where the intervention seeks to improve outcomes for a broad and relatively unselected group of patients.

The main sections in this chapter reflect the elements of the protocol established for this evidence report, and certain methods map to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) checklist.44

Topic Refinement and Review Protocol

Topics for the Closing the Quality Gap: Revisiting the State of the Science series were solicited from the portfolio leads at AHRQ. Nominations included a brief background and context; the importance and/or rationale for the topic; the focus or population of interest; relevant outcomes; and references to recent or ongoing work. The following factors were considered in making final decisions about which of the nominated topics would be included in the series: the ability to focus and clarify the topic area appropriately; relevance to quality improvement and a systems approach; applicability to the Evidence-based Practice Center (EPC) program/amenability to systematic review; potential for duplication and/or overlap with other known or ongoing work; relevance and potential impact in improving care; and fit of the topics as a whole in reflecting the AHRQ portfolios.

The EPC refined the KQs via discussions with the EPC coordinating the Closing the Quality Gap: Revisiting the State of the Science series and with AHRQ. A Technical Expert Panel (TEP), with experts knowledgeable in the PCMH as primary care model, provided input during the protocol development process.

Literature Search Strategy

Search Strategy

KQs 1–3

For KQs 1-3, we searched PubMed®, the Cumulative Index to Nursing & Allied Health Literature (CINAHL®), and the Cochrane Database of Systematic Reviews (CDSR). Our search strategy used the National Library of Medicine's medical subject headings (MeSH) keyword nomenclature and text words for the medical home and related concepts, and for eligible study designs. Where possible, we used validated search filters (such as the Clinical Queries Filters in PubMed) and drew on other groups' experience in searching for quality improvement studies (e.g., EPOC). We included studies published in English and indexed from database inception through December 6, 2011 (PubMed), or March 30, 2011 (CINAHL and CDSR). The exact search strings used are given in Appendix A. All searches were designed and conducted in collaboration with an experienced search librarian.

We supplemented these electronic searches with a manual search of citations from a set of key primary and review articles.45-52

All citations were imported into an electronic bibliographic database (EndNote® Version X4; Thomson Reuters, Philadelphia, PA).

KQ 4

For KQ 4, we sought to identify ongoing or recently completed studies by searching the following databases using the search term “medical home”:

  • Clinical trials databases (e.g., ClinicalTrials.gov, 5/10/11)
  • Web sites of non-Federal PCMH funders (e.g., Commonwealth Fund, 7/20/11; Robert Wood Johnson Foundation, 6/6/11);
  • Databases of Federally funded studies; searched using the enGrant Scientific interface (www.engrant.com): AHRQ, Centers for Disease Control and Prevention [CDC], Health Services Research Projects in Progress [HSRProj], National Institutes of Health [NIH] Reporter (NIH Research Portfolio Online), Health Resources and Services Administration [HRSA], United States Department of Veterans Affairs [VA], and the Department of Defense; search dates 4/5 to 4/11/11. This search was updated on 1/18/12 for the final report.

Several Web-based sources (American College of Physicians [ACP], Centers for Medicare and Medicaid Services [CMS], National Academy for State Health Policy [NASHP], Patient-Centered Primary Care Collaborative [PCPCC]) did not have searchable databases. For these sites, we conducted manual searches, exploring all Web links that showed promise for relevant information:

  • Databases of PCMH demonstration programs (e.g., the Patient-Centered Primary Care Collaborative [www.pcpcc.net]); 4/11/11
  • Primary care professional societies sponsoring PCMH demonstration projects (e.g., ACP, at www.acponline.org/running_practice/pcmh/); 4/11/11
  • Databases of state-sponsored PCMH studies (e.g., NASHP); 4/11/11
  • CMS; 4/11/11

In addition, we sent letters to 10 contacts involved in state-level projects funded by CMS (contacts identified from documents available on the CMS Web site), and a letter to the VA Director of PCMH (designated Patient Aligned Care Teams [PACT] within the VA environment) demonstration labs, requesting information about any ongoing or recently completed studies.

Finally, we identified a published horizon scan that included interviews with key informants designed to collect detailed information about the participants, design, and implementation of ongoing PCMH programs.46 We used information from this horizon scan to verify and augment data obtained from the above-mentioned databases/study registries.

Inclusion and Exclusion Criteria

The criteria used to screen articles for inclusion/exclusion at both the title-and-abstract and full-text screening stages are detailed in Table 1 (see PICOTS section of Introduction for further details).

Table 1. Inclusion/exclusion criteria.

Table 1

Inclusion/exclusion criteria.

Study Selection

Using the criteria described in Table 1, two investigators independently reviewed each title and abstract for potential relevance to the KQs; articles included by either investigator underwent full-text screening. At the full-text screening stage, two investigators independently reviewed the full text of each article and indicated a decision to “include” or “exclude” the article for data abstraction. When the paired reviewers arrived at different decisions about whether to include or exclude an article, or about the reason for exclusion, we reached a final agreement through review and discussion among investigators. Articles meeting eligibility criteria were included for data abstraction. For KQ4, these procedures were modified such that a single screener initially reviewed all citations; final eligibility for data abstraction was determined by duplicate review. All screening decisions were made and tracked in a Distiller SR database (Evidence Partners Inc., Manotick, ON, Canada).

Data Extraction

The investigative team created forms for abstracting the data elements for the KQs. Based on their clinical and methodological expertise, a pair of researchers was assigned to abstract data from the eligible articles. One researcher abstracted the data, and the second over-read the article and the accompanying abstraction form to check for accuracy and completeness. Disagreements were resolved by consensus or by obtaining a third reviewer's opinion if consensus could not be reached by the first two investigators.

To aid in both reproducibility and standardization of data collection, researchers received data abstraction instructions directly on each form created specifically for this project within the DistillerSR data synthesis software program (Evidence Partners Inc., Manotick, ON, Canada). The abstraction form templates were pilot-tested with a sample of included articles to ensure that all relevant data elements were captured and that there was consistency and reproducibility across abstractors. Data abstraction forms for KQs 1–3 included: descriptions of the study design, study population, interventions and comparators, financial models, implementation methods, study outcomes, and study quality. Outcomes of interest included patient experiences, staff experiences, process of care, clinical outcomes, and economic outcomes. Appendix B provides a detailed listing of the data elements abstracted for KQs 1–3.

For KQ 4, we developed a less detailed data abstraction form, based on the expectation (which turned out to be correct) that descriptions of ongoing studies would not provide the necessary information for more detailed abstraction. Abstracted data were: basic study design; geographic location; study setting, including health care system; number of practices/physicians; payment reform/financial model; major components of the intervention/PCMH model; the comparator; types of outcomes being assessed; study dates; and source of funding. Appendix C provides a detailed listing of the data elements abstracted for KQ 4.

Quality (Risk of Bias) Assessment of Individual Studies

We assessed the quality/risk of bias of studies included for KQ 1 based on their reporting of relevant data. We evaluated the quality of individual studies using the approach described in AHRQ's General Methods Guide.43 To assess quality, we (1) classified the study design, (2) applied predefined criteria for quality and critical appraisal, and (3) arrived at a summary judgment of the study's quality (see Appendix D for details). To evaluate methodological quality, we applied criteria for each study type derived from core elements described in the Methods Guide. To indicate the summary judgment of the quality of the individual studies, we used the summary ratings of good, fair, and poor, based on the studies' adherence to well-accepted standard methodologies and the adequacy of the reporting (Table 2). For each study, one investigator assigned quality ratings, which were then over-read by a second investigator; disagreements were resolved by consensus or by a third investigator if agreement could not be reached.

Table 2. Definitions of overall quality ratings.

Table 2

Definitions of overall quality ratings.

For RCTs, we used the key criteria described in AHRQ's Methods Guide,43 adapted for this specific topic. These criteria include adequacy of randomization and allocation concealment; the comparability of groups at baseline; blinding; the completeness of followup and differential loss to followup; whether incomplete data were addressed appropriately; the validity of outcome measures; and conflict of interest. After considering each individual quality element, we assigned the study a global quality rating of good, fair, or poor, using definitions from the Methods Guide.

We anticipated that this review would identify and include nonrandomized clinical trials (see Table 1 for eligible study designs). Because of the complexity of PCMH-based interventions, studies may have included an observational control group that was not randomized. Per the AHRQ Methods Guide,43,54 threats to internal validity of systematic review conclusions based on observational studies were identified through assessment of the body of observational literature as a whole, with an examination of characteristics of individual studies. Study-specific issues that were considered include: potential for selection bias (i.e., degree of similarity between intervention and control patients); performance bias (i.e., differences in care provided to intervention and control patients not related to the study intervention); attribution and detection bias (i.e., whether outcomes were differentially detected between intervention and control groups); and magnitude of reported intervention effects (see the section on “Selecting Observational Studies for Comparing Medical Interventions” in AHRQ's Methods Guide.)43

Data Synthesis

We summarized key features of the included studies by KQ. For published studies, we created the following summary tables: overview table of basic study characteristics, intervention table giving details of the intervention, and a summary table of implementation strategies. Studies were categorized into those that explicitly tested the PCMH model and those that met our functional definition for PCMH but did not use the terms “PCMH” or “medical home”; the latter are referred to as “functional PCMH” studies in this report. Studies were evaluated initially in aggregate, and then by PCMH versus functional PCMH studies and adult versus pediatric studies. For KQ 1, we used a random-effects model to compute summary estimates of effect for hospitalizations and emergency department visits for the subset of studies using RCT designs. Summary estimates were calculated using Comprehensive Meta-analysis and are reported as summary risk ratios.55 For other outcomes, the study populations, designs, and outcomes were too variable for quantitative analysis, and results were accordingly synthesized qualitatively. Because the continuous measures used for most outcomes reported varied greatly across studies, we computed effect sizes, represented as the standardized mean difference (SMD), to aid interpretation. The SMD is useful when studies assess the same outcome but with different measures or scales. In this circumstance, it is necessary to standardize the results for the studies to a uniform scale to facilitate comparisons. We calculated the SMD for each study, using Hedges' g, by subtracting (at post-test) the average score of the control group from the average score of the experimental group and dividing the result by the pooled standard deviations (SDs) of the experimental and control groups. To aid interpretation, we standardized presentation such that beneficial effects for the medical home are presented as positive effect sizes.

We planned to use cross-case analyses to evaluate the association between independent variables (e.g., specific components of comprehensive PCMH) and study effect, using methods based on Miles and Huberman.56 However, there were too few studies and too little variability in outcomes to complete this exploratory analysis.

Strength of the Body of Evidence

We assessed the strength of evidence for the highest priority outcomes in KQ 1 using the approach described in AHRQ's Methods Guide.43,57 In brief, the Methods Guide recommends assessment of four domains: risk of bias, consistency, directness, and precision. Additional domains are to be used when appropriate: coherence, dose-response association, impact of plausible residual confounders, strength of association (magnitude of effect), and publication bias. These domains were considered qualitatively, and a summary rating assigned, after discussion by two reviewers, as “high,” “moderate,” or “low” strength of evidence. In some cases, high, moderate, or low ratings were impossible or imprudent to make; for example, when no evidence was available or when evidence on the outcome is too weak, sparse, or inconsistent to permit any conclusion to be drawn. In these situations, a grade of “insufficient” was assigned. This four-level rating scale consists of the following definitions:

  • High: High confidence that the evidence reflects the true effect. Further research is very unlikely to change our confidence in the estimate of effect.
  • Moderate: Moderate confidence that the evidence reflects the true effect. Further research may change our confidence in the estimate of effect and may change the estimate.
  • Low: Low confidence that the evidence reflects the true effect. Further research is likely to change the confidence in the estimate of effect and is likely to change the estimate.
  • Insufficient: Evidence either is unavailable or does not permit estimation of an effect.

We did not rate the strength of evidence for KQs 2–4 because these questions were purely descriptive.


Systematic evidence reviews are conducted to summarize knowledge and to support clinicians, patients, and policymakers in making informed decisions. “Does this information apply?” is the core question for decisionmakers weighing the usefulness and value of a specific intervention or choosing among interventions. Interventions that work well in one context may not in another. The primary aim of assessing applicability is to determine whether the results obtained under research conditions are likely to reflect the results that would be expected in broader populations under “real-world” conditions. In this particular instance, we focused on application to primary care populations.

We assessed applicability using methods described in the Methods Guide.58 In brief, this method uses the PICOTS (Populations, Interventions, Comparators, Outcomes, Timing, Settings) framework as a way to organize information relevant to applicability. We evaluated the applicability to clinical practice, paying special attention to study eligibility criteria, demographic features of the enrolled population (such as age, ethnicity, and sex), organizational context, and clinical relevance and timing of the outcome measures. We summarized issues of applicability qualitatively.

Peer Review and Public Commentary

The peer review process is our principal external quality-monitoring device. Nominations for peer reviewers were solicited from several sources, including the TEP and interested Federal agencies. Experts in PCMH as a primary care model and individuals representing stakeholder and user communities were invited to provide external peer review of the draft report; AHRQ and an associate editor also provided comments. The draft report was posted on AHRQ's Web site for public comment for 4 weeks, from December 6, 2011, to January 3, 2012. We have addressed all reviewer comments, revising the text as appropriate, and have documented everything in a disposition of comments report that will be made available 3 months after the Agency posts the final report on AHRQ's Web site. A list of peer reviewers submitting comments on the draft report is provided in the front matter of this report.