NCBI » Bookshelf » Health Services/Technology Assessment Text (HSTAT) » AHRQ Evidence Reports » Perinatal Depression: Prevalence, Screening Accuracy, and Screening Outcomes
 
hserta
AHRQ Evidence Reports
public health

Chapter  119:  Perinatal Depression: Prevalence, Screening Accuracy, and Screening Outcomes

A190711

Prepared for:

Agency for Healthcare Research and Quality

U.S. Department of Health and Human Services

540 Gaither Road

Rockville, MD 20850

www.ahrq.gov

Contract No. 290-02-0016

Prepared by:

RTI-University of North Carolina Evidence-based Practice Center

Research Triangle Park, North Carolina

Investigators

Bradley N. Gaynes, MD, MPH

Norma Gavin, PhD

Samantha Meltzer-Brody, MD, MPH

Kathleen N. Lohr, PhD

Tammeka Swinson, BA

Gerald Gartlehner, MD, MPH

Seth Brody, MD, MPH

William C. Miller, MD, PhD

AHRQ Publication No. 05-E006-2

February 2005

This document is in the public domain and may be used and reprinted without permission except those copyrighted materials noted for which further reproduction is prohibited without the specific permission of copyright holders.

This report may be used, in whole or in part, as the basis for development of clinical practice guidelines and other quality enhancement tools, or a basis for reimbursement and coverage policies. AHRQ or U.S. Department of Health and Human Services endorsement of such derivative products may not be stated or implied.

AHRQ is the lead Federal agency charged with supporting research designed to improve the quality of health care, reduce its cost, address patient safety and medical errors, and broaden access to essential services. AHRQ sponsors and conducts research that provides evidence-based information on health care outcomes; quality; and cost, use, and access. The information helps health care decisionmakers—patients and clinicians, health system leaders, and policymakers—make more informed decisions and improve the quality of health care services.

Suggested Citation:

Gaynes BN, Gavin N, Meltzer-Brody S, Lohr KN, Swinson T, Gartlehner G, Brody S, Miller WC. Perinatal Depression: Prevalence, Screening Accuracy, and Screening Outcomes. Evidence Report/Technology Assessment No. 119. (Prepared by the RTI-University of North Carolina Evidence-based Practice Center, under Contract No. 290-02-0016.) AHRQ Publication No. 05-E006-2. Rockville, MD: Agency for Healthcare Research and Quality. February 2005.

Prepared for:

Agency for Healthcare Research and Quality

U.S. Department of Health and Human Services

540 Gaither Road

Rockville, MD 20850

www.ahrq.gov

Contract No. 290-02-0016

Prepared by:

RTI-University of North Carolina Evidence-based Practice Center

Research Triangle Park, North Carolina

Investigators

Bradley N. Gaynes, MD, MPH

Norma Gavin, PhD

Samantha Meltzer-Brody, MD, MPH

Kathleen N. Lohr, PhD

Tammeka Swinson, BA

Gerald Gartlehner, MD, MPH

Seth Brody, MD, MPH

William C. Miller, MD, PhD

AHRQ Publication No. 05-E006-2

February 2005

This document is in the public domain and may be used and reprinted without permission except those copyrighted materials noted for which further reproduction is prohibited without the specific permission of copyright holders.

This report may be used, in whole or in part, as the basis for development of clinical practice guidelines and other quality enhancement tools, or a basis for reimbursement and coverage policies. AHRQ or U.S. Department of Health and Human Services endorsement of such derivative products may not be stated or implied.

AHRQ is the lead Federal agency charged with supporting research designed to improve the quality of health care, reduce its cost, address patient safety and medical errors, and broaden access to essential services. AHRQ sponsors and conducts research that provides evidence-based information on health care outcomes; quality; and cost, use, and access. The information helps health care decisionmakers—patients and clinicians, health system leaders, and policymakers—make more informed decisions and improve the quality of health care services.

Suggested Citation:

Gaynes BN, Gavin N, Meltzer-Brody S, Lohr KN, Swinson T, Gartlehner G, Brody S, Miller WC. Perinatal Depression: Prevalence, Screening Accuracy, and Screening Outcomes. Evidence Report/Technology Assessment No. 119. (Prepared by the RTI-University of North Carolina Evidence-based Practice Center, under Contract No. 290-02-0016.) AHRQ Publication No. 05-E006-2. Rockville, MD: Agency for Healthcare Research and Quality. February 2005.

Preface

The Agency for Healthcare Research and Quality (AHRQ), through its Evidence-Based Practice Centers (EPCs), sponsors the development of evidence reports and technology assessments to assist public- and private-sector organizations in their efforts to improve the quality of health care in the United States. The reports and assessments provide organizations with comprehensive, science-based information on common, costly medical conditions and new health care technologies. The EPCs systematically review the relevant scientific literature on topics assigned to them by AHRQ and conduct additional analyses when appropriate prior to developing their reports and assessments.

This report on perinatal depression was requested and funded by the Safe Motherhood Group (SMG). The SMG consists of representatives from several agencies within the U.S. Department of Health and Human Services (DHHS): the DHHS Office on Women's Health; Centers for Disease Control and Prevention; Health Resources and Services Administration; Maternal and Child Health Bureau; National Institutes of Health, National Institute of Mental Health, National Institute of Child Health and Human Development, National Institute on Drug Abuse, and the Office of Research on Women's Health; Food and Drug Administration; Substance Abuse and Mental Health Services Administration; and Agency for Healthcare Research and Quality.

To bring the broadest range of experts into the development of evidence reports and health technology assessments, AHRQ encourages the EPCs to form partnerships and enter into collaborations with other medical and research organizations. The EPCs work with these partner organizations to ensure that the evidence reports and technology assessments they produce will become building blocks for health care quality improvement projects throughout the Nation. The reports undergo peer review prior to their release.

AHRQ expects that the EPC evidence reports and technology assessments will inform individual health plans, providers, and purchasers as well as the health care system as a whole by providing important information to help improve health care quality.

We welcome comments on this evidence report. They may be sent by mail to the Task Order Officer named below at: Agency for Healthcare Research and Quality, 540 Gaither Road, Rockville, MD 20850, or by e-mail to epc@ahrq.gov.

Carolyn M. Clancy, M.D.

Director

Agency for Healthcare Research and Quality

Jean Slutsky, P.A., M.S.P.H.

Director, Center for Outcomes and Evidence

Agency for Healthcare Research and Quality

Kenneth S. Fink, M.D., M.G.A., M.P.H.

Director, EPC Program

Agency for Healthcare Research and Quality

Marian D. James, M.A., Ph.D.

EPC Program Task Order Officer

Agency for Healthcare Research and Quality

The authors of this report are responsible for its content. Statements in the report should not be construed as endorsement by the Agency for Healthcare Research and Quality or the U.S. Department of Health and Human Services of a particular drug, device, test, treatment, or other clinical service.

Structured Abstract

Context. Depression during pregnancy or the first year postpartum is impressively common and can have devastating consequences for the woman, her children, and other family members.

Objectives. We systematically review the evidence on (1) the prevalence and incidence of perinatal depression, (2) the accuracy of different screening instruments, and (3) the effectiveness of interventions for women screened as high risk for perinatal depression

Data Sources. MEDLINE, CINAHL, PsycINFO, Sociofile, and the Cochrane Library (1980 through March 2004); bibliographic hand searches; and experts.

Study Selection. The English-language studies assessed women for major depression alone or for major or minor depression. Studies of the prevalence and incidence of depression and the accuracy of screening tools had to include diagnostic confirmation by a reference standard. Studies involving interventions required a comparison group. Two reviewers independently evaluated each abstract to determine inclusion by consensus.

Data Extraction. A primary reviewer abstracted data on key variables from the articles directly into detailed evidence tables; a second reviewer confirmed accuracy.

Data Synthesis. We conducted a meta-analysis of the prevalence and incidence estimates to compute combined estimates for particular periods and points in time. We also conducted meta-analyses of the sensitivity and specificity of different screening instruments. For screening outcome studies, we were only able to synthesize qualitatively.

Results. We identified 30 studies of prevalence. For major depression alone, point prevalence estimates ranged from 3.1 percent to 4.9 percent at different times during pregnancy and 1.0 percent to 5.9 percent at different times during the first postpartum year. For major and minor depression, estimates of the point prevalence ranged from 8.5 percent to 11.0 percent during pregnancy and 6.5 percent to 12.9 percent during the first year postpartum. However, these prevalence estimates were not significantly different from those of similarly aged nonchildbearing women. Data on incidence were more limited.

We identified 10 studies of screening accuracy. One small study reported on accuracy during pregnancy. For postpartum depression, screeners appeared feasible, but the small number of depressed patients involved precluded identifying an optimal screener or threshold for screening. Screening instruments studied are generally good at identifying major depression alone, with accuracy consistent with reports from primary care settings, but they performed poorer for the major or minor depression category.

We found no studies directly testing whether screening improved outcomes. However, we identified 15 studies that used some sort of screening to identify women at risk of depression and for whom a subsequent intervention was provided. The results of four small studies of various psychosocial interventions during pregnancy did not demonstrate consistently superior outcomes. Results were also mixed for postpartum interventions. Six of nine studies of various psychosocial interventions reported significant improvement in depression for the experimental group. Two studies with pharmacologic interventions provided conflicting results.

Conclusions. Although limited, the available research suggests that depression is one of the most common perinatal complications and that fairly accurate and feasible screening measures are available. Studies with larger sample sizes and a greater racial and ethnic mix are needed. Researchers also need to determine whether screening itself leads to better access to proven treatment and improved outcome relative to usual care.

Chapter 1. Introduction

Depressive disorders are ubiquitous and remarkably impairing; they occur throughout the lifespan. Lifetime prevalence rates of depression from community-based surveys range from 4.9 percent to 17.1 percent.1–3 Gender plays an important role in the prevalence rates of depression; women report a history of major depression at nearly twice the rate of men.4 In particular, women of childbearing age are at high risk for major depression.2, 3, 5 Pregnancy and new motherhood may increase the risk of depressive episodes.

Depression is the leading cause of disease-related disability among women in the world.6 It can have devastating consequences, not only for the women experiencing it but also for the women's children and family.7–9 For example, Stein and colleagues found that the mother-child interactions of depressed mothers and their children were of lower quality than those of nondepressed mothers,10 and Flynn et al. found that maternal depression was related to both missed pediatric appointments and greater use of emergency department services.11 A review of other research in this area points out that parental depression has been linked to raised levels of psychiatric disturbances among children and to greater child insecurity in attachment relationships.7, 8

The importance of detecting and treating perinatal depression has only recently been recognized. Perinatal depression encompasses major and minor depressive episodes that occur either during pregnancy or within the first 12 months following delivery. Major depression is a distinct clinical syndrome for which treatment is clearly indicated,12 whereas the definition and management of minor depression are less clear. Minor depression is an impairing yet less severe constellation of depressive symptoms13 for which controlled trials have not consistently indicated whether particular interventions are more effective than placebo.14, 15 In this report, we address major depressive episodes alone, which we refer to as major depression, as well as a broader grouping of major or minor depression, which we refer to as such or by the more general terms “depression” or “depressive illness.” We necessarily rely on the specific definitions of minor depression used by the different authors of the reviewed studies.

Another mental disorder that can occur in the perinatal period is postpartum psychosis. Unlike postpartum depression, postpartum psychosis is a relatively rare event with an estimated incidence of 1.1 to 4.0 cases per 1,000 deliveries.16 The onset of postpartum psychosis is usually acute, within the first 2 weeks of delivery, and appears to be more common in women with a strong family history of bipolar or schizoaffective disorder.17 Postpartum psychosis is an important disorder in its own right, but it is not addressed specifically in this report.

Perinatal depression, major or minor, often goes unrecognized because many of the discomforts of pregnancy and the puerperium are similar to symptoms of depression.18, 19 The onset of major depression is believed to be impressively common in the postpartum period; researchers have found a 3-fold increase in the onset of major or minor depression in the first 5 weeks postpartum compared to women of similar age, marital status, and parity at nonchildbearing times.20 However, the precise levels of the prevalence and incidence of perinatal depression are uncertain. Published estimates of the rate of major or minor depression in the postpartum period range widely—from 5 percent to more than 25 percent of new mothers—depending on the assessment method, the timing of the assessment, and population characteristics.21–23

Although many screening instruments have been developed or modified to detect major or minor depression in pregnant and newly delivered women , the evidence on their screening accuracy relative to a reference standard has yet to be systematically reviewed and assessed.24 Evidence on the effectiveness of screening all pregnant women and providing a preventive intervention to those scoring at high risk has also not been systematically investigated and evaluated.24

Table 1. Key questions for the evidence report on perinatal depression
Key Question
1What is the incidence and prevalence of depression (major or minor) during pregnancy and during the postpartum period? Is it increased during pregnancy and the postpartum period compared to nonchildbearing periods?
2What is the accuracy of different screening tools for detecting depression during pregnancy and the postpartum period?
3Does prenatal or early postnatal screening for depressive symptoms with subsequent intervention lead to improved outcomes?
To address these gaps, the Agency for Healthcare Research and Quality (AHRQ) in collaboration with the Safe Motherhood Group (SMG) commissioned the RTI International-University of North Carolina (RTI-UNC) Evidence-based Practice Center (EPC) to conduct a systematic evidence review on three questions related to perinatal depression. These questions (provided in Table 1) address the prevalence and incidence of perinatal depression, the accuracy of screening instruments for perinatal depression, and the effectiveness of interventions for women who are found to be at high risk for developing perinatal depression.

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-peridepf1.jpg.

   Figure 1. Causal pathway for the screening and treatment of perinatal depression

We show a simple schematic of the causal pathway for the screening and treatment of perinatal depression and the links addressed by the three study questions in Figure 1. For all three questions, we begin with a general population of pregnant or postpartum women. The first key question addresses the percentage of this population diagnosed with depression at various points and periods of time throughout pregnancy and the first postpartum year—that is, the prevalence and incidence of the disorder. Prevalence and incidence can be measured in different ways and may vary by population characteristics. We synthesize available evidence on prevalence and incidence measured in a similar manner at or over the same general period of time and analyze the impact of selected population and study characteristics. Studies with a comparison group of women of similar age during nonchildbearing periods are also reviewed to determine whether the prevalence or incidence of depression increases during pregnancy and the first postpartum year.

The second key question addresses the accuracy of different screening instruments for postpartum depression—that is, how well different instruments detect pregnant or postpartum women who have depression (sensitivity) and pregnant and postpartum women who do not have depression (specificity). We identify and abstract English-language and non-English-language studies of various cutoff scores for a variety of commonly used instruments but review only the English-language studies.

Finally, we review studies that provide evidence on whether interventions can reduce the prevalence and incidence of perinatal depression for women who are screened and found to be at high risk for the disorder. We also summarize evidence in these studies on the effect of screening with subsequent intervention on other health outcomes for the woman and her infant. This third question addresses whether the screening process itself ultimately leads to improved outcomes for perinatal depression. Studies had to use some form of screening to identify women for testing interventions involving a technique to address psychological status in the woman and had to have an outcome measured related to depression severity.

In this report, we provide the results of our systematic search and review of the published literature for evidence addressing these questions. In conducting this study, our intent was to answer the questions using the most reliable evidence available, obtain a sense of the strength of the available evidence, and identify gaps in the knowledge base that require further research. We follow a discussion of our general approach and methods in Chapter 2 with discussions of each of the question-specific methods and findings (Chapters 3, 4, and 5). In Chapter 6, we discuss our main conclusions, comment on the state of the evidence, and offer an agenda for future research studies. Appendix A presents the exact search strings for the electronic database searches. Appendix B contains copies of our quality rating forms. Appendix C presents the evidence tables, Appendix D provides a list of excluded articles, and Appendix E provides acknowledgments.

Chapter 2. Methods

In conducting this systematic review, we followed standardized procedures developed by the Agency for Healthcare Research and Quality (AHRQ) in collaboration with all its Evidence-based Practice Centers (EPCs) for such reviews. This chapter documents how we implemented those procedures to answer the three key questions on perinatal depression. We first discuss the role of the Technical Expert Advisory Group (TEAG). We then describe our inclusion/exclusion criteria, our strategy for identifying articles relevant for addressing the key questions, and our process for abstracting relevant information from the eligible articles and generating evidence tables. We also discuss our criteria for grading the quality of individual articles and the strength of the evidence as a whole. Finally, we explain the peer review process.

Role of the Technical Expert Advisory Group

Throughout the project, we enlisted the assistance of a TEAG to react to work in progress and advise us on substantive issues or possibly overlooked areas of research. The TEAG included four individuals with collective expertise in obstetrics, psychiatry, psychology, and research methods and both clinical and research experience in perinatal depression (see Appendix E, Acknowledgments). As in all such systematic reviews, the TEAG contributed to AHRQ's broader goals of (1) creating and maintaining science partnerships as well as public-private partnerships and (2) meeting the needs of an array of potential customers and users of its products. Thus, the TEAG was both an additional resource and a sounding board during the project.

To ensure robust, scientifically relevant work, we called on the TEAG to participate in conference calls and discussions through e-mail to

  • refine the analytic framework and key questions at the beginning of the project;

  • discuss the preliminary assessment of the literature, including inclusion/exclusion criteria;

  • identify relevant literature not revealed through our literature searches;

  • provide input on the information and categories included in evidence tables;

  • review proposed methods for data synthesis; and

  • help interpret preliminary findings.

Because of their extensive knowledge of this topic, we also asked TEAG members to participate in the external peer review of the draft report.

Literature Search Strategy

To ensure a comprehensive and reproducible literature search and appraisal, we identified relevant research studies using an explicit search strategy and uniformly applied a set of inclusion and exclusion criteria to the identified studies. We describe our criteria and approach in this section.

Inclusion and Exclusion Criteria

Table 2. Inclusion/exclusion criteria by key question
CategoryInclusionExclusion
All Key Questions
Publication date1980 through March 2004
SettingDeveloped countries onlyLess-developed countries
Any clinical setting or homes
PopulationsHumans onlyAnimal studies
Depressive illness assessed during pregnancy or first postpartum yearTrials addressing exclusively bipolar disorder, a primary psychotic disorder, or maternity blues
Study designOriginal dataCase reports, case series, letters, editorials, and non-systematic reviews that have no original data
Prevalence and Incidence (Key Question 1)
Study designPrevalence or incidence study
Epidemiologic cohort or weighted to be representative
Study populationDiagnosis of major depressive episode or postpartum depressive episode using criterion standard (see text)Depressive disorder identified only by screen
Screening Accuracy (Key Question 2)
Study designMust have criterion standard (see text)Case-control studies
Studies must be prospective
Outcomes of interestSensitivity and specificity
Study populationPatients who are screened for depression during pregnancy or during 12 months postpartumPatients with known current depressive episode
Screening Interventions Criteria (Key Question 3)
Study designRandomized controlled trial or prospective cohort studyCase-control studies
Outcomes of interestClinical status and functioning
Study populationPatients identified by a screen during pregnancy or during 12 months postpartum as being at high risk of having depressionPatients with known current depressive episode
To identify relevant studies, we generated a list of inclusion and exclusion criteria for each key question. We made the criteria fairly restrictive to ensure that our conclusions would be based on the highest quality data available with the lowest risk of bias. Some criteria were common across the three key questions; others were specific to the question. Table 2 summarizes the criteria.

For all key questions, studies had to report on original data, be in English, and be published from January 1980 through March 2004. This time frame ensured that the applied reference standards were consistent with the Diagnostic and Statistical Manual for Mental Disorders, Third Edition (DSM-III), or later criteria for the diagnosis of depression. The study could be conducted in any clinical setting or home but had to be from a developed country to increase the likelihood of being generalizable to the US population. In our original criteria submitted in the research proposal, we proposed including only studies done in the United States, the United Kingdom and other Commonwealth/English-speaking countries, Europe, and Scandinavia. However, we determined after abstract review that such limitations would leave out a large number of relevant studies. Therefore, we modified our inclusion criteria to accept any study conducted in developed countries where the population could be generalized to pregnant and postpartum women in the United States, regardless of the language spoken. We excluded studies published before 1980 or in a language other than English and those on women in less developed countries. We also excluded studies of women with major or minor depression in which the outcomes of interest were not distinguishable from those for women with bipolar disorder, primary psychotic disorders, or maternity blues.

In addition, studies for all key questions had to assess women for major depression either alone or together with minor depression during pregnancy or the first year postpartum by means of a clinical assessment or structured clinical interview. For Key Question (KQ) 1, we excluded studies of the prevalence and incidence of perinatal depression that relied solely on self-report screens to identify depression. For KQs 2 and 3, we excluded studies that included women with known depressive disorders at the outset. In KQ 2, study investigators used the clinical assessment or structured clinical interview as the criterion or gold standard with which to assess the properties of the screening instrument. In many KQ 3 studies, investigators used the clinical assessment to measure the depression outcomes from screening with subsequent intervention among women found to be at elevated risk of depression. Studies that measured women's mood using self-report measures only were also included in KQ 3.

For KQ 1, we included both prospective and retrospective studies of the prevalence and incidence of perinatal depression and studies that were conducted for purposes other than determining the prevalence and incidence of perinatal depression but nevertheless included a population-based estimate meeting the other inclusion criteria (e.g., studies of the properties of screening instruments). Furthermore, to answer the second part of KQ 1, we included both clinical trials and case-control studies comparing the incidence or prevalence of depression among pregnant women and newly delivered mothers to prevalence among women of similar age during other nonchildbearing periods of their lives. We included only prospective studies in those reviewed for KQs 2 and 3.

Literature Search and Retrieval Process

We used three strategies to identify studies providing evidence related to the key questions: systematic searches of electronic databases using both search terms and author names, hand searches of reference lists of included articles, and consultation with the TEAG. First, we generated a list of Medical Subject Heading (MeSH) search terms for each key question in the feasibility study. We used these terms to search standard electronic databases: MEDLINE, Cumulative Index to Nursing & Allied Health Literature (CINAHL), PsycINFO, Sociofile, and the Cochrane Library.

We conducted the electronic database searches twice. We initially did them in April 2003 for the feasibility study.24 That study included three additional key questions, including questions on natural history, risk factors, and treatment effectiveness for perinatal depression. We found relevant articles for the three key questions of the current study under the natural history and treatment effectiveness searches. We therefore conducted these and the incidence or prevalence and mass screening searches again in March 2004 to capture any studies published and posted in the interim.

Table 3. Literature search strategies and yield
Key QuestionSearch TermsYield
AllMEDLINE and& CINAHL: (‘Puerperal Disorders’ and (Depression or ‘Depressive Disorder’)) or ‘Depression, Postpartum/ or perinatal depression.mp’
PsycINFO: “Depression, Postpartum”
Sociofile: “Postpartum Depression”
KQ 1… and “Natural History” or “Cohort Studies” or “Longitudinal Studies” orMEDLINE = 165
… and Incidence or PrevalenceCINAHL = 42
PsycINFO = 88
Sociofile = 21
Total unduplicated = 256
KQ 2… and “Mass Screening”MEDLINE = 67
CINAHL = 25
PsycINFO = 28
Sociofile = 1
Total = unduplicated 96
KQ 3… and treatment.mp or Therapeutics or “treatment failure” or “treatment outcomes” or “treatment duration” or treatment errors” or “treatment delay” or “treatment complications”MEDLINE = 513
CINAHL = 90
PsycINFO = 91
Sociofile = 5
Total unduplicated = 485
The subject headings used and the total yield from each source are shown in Table 3 by key question. We found a total of 837 unduplicated citations in the electronic searches and picked up an additional 9 citations through the hand searches and discussion with the TEAG, for a total of 846 citations. We also searched the Cochrane Collaboration database for prior systematic reviews using the keywords “perinatal” and “depression.” This search yielded 38 reviews.

Three senior reviewers with clinical expertise in perinatal depression reviewed the abstracts of articles identified during the literature search. Two clinicians evaluated each abstract against the inclusion criteria and resolved any differences in inclusion by consensus. In several instances, the abstracts did not provide enough information to make an inclusion decision; we pulled full articles to review for those studies. Of the 846 articles identified, 729 did not meet the inclusion criteria for any of the key questions and were therefore excluded, 8 studies were pulled for background only, and the remaining 109 articles were pulled for a full review.

Among the 109 studies pulled for full review, 50 did not meet our inclusion/exclusion criteria for any of the three key questions. The most common reason for exclusion was the absence of a gold standard (i.e., either a clinical assessment or structured clinical interview) for assessing depression, which eliminated 26 studies. Ten of the studies pulled for the evaluation of the properties of screening instruments were excluded because they did not report sensitivity and specificity or data from which these statistics could be computed. Other reasons for exclusion were depression assessed after the first year postpartum, no depression outcome measure, a retrospective study design, and restriction of the study sample to specific population subgroups (e.g., teens, patients of psychiatric hospitals). We based the last exclusion on two lines of reasoning. First, although groups such as adolescents are a key subgroup, our charge was to ensure that our results were generalizable to the broader US population. Second, these specific subpopulations are different enough from the remainder of the population that they warrant separate consideration. We excluded only one study because it was limited to an adolescent population.

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-peridepf2.jpg.

   Figure 2. Perinatal depression article disposition

We included the remaining 59 studies in our review, and some met the inclusion criteria for more than one key question. We abstracted 30 studies for KQ 1, 23 for KQ 2, and 15 for KQ3. We provide a graphical presentation of the disposition of the citations in Figure 2.

Data Collection and Assessment

The data collection process involved abstracting relevant information from the eligible articles and generating evidence tables that present the key details of the study design and the major findings from the articles. A trained member of the study team read and abstracted each article; a second member checked the table entries for accuracy against the original article.

Appendix C contains the final evidence tables in their entirety. They provide the study design details and major findings. The dimensions of each study design abstracted vary by key question, but they contain some common elements, such as author, year of publication, study location (e.g., country, state), population description, and sample size. We also collected information on the clinical interview instrument and diagnostic criteria used to diagnose depression and the age and racial and ethnic distribution of study subjects in each study.

The study results are recorded in the form reported in the article. However, for assessing consistency of results across the studies and for combining study results in a meta-analysis (see below), we also transformed the study results when necessary into consistent outcome measures using the appropriate statistical formulas. These computed data elements are shown in bold in the evidence tables (Appendix C).

We conducted data abstraction electronically in a word processing program and in such a way that study identifiers and results were easily transferred from the forms to electronic files for input into programs for meta-analysis.

Meta-analysis

We conducted a meta-analysis of the different prevalence and incidence estimates from studies abstracted for KQ 1 to arrive at single prevalence and incidence estimates for particular periods and points in time. We elaborate on these methods in Chapter 3. We also conducted meta-analyses of the different estimates of the receiver operating characteristics (ROC) curves for screening instruments evaluated for KQ 2, as described in Chapter 4. Because of the diversity of screening instruments and prevention interventions in the studies found for KQ 3, we did not conduct a meta-analysis for this key question.

Quality of Individual Articles

At the same time that we abstracted information on the study designs and findings in the included articles, we rated the quality of the studies. We developed a quality rating form for the screening accuracy (KQ 2) articles from criteria identified by the Cochrane Methods Working Group on Systematic Review of Screening and Diagnostic Tests.25 For studies addressing KQ1 and KQ 3, we modified the quality rating forms developed by Downs and Black for RCTs and observational studies.26 These forms are provided in Appendix B.

The quality rating forms rated the reporting completeness and clarity, external validity, internal validity, and the power or precision of each study for the relevant key questions. Hence, the ratings refer to the usefulness or quality of the article for our purposes and not necessarily for the original purpose of the research or article. Studies that were included in more than one key question were rated separately for each key question. The specific quality items rated are described in more detail in Chapters 3, 4, and 5 for KQs 1, 2, and 3, respectively.

The senior abstractor completed the quality rating form for each article; another project team member then reviewed the completed form for accuracy and completeness. The overall quality scores of these articles are recorded in the evidence tables (Appendix C); scores on each of the domains are provided in Chapters 3, 4, and 5. All graded studies were included in the analysis regardless of their quality score. However, evidence from studies graded as poor were given less weight in the qualitative and quantitative syntheses and discussion.

Strength of Overall Evidence

In addition to the individual studies, we also rated the strength of the collective evidence on each key question. We applied four separate criteria: (1) number of studies, (2) aggregate sample sizes over the studies, (3) quality of the individual studies, and (4) representativeness of the study populations in the studies.

External Peer Review

As is customary for all evidence reports and systematic reviews done for AHRQ, the RTI-UNC EPC requested review of this report from a wide array of outside experts in the field and from relevant professional societies and public organizations. AHRQ has also requested review from its own staff and appropriate federal agencies. We provide a list of the external peer reviewers in Appendix E. This report reflects substantive and editorial comments from this external peer review.

Chapter 3. Prevalence and Incidence of Perinatal Depression

Introduction

Perinatal depression is generally recognized to be a common affliction among women during pregnancy and the first postpartum year. However, estimates of the prevalence and incidence of the condition vary widely—from 5 percent to more than 25 percent of pregnant women and new mothers—depending on the assessment method, the timing of the assessment, and population characteristics.21, 22, 27 To estimate disease burden more accurately and thereby better target and prioritize health care expenditures, we need more precise estimates of the prevalence and incidence of perinatal depression.

Two prior systematic reviews of the prevalence of perinatal depression—one for the early postpartum months and the other for pregnancy—are notable. O'Hara and Swain conducted the first meta-analysis of the prevalence of postpartum depression and investigated sources of variability in the prevalence estimates across studies.21 The authors combined estimates from 59 studies in which depression had been assessed at least 2 weeks postpartum using either a clinical interview or a validated self-report measure with an established cutoff (i.e., Beck Depression Inventory [BDI] ≥ 10; Edinburgh Postnatal Depression Scale [EPDS] ≥ 13; Zung Depression Scale ≥ 48; Center for Epidemiological Studies—Depression [CES-D] scale ≥ 16). Based on a total sample of 12,810 postpartum women, they estimated the average prevalence of postpartum depression to be 13.0 percent, with a 95% confidence interval (CI) of 12.3 percent to 13.4 percent. They found that self-report measures yielded significantly higher estimates of postpartum depression than interview-based methods and that longer evaluation periods resulted in higher estimates. The number of days postpartum when the depression assessment was made and the country in which the study was conducted did not significantly affect the prevalence estimates in their analysis.

More recently, Bennett et al. conducted a meta-analysis of prevalence estimates for depression during pregnancy.27 The authors combined estimates from 21 studies meeting predetermined inclusion criteria, including the assessment of depression by a structured clinical interview, the BDI, or the EPDS. Based on a total sample of 19,284 pregnant women, they estimated the prevalence of depression to be 7.4 percent (95% CI, 2.2 percent to 12.6 percent) during the first trimester, 12.8 percent (95% CI, 10.7 percent to 14.8 percent) during the second trimester, and 12.0 percent (95% CI, 7.4 percent to 16.7 percent) during the third trimester. The 95% CIs of these estimates overlap substantially, indicating that, given available evidence, the prevalence of depression during pregnancy cannot be said to differ significantly by trimester. The authors also found that, compared with structured clinical interviews, the self-report BDI produced significantly higher prevalence estimates, whereas the self-report EPDS produced statistically equivalent estimates.

Several factors point to the need for a reassessment of the prevalence of depression during pregnancy and the postpartum period at this time. First, the clinical definition of major depression has changed over time, becoming more precise. Definitions of major depression prior to the 1987 revision of the Diagnostic and Statistical Manual of Mental Disorders, Third Edition (DSM-III-R), were broader than subsequent definitions and likely included some minor depression and dysthymia. Minor depression is a proposed diagnosis for further study for which the 1994 DSM, fourth edition (DSM-IV), has defined research criteria;28 however, it has not yet been added to the DSM-IV. Furthermore, DSM-IV has an even more precise definition of major depression, requiring a minimum number of depressive symptoms and functional impairment, whereas DSM-III-R required only counts of depressive symptoms. Most of the literature reviewed in the O'Hara and Swain study21 (published in 1996) was published before 1994. Determining whether more recent studies affect the combined prevalence estimates and CIs is crucial to improving understanding of this disorder.

Most of the studies in Bennett et al.27 (done in 2004) were published after 1994. Whether the combined prevalence estimates refer to major and minor depression together or major depression alone is not clear. The text of the article discusses major depression, but the tables clearly indicate the inclusion of minor depression.

Second, neither review distinguished between measures of the point prevalence, the percentage of the population with depression at a given point in time (e.g., at 24 weeks gestational age or 9 weeks postpartum), and measures of period prevalence, the percentage of the population with depression over a period of time (e.g., during pregnancy or from delivery to the end of the first 3 months postpartum). Both types of estimates are used for the single combined prevalence estimates, although O'Hara and Swain did test the effect of differing time points and durations for the depression assessment in a meta-regression.

Third, neither of the reviews presented evidence of the incidence of perinatal depression—the percentage of the population with depressive episodes that begin within a given period of time.

Fourth, overall prevalence estimates from both reviews are confounded by false positives because they included prevalence estimates from studies that assessed depression with self-report instruments. As mentioned above, both systematic reviews found that self-report instruments produce significantly higher prevalence estimates than do clinical interviews.

Finally, although both systematic reviews discussed prevalence estimates for women who were not pregnant and had not recently delivered a child, neither study rigorously reviewed the evidence that compares depression rates for women during pregnancy and the first postpartum year to the rates for women of a similar age during nonchildbearing times.

This chapter reviews the literature addressing Key Question (KQ) 1: What is the prevalence and incidence of depression (major and minor) during pregnancy and during the first year postpartum? Is the prevalence or incidence increased during pregnancy and the first postpartum year compared to nonchildbearing periods?

Methods

We abstracted study features and all estimates of the prevalence and incidence of major and minor depression together and of major depression alone from the 30 included studies found through our literature searches described in Chapter 2. During the abstraction process, we graded the quality of the study based on selected study features. We then analyzed the estimates using a variety of meta-analytic methods described in this section.

Evaluation of the Quality and Strength of the Evidence

Appendix B presents the quality rating form used for articles considered for KQ 1. The total possible score for these studies was 20 for studies without a comparison group and 25 for studies with a comparison group. For both types of studies, we considered those articles with a score of 16 or greater to be good, those with scores between 10 and 15 to be fair, and those with scores of 9 and below to be poor. The domains and maximum points possible for each domain are as follows:

  • Reporting (domain score of 9): Eight items covering study aims, measures, patient populations, findings, and statistical presentation; each scored yes or no (1 or 0), except for an item concerning principal confounders that was scored yes, partially, or no (2, 1, or 0, respectively).

  • External validity (domain score of 3): Three items relating to the representativeness of populations from which people were recruited and of settings and clinicians that treat such patients; each scored yes, no, or unable to determine (1, 0, or 0, respectively).

  • Internal validity—bias (domain score of 3): Three items relating to issues such as validation of the depression diagnosis through clinical interview, follow-up periods, and appropriate statistical tests; each scored yes, no, or unable to determine (1, 0, or 0, respectively).

  • Internal validity—confounding (domain score of 2 for studies without a comparison group and 4 for studies with a comparison group): Two items relating to sources of comparison groups, one for the adequacy of adjustments for confounding, and one for the handling of loss to follow-up; each scored yes, no, or unable to determine (1, 0, or 0, respectively).

  • Precision (domain score of 3 for studies without a comparison group and 6 with a comparison group): One item relating to the number of pregnant or postpartum women assessed for depression, with scores of 3 for more than 1,000 women, 2 for 250 to 1,000 women, 1 for 30 to 250 women, and 0 for fewer than 30 women. For studies with a comparison group, a second item gave points based on the size of the smallest comparison group: a score of 3 for more than 2,000 women, 2 for 1,000 to 2,000 women, 1 for 500 to 1,000 women, and 0 for fewer than 500 women.

Best Estimates of Prevalence and Incidence

We abstracted all estimates of the prevalence and incidence of major and minor depression together and major depression alone. We distinguished prevalence estimates by whether they were point or period estimates and both prevalence and incidence estimates by the time period covered. Time periods for point prevalence estimates were defined as trimesters during pregnancy and months during the first postpartum year. Estimates taken at different weeks of gestation but within the same trimester of pregnancy were considered as being conducted in the same time period (e.g., estimates taken week 14 through week 27 of gestation were considered the second trimester). Similarly, estimates taken at different weeks postpartum but within the same month postpartum were considered within the same time period (e.g., estimates taken during week 1 through week 4 postpartum would be considered month 1; week 5 through week 9 postpartum, month 2). Where we found two or more estimates within the same trimester of pregnancy or month postpartum, we used meta-analysis to obtain a combined estimate for that trimester or month. We then graphed the resulting estimates to determine how they changed throughout pregnancy and the first postpartum year.

We conducted similar procedures for period prevalence and incidence estimates. The relevant time periods were either single trimesters and months or multiple trimesters and months. Because we found fewer estimates of these types, however, we graphed period prevalence and incidence estimates for only the first 3 months postpartum.

We combined all estimates with the same diagnosis, estimate type, and time period using the meta command in Stata. This procedure uses the inverse-variance weighting method to calculate random effects summary estimates. It also produces (1) Q tests of the homogeneity of the estimates and (2) forest plots of the individual study and combined estimates and their CIs. To satisfy the normalcy assumptions of these methods, we first transformed the prevalence estimates into log odds estimates.

We reviewed the forest plots of the studies in each summary estimate to determine whether we could identify the source of any heterogeneity between studies. We then reran the meta-analyses excluding studies that were obvious outliers and for which we could identify the source of the bias. The new summary estimates are considered our best estimates of the prevalence and incidence of perinatal depression for the general female population in the United States.

Analysis of Confounders

To analyze associations between the prevalence of depression and study characteristics, we conducted cumulative meta-analysis and a series of meta-regressions on the point prevalence estimates for major and minor depression together and major depression alone. In the cumulative meta-analysis, we added studies one by one, based on publication year, to produce a new combined estimate with the cumulative evidence for each year. This procedure allowed us to see trends in the estimate over time. We conducted cumulative meta-analysis on the 2-month point prevalence estimates using the metacum command in Stata.

We then used the Stata metareg command to estimate several different meta-regression models. For all models, we used the log odds as the dependent variable and included the time point at which depression was assessed and indicators for whether the study enrolled only low-risk women and only women of low socioeconomic status (SES) as explanatory variables. The time point was represented by a categorical variable with included values for the first, second, and third trimesters and the first, second, and third months postpartum. The reference category for this variable was 4 to 12 months postpartum.

We estimated seven different models. Each had a different set of additional explanatory variables:

  1. No additional explanatory variables;

  2. Publication year;

  3. Study country, categorized as the United States (the reference category), other western countries, and Asian countries;

  4. Interview type, categorized as the Schedule for Affective Disorders and Schizophrenia (SADS) (the reference category), the Structured Clinical Interview for DSM Diagnoses (SCID), and other interview types;

  5. Diagnostic criteria, categorized as Research Diagnostic Criteria (RDC) (the reference category), DSM III-R, DSM IV, and other criteria;

  6. Whether depression was assessed only for women who were designated as at risk based on a screening instrument; and

  7. The quality rating score.

Comparison with Other Women

To answer the second part of KQ 1, whether the prevalence and incidence of depression is higher during pregnancy and the first year postpartum compared to nonchildbearing periods, we computed odds ratios for studies with a comparison group of women of similar age during nonchildbearing times. Because the types and timing of prevalence and incidence estimates did not overlap in these studies, except for one time point, we did not conduct meta-analyses of the log odds ratios.

Results

We found 28 prospective studies and two retrospective studies that met our inclusion criteria. Only three of the prospective studies included a comparison group of nonpregnant women of similar age.a In this section, we first describe the study characteristics and then present our analysis of the study results.

Study Characteristics

Table 4. Major characteristics of studies of prevalence and incidence of perinatal depression
Author, YearCountrySample SizeWho InterviewedWhen InterviewedInterview TypeDiagnostic Criteria
Prospective Cohort Studies without Comparison Groups
Affonso et al., 199029US202AllPregnancy & PPSADS-PPGRDC
Areias et al., 199630Portugal54AllPregnancy & PPSADSRDC
Berle et al., 200331Norway411All EPDS ≥ 8 & some < 8PPMINI-V4.4/ MADRSDSM-IV
Campbell and Cohn, 199132US1,033AllPPSADSRDC
Cooper et al., 199633England4,964EPDS ≥ 8PPSCIDDSM-III-R
Cox et al., 198234Scotland105AllPPSPIPitt's
Garcia-Esteve et al., 200335Spain1,123All EPDS ≥ 9 & some < 9PPSCID-NPDSM-IV
Gotlib et al., 198936Canada295All BDI ≥ 10 & some < 10Pregnancy & PPSADSRDC
Hobfoll et al., 199537US192AllPregnancy & PPSADSRDC
Kent et al., 199938Australia710GHQ28 > 4PPCIDI-ADSM-III-R
Kitamura et al., 199339Japan120AllPregnancySADS/ SADS-CRDC
Kitamura et al., 199940Japan111AllPregnancy & PPSADSRDC
Kumar and Robson, 198441England196AllPregnancy & PPSPIRDC
Lee et al., 200142Hong Kong781All GHQ > 4 & some ≤ 4PPModified SCIDModified DSM-III-R
Lee et al., 200143Hong Kong145AllPPModified SCIDModified DSM-III-R
Lucas et al., 200144Spain641BDI > 21PPNot specifiedDSM-III-R
Matthey et al., 200345Australia408AllPPDISDSM-IV
Murray and Cox, 199046England100AllPregnancySPIRDC
O'Hara et al., 198419US99AllPregnancy & PPSADSRDC
Pop et al., 199347Netherlands293AllPregnancy & PPNot specifiedRDC
Watson et al., 198448England128AllPregnancy & PPSPIICD-9
Whiffen, 198849Canada115AllPPSADSRDC
Yamashita et al., 200050Japan88AllPPSADSRDC
Yonkers et al., 200123US802All IDS ≥ 18 or EPDS ≥ 12 & some < 12PPSCIDDSM-IV
Yoshida et al., 199751England98AllPPSADSRDC
Prospective Studies with Comparison Groups
Cooper et al., 198852England483 casesAll GHQ ≥ 12 & some < 12PPPSE/ MADRSPSE ID/ Catego Class
313 controls
Cox et al., 199320England232 casesAll EPDS ≥ 9 & some < 9PPSPIRDC
232 controls
O'Hara et al., 199053US182 casesAllPregnancy & PPSADSRDC
179 controls
Retrospective Studies
Bryan et al., 199954US403PPMedical recordsDiagnosis of 2 or more symptoms
Georgiopoulos et al., 200155US342PPMedical recordsDiagnosis

BDI, Beck Depression Inventory; CIDI-A, Composite International Diagnostic Interview; DIS, Diagnostic Inventory Schedule; DSM-III-R, Diagnostic and Statistical Manual of Mental Disorders, Third Edition, Revised; DSM-IV, Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition; EPDS, Edinburgh Postnatal Depression Scale; GHQ, General Health Questionnaire; ICD-9, International Classification of Diseases, Ninth Edition; MADRS, Montgomery-Asburg Depression Rating Scale; MINI-V4.4, Mini International Neuropsychiatric Interview, Version 4.4; PP, postpartum; PSE, Present State Examination; PSE ID, PSE Index of Definition; RDC, Research Diagnostic Criteria; SADS, Schedule for Affective Disorders and Schizophrenia; SADS-C, SADS Change Version; SADS-PPG, SADS-Pregnancy and Postpartum Guidelines; SCID, Structured Clinical Interview for DSM-III-R; SCID-NP, Structured Clinical Interview for DSM-III-R, Non-Patient Version; SPI, Standardized Psychiatric Interview.

The major characteristics of the 30 studies are summarized in Table 4 by study type and alphabetically within type. The 25 prospective studies without a comparison group are shown first,19, 23, 29–51 followed by the three prospective studies with a comparison group,20, 52, 53 and, finally, the two retrospective studies.54, 55 Important study characteristics include the precision or size of the studies, the representativeness of the study populations, the methods and timing used to assess the mother's mood, and the quality rating of study design. Each of these characteristics is addressed in turn below.

Precision. The study sample sizes ranged from 54 to 4,964 women; the median sample size was 202 women. Although all the studies had an adequate sample size to provide a prevalence estimate of 10 percent with 80 percent power at a 95% confidence level, most were not large enough to allow subgroup analyses.

The three studies with comparison groups included 313, 232, and 179 women in the comparison groups.20, 52, 53 These sample sizes are inadequate to detect a difference as large as 5 percentage points in incidence or prevalence at 80 percent power and a 95% confidence level; a minimum sample size of more than 500 per group is required.

Representativeness. Included studies represented a wide array of developed nations, but the study subjects were not a good representation of the racial and ethnic mix of the US population (Table 4). Seven of the prospective studies were located in England; six in the United States; three in Japan; two each in Canada, Australia, and Hong Kong; and one each in the Netherlands, Norway, Portugal, and Scotland. The two retrospective studies investigated depression diagnoses documented in the Olmsted County, Minnesota, population-based databases during two different 12-month periods—the 12 months following deliveries occurring in 1993 and the 12 months following deliveries among women visiting the Olmsted County and Mayo clinics in 1997 and 1998.54, 55

None of the studies was designed to compare rates of depression among women of different racial and ethnic groups. Sixteen of the 30 studies did not even specify the racial and ethnic composition of the study subjects. Among the other 14 studies, 5 included only white non-Hispanic women;19, 32, 38, 44, 47 two studies included only Chinese women;42, 43 and two others included only Japanese women.50, 51 The remaining five studies noted a racially mixed population, but all had a predominant race or ethnicity. In four of these studies, 73 percent to 90 percent of the women were white non-Hispanic,29, 36, 37, 48 and, in the fourth, 75 percent were Hispanic.23

Depression Assessment. Our inclusion criteria required that the study use a clinical interview or assessment to validate depression diagnoses. The prospective studies differed in who received a clinical interview, the interview instrument, the diagnostic criteria used to identify a depressive episode from the interview responses, and when the interview was conducted. These differences can affect the resulting estimates of prevalence and incidence.

Eighteen of the 28 prospective studies conducted a clinical interview on all study women. The remaining 10 studies first had study subjects complete a self-report depression screening instrument, such as the EPDS, the BDI, or the General Health Questionnaire (GHQ), a broader measure designed to assess the presence of psychiatric distress related to general medical illness. These studies then administered a clinical interview to women scoring over a predetermined cutoff on the screening instrument. Seven of the 10 studies also interviewed a small sample (e.g., 10 percent) of the women scoring below the cutoff, but few of the studies used the results from these interviews to adjust the final prevalence estimates for false negatives. Most studies used low enough cutoff scores that the resulting downward bias in the estimates was minimal. The one exception was the Lucas et al. study, which used a high cutoff of 21 on the BDI and did not interview any women scoring below the cutoff or adjust the resulting prevalence rates in any way, thereby introducing a significant, uncorrected downward bias.44

Different interview instruments have been developed for identifying depression diagnoses. These different instruments use different criteria for diagnosing depression. Little is known about how these different instruments and diagnostic criteria affect the prevalence and incidence estimates.

The most frequently used instrument among our studies was the SADS. This semistructured interview is widely used in clinical research and has well-established reliability and validity.56 O'Hara et al. adapted the SADS for use with pregnant and postpartum women.19 Twelve of the 28 prospective studies used this interview instrument.

Five of the studies used the section of the SCID that covers depressive disorders.57, 58 The SCID allows the interviewer to use additional questions to inquire about idioms of distress that are specific to the local context. Lee et al. used this feature of the SCID to incorporate questions about traditional Chinese customs used during the puerperium that may affect the clinical presentation of postpartum depression.42, 43 They also modified the instrument to identify cases of minor depression.

Five other studies used the Standardized Psychiatric Interview (SPI) of Goldberg et al.59 The SPI includes 10 five-point scales that rate the severity of neurotic symptoms in the 7 days preceding the interview and a rating of 12 abnormalities observed during the interview.

Other interview instruments used include the Composite International Diagnostic Interview (CIDI-A),60 the Diagnostic Interview Schedule,61 the Mini International Neuropsychiatric Interview (MINI-V4.4),62 the Present State Examination (PSE),63 and the Montgomery and Asberg Depression Rating Scale (MADRS).64

All studies that used the SADS and three of the studies that used the SPI based depression diagnosis on the RDC.65 To be diagnosed with depression, women had to have reported that they felt sad, tearful, or blue for at least 2 weeks. The 2-week criterion serves to rule out women who were experiencing postpartum blues only. In addition, for a diagnosis of major depression, the women had to have reported at least three or four additional symptoms, such as sleeping disturbances, loss of appetite, fatigue, loss of interest in usual activities or the ability to concentrate, psychomotor retardation, and suicidal thoughts. Women with only two to four of these symptoms were classified as having minor depression. The RDC attempts to differentiate between normal physical effects of pregnancy and the puerperium and actual symptoms of depression.

Five of the prospective studies based diagnoses of depression on DSM-III-R criteria and four based diagnoses on DSM-IV criteria. A diagnosis of major depression based on the DSM-III-R criteria is comparable with the RDC for definite major depression.66 However, the RDC includes criteria for minor depression, which, as mentioned above, received its first DSM mention in the fourth edition (DSM-IV)28 as a proposed category for further study. Other criteria used for diagnoses of depression included Pitt's criteria;67 the International Classification of Diseases, Ninth Edition (ICD-9); and PSE Index of Definition (PSE ID) and Catego Class.63

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-peridepf3.jpg.

   Figure 3. Estimates of point prevalence of major and minor depression by time of assessment

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-peridepf8.jpg.

   Figure 8. Estimates of incidence of major depression by time period of assessment

Finally, because the prevalence and incidence of depression may not be constant throughout pregnancy and the first postpartum year, the timing of the clinical interview is also very important. Most of the studies we reviewed administered the clinical interview at multiple points in time throughout pregnancy and the first postpartum year, allowing for multiple estimates of prevalence and incidence. The 28 prospective studies provided 80 estimates of the prevalence and incidence of major and minor depression and 70 estimates of the prevalence and incidence of major depression alone. Clinical assessments of depression were taken at different points in time throughout pregnancy and the first postpartum year. Graphical presentations of the timing of each of the estimates by diagnosis and estimate type are shown in Figures 3 through 8 as follows:

The numbers in parentheses in these figures are the number of estimates found in the 28 studies for that point or period of time.

For the two retrospective studies, the investigators had abstracted information on symptoms and diagnoses of depression from medical records beginning at delivery and extending to 1 year postpartum. Both studies provided only estimates of 1-year period prevalence. Bryan et al.54 provided estimates of the prevalence for both major and minor depression and major depression alone, whereas Georgiopoulos et al.55 provided only the prevalence of major depression alone. Bryan et al. identified a woman as having postpartum depression if any of the following criteria were found in her medical records:54 (1) two notations at least 2 weeks apart of symptoms of depression; (2) a documented diagnosis of depression by a physician, psychologist, nurse practitioner, or midwife; (3) a new prescription for an antidepressant with no evidence that it was for chronic pain or for any indication other than depression; and (4) documentation of symptoms sufficient to meet the DSM-IV criteria of major depression. Georgiopoulos et al.55 based their prevalence estimate solely on a documented diagnosis of postpartum depression.

Table 5. Quality rating of studies of the prevalence and incidence of perinatal depression
Author, YearReporting (9)External Validity (3)Internal Validity-Bias (3)Internal Validity-Confounding (2)Precision (3)Total Score (20)
Prospective Cohort Studies without Comparison Groups
Affonso et al., 199029403018
Areias et al., 1996308021112
Berle et al., 200331502029
Campbell and Cohn, 1991326030312
Cooper et al., 1996337020312
Cox et al., 1982345131111
Garcia-Esteve et al., 2003357021313
Gotlib et al., 1989365211211
Hobfoll et al., 1995376320112
Kent et al., 1999387120212
Kitamura et al., 1993398031113
Kitamura et al., 1999404131110
Kumar and Robson, 1984417030111
Lee et al., 2001426221112
Lee et al., 200143520018
Lucas et al., 200144502029
Matthey et al., 2003456030211
Murray and Cox, 1990466030110
O'Hara et al., 1984196030110
Pop et al., 1993477130213
Watson et al., 1984487230113
Whiffen, 1988496030110
Yamashita et al., 2000506030110
Yonkers et al., 2001238022214
Yoshida et al., 1997517030111
Average6.00.62.40.41.511.1
Retrospective Studies
Bryan et al., 1999548321216
Georgiopoulos et al., 200155221128
Average5.02.51.51.02.012.0
Prospective Studies with Comparison Groups
Cooper et al., 1988526020210
Cox et al., 1993205123112
O'Hara et al., 1990537032113
Average6.00.32.31.71.311.7

Note: Numbers in parentheses are total possible points.

Quality Rating. We show the results of the quality rating of the included articles in Table 5 by study type. Studies were rated on reporting completeness, external validity, internal validity, and precision. The average overall quality rating score, out of a possible 20 points, was 11.1 for prospective studies without comparison groups and 12.0 for the retrospective studies. For prospective studies with comparison groups, which had 25 possible points, the average overall quality score was 11.7. Thus, we would rate the overall body of evidence for the prevalence and incidence of perinatal depression as fair at best.

In general, studies ranked good on reporting. The 28 prospective studies, both those with and those without comparison groups, scored an average of 6.0 out of 9 possible points for reporting. The retrospective studies scored 5.0 on average. Most studies clearly described the purpose of the study, the method of assessing depression, the characteristics of the patients in the study, and the study findings. Most studies also provided adequate information to estimate the random variability in the estimates and reported actual probability values for the statistical significance of the main outcomes. Fewer studies provided the distribution of the major principal confounders and described the characteristics of patients lost to follow-up. In particular, studies often did not discuss whether the women had prior depressive episodes or obstetrical complications and frequently did not report the women's socioeconomic status or race and ethnicity. Most studies also did not specifically exclude cases of bipolar disorder or psychosis.

Virtually all prospective studies rated poor on external validity. Prospective studies without a comparison group averaged 0.8 points out of 3 possible points; those with a comparison group averaged 0.3 points. These studies seldom supplied adequate information to determine whether study subjects were representative of the patient population of the facilities from which they were recruited and whether the recruitment facilities were representative of the facilities frequented by the general population in the geographic area. In contrast, the two retrospective studies, which were conducted using the Olmsted County Health Department and Mayo Clinic databases, included the majority of all newly delivered women in the county and therefore scored an average of 2.5 points on external validity.

We separated scores for internal validity into two sets of study design characteristics: those that may bias the prevalence estimates and those that reflect possible confounding factors, which relate to the comparability of the comparison groups and whether losses of patients to follow-up were taken into consideration. The prospective studies scored high on the first measure of internal validity; the studies without a comparison group averaged 2.4 of 3 points and the studies with a comparison group averaged 2.3 points. Virtually all prospective studies assessed the mood of study women within 2 weeks of designated times during pregnancy and postpartum and applied appropriate statistical tests for measuring incidence or prevalence. However, as noted above, 10 studies introduced potential bias by not administering the clinical interview to all study women.

The retrospective studies averaged a lower 1.5 points. Diagnoses were not validated through clinical interview for all women, and Georgiopoulos et al. did not provide adequate information to determine whether they used appropriate statistical techniques to compute the prevalence estimate.55

Studies with comparison groups could get 4 possible points for the internal validity confounding score. We awarded 2 additional points if the cases and controls were recruited from the same population and over the same period of time. Only two of the three prospective studies with comparison groups met these criteria. The comparison group in the Cooper et al. study comprised women interviewed by another researcher over a different time period in a different city. Study women were recruited from the appointments diary of the prenatal clinic and the delivery booking diary of the general practitioner unit of the John Radcliffe Hospital in Oxford; the comparison group was derived from a community sample of Edinburgh women of similar age but who were not pregnant and had not delivered in the previous 12 months.52

By contrast, in the Cox et al. study, both cases and controls resided in the North Staffordshire Health District.20 Cases were recruited from the prenatal clinic lists of the North Staffordshire Maternity Hospital; controls matching cases on marital status, number of children, and age (within 5 years) were recruited from four general practice registers. The O'Hara et al. study recruited cases from a public obstetrics and gynecology clinic and two private practices at the University of Iowa Hospitals and Clinics.53 Each subject was asked to provide the names of five acquaintances similar in age, marital status, work status, and number of children. The acquaintance most similar to the subject was selected as a control.

We also gave points for the internal validity confounding measure if the investigators made adjustments or discussed the possible direction and magnitude of any biases from confounding factors and if they took the loss of patients to follow-up into account in their prevalence or incidence estimate. A minority of studies met either of these criteria, resulting in an average score on this measure of 0.4 out of 4 possible points for prospective studies without comparison groups, 1.7 for prospective studies with comparison groups, and 1.0 for the retrospective studies.

Finally, we gave 17 studies with 30 to 250 pregnant or recently delivered women a precision score of 1, 10 studies with 250 to 1,000 women a precision score of 2, and 3 studies with more than 1,000 women a precision score of 3. None of the studies had a comparison group of at least 500 women; therefore, we awarded no additional points for precision. The average precision score was 1.5 for prospective studies without comparison groups, 1.3 for prospective studies with comparison groups, and 2.0 for the retrospective studies.

In summary, the included studies generally were rated as good on reporting and internal validity for bias, poor on external validity and internal validity for confounding, and only fair on precision.

Results from Prospective Studies

Table 6. Original estimates of prevalence and incidence of major and minor depression
Start DateEnd DateStudiesEstimate95% Confidence IntervalP-Value for Test of Homogeneity
Point Prevalence
1st trimester29,40,416.4%2.3%–16.2%0.002
2nd trimester19,36,37,41,5311.0%5.7%–20.4%0.000
3rd trimester29,36,37,40,41,46,478.7%4.9%–15.0%0.000
1 week PP405.5%1.8%–12.4%
1 month PP23,29,36,40,42,47,508.8%6.4%–11.9%0.002
2 months PP31,32,35,37,43,49,5311.3%7.7%–16.2%0.000
3 months PP41,42,47,5012.9%10.6%–15.8%0.707
4 months PP29,474.3%0.6%–25.4%0.001
5 months PP4710.6%7.3%–14.7%
6 months PP209.9%6.4%–14.5%
7 months PP41,4710.6%7.1%–15.6%0.180
8 months PP476.5%4.0%–9.9%
12 months PP416.5%2.7%–12.9%
Period Prevalence
Conception2nd trimester309.3%3.1%–20.3%
ConceptionBirth30,39,4118.4%14.3%–23.3%0.931
2nd trimester3rd trimester3610.2%7.0%–14.2%
Birth1 month PP5013.6%7.3%–22.6%
Birth2 months PP19,32,458.9%6.8%–11.7%0.135
Birth3 months PP30,50,5119.2%10.7%–31.9%0.016
Birth5 months PP3429.1%20.6%–38.9%
Birth6 months PP2013.8%9.6%–18.9%
Birth8 months PP4720.8%16.3%–25.9%
Birth12 months PP3053.7%39.6%–67.4%
Incidence
Conception1st trimester39,4111.3%7.8%–16.3%0.757
Conception2nd trimester305.8%1.2%–16.0%
ConceptionBirth30,3914.5%8.1%–24.4%0.192
1st trimester2nd trimester412.7%0.6%–7.6%
2nd trimester3rd trimester36,412.2%1.1%–4.1%0.627
2nd trimester2 months PP3712.5%7.9%–18.5%
Birth1 month PP36,42,507.8%3.6%–16.1%0.003
Birth2 months PP1910.3%5.1%–18.1%
Birth3 months PP30,41,42,50,5114.5%10.9%–19.2%0.142
Birth6 months PP2011.1%7.3%–16.0%
Birth12 months PP3049.0%34.4%–63.7%
Table 7. Original estimates of prevalence and incidence of major and minor depression
Start DateEnd DateStudiesEstimate95% Confidence IntervalP-Value for Test of Homogeneity
Point Prevalence
1st trimester29,40,412.4%0.7%–8.2%0.032
2nd trimester19,37,48,536.4%3.7%–11.0%0.029
3rd trimester29,37,40,46,473.4%1.8%–6.4%0.116
1 week PP400.0%0.0%–3.2%
1 month PP23,42,44,502.8%1.5%–5.5%0.000
2 months PP31,33,35,37,42,48,496.8%3.8%–11.9%0.000
3 months PP42,44,503.8%2.4%–6.1%0.010
4 months PP29,472.3%1.1%–4.9%0.435
5 months PP472.1%0.8%–4.4%
6 months PP20,38,44,524.2%2.1%–8.7%0.000
7 months PP473.1%1.4%–5.8%
8 months PP471.0%0.2%–3.0%
9 months PP440.0%0.0%–0.7%
12 months PP44,521.3%0.0%–56.6%0.206
Period Prevalence
ConceptionBirth3912.77.1%–20.4%
1st trimesterBirth489.4%4.9%–15.8%
Birth1 month PP505.7%1.9%–12.8%
Birth2 months PP19,32,6.5%5.2%–8.2%0.516
Birth3 months PP50,517.1%4.1%–11.7%0.626
Birth5 months PP3412.6%6.9%–20.6%
Birth6 months PP206.5%3.7%–10.4%
Birth8 months PP476.8%4.2%–10.4%
Birth12 months PP44,486.6%0.5%–51.7%0.000
Incidence
ConceptionBirth30,39,487.5%3.8%–14.2%0.116
2nd trimester2 months PP373.0%1.0%–6.8%
Birth1 month PP23,42,503.9%2.9%–5.4%0.429
Birth2 month PP488.1%4.0%–14.4%
Birth3 months PP42,50,516.5%4.2%–9.6%0.767
Birth12 months PP3030.6%18.3%–45.4%
Our original estimates of point prevalence, period prevalence, and incidence rates computed from all of the estimates in the included studies are shown by time period in Table 6 for major and minor depression together and in Table 7 for major depression alone. For time periods for which we had more than one estimate, we show the combined estimate from the meta-analysis and the P-value for the Q test of homogeneity. This statistic tests the null hypothesis that the estimates come from the same distribution—that is, whether or not the studies appear statistically to measure the same phenomenon. A P-value < 0.05 suggests that they do not.

The results of these tests indicate that considerable heterogeneity exists across the studies included in many of the pooled estimates, particularly among the point prevalence estimates. Therefore, we first discuss the results of our analysis of outliers and then discuss the results of the revised meta-analyses. We finish this section by presenting the findings from the studies with comparison groups of nonchildbearing women.

Outliers. In a review of the forest plots of the meta-analyses of the prevalence and incidence estimates, we found estimates from several studies consistently to be outliers for all time periods at which they assessed the women's mood. Two studies included only women at low risk of depression.29, 32 Affonso et al.29 included only primigravida women with a viable fetus who were married or living with the infant's father and who had no recent depression episodes. Campbell and Cohn32 included only primiparous women who delivered full-term, single infants without major complications and who were Caucasian, married, over 17 years of age, and had at least a high school education. The estimates from these studies were consistently lower than the estimates from the other studies.

Two additional studies included only women of lower socioeconomic status.23, 37 These studies generally provided higher estimates of depression prevalence and incidence than the other studies.

The Lucas et al. study included only women who screened positive for depression on the BDI.44 The cutoff used (> 21) was so high that the bias from false negatives produced consistently lower prevalence estimates compared to the other studies.

Finally, because of its size, the Cooper et al. study dominated the combined 2-month point prevalence estimate for major depression alone.33 However, the 15.3 percent estimated point prevalence from this study is outside the 95% CI of the combined estimate for major and minor depression. The purpose of the study was not to produce a prevalence estimate but rather to develop a predictive index for postpartum depression. Furthermore, many of the clinical interviews were conducted by telephone and the article did not state whether a clinician or lay person conducted the interview. Thus, the procedures for assessing depression in this study may have introduced significant bias in the prevalence estimate.

Table 8. Best estimates of prevalence and incidence of major and minor depression
Start DateEnd DateStudiesEstimate95% Confidence IntervalP-Value for Test of Homogeneity
Point Prevalence
1st trimester40,4111.0%7.6%–15.8%0.383
2nd trimester19,36,41,538.5%6.6%–10.9%0.921
3rd trimester36,40,41,46,478.5%6.5%–11.0%0.235
1 week PP405.5%1.8%–12.4%
1 month PP23,36,40,42,47,509.7%7.7%–12.3%0.060
2 months PP31,35,43,49,5310.6%8.7%–13.0%0.121
3 months PP41,42,47,5012.9%10.6%–15.8%0.707
4 months PP4710.6%7.3%–14.7%
5 months PP4710.6%7.3%–14.7%
6 months PP209.9%6.4%–14.5%
7 months PP41,4710.6%7.1%–15.6%0.180
8 months PP476.5%4.0%–9.9%
12 months PP416.5%2.7%–12.9%
Period Prevalence
Conception2nd trimester309.3%3.1%–20.3%
ConceptionBirth30,39,4118.4%14.3%–23.3%0.931
2nd trimester3rd trimester3610.2%7.0%–14.2%
Birth1 month PP5013.6%7.3%–22.6%
Birth2 months PP19,459.6%8.0%–11.4%0.362
Birth3 months PP30,50,5119.2%10.7%–31.9%0.016
Birth5 months PP3429.1%20.6%–38.9%
Birth6 months PP2013.8%9.6%–18.9%
Birth8 months PP4720.8%16.3%–25.9%
Birth12 months PP3053.7%39.6%–67.4%
Incidence
Conception1st trimester39,4111.3%7.8%–16.3%0.757
Conception2nd trimester305.8%1.2%–16.0%
ConceptionBirth30,3914.5%8.1%–24.4%0.192
1st trimester2nd trimester412.7%0.6%–7.6%
2nd trimester3rd trimester36,412.2%1.1%–4.1%0.627
Birth1 month PP36,42,507.8%3.6%–16.1%0.003
Birth2 months PP1910.3%5.1%–18.1%
Birth3 months PP30,41,42,50,5114.5%10.9%–19.2%0.142
Birth6 months PP2011.1%7.3%–16.0%
Birth12 months PP3049.0%34.4%–63.7%

NOTE: Best estimates reflect the single or combined estimate at each point or period of time remaining after estimates with obvious, identifiable biases have been dropped.

PP, postpartum.

Table 9. Best estimates of prevalence and incidence of major depression
Start DateEnd DateStudiesEstimate95% Confidence IntervalP-Value for Test of Homogeneity
Point Prevalence
1st trimester40,413.8%1.0%–12.6%0.092
2nd trimester19,48,534.9%3.1%–7.4%0.752
3rd trimester40,46,473.1%1.1%–8.1%0.038
1 week PP400.0%0.0%–3.2%
1 month PP40,42,47,503.8%2.2%–6.4%0.204
2 months PP31,35,43,48,49,535.7%3.8%–8.7%0.001
3 months PP41,42,47,50,524.7%3.6%–6.1%0.658
4 months PP472.4%1.0%–4.9%
5 months PP472.1%0.8%–4.4%
6 months PP20,525.6%2.4%–12.1%0.028
7 months PP473.1%1.4%–5.8%
8 months PP471.0%0.2%–3.0%
12 months PP523.9%2.3%–6.1%
Period Prevalence
ConceptionBirth3912.7%7.1%–20.4%
1st trimesterBirth489.4%4.9%–15.8%
Birth1 month PP505.7%1.9%–12.8%
Birth2 months PP198.1%3.6%–15.3%
Birth3 months PP50,517.1%4.1%–11.7%0.626
Birth5 months PP3412.6%6.9%–20.6%
Birth6 months PP206.5%3.7%–10.4%
Birth8 months PP476.8%4.2%–10.4%
Birth12 months PP4821.9%15.1%–30.0%
Incidence
ConceptionBirth30,39,487.5%3.8%–14.2%0.116
Birth1 month PP42,505.2%3.1%–8.9%0.819
Birth2 months PP488.1%4.0%–14.4%
Birth3 months PP42,50,516.5%4.2%–9.6%0.767
Birth12 months PP3030.6%18.3%–45.4%

NOTE: Best estimates reflect the single or combined estimate at each point or period of time remaining after estimates with obvious, identifiable biases have been dropped.

PP, postpartum.

We reran the meta-analyses excluding these six studies to produce “best estimates” of the prevalence of perinatal depression. The final best estimates are shown in Table 8 for major and minor depression together and in Table 9 for major depression alone.

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-peridepf9.jpg.

   Figure 9. Best estimates of point prevalence of major and minor depression

Point Prevalence. We show the best estimates for the point prevalence of major and minor depression graphically in Figure 9. This figure graphs the mean estimate and corresponding 95% CI for each trimester of pregnancy and month postpartum in the first year following delivery. The number of studies that we used to compute the estimate and the P-value for the Q test of homogeneity among the studies are shown above each estimate. For points in time for which no numbers are shown, we found only a single estimate.

As shown in Figure 9, prevalence in the first trimester is 11.0 percent but drops to 8.5 percent in the second and third trimesters. Following delivery, prevalence of major and minor depression begins to rise and is highest in the third month at 12.9 percent. In the fourth through seventh month postpartum, prevalence declines slightly, staying in the range of 9.9 percent to 10.6 percent, after which it declines to 6.5 percent. However, all of these estimates have broad 95% CIs, suggesting that a considerable amount of uncertainty remains in the precise values of the estimates and that the differences in the estimates over time may be attributed to chance or to uncontrolled factors. We cannot say with certainty from these data that perinatal depression is higher at any particular trimester during pregnancy or month in the first postpartum year.

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-peridepf10.jpg.

   FIgure 10. Best estimates of point prevalence of major depression

The best estimates for the point prevalence of major depression alone (Figure 10) are more variable and no more precise than those for major and minor depression together. Episodes of major depression comprise less than half of all cases of depression in the perinatal period, except during three seemingly peak times. As shown in Figure 10, the prevalence of major depression is highest in the second trimester (4.9 percent), 2 months postpartum (5.7 percent), and 6 months postpartum (5.6 percent). However, the 95% CIs for these estimates are very wide and overlap those at other times. Thus, we cannot say with certainty that major depression peaks at these points in time. Furthermore, the tests for homogeneity show that considerable heterogeneity persists among studies in the combined estimates.

Period Prevalence. The many fewer estimates of period prevalence allow us to say little about the period prevalence for major and minor depression. As shown in Tables 8 and 9, the best estimates suggest that as many as 18.4 percent of pregnant women are depressed during their pregnancy (i.e., from conception to birth), with as many as 12.7 percent having an episode of major depression. Furthermore, as many as 19.2 percent of new mothers may have major or minor depression in the first 3 months following delivery (Table 8), with as many as 7.1 percent having major depression (Table 9).

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-peridepf11.jpg.

   Figure 11. Best estimates of period prevalence of depression

However, all estimates have wide 95% CIs. Moreover, as shown in Figure 11, the best estimates of different durations are not consistent over longer periods of time. We would expect the period prevalence for major and minor depression from birth to 2 months postpartum to be higher than the period prevalence from birth to 1 month postpartum and the period prevalence for major depression from birth to 3 months postpartum to be higher than the period prevalence from birth to 2 months postpartum, but we do not see these patterns.

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-peridepf12.jpg.

   Figure 12. Best estimates of incidence of major and minor depression

Incidence. We also found few estimates of the incidence of depression—the percentage of women with depressive episodes that begin during pregnancy or the first year postpartum. The studies we found suggest that as many as 14.5 percent of pregnant women have a new episode of major or minor depression during pregnancy, and 14.5 percent have a new episode during the first 3 months postpartum (Table 8). Considering major depression alone, 7.5 percent of women may have a new episode during pregnancy and 6.5 percent during the first 3 months after delivery (Table 9). Figure 12 shows that, although the incidence estimates for major and minor depression in the first 3 months postpartum follow the expected upward trend, the incidence estimates of major depression alone do not.

Analysis of Confounders

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-peridepf13.jpg.

   Figure 13. Cumulative meta-analysis for point prevalence of depression at 2 months postpartum

The results of the cumulative meta-analysis are graphed in Figure 13. They clearly show the impact of the more precise diagnostic criteria in more recent studies. For both major and minor depression together (left panel) and major depression alone (right panel), the cumulative combined 2-month point prevalence estimate drifts downward as more recent studies are added. Thus, the more precise criteria in the more recent studies identify fewer women as depressed. However, we did not find a statistically significant effect of the year of publication in our meta-regression.

Table 10. Meta-regression results for log odds of a diagnosis of major and minor depression
Explanatory VariablesModel
1234567
Constant-2.291 (0.149)-2.159 (0.174)-2.564 (0.339)-2.189 (0.218)-2.273 (0.098)-2.276 (0.151)-2.125 (0.626)
P = 0.000P = 0.000P = 0.000P = 0.000P = 0.000P = 0.000P = 0.001
1st trimester vs.0.065 (0.322)0.068 (0.316)0.064 (0.337)0.032 (0.310)0.064 (0.238)0.056 (0.324)0.042 (0.336)
 4 to 12 months PPP = 0.840P = 0.830P = 0.850P = 0.917P = 0.788P = 0.863P = 0.901
2nd trimester vs.0.080 (0.244)0.029 (0.242)0.182 (0.274)-0.004 (0.260)0.012 (0.166)0.091 (0.246)0.065 (0.252)
 4 to 12 months PPP = 0.744P = 0.903P = 0.508P = 0.989P = 0.943P = 0.711P = 0.798
3rd trimester vs.-0.014 (0.229)-0.011 (0.224)-0.009 (0.235)-0.065 (0.226)-0.075 (0.156)-0.007 (0.230)-0.028 (0.237)
 4 to 12 months PPP = 0.953P = 0.960P = 0.971P = 0.775P = 0.630)P = 0.976P = 0.904
1 month PP vs.-0.115 (0.222)-0.033 (0.226)-0.109 (0.242)-0.029 (0.237)0.147 (0.160)-0.054 (0.240)-0.120 (0.226)
 4 to 12 months PPP = 0.606P = 0.883P = 0.652P = 0.902P = 0.357P = 0.822P = 0.594
2 months PP vs.0.336 (0.211)0.426 (0.216)0.379 (0.223)0.404 (0.226)0.377 (0.167)0.361 (0.214)0.323 (0.219)
 4 to 12 months PPP = 0.110P =0.049P = 0.089P = 0.073P =0.024P = 0.092P = 0.139
3 months PP vs.0.346 (0.255)0.400 (0.252)0.339 (0.273)0.425 (0.245)0.377 (0.175)0.354 (0.256)0.342 (0.258)
 4 to 12 months PPP = 0.175P = 0.113P = 0.214P = 0.082P =0.031P =00.167P = 0.185
Low risk-1.436 (0.271)-1.494 (0.271)-1.195 (0.389)-1.529 (0.269)-1.230 (0.195)-1.474 (0.277)-1.457 (0.278)
P = 0.000P = 0.000P = 0.002P = 0.000P = 0.000P = 0.000P = 0.000
Low SES0.753 (0.204)0.818 (0.204)0.988 (0.331)0.772 (0.192)1.083 (0.149)0.737 (0.206)0.774 (0.219)
P = 0.000P = 0.000P = 0.003P = 0.000P = 0.000P = 0.000P = 0.000
Publication year-0.018 (0.013)
P = 0.170
Other western countries vs. US0.273 (0.305)
P = 0.371
Asian countries vs. US0.285 (0.349)
P = 0.414
SCID vs. SADS-0.489 (0.202)
P = 0.015
Other interview type vs. SADS-0.098 (0.174)
P = 0.574
DSM III-R vs. RDC-0.113 (0.188)
P = 0.548
DSM IV vs. RDC-0.381 (0.190)
P = 0.045
Other diagnostic criteria vs. RDC-1.487 (0.268)
P = 0.000
Interviewed women with positive screens only vs. all-0.096 (0.140)
P = 0.490
Quality score-0.014 (0.051)
P = 0.783

Notes: Estimated coefficients are shown along with their standard errors in parentheses and the P -value for a test of statistically significant differences from zero. P-values shown in bold type are significant at the < 0.05 level.

DSM III-R, Diagnostic and Statistical Manual of Mental Disorders, Third Edition; DSM IV, Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition; PP, postpartum; RDC, Research Diagnostic Criteria; SADS, Schedule for Affective Disorders and Schizophrenia; SCID, Structured Clinical Interview for DSM-IV; SES, socioeconomic status.

Table 11. Meta-regression results for log odds of a diagnosis of major depression
Explanatory VariablesModel
1234567
Constant-3.206 (0.209)-3.299 (0.272)-3.447 (0.515)-3.010 (0.373)-3.419 (0.191)-3.454 (0.239)-1.677 (0.871)
P= 0.000P= 0.000P= 0.000P= 0.000P= 0.000P= 0.000P= 0.054
1st trimester vs.0.052 (0.516)0.033 (0.523)-0.180 (0.558)-0.086 (0.558)0.271 (0.447)0.278 (0.512)-0.085 (0.514)
 4 to 12 mos PPP = 0.920P = 0.950P = 0.747)P = 0.877P = 0.545P = 0.587P = 0.868
2nd trimester vs.0.375 (0.384)0.424 (0.399)0.471 (0.466)0.245 (0.444)0.517 (0.315)0.634 (0.392)0.339 (0.376)
 4 to 12 mos PPP = 0.3290.288P = 0.312P = 0.582P = 0.101P = 0.105P = 0.368
3rd trimester vs.-0.272 (0.410)-0.277 (0.415)-0.372 (0.426)-0.351 (0.436)-0.052 (0.354)-0.012 (0.417)-0.379 (0.406)
 4 to 12 mos PPP = 0.507P = 0.503P = 0.382P = 0.421P = 0.883P = 0.976P = 0.350
1 month PP vs.0.021 (0.356)-0.033 (0.377)-0.185 (0.409)-0.168 (0.415)-0.073 (0.290)0.060 (0.342)0.035 (0.349)
 4 to 12 mos PPP = 0.954P = 0.929P = 0.651P = 0.686P = 0.800P = 0.861P = 0.920
2 mos PP vs.0.557 (0.291)0.533 (0.300)0.538 (0.304)0.377 (0.356)0.573 (0.257)0.613 (0.279)0.465 (0.289)
 4 to 12 mos PPP = 0.056P = 0.076P = 0.077P = 0.290P= 0.026P= 0.028P = 0.107
3 mos PP vs.0.231 (0.342)0.228 (0.347)0.097 (0.364)0.139 (0.364)0.159 (0.273)0.279 (0.328)0.180 (0.336)
 4 to 12 mos PPP = 0.4990.510P = 0.791P = 0.703P = 0.561P = 0.395P = 0.592
Low risk-1.501 (0.671)-1.436 (0.687)-1.045 (0.822)-1.537 (0.695)-1.340 (0.609)-1.384 (0.658)-1.915 (0.702)
P= 0.025P= 0.036P = 0.204P= 0.027P= 0.028P= 0.036P= 0.006
Low SES0.459 (0.323)0.428 (0.333)0.759 (0.497)0.379 (0.345)0.498 (0.262)0.432 (0.308)0.636 (0.331)
P = 0.155P = 0.199P = 0.126P = 0.273P = 0.057P = 0.161P = 0.054
Publication year0.010 (0.020)
P = 0.604
Other western countries vs. US0.238 (0.470)
P = 0.612
Asian countries vs. US0.602 (0.533)
P = 0.258
SCID vs. SADS0.112 (0.332)
P = 0.736
Other interview type vs. SADS-0.201 (0.306)
P = 0.511
DSM III-R vs. RDC0.815 (0.241)
P= 0.001
DSM IV vs. RDC-0.198 (0.356)
P = 0.578
Other diagnostic criteria vs. RDC0.414 (0.218)
P = 0.058
Interviewed women with positive screens only vs. all0.441 (0.222)
P= 0.047
Quality score-0.132 (0.073)
P = 0.072

Notes: P-values shown in bold type are significant at the < 0.05 level.

DSM III-R, Diagnostic and Statistical Manual of Mental Disorders, Third Edition; DSM IV, Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition; PP, postpartum; RDC, Research Diagnostic Criteria; SADS, Schedule for Affective Disorders and Schizophrenia; SCID, Structured Clinical Interview for DSM-IV; SES, socioeconomic status.

We provide the estimated coefficients, their standard errors, and P-values from the different meta-regression models in Table 10 for major and minor depression together and in Table 11 for major depression alone. We have bolded coefficients significantly different from zero at the α = 0.05 level. The results for major and minor depression show large, positive coefficients for the 2-month postpartum and 3-month postpartum time periods compared to the 4- to 12-month postpartum period (Table 10). These findings suggest a higher prevalence of depression during these 2 months. However, the coefficients are both significant only in the equation that includes diagnostic criteria (Model 5, Table 10). The 2-month postpartum time period is also large and positive for major depression alone, but significant only in the equations including diagnostic criteria (Model 5, Table 11) and whether only women who screened positive for depression were interviewed (Model 6, Table 11). None of the coefficients for the trimesters of pregnancy is statistically significant, suggesting that the prevalence of depression during pregnancy is similar to that during the last three quarters of the first postpartum year.

The low-risk indicator has a statistically significant, negative coefficient for both sets of diagnoses, as expected (Tables 10 and 11). Low SES has a statistically significant, positive coefficient only for major and minor depression together (Table 10). The latter result suggests that the prevalence of major depression is similar among SES groups but that minor depression may be more prevalent among lower SES groups.

The meta-regression results also suggest that prevalence can vary by the clinical instrument and diagnostic criteria used to assess depression. The SCID instrument defined fewer women with major and minor depression than did the SADS interview (Table 10), but the coefficient for this variable is not significant in the equation for major depression alone (Table 11), suggesting that the difference is in the identification of women with minor depression. DSM-IV and other diagnostic criteria (e.g., Pitt, ICD-9) defined fewer women as depressed than did the RDC in the equation for major and minor depression (Table 10), and DSM-III-R and other criteria defined significantly more women as suffering from major depression than did the RDC (Table 11).

Finally, studies with higher quality rating scores have lower log odds, suggesting lower prevalence of depression, but the coefficient of this variable is only marginally significant (P = 0.072) in the equation for major depression alone (Model 7, Table 11) and is not significant in the equation for major and minor depression together (Model 7, Table 10). No statistically significant results were found for study country or whether the study interviewed only women who screened positive for depression, although the signs of the coefficients for these variables are as predicted.

Comparison with Other Women

Table 12. Odds ratios for studies with comparison groups of women during nonchildbearing periods
Diagnosis Estimate Type Author, YearTime PeriodOdds Ratio95% Confidence Interval
Major and Minor Depression
Point
O'Hara et al., 1990532nd trimester1.410.61–3.26
O'Hara et al., 1990539 weeks PP1.370.67–2.83
Cox et al., 1993206 months PP1.000.54–1.84
Period
Cox et al., 199320Birth to 6 months PP1.040.61–1.76
Incidence
Cox et al., 199320Birth to 5 weeks PP3.26*1.17–9.06
Cox et al., 199320Birth to 6 months PP1.480.77–2.82
Major Depression
Point
O'Hara et al., 1990532nd trimester1.280.47–3.51
O'Hara et al., 1990539 weeks PP1.330.45–3.90
Cooper et al., 1988523 months PP0.850.33–2.17
Cox et al., 1993206 months PP1.000.37–2.71
Cooper et al., 1996336 months PP1.530.65–3.58
Cooper et al., 19963312 months PP0.500.17–1.46
Period
Cox et al., 1993206 months PP1.160.54–2.51
*

Statistically significant at P < 0.05.

PP, postpartum.

The three prospective studies with comparison groups of women of similar age in nonchildbearing periods had adequate data to compute 13 estimates of the relative prevalence and incidence of depression. The estimated odds ratios and corresponding 95% CIs are shown in Table 12.

None of the odds ratios for prevalence, which covered different time periods in the first postpartum year, indicated a statistically significant difference. In addition, the National Comorbidity Survey fielded in 1990-1992 found a 5.9 percent current 30-day prevalence of major depressive episodes among women ages 15 to 54 years using the CIDI instrument and DSM-III-R criteria.68 This finding is approximately equivalent to our best 1-month postpartum period prevalence of 5.7 percent and to the point prevalence at 2 months and 6 months postpartum (5.7 percent and 5.6 percent, respectively) shown in Table 9. Thus, the evidence indicates no difference in the prevalence of postpartum depression among pregnant or newly delivered women and women at other times in their childbearing years.

The single estimate of the incidence of major and minor depression (Table 12) shows a significant 3-fold difference in the odds of having a new episode of major or minor depression among women in their first 5 weeks postpartum compared to women who were not pregnant and had not recently given birth.20 However, by 6 months postpartum, the difference in the incidence had narrowed and was no longer significant (Table 12).

Results from Retrospective Studies

The prevalence estimates from the retrospective studies measure something different than the prospective studies. In the prospective studies, all study women recruited from prenatal clinics or maternity wards were screened and interviewed for depression. Thus, all (or nearly all) women with depression in the populations so defined are identified. In the retrospective studies, only those women with depression detected through the course of medical contacts during the year were identified.

The two retrospective studies that met our inclusion criteria provided estimates of the period prevalence of major depression in the first postpartum year. The first study conducted in 1993 found that 1.2 percent of postpartum women in Olmsted County, Minnesota, had a major depressive episode during their first postpartum year and that 2.5 percent had a major or minor episode.54 These rates are significantly below the 3-month period prevalence of 7.1 percent for major depression alone and the 19.2 percent for major and minor depression reported in Tables 8 and 9.

In 1997-1998, universal screening for depression with the EPDS at the 6-week postpartum visit was implemented in Olmsted County. As a result, the prevalence of a diagnosis of major depression among postpartum women rose to 10.7 percent, suggesting that the screening score posted in medical charts led clinicians to become more aware of their patients' mental state.55

Discussion

We found 30 studies providing estimates of the prevalence of perinatal depression but only 13 providing estimates of the incidence of the disorder. The studies were generally of moderate size—too small for reliable subgroup analyses. Furthermore, the study populations were typically restricted to a local community or geographic region served by one provider or a small number of providers of obstetrical services and were not representative of the racial and ethnic mix of the countries in which the studies were conducted. Other confounders included the risk status of women at study entry, their socioeconomic status, the interview methods, and the diagnostic criteria used to identify cases.

Combining point prevalence estimates of depression assessed at the same point in time and distinguishing whether they included minor depression, we found that the best estimates of the point prevalence of major and minor depression ranged from 8.5 percent to 11.0 percent at different times during pregnancy and from 6.5 percent to 12.9 percent at different times during the first year postpartum. Including only major depression, the best point prevalence estimates ranged from 3.1 percent to 4.9 percent at different times during pregnancy and from 1.0 percent to 5.9 percent at different times during the first postpartum year.

Period prevalence estimates show that as many as 19.2 percent of women have a depressive episode during the first 3 months postpartum, with as many as 7.1 percent having a major depressive episode during this time. Most of these episodes began following delivery. Incidence estimates show that, during the same 3-month period, 14.5 percent of women had a new depressive episode with as many as 6.5 percent having a major depressive episode. However, all of these estimates have wide 95% CIs, indicating that the amount of uncertainty in their precise values is considerable.

Our best estimates of prevalence and incidence were somewhat lower than those found in prior systematic reviews because we excluded studies that assessed depression based on self-report screens alone, which tend to overestimate prevalence. In addition, we separate out estimates of major and minor depression from estimates of major depression alone and estimates of point prevalence from estimates of period prevalence. Finally, we include more recent studies that use more precise criteria to identify major depression.

We found that the available evidence does not support the hypothesis that the prevalence of depression is higher during pregnancy or in the first year postpartum compared to nonchildbearing times. A single study suggested that the incidence of new depressive episodes (major and minor) is greater in the first 5 weeks postpartum than at other times.20

Nevertheless, pregnancy and the early postpartum period provide opportunities to screen for depression through regular prenatal and postpartum physician contacts. Because the poor outcomes of suffering from depression during the perinatal period can be farther reaching—affecting not only the woman but her newborn child and other family members—it behooves us to investigate the efficacy of screening and treatment programs for these women.

Chapter 4. Screening Accuracy

Introduction

Screening for perinatal depression is an important first step in identifying women who are at risk of having perinatal depression. It is only an initial step—after a positive screen, a depressive illness must be confirmed by a follow-up diagnostic examination and determination by a clinician.

To be useful screening tools, instruments must be able to identify accurately and reliably the illness in the population of interest; they also need to rule out, accurately, persons in the population who do not have the illness. Assessment of a screening test's accuracy depends on knowing whether a disease is truly present, i.e., comparison to a reference standard. This section addresses the second Key Question (KQ) from the Safe Motherhood Group (SMG) and the Agency for Healthcare Research and Quality (AHRQ): “What is the accuracy of different screening tools for detecting depression during pregnancy and during the postpartum period?”

The two most commonly used measures of accuracy are sensitivity and specificity. Sensitivity refers to the proportion of patients with a disease who test positive (“true positives”) using a screening tool. A sensitive test is one that is usually positive in the presence of disease. In general, a highly sensitive test should be selected when the consequence of missing a disease would be a clearly bad outcome. Screens with high sensitivity are most useful to clinicians when the result is negative; negative results can help rule out a disease.

Specificity refers to the proportion of patients without a disease who test negative (“true negatives”) using the screening tool. A specific test is one that is usually negative in the absence of disease. A highly specific test, then, should be selected when false-positive results can substantially harm the patient in some way. Screens with high specificity are most useful to the clinician when the result is positive; the positive result can rule in the disease.

Screening tools have varying sensitivities and specificities as a function of which cutoff point, or threshold, clinicians and others use. The optimal cutoff depends on prevalence of disease (as explored in Chapter 3), benefits and harm of therapy, and risks and costs of administering the screening test.

Methods

Chapter 2 provides the detailed methods we used to search and review the literature on screening instrument accuracy. In this discussion, we elaborate on some of these methods.

Inclusion and Exclusion Criteria

Studies to be retained had to report directly or to provide data allowing us to calculate our primary outcomes of interests—sensitivity and specificity. We required that the screening instrument be compared to a reference standard for a diagnosis of depression. Reference standards could be one of two types. The first includes a clinical assessment by a mental health professional based on criteria from the Diagnostic and Statistical Manual of Mental Disorders (DSM), the Research Diagnostic Criteria (RDC), the Bedford College Checklist,69 or the International Classification of Diseases (ICD). The second involves a research-based diagnosis obtained by structured or semistructured clinical interview, such as the Structured Clinical Interview for Depression (SCID), the Diagnostic Interview Schedule (DIS), the Schedule for Affective Disorders and Schizophrenia (SADS), or Goldberg's Standardized Psychiatric Interview (SPI); each of these confirms a diagnosis based on one of the above systems of criteria.

Depressive illness can be either a major depressive disorder or a minor depression. The latter is understood to be an impairing, episodic depression with clear symptoms exceeding a normal state but without severity reaching the diagnostic criteria for major depressive disorder. For this chapter, we are concerned with the ability of screening tools to detect either major depression or minor depression in a given individual (because an individual can have only one or the other of these diagnoses), so the terminology intentionally differs from that used in Chapter 3.

We excluded studies that included patients with a known current depressive illness (for whom a screen would not provide new information). Furthermore, we excluded studies on women with bipolar disorder or a primary psychotic disorder and studies in which women with diagnosed depression could not be distinguished from women with maternity blues, a transient, subthreshold cluster of depressive symptoms commonly described in up to 50 percent of postpartum women.

Data Analysis

Our main outcomes of interest were sensitivity and specificity of the screening approaches or instruments as described in the selected articles. When calculating outcomes ourselves or doing other analyses, we used Stata, version 8. For each reported instrument and associated cutoff, we calculated sensitivity and specificity from the published data. We constructed 95% confidence intervals (CIs) using exact methods. For instruments with three or more outcome values reported, we created plots of the sensitivity or specificity with associated 95% CIs to provide a graphic description of the degree of consistency of results. In addition, where possible we estimated pooled sensitivity and specificity values using meta-analytic methods for fixed effects. We evaluated heterogeneity using the Q statistic test for homogeneity. In several circumstances, pooled estimates were impossible to calculate because of perfect estimates of sensitivity (i.e., 100 percent) with associated variance estimates equal to 0.

Evaluation of Quality and Strength of Evidence

We developed a quality rating form for these articles on screening accuracy from criteria identified by the Cochrane Methods Working Group on Systematic Review of Screening and Diagnostic Tests.25 The quality rating forms, provided in Appendix B, rated reporting, external validity, and internal validity. The senior abstractor completed the quality rating form for each article; another project team member reviewed a sample of the completed forms for accuracy and completeness.

We rated retained studies on three separate categories of quality then summed the individual category scores for a total score. The domains and maximum points possible for each domain are as follows:

  • Reporting (domain score of 10): Nine items covering study aims, description of depression assessment, potential confounders described, and instrument procedures described, each scored yes or no (1 or 0), except for an item concerning principal confounders that was scored yes, partially, or no (2, 1, or 0). We considered 0 to 3 as poor, 4 to 7 as fair, and 8 to 10 as good.

  • External validity (domain score of 3): Three items relating to representativeness of populations from which people were recruited and of settings and clinicians that treat such patients, each scored yes or no (1 or 0). We considered 0 or 1 as poor, 2 as fair, and 3 as good.

  • Internal validity (domain score of 8): Six items relating to both bias and confounding in the use of the screen and reference standard, each scored yes or no (1 or 0), except for an item assessing whether all screens were done independently on each person, all tests done on each person but not independently, or different tests done on different persons and not randomly allocated (2, 1, or 0, respectively). We considered 0 to 2 as poor, 3 to 5 as fair, and 6 to 8 as good.

The maximum total quality score was 21. We considered 0 to 7 as poor, 8 to 14 as fair, and 15 to 21 as good.

Results

Study Characteristics

Our literature review of screening tools for detecting depression during pregnancy and the postpartum period identified no relevant systematic reviews. We did find 23 studies meeting our inclusion criteria. Of these, 10 were studies involving screening instruments in English;32, 46, 70–77 13 involved non-English screening instruments.31, 35, 42, 43, 50, 51, 78–84

Table 13. Major characteristics of studies of screening for perinatal depression
Author, YearPlace/Sample SizeDepression Type and PrevalenceScreening Method(s) and Cutoffs UsedTiming of ScreeningsCriterion Standard
Prenatal Period
Murray and Cox, 199046UK 100Major depression: 6% major or minor depression: 14%EPDS: cutoffs vary from ≥ 11 to ≥ 1528 to 34 weeks GASPI to obtain RDC diagnosis
Postpartum Period
Ballard et al., 199470UK 200Major depression alone: 12%EPDS: cutoff 136 months PPPAS to obtain RDC diagnosis
Beck and Gable, 200171US 150Major depression alone: 12%PDSS ≥ 81Between 2nd and 12th week PPSCID-DSM-IV for DSM-IV diagnosis
Major or minor depression: 19%EPDS ≥ 13
BDI-II ≥ 21
Boyce et al., 199372Australia 103Major depression alone: 9%EPDS ≥ 13≤6 months PPDIS to obtain DSM-III-R diagnosis
GHQ: NR
Pitt Scales: NR
Campbell and Cohn, 199132US 1,007Major or minor depression: 9%CES-D6 to 8 weeks after deliveryModified SADS to obtain RDC diagnosis
Cox et al., 199673UK 128Major depression alone: 6%EPDS ≥ 13 (primarily) but also ≥ 10, ≥ 11, ≥ 12, ≥ 14, ≥ 15Not reported in relationship to time of birthSPI to obtain RDC diagnosis
Major or minor depression: 16%
Harris et al., 198974Wales 147Major depression alone: 15%BDI: ≥ 116 to 8 weeks PPClinical examination for DSM-III criterion
EPDS: ≥ 13
Leverton and Elliott, 200075England 199Major or minor depression:EPDS ≥ 133 months PPPSE with 2 standards used: Bedford College and Catego diagnosis
Catego: 5%; Bedford: 8%
Murray and Carothers, 199076England 646Not provided, but data suggest major depression alone: 6%EPDS ≥ 136 weeks PPSPI to obtain RDC diagnosis
Major or minor depression: 15%
Whiffen, 198877Canada 120Major or minor depression: 18%BDI ≥ 106 to 8 weeks PPSADS to obtain RDC diagnosis

BDI, Beck Depression Inventory; CES-D, Center for Epidemiological Studies-Depression scale; DIS, Diagnostic Inventory Schedule; DSM-III-R, Diagnostic and Statistical Manual of Mental Disorders, third edition, revised; DSM-IV, Diagnostic and Statistical Manual of Mental Disorders, fourth edition; EPDS, Edinburgh Postnatal Depression Scale; GA, gestational age; GHQ, General Health Questionnaire; PAS, Psychiatric Assessment Schedule; PDSS, Postpartum Depression Screening Scale; PP, postpartum; PSE, Present State Examination; RDC, Research Diagnostic Criteria; SADS, Schedule for Affective Disorders and Schizophrenia; SCID, Structured Clinical Interview for DSM-III-R; SPI, Standardized Psychiatric Interview.

The major characteristics of these studies are summarized in Table 13 and detailed in Evidence Table 3 (Appendix C). The studies represent a wide variety of countries. Of the 10 studies using an English-language screening instrument, two were conducted in US populations,32, 71 six were performed in the United Kingdom,46, 70, 73–76 and one each was conducted in Canada77 and Australia.72 Of the 13 studies using a non-English screening instrument, four were conducted in Chinese,42, 43, 81, 82 three were in Japanese,50, 51, 80 and one each was in German (Austria),83 Swedish,84 French,78 Spanish (Spain),35 Norwegian,31 and English/Africaans.79 We will focus on the screening instruments used in the English language, given their greater relevance to our population of interest.

Unfortunately, the racial and ethnic mix of the study populations for the studies using English language screening instruments was poorly representative of the US population (our target of interest). Of the 10 studies, only the two studies conducted in the United States reported race and ethnicity.32, 71 These populations were overwhelmingly Caucasian; in by far the largest study,32 100 percent of the 1,007 women enrolled were white, and, in the other, 87 percent of the women were white.71

When reported, the mean age of women in these studies ranged from approximately 24 to 31 years. Of these 10 studies, only one was conducted during pregnancy.46 The remaining nine studies were conducted postpartum between 2 weeks and 6 months after delivery, with most occurring between weeks 8 and 12. Individual study sizes ranged from 103 to 1,007, with an aggregate sample size of 2,800.

Studies might use one or more screening tools; the selected articles evaluated four different screening tools.

Screening Instruments Used

Table 14. Key features of screening instruments for perinatal depression
Screening ToolMethod of AdministrationNumber of ItemsScore RangesTime to CompleteTime Frame Covered
EPDSSelf-administered10-item*0–30< 5 minutesIn the past 7 days
13-item0–39
BDIInterviewer- or self-administered21-item0–635–10 minutesLast week including today
BDI-IIInterviewer- or self-administered21-item0–635–10 minutesDuring the past 2 weeks
PDSSSelf-administered35-item35–1755–10 minutesOver the past 2 weeks
CES-DSelf-administered20-item0–801–2 minutesPast 7 days
*

The 10-item EPDS is more commonly administered than the 13-item version.

BDI and BDI-II were originally designed to be administered by an interviewer but are most often self-reported. rating

The key features of the four different types of screening instruments used are summarized in Table 14. Eight studies assessed the Edinburgh Postnatal Depression Scale (EPDS); seven of these used the 10-item version,46, 71–76 and one used a 13-item version.70 The EPDS had been developed specifically for assessing postpartum depression and relies much less than standard depression screens on somatic, or physical, questions. In its most common form, it is a 10-item self-report screening scale for postpartum depression that is specifically aimed at exploring mood symptoms in the postpartum period.85 Questions on the EPDS scale are framed within the “past seven days” and the response format is frequency-based. Each item is scored on a 4-point scale (0 to 3); the minimum and maximum scores are 0 and 30, respectively. It takes less than 5 minutes to administer. The responses to the 10 items are summed to obtain a score.

Three studies assessed the Beck Depression Inventory (BDI).71, 74, 77 The BDI is a list of 21 symptoms and attitudes that are each rated on intensity.86 Versions include the BDI, which uses “last week, including today” as the time frame for symptoms;86 the BDI-II, which uses 2 weeks as the time frame for symptoms;87 and the BDI-PC, which also has a 2-week time frame.88 The versions used most often (BDI or BDI-II) are scored by summing the ratings that respondents give to the 21 items. Although originally designed to be administered by trained interviewers, it is most often self-administered and takes 5 to 10 minutes to complete. This instrument has been used to measure severity of depression in depressed samples and also to assess depression in general population samples. Because of its reliance on somatic symptoms, some experts worry that it may produce higher scores and more false-positive results in pregnant women than in other respondents.

One study used the Postpartum Depression Screening Scale (PDSS).71 The PDSS is a 35-item Likert-type self-report instrument created specifically for new mothers that can be administered in 5 to 10 minutes. Written at a third-grade reading level, PDSS items are brief and easy to understand. Mothers respond using a 5-point scale ranging from “strongly disagree” to “strongly agree.” The test yields an overall severity score falling into one of three ranges: normal adjustment, significant symptoms of postpartum depression, and positive screen for major postpartum depression. The PDSS also provides scores for seven symptom areas: Sleeping/Eating Disturbances, Anxiety/Insecurity, Emotional Lability, Mental Confusion, Loss of Self, Guilt/Shame, and Suicidal Thoughts.

Another study used the Center for Epidemiological Studies Depression Scale (CES-D).32 The CES-D was designed to measure current level of depressive symptomatology and especially depressive affect.89 The 20 items were chosen from five previously used depression scales to represent all major components of depressive symptomatology, and it was designed to apply to a general population. Each item is rated on 4-point scales indicating the degree of its occurrence during the past week. The scales range from “rarely or none of the time” to “most all of the time.” The scale can distinguish between clinical groups and general community groups. It takes approximately 5 to 10 minutes to complete; scoring takes about 1 to 2 minutes. Although it is usually scored continuously, various cutoff scores for clinical depression have reasonable associations with a clinical diagnosis. A cutoff score of 16 or higher has been suggested as a positive screen for depression.89

Reference Standards Used

Investigators used a variety of strategies to confirm the diagnosis of depression. Six studies used the RDC65 for depressive illness as the reference standard but employed different instruments to identify patients meeting this standard. Three studies used the Standardized Psychiatric Interview,46, 73, 76 two studies used a version of the Schedule for Affective Disorders and Schizophrenia,32, 77 and one study used the Psychiatric Assessment Schedule (an adaptation of the Present State Examination).70

Other reference standards were also employed. Beck and Gable used the Structured Clinical Interview for DSM IV to confirm the diagnosis of depressive illness per DSM IV criteria;71 Boyce et al. used the Diagnostic Interview Schedule, based on DSM III-R criteria, as the reference standard to confirm depressive illness;72 Harris et al. used a clinical assessment of whether a patient's presentation met DSM III criteria for depressive illness;74 and Leverton and Elliott used the Present State Examination to identify whether patients met depressive illness criteria by either the Bedford College Criteria or the Catego criteria (based on ICD-8 criteria).75

Classifications of Depressive Illness

Investigators classified depressive illness into one of two categories that reflected how perinatal depression is described in the scientific literature: major depression alone or major or minor depression. Patients identified as major depression alone met criteria for an episode of severe depressive illness according to the standardized criteria. In this report, we refer to major depressive episodes as major depression. For major depressive disorders, clearly effective interventions have been identified in clinical trials. Seven studies provided this classification.46, 70–74, 76

The point prevalence for major depression alone was 6 percent in the single prenatal study,46 somewhat higher than the 3.1 percent “best estimate” that we discussed in Chapter 3. For the postpartum studies, the point prevalence for the six studies reporting on major depression alone ranged from 6 percent to 15.5 percent;70–74, 76 this frequency is somewhat higher than the postpartum results from KQ 1 showing a best estimate prevalence between 1 and 3 months postpartum of 3.8 percent and 4.7 percent, respectively.

The major or minor depression category of depressive illness requires that patients meet diagnostic criteria for either a major depressive episode or a minor depressive episode. Minor depression is an impairing yet less severe constellation of depressive symptoms13 for which controlled trials have not consistently indicated that particular interventions are more effective than placebo.14, 15 In this report, we refer to this grouping as major or minor depression, or by the more general terms of “depression” or “depressive illness.” Seven studies classified depression in this way.32, 46, 71, 73, 75–77

In the single prenatal screening study, the point prevalence of major or minor depression in the third trimester (14 percent) was greater than our best estimate from KQ 1 for this time period (8.5 percent).73 For the postpartum studies, prevalence rates ranged from 5 percent to 19 percent; these figures are somewhat higher than our best estimate range for point prevalence of 9.7 percent to 12.9 percent in the first 3 months postpartum. Given that this distinction substantially affects screening accuracy at a particular cutoff, we sort the results below by these two case definitions.

Quality Rating

Table 15. Quality rating of studies of screening for perinatal depression
Author, YearReporting (10)External Validity (3)Internal Validity (8)Total Score (21)
Studies with Screener in English
Prenatal Period
Murray and Cox, 19904653816
Postpartum Period
Ballard et al., 19947091818
Beck and Gable, 20017161815
Boyce et al., 19937283516
Campbell and Cox, 19913283819
Cox et al., 19967350813
Harris et al., 19897452613
Leverton and Elliott, 20007550712
Murray and Carothers, 19907643815
Whiffen, 19887760410
Average6.11.67.014.7

Note: Maximum possible score is shown in parentheses.

Table 15 documents the details of our grading of individual studies. For reporting completeness, we rated studies as fair; they averaged 6.1 of a possible 10 points. Three studies scored in the good range (8 or above).32, 70, 72 For external validity, studies ranged from poor to good, averaging a poor-to-fair rating of 1.6 overall (of a possible 3 points), suggesting that at best they were a fair representation of each individual study's target population. Given that only two studies (Campbell and Cohn with an external validity score of 332 and Beck and Gable with an external validity score of 171) were conducted on US populations, the generalizability of these results to our target population appears limited. For internal validity, studies scored better, ranging from 4 to 8, with an overall average in the good range (7.0). Total scores for the three categories ranged from fair to good, with an overall average of 14.7 of a possible 21 points; of these 10 studies, six scored in the good overall quality range (15 or higher).

Prenatal Screening Results

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-peridepf14a.jpg.

   Figure 14a. Sensitivity of screening by Edinburgh Postnatal Depression Scale: prenatal period, major depression alone

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-peridepf14b.jpg.

   Figure 14b. Specificity of Edinburgh Postnatal Depression Scale: prenatal period, major depression alone

One English study of 100 subjects used the10-item EPDS to screen women in their third trimester of pregnancy (Table 13).46 For major depressive disorder alone (n = 6 depressed patients), sensitivity and specificity point estimates at all cutoff points (12, 13, 14, 15) were quite good, although the sensitivity estimates were imprecise (as demonstrated by the wide CIs). At all cutoff points used, sensitivity was 100 percent, and each cutoff had a wide CI from 0.54 to 1.0 (Figure 14a). Specificity varied among the different cutoff points, with means varying from 0.79 (at a cutoff of 12) up to 0.96 (at a cutoff of 15), and all CIs were more precise, reflecting the larger number of subjects (n = 94) without major depression alone (Figure 14b). At the traditional postpartum cutoff of ≥ 13, sensitivity was 100 percent and specificity was 87 percent.

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-peridepf15a.jpg.

   Figure 15a. Sensitivity of screening by Edinburgh Postnatal Depression Scale: prenatal period, major or minor depression

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-peridepf15b.jpg.

   Figure 15b. Specificity of screening by Edinburgh Postnatal Depression Scale: prenatal period, major or minor depression

As a screening instrument for major or minor depression (n = 14 depressed patients), overall test performance was worse. Sensitivity was much lower, ranging from 0.71 (at a cutoff of 11 or greater) to 0.57 (at a cutoff of 14 or greater), and CIs remained wide (Figure 15a). Specificity remained relatively good, varying from 0.72 (cutoff of 11 or greater) to 0.95 (cutoff of 14 or greater), with reasonable precision (Figure 15b). At the same ≥ 13 cutoff, sensitivity was 64 percent and specificity was 90 percent.

In summary, one prenatal screening study is of good quality. However, the inclusion of only six women with major depression substantially limits conclusions about the accuracy of prenatal depression screens. Indeed, the sensitivity results at 100 percent for each cutoff dramatically underscore the small number of depressed patients involved.

Results for major or minor depression from this one study are similarly limited. Only 14 depression cases are involved. Sensitivity and specificity estimates appeared to be lower than those for major depression alone. In particular, sensitivity estimates appeared worse than those for major depression alone, but again CIs are wide.

Postpartum Screening Results

Table 16. Sensitivity and specificity of perinatal depression screens
Author, YearCutoff (≥)Point Estimate for Sensitivity 95% CIPoint Estimate for Specificity 95% CI
Prenatal period
EPDS, Major depression
Murray and Cox, 199046151.00.96
0.54–1.0 0.89–0.99
141.00.94
0.54–1.0 0.87–0.98
131.00.87
0.54–1.0 0.79–0.93
121.00.79
0.54–1.00.69–0.86
EPDS, Major or minor depression
Murray and Cox, 199046140.570.95
0.29–0.82 0.89–0.99
130.640.90
0.35–0.87 0.81–0.95
120.640.80
0.35–0.87 0.70–0.88
110.710.72
0.42–0.920.61–0.81
Postpartum Period
EPDS, Major depression
Ballard et al., 1994 (13-item version)70130.960.70
0.78–1.00.51–0.85
Harris et al., 198974130.950.93
0.77–1.0 0.87–0.97
101.00.82
0.85–1.00.73–0.89
Beck and Gable, 200171130.780.99
0.52–0.940.96–1.0
Boyce et al., 199372131.00.96
0.67–1.0 0.89–0.99
101.00.89
0.66–1.00.81–0.95
Cox et al., 199673130.750.84
0.35–0.97 0.76–0.90
120.880.76
0.47–1.0 0.67–0.83
100.880.71
0.47–1.0062–0.79
EPDS, Major or minor depression
Cox et al., 199673130.620.89
0.38–0.820.81–0.94
Cox et al., 199673120.760.81
0.53–0.920.73–0.88
Cox et al., 199673100.810.77
0.58–0.950.67–0.84
Beck and Gable, 200171100.590.86
0.43–0.730.78–0.92
Leverton and Elliott, 200075 (Bedford Criteria)130.440.92
0.38–0.820.87–0.95
Leverton and Elliott, 200075100.690.85
0.41–.890.79–.0.90
BDI, Major depression
Beck and Gable, 200171 (BDI-II)210.561.0
0.31–0.780.97–1.0
Harris et al., 198974 (BDI)210.320.99
0.13–0.57 0.95–1.0
130.630.92
0.38–0.84 0.85–0.96
110.680.88
0.43–0.870.82–0.94
BDI, Major or minor depression
Beck and Gable, 200171 (BDI-II)150.570.97
0.41–0.710.92–1.0
Whiffen, 198877 (BDI)100.480.86
0.26–0.700.78–0.92
PDSS, Major depression
Beck and Gable, 200171810.940.98
0.73–1.00.94–1.0
PDSS, Major or minor depression
Beck and Gable, 200171610.910.72
0.79–0.980.62–0.80
CES-D, Major or minor depression
Campbell and Cohn, 199132160.600.92
0.50–0.70 0.90–0.93
210.430.97
0.33–0.540.95–0.98

BDI, Beck Depression Inventory; CES-D, Center for Epidemiological Studies - Depression Scale; CI, confidence interval; EPDS, Edinburgh Postnatal Depression Scale; PDSS, Postpartum Depression Screening Scale.

Nine studies provided sensitivity and specificity estimates in the postpartum period (Table 16). These studies used one of four screening instruments—EPDS, BDI, PDSS, and CES-D—at a variety of cutoff points. We review the results separately for each scale for major depression alone and for major or minor depression.

Edinburgh Postnatal Depression Scale. The EPDS was the most common tool reported, involving 1,573 patients from eight studies;46, 70–76 80 patients had major depression alone, and 83 patients had major or minor depression. Murray and Carothers reported test characteristics for both major depression alone and major and minor depression together (not in Table 16), but they did not give information allowing us to calculate CIs for the results;76 we address their work separately.

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-peridepf16a.jpg.

   Figure 16a. Sensitivity of screening by Edinburgh Postnatal Depression Scale: postpartum period, major depression alone

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-peridepf16b.jpg.

   Figure 16b. Specificity of screening by Edinburgh Postnatal Depression Scale: postpartum period, major depression alone

Major Depression Alone. Figures 16a and 16b present the sensitivity and specificity estimates for the five studies reporting on major depression alone (a total of 927 patients).70–74 We show data according to the version and cutoff point used. The sensitivity graphs show the number of depressed patients; the specificity graphs use the number of nondepressed patients.

For the Ballard et al. study employing the 13-item version (n = 23 depressed women),70 we used only the cutoff of ≥ 13. Mean sensitivity was 0.96 and mean specificity was 0.70, with relatively wide CIs for both point estimates.

Results for the remaining four major depression alone studies are listed below the solid line in Figures 16a and 16b.71–74 All used a cutoff point of 13. Sensitivities in these studies range from 0.75 to 1.0, with very wide CIs.

Specificities ranged from 0.84 to 0.99 and appeared to be more precise than sensitivities, as indicated by the much narrower CIs. Of note, results at this threshold from these individual studies of the 10-item screen indicated that sensitivities were similar to the value reported in the one 13-item screen study, but specificities were higher with the 10-item version.

We attempted to conduct a meta-analysis of the sensitivity results from the four studies using the cutoff point of 13 or greater. The Boyce et al. study72 reported a sensitivity point estimate of 1.0, thus we were unable to generate a meaningful standard error; consequently, we could not include this result in the sensitivity meta-analysis. Leaving this study out, our meta-analysis produced a sensitivity point estimate of 0.91 (95% CI, 0.84 to 0.99); the test for heterogeneity was not significant (P= 0.141). We were able to include all four studies in our meta-analysis of specificity, but heterogeneity was significant (P < 0.001), precluding a pooled specificity estimate.

One study assessed a cutoff point of ≥ 12.73 It reported a sensitivity of 0.88 (with a wide CI) and a specificity of 0.76 (with a narrow CI).

Three studies reported a cutoff of ≥ 10, all producing estimates with imprecise sensitivities yet relatively precise specificities.72–74 Point estimates for sensitivity ranged from 0.8873 to 1.0.72, 74 Because two studies reported a perfect sensitivity of 1.0, we could not determine a pooled sensitivity estimate. Specificity ranged from 0.71 to 0.89, but heterogeneity was significant (P = 0.002), precluding a pooled estimate.

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-peridepf17a.jpg.

   Figure 17a. Sensitivity of screening by Edinburgh Postnatal Depression Scale: postpartum period, major or minor depression

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-peridepf17b.jpg.

   Figure 17b. Specificity of screening by Edinburgh Postnatal Depression Scale: postpartum period, major or minor depression

Major or Minor Depression. For major or minor depression (1,343 patients), four studies71, 73, 75, 76 reported test characteristics for the 10-item EPDS (Table 16). All but Murray and Carothers76 allowed a calculation of confidence intervals and are presented in Figures 17a and 17b.

Two studies report a cutoff score of ≥ 13.73, 75 Sensitivities were low (0.6273 and 0.4475) and imprecise (wide CIs). Specificities were high (0.89 and 0.92, respectively) and quite precise. A meta-analysis at this cutoff produced a pooled sensitivity estimate of 0.54 (95% CI, 0.39 to 0.70) without significant heterogeneity (P = 0.266) and a pooled specificity estimate of 0.91 (95% CI, 0.88 to 0.94) without significant heterogeneity (P = 0.410).

One study reported a cutoff score of 12 or greater.73 Relative to a threshold of 13 or more, this score appeared to improve sensitivity and decrease specificity, with the precision remaining unchanged.

Three studies reported results with a cutoff score of ≥ 10.71, 73, 75 Reported sensitivities ranged from 0.59 to 0.81, and specificities ranged from 0.77 to 0.88. Again, sensitivity estimates were quite imprecise, whereas specificity estimates were quite precise. A meta-analysis of these results produced a pooled sensitivity estimate of 0.68 (95% CI, 0.58 to 0.78) without significant heterogeneity (P = 0.140). Specificities could not be pooled because of significant heterogeneity (P = 0.068).

Murray and Carothers reported sensitivities and specificities at various cutoff points as estimated by logistic regression analyses on results from 646 subjects;76 they addressed both major depression alone and major or minor depression. Because we could not calculate CIs from their reported results, we do not show them in Table 16; their reported test characteristics are listed in Evidence Table 4 (Appendix C). For major depression alone, their sensitivity and specificity results mirrored the other studies' point estimates in Figures 16a and 16b. For major or minor depression, although specificities were similar to those of other studies, sensitivities were slightly higher than those reported in Figures 17a and 17b. For example, at a cutoff of ≥ 10, sensitivity was reported as 0.89; at a cutoff of ≥ 13, sensitivity was 0.68.

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-peridepf18a.jpg.

   Figure 18a. Sensitivity of screening by BDI: postpartum period, major depression

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-peridepf18b.jpg.

   Figure 18b. Specificity of screening by BDI: postpartum period, major depression alone

Beck Depression Index. Three studies involving the BDI (417 patients in all) reported lower sensitivity and slightly higher specificity than did the EPDS studies.71, 74, 77 Two studies71, 74 reported results for major depression only using the BDI-II and BDI, respectively (Figures 18a and 18b). Using a cutoff of ≥ 21, Beck and Gable reported a sensitivity of 0.56 and a specificity of 1.0,71 Harris et al. reported a sensitivity of 0.32 and a specificity of 0.99 at this cutoff.74 Harris et al. also reported test characteristics for cutoff points of ≥ 13 and ≥ 11;74 these cutoff scores each produced slightly higher sensitivity and slightly lower specificity than did a cutoff of 21. Relative to the EPDS for major depression alone, Beck and Gable results showed sensitivities that remained substantially lower and specificities that appeared to be slightly higher, although wide CIs preclude a confident comparison.

For major or minor depression, two articles reported BDI test characteristics using different thresholds.71, 77 Beck and Gable,71 using a cutoff of ≥ 15 on the BDI-II, reported a sensitivity of 0.57 and a specificity of 0.97. The BDI study by Whiffen employed a cutoff of ≥ 10 and reported a sensitivity of 0.48 and a specificity of 0.86.77

Postpartum Depression Screening Scale. One study of the PDSS (150 patients) reported high sensitivity (0.94) and high specificity (0.98) for major depressive disorder alone at a cutoff of ≥ 80.71 The investigators also reported lower sensitivity (0.91) and lower specificity (0.72) for major or minor depression using a cutoff of ≥ 60.

Center for Epidemiological Studies - Depression Scale. One study of the CES-D (1,007 patients) used two cutoff points (≥ 21 and ≥ 16).32 It reported low sensitivity (0.60 and 0.43, respectively) and high specificity (0.92 and 0.97, respectively) for major or minor depression.

Summary of Results of Screening Instrument Review

The available evidence for both major depression alone and major or minor depression together is characterized by studies including markedly low numbers of depressed patients, a narrow racial and ethnic mix, varying cutoff points, and varying reference standards. These factors combine to preclude definitive conclusions or recommendations about screening instruments or thresholds.

Screening Instruments. For major depression alone, all screening instruments investigated (EPDS, BDI, PDSS) provided similarly high degrees of specificity at various cutoffs. Because of wide CIs, however, conclusions about sensitivity are more restricted. Heterogeneity among the studies limited our ability to synthesize these results quantitatively. In most instances, we could not obtain a more precise estimate. For an EPDS cutoff of ≥ 13 for patients with major depression alone, sensitivity estimates were combined in a meta-analysis to produce a point estimate of 0.91; however, heterogeneity precluded a meta-analysis for a specificity point estimate.

The EPDS and PDSS (with point estimates ranging from 0.75 to 1.0 at various cutoffs) appeared to be more sensitive than the BDI instruments (0.32 to 0.68 at various cutoffs), but the wide CIs overlapped nearly completely. A recent meta-analysis of prevalence estimates found that, compared with structured clinical interviews, the EPDS produced statistically equivalent prevalence estimates whereas the BDI produced significantly higher estimates.27 Together, these findings suggest that a positive screen with EPDS may be more clinically useful than screens with the other instruments.

For major or minor depression, sensitivity point estimates for each tool at each cutoff were consistently lower than those for major depression alone, although specificities were quite similar to those for major depression alone. We were able to synthesize EPDS results quantitatively at a cutoff of ≥ 13, producing a sensitivity point estimate of 0.54 with a wide CI (95% CI, 0.39 to 0.70) and a specificity estimate of 0.91 with a narrow CI (95% CI, 0.88 to 0.94). At an EPDS cutoff of ≥ 10, we were able to produce a pooled sensitivity estimate of 0.68 (95% CI, 0.58 to 0.78), but heterogeneity precluded a pooled analysis for specificity.

In short, estimates of specificity are relatively precise, but estimates of sensitivity are imprecise. This pattern of results prevents any substantive conclusions about the accuracy of these tools for identifying true positives. This imprecision can be attributed to the consistently low number of patients with a depression diagnosis, a fact reflected by a number of studies reporting 100 percent sensitivity, and it is a major limitation of the currently available data. Because of this imprecision, we cannot meaningfully compare sensitivities of screening instruments.

Cutoff Points. For an individual screening instrument, we cannot make any substantive conclusions about the use of a particular cutoff point. As noted above, the wide CIs for sensitivity prevent one from confidently distinguishing one sensitivity result from another. However, two further guides that bear directly on the choice of a threshold need to be considered before a particular threshold could be suggested.

First, the relative cost, or value, of errors in screening tests (false-negative compared to false-positive results) needs to be clarified. False-negative results (miss true depression) can lead to bad outcomes such as continued morbidity, costs of unnecessary tests, and similar effects. By contrast, false-positive results (identifying depression when it is not there), can lead to unnecessary time, effort, and financial cost for diagnostic workup as well as potential side effects of a treatment that is not indicated. If false-negative and false-positive results are equally bad, then a screening test should try to minimize both equally to identify the most effective cutoff.

If missing depression in a patient is worse than falsely identifying depression in a patient (i.e., a false-negative classification is worse than a false-positive one), then one would want a test that maximizes sensitivity and has the highest negative predictive value. Said another way, the preferred test would be one in which the greatest proportion of those screening negative do not have the disease. By contrast, if falsely identifying a patient as having depression (a false positive) is worse, then one would want a test that maximizes specificity and has the highest positive predictive value. Clinical intuition suggests that missing a diagnosis is worse than making an incorrect diagnosis. We could find no literature addressing the trade-off of false-positive versus false-negative diagnoses in this clinical situation.

A second important guide in choosing a cutoff is the prevalence of a disease in a particular population. Regardless of test characteristics, in populations in which the prevalence of depression is relatively high, the number of false-negative results is higher; in populations in which the prevalence is relatively low, the number of false-positive results is higher. Therefore, the choice of a test and cutoff may differ depending on whether the population has a higher prevalence of depression (e.g., a high-risk postpartum clinic) or a lower prevalence (e.g., a healthy baby clinic). As a result, these three variables—sensitivity and specificity, the predictive value of screening errors (false positives versus false negatives), and the prevalence of the disease—must be clarified before clinicians or researchers can choose a specific test and related cutoff.

The above limitations notwithstanding, the tools we have reviewed above appear to be able to identify depressive illness in pregnant and postpartum women with a degree of accuracy similar to that for depression screen results in other nonpsychiatric settings. Screening results in primary care for a combined major or minor depression group are not available, but the results in primary care settings for major depression alone are similar to those reported for perinatal depression. For example, in a synthesis of depression case-identifying instruments in primary care settings using selection criteria similar to ours, Williams et al. reported a median sensitivity for major depression of 85 percent (range, 50 percent to 97 percent), and a median specificity of 74 percent (range, 51 percent to 98 percent).90 This review included both women and men, which might explain the lower measures of accuracy; female gender appears to improve the accuracy of depression screens in primary care settings.91

Interpretation of Results

The small numbers of relevant articles limits our interpretation of the results. Given that most of the articles address the EPDS, we will use this instrument as an example. Because of the reports of 100 percent sensitivity in the prenatal tests of the EPDS (underscoring the very small number of prenatal depressed patients involved), we consider application of our results only to the postpartum population, and we draw on the prevalence data reported in Chapter 3 for KQ 1. We caution that, given the low numbers of depressed patients in the postpartum studies, the sensitivity estimates are likely to be inaccurate. Also, the majority of postpartum screens were performed 6 to 8 weeks after delivery, so the examples below apply only to that time period.

For major depression alone, the estimated point prevalence for the 6- to 8-week postpartum period is 6.8 percent, although the confidence interval around this estimate is wide. EPDS screens using the most commonly cited cutoff of 13 have a sensitivity of 91 percent and a specificity of approximately 95 percent. To illustrate this scenario, consider using this tool and cutoff for 1,000 patients. This EPDS screen would produce 62 true-positive cases and 6 false-negative cases, and 47 false-positive cases and 885 true-negative cases. The positive predictive value is 57 percent, meaning that the probability that a woman with a positive screen truly has major depressive disorder is slightly more than half. The negative predictive value (i.e., the probability that a woman with a negative screen would not have depressive illness) is 99 percent.

For major or minor depression, the estimated point prevalence from KQ 1 is 11.3 percent. EPDS screens tested for this population most commonly reported a cutoff of 10. This threshold at 6 to 8 weeks postpartum has a sensitivity of 68 percent and a specificity of approximately 80 percent. For 1,000 patients, the screen would produce 77 true-positive cases and 36 false-negative results, and 177 false-positive cases and 710 true-negative cases. The positive predictive value is 30 percent, and the negative predictive value is 95 percent.

Discussion

Conclusions

Very little is known about the accuracy of depression screening tests in pregnant and postpartum women. The available evidence is limited in several ways. It has a very narrow racial and ethnic mix. Study samples have prevalence rates of depression that are, by design, somewhat higher than our best estimate prevalence rates from KQ 1 (which would produce a higher positive predictive value). Most important, the available data involve small numbers of depressed patients. We could not address the limits of the small numbers of depressed patients using meta-analytic procedures. Case definitions, reference standards, screening tools, and screening thresholds all varied across the studies, and the heterogeneity of study methods constrained our ability to synthesize the data and obtain pooled estimates.

Despite these limitations, the available evidence does indicate that depression screens are feasible to administer in perinatal settings. It also suggests that the estimates of sensitivity and specificity, although limited, appear equivalent to those that have been reported in primary care settings. In particular, specificity is relatively good, suggesting a relatively good positive predictive value.

Future Research

Further studies in this area need to standardize the above parameters we have examined in this chapter (instruments and, in particular, cutoff points), involve a more representative mix of racial and ethnic groups, test the screening tools in populations with a frequency of depression more reflective of the actual prevalence, and include a larger number of depressed patients to clarify the accuracy of depression screening tools and make them more relevant to the population of interest. Given the currently available evidence, we offer six future research recommendations.

First, subsequent studies on the test characteristics of screeners must be designed with sample size estimates that take into account prevalence and that project a reasonable width of sensitivity confidence intervals for the particular illness. For example, studies would need to screen 1,000 women to identify 34 with major depression or 110 with major or minor depression. This sample size might be enough for precise estimates for women with major or minor depression as a group, but it may not be enough for precise estimates for major depression alone.

Second, the sample should represent the target population. Specifically, subsequent studies need to provide a more representative racial and ethnic mix. In addition, studies should incorporate a range of other demographic variables that could influence screening performance, such as socioeconomic status measures, and assess the screening tools in these subpopulations.

Third, as in the Beck and Gable study,71 subsequent studies should assess and directly compare multiple screening instruments. This design provides a head-to-head comparison that allows researchers and clinicians to understand which screening instruments are more accurate than others in different settings.

Fourth, studies evaluating both the risks and benefits of screening, specifically assessing the relative cost of false-negatives and false-positive results, will provide insights on how to consider target sensitivity and specificity when attempting to maximize cost-effectiveness.

Fifth, subsequent depression screening studies should carefully consider whether to target major depression alone, for which beneficial treatments clearly exist, or the traditional combined category of major or minor depression, a heterogeneous group for which treatment benefit is unclear. Our results suggest that the sensitivity of screening instruments is generally greater for the major depression alone group.

Sixth, the bulk of the screening studies we reviewed were conducted in the first 3 months postpartum. Subsequent studies should examine screening not just in the first 3 months postpartum but also at 6 weeks, 6 months, and 12 months postpartum. If peak prevalence and incidence occur within the first 6 weeks, the obstetrics clinic is a prime place to target resources for such a program. If, however, peaks occur after this time, most postpartum women will have completed follow-up care with an obstetrician, so programs in an obstetrics clinic may be less helpful. In this case, it is possible that programs targeting new mothers in family medicine, internal medicine, or pediatric clinics might be more effective.

Use in Clinical Settings

In the interim, what is a clinician to do? The best available evidence supports the conclusion that screening instruments with reasonable test characteristics appear feasible to use in a perinatal population with a depression prevalence between 5 percent and 10 percent. Given that use of the tools likely carries low risk, and that they all have reasonable specificity (and, thus, a reasonable positive predictive value), the selection of a tool would be guided by an interest in maximizing sensitivity. For the category of major or minor depression, sensitivity estimates were quite similar for all instruments. However, for major depression alone, sensitivity estimates for the EPDS and PDSS appear to be higher than those for the BDI. The standard cutoffs of ≥ 13 for the EPDS and ≥ 81 for the PDSS appear to be reasonable thresholds.

Having an instrument that can accurately identify women at risk of having perinatal depression is an important and necessary link in improving the clinical outcomes of women with perinatal depression: women who may benefit from a depression intervention first need to be recognized. Nonetheless, it remains merely an initial step. A more important question is whether screening pregnant or postpartum women to identify those at risk of having depression, and subsequently providing an intervention, ultimately leads to improved outcome. We address this key question in our next chapter.

Chapter 5. Impact of Depression Screening and Interventions on Patient Outcomes

Introduction

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-peridepf19.jpg.

   Figure 19. Causal Pathway for Key Question 3 on Screening and Treatment Outcomes

In agreement with the Safe Motherhood Group and the Agency for Healthcare Research and Quality (AHRQ), we directed part of our work to Key Question (KQ) 3: Does prenatal or early postnatal screening for depressive symptoms with subsequent intervention lead to improved outcomes? That is, does screening for depression during pregnancy or the postpartum period and implementing an intervention improve outcomes related to maternal depressive symptoms.? To address KQ 3, we developed an analytic framework (Figure 19), which begins (left side) by identifying a cohort of women with unknown mood state, continues through implementation of a formal screening of the cohort either during pregnancy or in the postpartum period, and (right side) ends with studies of an intervention to assess how it may affect outcome measures of postpartum depression.

As described in this chapter, screening can be done in various settings and with various instruments (as discussed for KQ 2). Interventions are both nonpharmacologic (e.g., counseling and behavioral intervention programs aimed at mothers or, in some cases, both parents or mother-infant dyads) and pharmacologic (e.g., antidepressants). These interventions can be implemented in various outpatient settings (e.g., clinics, homes) and delivered by various types of health professionals, and they may be group efforts or one-on-one activities.

Methods

Chapter 2 documentsthe methods we used to conduct literature searches and title and abstract or full article reviews. We did not identify any studies that specifically examined the cascade of screening-treatment-outcomes. Thus, we do not have any direct evidence pertaining to KQ3.

All the trials included for KQ3 are treatment studies that had a screening component (either a formal depression screening instrument or other type of screen that identified women at risk of a depressive illness). We included studies conducted worldwide in developed countries where the population could be generalized to pregnant and postpartum women in the United States, regardless of the language spoken. We also included both randomized controlled trials (RCTs) and prospective cohort studies. Additionally, for inclusion in KQ 3, patients were identified by a screen done either during pregnancy or during 12 months postpartum and considered to be “at risk” of having a depressive illness.

We excluded all case-control studies and studies in which patients had had a documented current depressive episode before the initial screening. Furthermore, we excluded two studies that had originally been reviewed for the feasibility study, one because it did not use any screening92 and one because it had no depression severity outcome.93

We attempted to synthesize the results of the included studies quantitatively, but the study methods (screening instruments, type of intervention, intensity of intervention, outcomes measured) were so heterogeneous that a combined result would have little meaning. We also attempted to compare effect sizes in an exploratory analysis of the various studies, but the data necessary to compute these were not available.

Appendix B presents the quality rating form used for articles considered for KQ 3. The total possible score for these studies was 29. We characterized studies with scores of 20 or greater as good, those with scores between 15 and 19 as fair, and those with scores of 14 and below as poor. The domains and maximum points possible for each domain are as follows:

  • Reporting (domain score of 11): 10 items covering study aims, measures, patient populations, findings, and statistical presentation; each scored yes or no (1 or 0), except for an item concerning principal confounders that was scored yes, partially, or no (2, 1, or 0, respectively).

  • External validity (domain score of 3): Three items relating to representativeness of populations from which people were recruited and of settings and clinicians that treat such patients; each scored yes or no (1 or 0).

  • Internal validity-bias (domain score of 7): Seven items relating to issues such as blinding subjects and outcomes assessors, follow-up periods, appropriate statistical tests, and use of reliable and valid outcome measures; each scored yes, no, or unable to determine (1, 0, or 0, respectively).

  • Internal validity-confounding (domain score of 6): Six items relating to sources of intervention and control groups, randomization of study subjects and concealment of allocation, adequacy of adjustments for confounding, and loss to follow-up; each scored yes, no, or unable to determine (1, 0, or 0, respectively).

  • Power (domain score of 2): One item about use of power analysis to determine sample size; scored no, yes for one measure, or yes for two or more measures (0, 1, and 2, respectively).

Results

Table 17. Major characteristics of studies of screening with interventions for prenatal or postpartum depression
Author, YearCountryStudy DesignSample SizeSettingType of ScreeningType of Intervention
Screening during Pregnancy
Brugha et al., 200095UKRCT209Prenatal clinicModified GHQ-DStructured group prenatal preparation classes
Elliott et al., 200096UKNonran-domized controlled trial98Prenatal clinicLeverton QuestionnaireStructured group prenatal preparation classes
Crown Crisp
Stamp et al., 199594AustraliaRCT129Prenatal clinicModified prenatal questionnairePerinatal support group
Zlotnick et al., 200197USRCT37Prenatal clinicScreening surveyFour prenatal therapy/skills groups
Screening during Postpartum Period
Armstrong et al., 199998AustraliaRCT181PP hospital wardAdverse family risk factors from Brisbane Evaluation of Needs QuestionnaireRegular home visits by child-health nurses
Chabrol et al., 200299FranceRCT859 screenedPP hospital wardEPDSOne CBT prevention group during the delivery hospital stay followed by an at- home CBT- based program in women with major depression
258 randomized
Chen et al., 2000100TaiwanRCT414 screenedPP hospital wardTaiwanese BDIFour weekly PP support group sessions
115 randomizedMeasures of support
Dennis 2003101CanadaRCT501 screenedChild immunization clinicsEPDSTelephone-based peer support
44 randomized
Fleming et al., 1992102CanadaNonrandomized controlled trial781 screenedPP hospital wardEPDSPP social support group
152 enrolledCES
MAACL
Hiscock and Wake, 2002103AustraliaRCT155 screenedChild-health centerEPDSInfant sleep intervention group
99 randomized
Honey et al., 2002104UKRCT45 randomizedMother/baby clinicEPDSPsycho-educational group
Horowitz et al., 2001105USRCT1,215 screenedCommunity sample of PP womenEPDSCoached behavioral intervention to promote maternal-baby interaction
122 randomized
Onozawa et al., 2001106UKRCT59PP hospital wardEPDSInfant massage plus support group
Wisner and Wheeler, 1994107USOpen trial23PP hospital wardPrior history of PPDAntidepressant medication
Wisner et al., 2001108USRCT581 screenedPP hospital wardPrior history of PPDAntidepressant medication
56 randomized

BDI, Beck Depression Inventory; CBT, cognitive behavioral therapy; CES, Current Experience Scale; EPDS, Edinburgh Postnatal Depression Scale; GHQ-D, General Health Questionnaire Depression Score; MAACL, Multiple Affect Adjective Checklist; PP, postpartum; PPD, postpartum depression; RCT, randomized controlled trial.

Table 18. Quality rating of studies of screening with interventions for prenatal or postpartum depression
Author, YearReporting (11)External Validity (3)Internal Validity-Bias (7)Internal Validity-Confounding (6)Power (2)Total Score (29)
Screening during Pregnancy
Brugha et al., 2000957036117
Elliott et al., 2000965024011
Stamp et al., 1995946015113
Zlotnick et al., 2001975043012
Screening during Postpartum Period
Armstrong et al., 1999989153119
Brisco et al., 1989936143014
Chabrol et al., 2002998025015
Chen et al., 20001008033014
Dennis, 20031019256022
Fleming et al., 19921026032011
Hiscock and Wake, 20021037155119
Honey et al., 20021049034016
Horowitz et al., 20011057043115
Onozawa et al., 200110610033016
Wisner and Wheeler, 19941078031012
Wisner et al., 20011087064118

Note: Maximum possible score in parentheses.

Table 19. Major outcomes of studies of screening and interventions for perinatal depression
Author, YearType of InterventionOutcome MeasuresSignificant Differences between Intervention and Control Group
Screening during Pregnancy
Brugha et al., 200095Structured group prenatal preparation classesGHQ-DNo significant differences on any measure
EPDS
SCAN
Elliott et al., 200096Structured group prenatal preparation classesEPDSIntervention group had significantly lower EPDS scores in first time mothers; no significant difference on PSE for diagnosis of major depression
PSE
Stamp et al., 199594Perinatal support groupEPDSNo significant differences on this measure
Zlotnick et al., 200197Four prenatal therapy/skills groupsBDIIntervention group had a significantly greater change over time; at follow-up, intervention group had a significantly lower level of maternal depression
SCID
Screening during Postpartum Period
Armstrong et al., 199998Regular home visits by child health nursesEPDSFor secondary outcomes, intervention group had significantly lower depression scores and a positive effect on parent-infant interaction
PSI
Child health
HOME
Chabrol et al., 200299One CBT-based prevention group during the PP hospitalization, followed by an at- home CBT- based program in women with major depressionEPDSIntervention group had significant reductions in frequency of depressive symptoms
HAM-D
BDI
Chen et al., 2000100Four weekly PP support group sessionsBDIIntervention group had significant lower rates of depression and rates of perceived stress and more interpersonal support
PSS
ISEL
Dennis, 2003101Telephone-based peer supportEPDSIntervention group had significantly lower EPDS scores
Fleming et al., 1992102PP social support groupEPDSNo significant differences on any measure
CES
Hiscock et al., 2002103Infant sleep intervention (controlled crying) groupEPDSIntervention group members with higher depression scores at baseline had significantly greater improvement in EPDS scores and reported improvements in sleep quality
Maternal and infant sleep quality
Maternal stress
Honey et al., 2002104Psycho-educational groupEPDSIntervention group had significant reductions in depressive symptoms
Horowitz et al., 2001105Coached behavioral intervention to promote maternal-baby responsivenessBDI-IINo significant differences for maternal depression; intervention group showed significantly better mother-infant responsiveness
DMC
Onazawa et al., 2001106Infant massage classes plus support groupEPDSNo significant differences for maternal depression; intervention group showed significant improvements in mother-infant interaction
Videotape of mother-infant interaction
Wisner and Wheeler, 1994107Antidepressant (nortriptyline)Clinical interviewIntervention group had significantly lower proportion of new episodes of major depression
IDD
Wisner et al., 2001108Antidepressant (nortriptyline)RDCNo significant differences in the rate of recurrence
HAM-D

BDI, Beck Depression Inventory; CBT, cognitive behavioral therapy; CES, Current Experience Scale; DMC, Dyadic Mutuality Code; EPDS, Edinburgh Perinatal Depression Scale; GHQ-D, General Health Questionnaire-Depression Subscale; HAM-D,Hamilton Depression Rating Scale; HOME, Home Observation for Measurement of the Environment; IDD, Inventory to Diagnose Depression; ISEL, Interpersonal Support Evaluation List; PSI, Parenting Stress Index; PSE, Present State Examination; PSS, Perceived Stress Scale; RDC, Research Diagnostic Criteria; SCAN, Schedule for Clinical Assessment in Neuropsychiatry; SCID, Structured Clinical Interview for Diagnosis.

We reviewed a total of 60 titles and abstracts or full articles for KQ 3 drawn from several searches done for the feasibility study and later for this update. Ultimately, we retained 15 studies that met our inclusion criteria. Table 17 summarizes the major characteristics of the 15 included studies, Table 18 presents the results of our quality ratings, and Table 19 shows the results of the various depression assessments made among cases and controls.

The types and frequency of screening measures and the types of interventions applied varied appreciably among the studies we reviewed. Of the 15 studies retained for the full study, 4 examined intervention efforts for which screening had been done in the prenatal period; and 11 studies examined screening and interventions in the postpartum period. The remainder of this section reports on the studies in these two main categories.

Prenatal Studies

Of the four studies examining screening, interventions, and outcomes in the prenatal period,94–97 three were RCTs94, 95, 97 and one was a nonrandomized controlled trial.96 All four studies (published between 1995 and 2001) were set in prenatal clinics. Sample sizes for screening ranged from 37 to 209, for a total population of 473 women. The types of screening instruments used to identify patients with depressive symptoms differed among these studies; similarly, the outcome measures differed, although three studies used the Edinburgh Postnatal Depression Scale (EPDS) as one measure. All four studies implemented some type of psychological intervention, generally characterized as group classes or sessions relating to prenatal preparation, skills, and perinatal support. One study was considered fair; the other three were poor.

Brugha et al. screened 209 women with a modified General Health Questionnaire Depression Score (GHQ-D) to study the effect of six weekly prenatal group therapy classes called “Preparing for Parenthood” compared to routine prenatal care.95 In this study, which we graded as fair, the program aimed to increase social support and problem-solving skills. Outcome measures that assessed maternal mood and depressive symptoms at 3 months postpartum included the GHQ-D (cutoff ≥ 2), the EPDS (cutoff score of > 11), and a Schedule for Clinical Assessment in Neuropsychiatry (SCAN, related to the International Classification of Diseases [ICD], version 10). Assignment to the intervention group did not significantly improve postpartum depression. On the GHQ-D, 26 percent of the intervention group and 22 percent of the control group scored at or above 2, with an adjusted odds ratio of 1.19 (95% confidence interval [CI], 0.59 to 2.37). On the EPDS, 16 percent of the intervention group and 19 percent of the control group scored above 11 ( an adjusted odds ratio of 0.83 (95% CI, 0.39 to 1.79).

In the earliest study we included in this group (1995), Stamp et al. used a study-specific, modified prenatal questionnaire as a screening instrument and assigned 129 patients to either two prenatal group classes plus one postpartum class (at 6 weeks) or routine care.94 Outcome measures were the EPDS at 6 weeks and 6 months postpartum, using a cutoff point of > 12 for major depression and > 9 for major or minor depression. The intervention did not significantly reduce rates of postpartum depression on either measure. For example, at 6 weeks, 13 percent of the intervention group and 17 percent of the control group had EPDS scores greater than 12; at 6 months, the figures were 15 percent and 10 percent, respectively.

Elliott et al. screened 98 women with the Leverton Questionnaire and the depression, anxiety, and somatic subscales of the Crown Crisp Experiential Index.96 The authors studied a preventive group of psychosocial intervention versus routine care; they also looked at differences between first-and second-time mothers. The structured group intervention was conducted once per month for 5 months during the prenatal period (starting at 24 weeks) and for 6 months postpartum. Outcome measures included the EPDS and the Present State Examination (PSE), as well as a self-rating questionnaire, at 3 and 12 months postpartum. For first-time mothers, the median EPDS score was significantly lower in the intervention group (Mann-Whitney one-tailed test, P = 0.005); for second-time mothers, the median EPDS did not differ significantly between the two groups. The PSE served as a formal diagnosis of depression, and the investigators reported no significant differences in diagnosis of major depression. When the authors included cases of borderline depression or “minor depression” in the analysis, first-time mothers in the intervention group were significantly less likely to have a diagnosis of depression than controls (19 percent and 39 percent, respectively, Chi-square = 2.64, one-tailed test, P < 0.05). PSE scores did not differ significantly between groups in second-time mothers.

In the only study in this category done in the United States, Zlotnick et al. used the Beck Depression Inventory (BDI) and a Diagnostic and Statistical Manual (DSM-IV) Structured Clinical Interview for Depression (SCID) as a positive screen among women of low socioeconomic status (SES).97 They excluded patients who met criteria for a current episode of major depression based on the SCID. A total of 37 patients with a positive screen were assigned to either a four-session Interpersonal-Therapy-Oriented Group (given weekly) or to a usual-care group. Outcome measures included the BDI before and after the intervention and the SCID at 3 months postpartum. Women in the intervention group had a significantly greater change in their BDI scores from baseline than did those in the control group (“pre” versus “post” intervention Beck scores were 13.0 and 8.4, respectively, for the treatment group). In contrast, the control group “pre” versus “post” intervention scores were 9.2 and 11.3, respectively, suggesting that they got worse over time. The change between the intervention and the control group was significant (t-test = 3.50; df = 33; P = 0.001). In addition, women in the intervention group had a significantly decreased rate of major depression during the postpartum period as measured by the SCID at 3 months postpartum; no women in the intervention group and 33 percent of women in the usual-care group developed postpartum depression (P > 0.02).

These four small studies of programs for women identified by screening prenatally did not, collectively, produce many positive results from the various psychosocial interventions as compared with usual care. All of these studies scored poor on external validity (0 of 3 points), and two of the four had 0 of 2 points for power. The four studies did, at best, only a fair job of reporting data (from 5 to 7 of 11 points). For bias, the study scores ranged from 1 to 4 of 7 possible points; for confounding, they ranged from 3 to 5 of 6 points. Given the heterogeneity in populations, the screening instruments and cutoff points for defining “at-risk” individuals, the interventions themselves, and the outcome measures used, we cannot draw any overall conclusions about the utility of such programs.

Postpartum Studies

Of the 11 studies examining screening and intervention outcomes only in the postpartum period, 98–108 eight were RCTs published between 1992 and 2003,98, 100, 101, 103–106, 108 and three were controlled trials published between 1992 and 2002.99, 102, 107 Sample sizes ranged from 23 to 1, 215, for a total population of 4,289 women.

As with the prenatal screening studies, the screening instruments used to identify patients with depressive symptoms differed among the postnatal studies, although the EPDS was used in the majority of studies and two studies by the same investigator team used “prior history of postpartum depression.” The treatment interventions also differed considerablyNine of these studies involved various behavioral and psycho-educational programs or other innovative activities (e.g., infant massage or infant sleep interventions); two involved tests of antidepressants. Unlike the prenatal studies, the settings varied from postpartum hospital wards to child-health and immunization clinics. Finally, the outcome measures also varied across these studies, but the EPDS was most commonly used (in seven studies). We graded one study good, seven fair, and three poor.

Behavioral and Psychosocial Interventions. Of the nine studies in this subgroup, one was conducted in the United States; the remainder were in Australia (two studies), Canada (two studies), the United Kingdom (not otherwise specified, two studies), and France and Taiwan (one study each). Using “number randomized or enrolled” as the metric, the sample sizes ranged from 45 to 859. We describe the studies below according to quality grade and sample size.

In a recent study rated good that randomized participants for the intervention (not screening), Dennis et al. screened 501 women recruited from child immunization clinics between 8 and 12 weeks postpartum.101 Inclusion criteria included the mother's being at least 18 years of age, having a singleton birth, and delivering a full-term infant. Women were screened using the EPDS (cutoff score > 9). The 44 women with a positive screen were randomized to a “mother-to-mother” peer support telephone intervention or to routine care. The outcome measures were the EPDS at 4 and 8 weeks after randomization. The women in the intervention group had significantly lower EPDS scores than those in the control group: 15 percent of the intervention group and 52.4 percent of the control group had an EPDS > 12 at 8 weeks (P = 0.02).

In the largest of the seven studies rated fair, Chabrol et al. screened 859 women and identified 258 who were at risk based on an EPDS > 9 on day 2 or 3 postpartum.99 They assigned these 258 women randomly to receive a cognitive behavioral therapy (CBT) (n = 130) intervention or to routine care (n = 128) during the postpartum hospitalization. CBT is a form of psychotherapy that actively examines how cognitions influence emotions or affect and involves active exploration, clarification, and testing of the patient's perceptions and beliefs.109 Outcome measures for the prevention intervention included the EPDS (cutoff ≥ 11) taken at 4 to 6 weeks postpartum.

Women in the CBT group who continued to have positive screens on the EPDS (defined as EPDS score ≥ 11) at 4 to 6 weeks were assessed for major depression in a clinical interview using the Mini-Neuropsychiatric Interview (MINI) and DSM-IV criteria. Those with major depression were offered an at-home CBT program for five to eight additional sessions. These women were then compared with women with probable major depression in the control group at 10 to 12 weeks using the EPDS, Hamilton Depression Rating Scale (HAM-D), and the BDI. Women in the control group received one initial home visit assessment but then received only weekly telephone checks.

The study results demonstrated that women in the prevention intervention group had significant reductions in the frequency of depressive symptoms. At 4 to 6 weeks postpartum, 30.2 percent of those in the CBT group versus 48.2 percent in the control group (P = 0.0067) were still depressed (based on an EPDS score of ≥ 11). Additionally, the intensity of depressive symptoms measured by the mean score on the EPDS was significantly lower in the prevention group than in the control group: mean EPDS scores, respectively, of 8.5 (standard deviation [SD] 4) and 10.3 (SD 4.4) (t-test = 3.06, df = 209, P = 0.0024); the analyses indicate a medium effect size (ES, 0.42). At 10 to 12 weeks postpartum after completion of the home-based CBT intervention, women in the intervention group had significantly lower scores on all measures of depressive symptoms (HAM-D, BDI, EPDS) than did those in the control group. Specifically, the intervention and the control group mean scores were as follows: HAM-D, 5.7 versus 16.2 (t-test = 8.4, P < 0.0001); BDI, 4.7 versus 15.7 (t-test = 9, P < 0.0001); and EPDS, 5.9 versus 13.7 (t-test = 7.7; P < 0.0001).

Armstrong et al. screened 181 women with good literacy skills in the immediate postpartum period by asking about a history of trauma or abuse or a positive screen for adverse family characteristics on the Brisbane Evaluation of Needs Questionnaire.98 Women were randomized to receive 6 months of home visits by a child-health nurse or routine primary care. Primary outcome measures involved measures of child health;, parental and family functioning (measured by the EPDS [cutoff > 12] and the Parenting Stress Index [PSI]), quality of the home environment (HOME assessment), and satisfaction with community services. All assessments were administered immediately postpartum.

If we focus primarily on scores of maternal depression and functioning at 6 weeks postpartum, women in the intervention group had significantly lower depression scores than the control group: 5.8 percent in the intervention group and 20.7 percent in the control group (P = 0.003) with EPDS > 12. Additionally, women in the intervention group had significantly lower (better) PSI scores at 6 weeks than controls (15.3 versus 38.4, P < 0.001). The investigators also reported that the total HOME score differed significantly between groups: 28.34 for the intervention group versus 25.51 for the control group (P < 0.001), providing evidence for the positive effect the intervention had on influencing parent-infant interaction and the home environment for the child.

Hiscock and Wake recruited 155 women from a child-health center at 7 to 8 months postpartum and screened with the EPDS (cutoff > 12).103 Other inclusion criteria included reported child sleep problems. Of these women, 99 were considered depressed (baseline EPDS ≥ 10) and randomly assigned to either an infant sleep intervention group or a usual-care group. The infant sleep intervention comprised three private sessions (one session every 2 weeks) held at the local child-health center where sleep management plans were discussed, including an emphasis on controlled crying (where parents responded to their infants' crying at increasing time intervals, allowing the infant to fall asleep unaided). At the 10- to 12-month follow-up assessment, outcome measures included the EPDS, measures of sleep quality, and measures of maternal stress and coping.

The results of this study were mixed. Women who began with higher (worse) scores of depression at baseline had a significantly greater improvement in their EPDS scores than did those in the control group. At the 10-month follow-up, women in the intervention group had a 6.0 point decrease (95% CI, 7.5 to 4.0) in EPDS score compared to a 3.7 point decrease (95% CI, 4.9 to 2.6) in the control group (P = 0.01). At the 12-month follow-up visit, the intervention group had a 6.5 point decrease (95% CI, 7.9 to 5.1) in EPDS score compared to a 4.2 point decrease (95% CI, 5.9 to 2.5) in the control group (P = 0.04). Also, at the 10-month follow-up, women in the intervention group reported improvements in their own sleep quality, including being more likely than control mothers to rate their own sleep quality as “very good” and less likely to rate it as “very bad” (Chi square = 9.93; P = 0.02). They also reported having “enough sleep” and were less likely to have “not enough” sleep (Chi square = 8.11, P = 0.04).

Horowitz et al. screened 1,215 women at 2 to 4 weeks postpartum with the EPDS (cutoff > 10). Women with positive screens (n = 122) were randomly assigned to either an interactive coaching intervention or a control group. The coached behavioral intervention was designed to promote maternal-infant responsiveness. All women in the study (both intervention and control groups) received three home visits when their infants were 4 to 8 weeks, 10 to 14 weeks, and 14 to 18 weeks of age; the women in the intervention group practiced the coaching intervention during these visits. Outcome measures included the BDI-II for maternal depression and, secondarily, the Dyadic Mutuality Code (DMC), a measure of the level of responsiveness in the maternal-infant relationship. Responsiveness was defined as “the mother's ability to accommodate to her infant's behavior and to give it meaning through regulation of her own behavioral responses” (p. 326). The intervention and control groups did not differ significantly in terms of maternal depression scores (BDI-II) at any time period. The DMC showed a significantly better outcome for mother-infant responsiveness for the treatment group (P = 0.06).

Onozawa et al. screened 581 primiparous women with the EPDS (cutoff ≥ 13) at 4 weeks postpartum.106 Of 91 women who had a positive screen, 59 agreed to participate in the study. Participants were randomized to either a 5-week infant massage class with a support group or the support group only. The 1-hour infant massage class (approved by the International Association of Infant Massage) taught parents the techniques of infant massage by encouraging parents to observe and respond to their infants' body language and cues and to adjust their touch accordingly. Outcome measures included maternal depression on the EPDS (cutoff ≥ 13) at 4 weeks and 2 months postpartum and a videotaped mother-infant interaction that assessed the mother's attitude toward the infant, the infant's response to the mother, and the overall quality of the interaction. At 14 weeks postpartum, EPDS scores had fallen for both groups (reported as a change in median EPDS score from baseline to final visit), but the intervention group demonstrated a significantly greater change in scores than did the control group (intervention group baseline of 15.0 and final visit score of 5.0, versus the control group score of 16.0 at baseline to 10.0 at the final visit; P = 0.03). Additionally, significant improvements in all aspects of mother-infant interaction as measured by the videotape were seen only in the massage group (P = 0.0004).

Honey et al. used the EPDS (cutoff > 12) to screen postpartum women recruited through mother-baby clinics but assessed at home.104 The 45 women with a positive screen on the EPDS were randomly assigned to either an 8-week psycho-educational group (PEG) or to a routine care group. Outcome measures included the EPDS (cutoff > 12) after completion of the PEG and at a 6-month follow-up. At the end of the 8-week assessment interval, the women in the PEG did not differ significantly from those in the routine-care group. By contrast, at the 6-month follow-up assessment, the percentage of women scoring below the EPDS cutoff for a probable major depressive episode was significantly higher in the PEG group than in the routine-care group (65 percent versus 36 percent, Chi square = 3.75; P ≤ 0.05). An additional analysis demonstrated that the use of antidepressant medication during the study had no impact on the improvement in mood observed at the 6-month follow-up assessment.

Fleming et al. screened 781 primiparous women with full-term deliveries and no psychiatric history during their first 2 weeks postpartum using the EPDS (cutoff ≥ 13), the Current Experience Scale (CES, cutoff ≥ 35), and the Multiple Affect Adjective Checklist (MAACL, cutoff ≥ 21).102 Women with a positive screen (n = 142) were assigned (not randomly) to either a postpartum social support group that included both depressed and nondepressed women or a usual-care group.

Outcome measures included the EPDS and the CES at the same cutoff scores used for screening. At the 6-week and 5-month follow-up assessments, the groups did not differ significantly with respect to rates of maternal depression, and the support groups had no apparent effect on the mothers' general affective mood. However, women in the social support group had a statistically significant increase in the number of maternal-infant interactions and noted decreased infant crying compared to women in the routine-care group.

Chen et al. screened 414 women at 3 weeks postpartum using the Taiwanese BDI (cutoff ≥ 10).100 Of these, 115 women with positive screens were randomized to weekly support groups or to a routine-care group; 60 patients were available for analysis. Outcome measures included the BDI (cutoff ≥ 10), Perceived Stress Scale (PSS), and measures of interpersonal support. At the 15-week follow-up assessment, women in the intervention group had significantly lower rates of depression: 33.3 percent of the intervention group and 60.0 percent of the control group had BDI values equal to or greater than 10 (P < 0.05). The rate of perceived stress was also significantly lower in the intervention group than the control group (t-test = 3.75, P < 0.01). Finally, women receiving the intervention reported significantly more interpersonal support as measured by the Interpersonal Support Evaluation List than those in the control group (t-test = 2.81, P < 0.01).

Pharmacologic Studies. Two of the studies were psychotropic medication trials to prevent the occurrence of postpartum depression. The women were not directly screened with any instrument, but rather were included if they had a previous history of postpartum depression. The same research team conducted both of these pharmacologic trials. In the first trial, Wisner and Wheeler studied the efficacy of antidepressant treatment in women with a previous history of postpartum depression (i.e., at high risk of maternal depression but no history of psychosis or bipolar disorder).107 At-risk postpartum women (n = 23), who had had at least one episode of postpartum depression were treated in an open clinical trial with the tricyclic antidepressant nortriptyline and postpartum monitoring or with postpartum monitoring only. Outcome measures included a clinical assessment of major depression and the Inventory to Diagnose Depression scale. After 3 months, study results demonstrated a significantly greater proportion of new-episode major depression in those patients who received monitoring alone than in those in the medication group (62.5 percent of those in the monitoring group; 6.7 percent in the medication group; P = 0.0086).

In a later Wisner et al. RCT, 56 women with a prior history of postpartum depression within the past 5 years but no depressive episode upon enrollment, as diagnosed by standardized research diagnostic criteria, were randomized to either a nortriptyline group or a placebo group immediately postpartum.108 Outcome measures of recurrence of perinatal depression included the HAM-D and Research Diagnostic Criteria (RDC). In contrast to the earlier open-label trial, the investigators reported no difference in the rate of recurrence of depression (one-fourth) between women treated with nortriptyline and those receiving placebo (23 percent versus 24 percent, respectively).

None of the studies used treatment interventions that are recognized as the gold standard treatment for major depressive illness according to current American Psychiatric Association guidelines. These guidelines specify that the gold standard include antidepressant medication plus psychotherapy.12.

The overall quality of these 11 postpartum studies was fair; one study was rated good; two, poor. Of a possible quality score of 29, one study scored 22, seven scored between 15 and 20, and two at or below 14 (Table 18). External validity was generally poor; the majority of studies scored 0 of 3 points. The bias measure of internal validity ranged between scores of 1 and 6 (of a possible 7); 6 of 11 studies scored 4 or better. For confounding, scores ranged from 1 to 6 (of a possible 6); 6 of 11 studies scored 4 or better. Power was generally poor, reflecting the small sample sizes; 9 of 11 studies scored 0 (of a possible 2).

Only three studies had quality scores of 18 or higher: Wisner et al.,108 Armstrong et al.,98 and Dennis et al.101 All three enrolled women in the postpartum period and had a fairly intensive treatment approach consisting of weekly interventions (Wisner et al. for 20 weeks, Dennis et al. for 8 weeks, and Armstrong et al. for 6 weeks). The Wisner et al. study had a pharmacologic intervention with weekly assessments of efficacy but no psychotherapeutic intervention; by contrast, the Dennis et al. and Armstrong et al. studies had weekly psychotherapeutic interventions but no pharmacologic intervention. Interestingly, although Wisner et al. treated patients for 20 weeks postpartum with antidepressant medication, their study did not have a significant result. This finding may suggest that psychosocial support and psychotherapeutic intervention are both critical as part of a treatment plan for women with postpartum depression.

Discussion

Conclusions

The 15 studies examined a variety of screening and treatment interventions for women identified as being at risk (sometimes at high risk) for postpartum depression. The majority of these studies focused on intervention strategies in the postpartum period; all but two dealt with a wide array of psychosocial, education, skill-building, and other mother-child behavioral activities. Generally, the more successful efforts occurred in the studies in which screening and interventions were carried out in the postpartum, not the prenatal, period. Once again, none of the studies had a treatment intervention with both psychotherapeutic and pharmacologic components that would be considered “gold standard therapy” for the treatment of major depression.

Overall, many of the studies suggest that providing the mother with some form of psychosocial program to increase maternal support or improve maternal-child interaction may decrease the rate of postpartum depression. Across the nine nonpharmacologic studies, about 20 outcomes were assessed; of these, 12 showed significant effects for the intervention group. Taking only the outcomes dealing specifically with depression, nine significant effects were reported. The two small pharmacologic trials from the same group yielded conflicting results about the impact of nortriptyline in reducing recurrence of maternal depression.

Only one study97 specifically studied low SES women—a matter of some interest to the Safe Motherhood Group. Low SES women with at least one risk factor for postpartum depression who participated in weekly prenatal survival skills classes were less likely to develop postpartum depression compared to controls. This small study suggested that increasing support and parenting skills may help to decrease postpartum depression in this particular population.

Study Limitations

This set of studies, however, has several limitations, and it can be regarded as offering, at best, only fair evidence about the utility of screening plus prevention or treatment programs or even interventions alone. Although a variety of interventions may be helpful in treating women with or at risk of perinatal depression, the available evidence does not directly address whether screening with subsequent intervention improves outcomes. Screening, in the classic sense, implies “examination of a group of usually asymptomatic individuals to detect those with a high probability of having a given disease” (italics added; http://dictionary.reference.com/search?q=screening); this meaning can be extended to using appropriate screening or diagnostic tools within populations with known risk factors. These studies provide little guidance in answering the practical question of whether clinicians should screen all women in the perinatal period (i.e., essentially an asymptomatic population with respect to depression) for risk factors or latent depression, or whether they should screen only women who have known prior histories or risk factors for depression.

The studies are generally small, with poor generalizability (especially to the heterogeneous childbearing population of the United States). We contemplated and rejected the idea of any quantitative analyses: populations, settings, and screening and outcome measures—let alone interventions—were simply too disparate for anything but qualitative synthesis.

Future Research

To overcome some of these problems in understanding the impact of programs designed to prevent the problems of perinatal depression or to mitigate the considerable deleterious effects of this disorder on mothers, infants, and families, considerably more and better research needs to be conducted. Possibly the most important issue is for future studies to enroll adequate samples of women and, if screening is the question, to screen quite large numbers of women to produce sample sizes with adequate power to detect relevant differences between treatment and control groups in later phases of these studies. Virtually all studies appeared to be underpowered to start with, and some lost participants along the way. This deficiency hampers investigators and policymakers in making sense of, or decisions based on, much of this work.

Moreover, a greater effort should be made, at least in US-based studies, to focus on ethnic and disadvantaged populations, such as low-income women. Even if the incidence and prevalence of perinatal depression were “evenly” spread over population groups in this country, the underfunding of health care for many (e.g., lack of insurance, poor coverage of mental health benefits in insurance plans, unavailability of publicly funded services) and the more precarious economic resources and family support for some populations means that additional attention needs to be paid to them. For example, programs may need to be designed to take lack of transportation, child care, or telephone access into account.

In addition, researchers might direct attention to several other variables that appear to be important. They include first-time versus second-time mothers, maternal comorbidities and lifestyle behaviors, family structure (make-up) and available support, and status of infant at birth (e.g., full term or not, healthy or not). Another gap may be programs intended to assist the mother-father dyad or, indeed, to assist fathers in providing the emotional or physical support needed to forestall depression in new mothers.

Ideally, researchers would employ similar screening measures with similar cutoff points so that some elements of separate studies could be compared more readily. Not all of the screening instruments used appear to be sufficiently well-targeted to perinatal depression (i.e., even if they are reliable, their validity for this purpose may be called into question). Moreover, some instruments may be relatively infeasible for use in certain populations (e.g., immigrants) or in cases in which patient self-report is important and literacy may be low. For these situations, some work to calibrate well-known instruments that have been specifically designed for this disorder and that have acceptable test properties against each other might be useful. Calibrating less well-known or well-proven instruments against some agreed-upon reference (“gold”) standard instrument in this area might also be valuable. Testing these in different settings, trying to use shorter instruments, attempting to take literacy levels into account, and in other ways improving the screening armamentarium are also important steps. In that way, investigators and clinicians can have a better selection of proven screening tools for future research or clinical practice applications.

Another element warranting more clarity is the purpose of the screening-cum-intervention effort. All appear to relate to populations of women at risk of perinatal depression (particularly postpartum depression that goes beyond “maternal blues”), but the severity level of being at risk differed in these studies. Moreover, women could have had no prior history of depression (or perinatal depression) and be at risk; alternatively, they could have had some history, especially of postpartum depression, and be, essentially, at “high” risk. These distinctions did not seem to be well or consistently described across these studies. They also have implications for the goals of the interventions themselves: for example, preventing any “first episode” of depression, mitigating the effects of a first episode that is not wholly prevented, or preventing a recurrence.

Interventions tested in the future would, ideally, be those shown to have some promise so far (e.g., as reflected in some of the studies reported here). The components of the programs should be of appropriate length and intensity, and published articles should describe them thoroughly. In addition, interventions should be consistent with current evidence-based practice standards for the treatment of major depression. Multiple studies of the same interventions, perhaps at different time periods or different settings and populations, might be helpful in completing the picture of the impact of screening and interventions on occurrence or reoccurrence of perinatal depression. Finally, outcome measures should be appropriate to the research questions and preferably selected from among the more reliable, valid, and widely used instruments. These steps might help fill the gaps in this knowledge base and permit those performing systematic reviews to compare and synthesize studies more readily.

Chapter 6. Conclusions and Recommendations

In an effort to identify the evidence base addressing important questions on the epidemiology, screening and diagnosis, and management of perinatal depression, the Safe Motherhood Group (SMG) and the Agency for Healthcare Research and Quality (AHRQ) initially requested a feasibility study to determine whether enough high-quality evidence existed on six separate issues to support a full evidence report. After reviewing our feasibility study,24 SMG and AHRQ requested an evidence report focusing on the three key questions (KQs) covered in this review.

We applied rigorous selection criteria and assessed the quality of each study, bringing a public health perspective to an area of research that traditionally has not had this focus. Our report was limited to depressive illness without psychotic symptoms, the latter complication being much less common and much more challenging to identify and manage. We made a distinction between results involving major depression alone, a discrete clinical syndrome for which treatment is clearly indicated, and results referring to patients with either major or minor depression, for which management is less clear.

This evidence report comprises a comprehensive review of all the available research. In this final chapter, we first review the major findings pertaining to each question and the strength of overall evidence about these issues; we then present some observations and recommendations about future research.

Conclusions

Key Question 1: Prevalence and Incidence of Perinatal Depression

For KQ 1, we identified 30 studies of generally moderate size that provide estimates of the prevalence of perinatal depression; 13 of these inquiries provide estimates of incidence. Studies were generally of good quality for reporting completeness and internal validity for bias; by contrast, they were of fair quality for precision and only poor quality for external validity and internal validity for confounding. In particular, the study populations were not representative of the racial and ethnic mix of the countries in which the studies were performed and especially not of the United States.

Our final best estimates of prevalence and incidence were somewhat lower than those reported in prior systematic reviews because we excluded studies that assessed depression based on self-report screens alone, which have been found to overestimate prevalence. Also, we separated out estimates of major and minor depression from estimates of major depression alone. Finally, we included more recent studies that use more precise criteria to identify major depression.

For major depression alone, our final combined point prevalence estimates ranged from 3.1 percent to 4.9 percent at different times during pregnancy and from 1.0 percent to 5.9 percent at different times during the first postpartum year. For major and minor depression, our final combined estimates of point prevalence ranged from 8.5 percent to 11.0 percent at different times during pregnancy and between 6.5 percent and 12.9 percent at different times during the first year postpartum. This nearly 2-fold higher rate suggests that approximately half of the women experience a major depressive episode and half a minor depressive episode at any given time. Confidence intervals surrounding all these estimates remain wide, suggesting that a fair amount of uncertainty remains in the combined estimates.

Fewer estimates were available for the incidence of depression. These limited data suggest that as many as 14.5 percent of pregnant women have a new episode of either major or minor depression during pregnancy, and 14.5 percent have a new episode during the first 3 months postpartum. Considering only major depression, 7.5 percent may have a new episode during pregnancy, with 6.5 percent having a new episode in the first 3 months postpartum.

Are the prevalence and incidence of depression during the perinatal period higher than the rates during nonchildbearing periods? We found three studies that measured the prevalence of major or minor depression and major depression alone for women at different times during these two periods . None of these estimates shows a statistically significant difference. Only one study20 directly compared the incidence (new onset) of perinatal depression to that of nonchildbearing women of similar age; women at 5 weeks postpartum were more than three times as likely as the comparison group to have a new episode of major or minor depression. By 6 months postpartum, this difference had disappeared. An incidence for major depression alone was not reported.

That these estimates did not appear significantly different from those of nonchildbearing women of the same age does not reduce the dramatic burden experienced by women postpartum. Indeed, these estimates, based on the best available evidence, suggest that perinatal depression, whether major or minor depression, is a very common complication of pregnancy. Furthermore, and arguably more important, after labor and delivery this dramatically common complication, rather than primarily affecting one individual, now directly affects two: mother and child.

Key Question 2: Accuracy of Perinatal Depression Screening Tools

For our analysis of the accuracy of screening tools (KQ 2), we identified 10 studies reporting test characteristics for English-language screeners. In general, studies were of fair to good quality, although external validity was only poor to fair. In particular, the study populations were nearly entirely white, so the accuracy of these screeners in nonwhite perinatal populations is not clear. A major limitation in the available evidence is the very small number of depressed patients involved, a fact that results in substantial imprecision in the point estimate of sensitivity and prevents one from reasonably determining an ideal cutoff point.

For depression during pregnancy, we found only one study reporting on screening accuracy in a population with 6 patients with major depression and 14 patients with either major or minor depression. For major depression, sensitivities for the Edinburgh Postnatal Depression Scale (EPDS) at all evaluated thresholds (12, 13, 14, 15) were 1.0, underscoring the markedly small number of depressed patients involved; specificities ranged from 0.79 (at EPDS ≥ 12) to 0.96 (at EPDS ≥ 15). For major or minor depression, sensitivity was much poorer (0.57 to 0.71); specificity remained fairly high (0.72 to 0.95).

For postpartum depression also, the small number of depressed patients involved in the studies precluded identifying an optimal screener or an optimal threshold for screening. Our ability to conduct a meta-analysis of the results of different studies was limited by the use of multiple cutoffs and other differences across studies that precluded a meaningful interpretation of the results. Where we were able to combine the results , the pooled estimates did not add to what one could conclude from individual studies.

For women with major depression alone, specificity for all screeners (the Beck Depression Inventory [BDI], the Postpartum Depression Screening Scale [PDSS], and the EPDS) was relatively high. This finding suggests that a positive screen was accurate in ruling major depression in; that is, the risk that a screen with one of these instruments would be falsely positive was low. By contrast, sensitivities varied much moreThe EPDS and the PDSS appeared to be more sensitive (with estimates ranging from 0.75 to 1.0 at different thresholds) than the BDI instruments (with estimates from 0.32 to 0.68), but the wide confidence intervals (CIs) overlapped nearly completely. This means that we could not say with confidence that the specificity estimates using the different tools were different.

The point estimates are consistent with what is reported for depression screeners in primary care settings.90 Still, the imprecision is important to clarify. If falsely missing depression (a false negative) is worse than falsely identifying it, as may be the case with this disorder, clinicians must be able to feel confident that the screen is usually positive if the disease is there and that a negative result can help rule out the illness.

For patients with major or minor depression, results were reported for EPDS, BDI, PDSS, and the Center for Epidemiologic Studies-Depression (CES-D). Specificity estimates remained relatively high, but sensitivity results were much lower (ranging from 0.43 to 0.71) than for major depression alone. This means that the ability of the screening instrument to score women as positive for this condition when the disease is present was poorer than for major depression alone. Again, neither any particular cutoff nor any particular screening instrument performed differently from the others. No available comparators were found for primary care populations.

Our results suggest that various screening instruments can identify perinatal depression, most accurately major depression, but clinicians need to know more about the precision of individual instruments. If one assumes that the risk of a false-negative depression screen is worse than the risk of a false-positive screen, perinatal depression is a condition in which sensitivity is likely to be more important than specificity. Whether as a screen for major depression alone or for major or minor depression, specificities appear high and relatively precise. By contrast, sensitivity for identifying either category is imprecise and differs by diagnostic category. For major depression alone, point estimates are equivalent to those in primary care medical settings. For major or minor depression, however, sensitivity is quite low. At this time, these screens do not appear to be useful for identifying patients in this latter category of illness.

Key Question 3: Screening and Treatment Outcomes

KQ 3 concerned issues of whether screening ultimately leads to improved patient outcomes. Although it is the most vital question from the public health perspective, it is the one with the most limited evidence. Indeed, the studies that we identified were not designed to test whether screening for depression (versus not screening) improved patient outcomes. Such a design would randomize patients to be screened or not to be screened and then compare subsequent outcomes. We found no studies designed in this way.

Instead, we made use of studies in which women were screened by formal depression screening or the presence of a risk factor associated with perinatal depression to identify those at risk of having a depressive illness; then, for those screening positive, the investigators compared the outcomes of women receiving a treatment intervention to those in a control group. This design tests whether, among women identified as at risk of depression by a screen, an intervention improves outcomes compared to the outcomes in a control group. This is an important intermediary step, but it does not directly test whether screening itself improves outcome compared to not screening. All the trials included are treatment studies that had a screening component (either a formal depression screening instrument, or other type of screen that identified women at risk of a depressive illness) but did not have diagnostic confirmation of depression.

We attempted to synthesize the results of the included studies quantitatively, but the study methods (screening instruments, type of intervention, intensity of intervention, outcomes measured) were so heterogeneous that a meta-analytic synthesis would not be meaningful. We also attempted to compare effect sizes to attempt an exploratory analysis of the various studies, but the data necessary to compute these were not available.

For patients whose screening results identified them as at risk of perinatal depression and for whom a subsequent intervention was provided, we identified 15 studies. Four small prenatal studies involved various psychosocial interventions. Quality was poor for three of these studies and fair for one. Overall, the effects of the interventions in these studies were not consistently superior to those in the control groups.

The 11 postpartum studies were of overall fair quality and had larger sample sizes than the prenatal trials. Study populations reflected only a limited racial and ethnic mix, and both external validity and the power to demonstrate statistically significant differences were generally poor. Again, screening tools and interventions varied considerably; the latter involved both psychosocial and pharmaceutical interventions.

Results were mixed. Of the nine trials that employed a psychosocial intervention, six studies98–101, 103, 104 reported significant benefit for depression outcomes in the experimental group compared to those in the control group. The one RCT involving pharmacologic intervention did not show benefit relative to the control group.108 Overall, the evidence available is not sufficient to draw conclusions about this key question. These results, although limited, do suggest that providing some form of psychosocial support to pregnant women at risk of having a depressive illness may decrease depressive symptoms.

Recommendations for Future Research

The available research suggests that depression is one of the more common complications of the prenatal and postpartum periods and that fairly accurate and feasible screening measures are available. The prenatal or postpartum periods are clearly not times for nonpsychiatric clinicians to ignore depression screening, which is routinely recommended for patients seen in primary care settings.110, 111

Specifics of the course of a depressive illness with onset during the perinatal period, including the severe physiologic and psychological challenges unique to this period that complicate the identification and management of perinatal depression, seem to suggest that this topic would have a substantial degree of high-quality research. We were surprised by the paucity of such evidence in this area. If one assumes that perinatal depression is a significant mental health and public health problem, then larger scale studies are needed involving each of these domains. The small number and small size of relevant studies are not adequate to guide national policy.

Reflecting on the three key questions addressed in this report, we have concluded generally that the level of research warrants both improvement and expansion. The three results chapters discuss the limitations and gaps in these areas in more detail. We summarize here our suggestions for additional research efforts for the future.

For KQ 1, prevalence studies need to account better for the racial and ethnic mix of perinatal depression in the US population. We do not have good evidence about whether and, if so, how perinatal depression rates differ among various ethnic groups. The absence of information on nonwhite populations was dramatic. Better understanding any racial and ethnic variations could help clinicians know where to target screening programs and researchers know where to target studies on screening tools, and it could help researchers clarify the need for more nationally representative perinatal depression samples. Furthermore, researchers need to clarify whether the incidence of perinatal depression is greater than the incidence of depression in nonchildbearing women of similar ages.

For KQ 2, the quality grades point to several areas in which improvements in study design and conduct are needed. In particular, future studies on the test characteristics of screeners must be designed with sample size estimates that take prevalence into account and that project a reasonably precise estimate of sensitivity for the particular illness. Moreover, samples should more closely mirror the target population; specifically, subsequent studies need to provide a more representative racial and ethnic mix. In addition, studies should incorporate a range of other demographic variables that could influence screening performance, such as socioeconomic status measures, and assess the screening tools in these subpopulations.

Furthermore, as Beck and Gable did,71 future research should continue to assess and directly compare multiple screening instruments. This design would provide a head-to-head comparison to allow an evaluation of which screening instrument is more accurate in the setting in which the investigations are carried out. Moreover, studies evaluating the cost-effectiveness of screening, specifically assessing the relative costs of false-negative and false-positive designation, the degree of provider burden, and patient acceptability, are needed to provide insights on how to consider target sensitivity and specificity when attempting to maximize cost-effectiveness.

Diagnosis is another area of concern. Subsequent studies should carefully consider whether to target major depression alone, for which beneficial treatments clearly exist, or the combined category that includes minor depression, a heterogeneous group for which treatment benefit is unclear. Given that the results suggest that available screening tools identify major depression alone more accurately, and noting that the general benefit of interventions is more apparent for major depression alone, we believe that an evidence-based public health perspective recommends targeting major depression alone.

Timing is another factor of future studies deserving more thought. The issue here involves both the need for more epidemiology to confirm prevalence rates at different times as well as the need to confirm what time point(s) would identify the greatest number of depressed women. The bulk of the few screening studies we identified had been conducted in the first 3 months postpartum. Our best estimates of prevalence suggest that depression may remain high for several more months.

More studies are needed to better delineate periods of peak prevalence and incidence, to include not just 3 months but also 6 weeks, 6 months, and 12 months, and subsequent screening studies need to consider testing properties of screening at these later time periods. The very small number of adequate studies currently available hampers plans for screening and intervention programs because the best time for screening, and hence the best clinic location, is not clear. If peak prevalence and incidence occur within the first 6 weeks, the obstetrics clinic is a prime place to target resources for such a program. If, however, peaks occur after this time, most postpartum women will have completed follow-up care with an obstetrician, so programs in an obstetrics clinic may be less helpful. In this case, programs targeting new mothers in family medicine, internal medicine, or pediatric clinics might be more effective.

For KQ 3, several similar or related issues emerged as well. First, studies addressing the relationship between screening and outcome need to recruit and retain sample sizes that are large enough to yield adequate power to detect relevant differences. Second, screening and outcome studies must include populations with a racial and ethnic mix that is more representative of the US populations than the work we have seen to date. Third, interventions involved should be more consistent with what we know to be evidence-based treatments for depression,12 i.e., antidepressant medications112 and/or psychotherapies such as cognitive behavioral therapy113 or interpersonal psychotherapy.114

Type of screening measures used henceforth is another major issue. Of the three KQ 3 studies rated as good, 98, 101, 108 only Dennis and colleagues used a depression screener (EPDS).101 Researchers should consider developing and using standardized screening measures, and similar cutoff points, so that some elements of separate studies could be compared more readily. Screening tools with the best supporting evidence would seem to be the best candidates. While the evidence base remains quite limited and any conclusions preliminary, at this time those instruments would appear to be either the EPDS or the PDSS. For major depression alone, an EPDS cutoff of ≥ 13 or a PDSS cutoff of ≥ 81 are reasonably supported by the evidence. For major or minor depression, we found the results too inconclusive to make even a preliminary recommendation.

Finally, studies should be designed to address whether the screening process itself leads to better access to proven treatment and improved outcome relative to usual care. We support additional research on interventions per se, but we conclude that important questions remain about the impact of the screening element. Reviewing studies that used screening as a means of identifying women potentially at high risk and enrolling them in interventional studies is not a sufficient approach to answering issues about the effectiveness of screening.

Glossary

Bipolar disorder - a type of mood disorder characterized by both (1) one or more major depressive episodes and (2) either one or more manic or mixed episodes (Bipolar 1) or hypomanic episodes (Bipolar II). The disorder may or may not be accompanied by psychotic symptoms. In community samples, the prevalence of bipolar disorder (approximately 1 percent) is lower than the prevalence of major depressive disorder (at least 6 percent). Given that management of bipolar disorder is notably different from that of major depressive disorder, making such a diagnostic distinction is critical.

External validity - the extent to which a study's conclusions can be applied to populations and settings outside those of the study itself.

Incidence - the percentage of the population with an illness episode that begins within a given period of time (e.g., during pregnancy or within the first 3 months following delivery).

Internal validity - the extent to which a study is appropriately designed and conducted to measure what it is intended to measure.

Major depressive disorder - a type of mood disorder characterized by one or more major depressive episodes. The Diagnostic and Statistical Manual, version III (DSM-III) defines a major depressive episode as a period of at least 2 weeks during which an individual experiences daily disturbance in mood (intense feelings of sadness or loss of interest in activities that are usually pleasurable) and at least four of eight symptoms: (1) too much or too little sleep, (2) appetite or weight disturbance, (3) psychomotor agitation or retardation, (4) loss of energy, (5) feelings of worthlessness or excessive guilt, (6) problems with concentration or indecisiveness, (7) loss of interest in sex, and (8) recurrent suicidal thoughts or attempts. DSM-IV changed these criteria to the following: (1) symptoms must be present most of the day and nearly every day during the episode, (2) clinically significant distress or impairment in functioning must be present, (3) the syndrome must not be the result of the direct physiologic effects of a substance or a general medical condition, (4) major depressive disorder is still diagnosed after an acute grief reaction if the syndrome lasts for more than 2 months.

Major depressive disorder is not diagnosed if the syndrome is attributable to an acute grief reaction or a nonaffective psychotic condition such as schizophrenia. In addition, major depressive disorder is not diagnosed if there is a history of a manic, hypomanic, or mixed episode.

Maternity blues - a subthreshold cluster of depressive symptoms commonly described in up to 50 percent of postpartum women. This transient condition does not require an intervention.

Meta-analysis - a quantitative approach for systematically combining evidence from multiple previous research studies on a particular parameter or association to arrive at a conclusion about the body of research on that parameter or association.

Meta-regression - a statistical analysis of the association between one or more study characteristics and the observed magnitude of effect.

Minor depressive disorder (also known as minor depression) - a subthreshold diagnosis with a variety of definitions, but in general seen as one or more episodes of depression lasting 2 weeks or more but with fewer symptoms than required for a diagnosis of major depressive disorder.

Period prevalence - the percentage of the population with depression over a period of time (e.g., during pregnancy or from delivery to the end of the first 3 months postpartum).

Perinatal depression - a condition encompassing major and minor depressive episodes that occur during pregnancy (prenatal) or within the first 12 months following delivery (postpartum).

Point prevalence - the percentage of the population with a condition at a given point in time (e.g., at 24 weeks gestation or 9 weeks postpartum).

Postpartum - for the purposes of this review, the period from parturition to 12 months after delivery.

Postpartum depression - according to DSM-IV, a specific type of major depressive disorder with onset of a major depressive episode within 4 weeks postpartum.

Postpartum psychosis - also known as puerperal psychosis, this condition is a severe and rare postpartum disorder, affecting 1 to 2 per 1,000 births. Women with postpartum psychosis present with new onset of delusions or prominent hallucinations. More than half of these episodes meet the criteria for major depressive disorder, and many women ultimately prove to have bipolar illness. Management of postpartum psychosis substantially differs from the much more common presentation of major depressive disorder with postpartum onset.

Power (statistical power) - the probability of detecting as “statistically significant” a postulated level of effect.

Precision - a measure of how close an estimator is expected to be to the true value of a parameter. Precision is related to the standard error of the estimator; less precision is reflected by a larger standard error.

Prenatal - the period of pregnancy from conception to parturition.

Puerperium - the 6-week period following delivery.

Reference standard (also known as gold standard) - the diagnostic assessment against which the screening test is compared to gauge the accuracy of the screening test. The reference standard determines the actual presence of disease. For psychiatric illness, the reference standard is often a clinical assessment by a mental health professional or a structured or semi-structured diagnostic interview.

Screen (also screening) - the use of a measure or test, often a formal instrument or tool, to classify an individual with respect to her likelihood of having a particular disorder. A screen itself does not diagnose the illness—those screening positive require subsequent diagnostic confirmation to confirm the presence of the disease.

Sensitivity - the ability of a test to identify correctly those who have a condition, computed as the percentage of true positive values correctly predicted by the test. A sensitive test identifies few false-negative cases.

Specificity - the ability of a test to identify correctly those who do not have a condition, computed as the percentage of true negative values correctly predicted by the test. A specific test identifies few false-positive cases.

Appendix A

Exact Search Strings

Database: MEDLINE <1966 to March Week 3 2004>

Search Strategy:

1 exp Puerperal Disorders/ (16527)

2 exp Depression/ (32747)

3 exp Depressive Disorder/ (42005)

4 2 or 3 (73267)

5 1 and 4 (1452)

6 exp Depression, Postpartum/ or perinatal depression.mp. (753)

7 5 or 6 (1467)

13 limit 7 to (human and english language) (1299)

CINAHL used these terms as well.

PsycINFO has “Depression, Postpartum” as a Major Descriptor that yields 379.

Sociofile indexes 105 records to “Postpartum Depression”.

For Key Question 1, the following terms were used:

20 exp Natural History/ (8432)

21 8 and 20 (0)

When “Natural History” yielded no results, the following terms were used:

22 exp Cohort Studies/ (466831)

23 8 and 22 (112)

24 exp Longitudinal Studies/ (438062)

25 8 and 24 (101)

26 23 or 25 (134)

CINAHL (using similar terms) = 35

PsycINFO (natural history, cohort, longitudinal) = 65

Sociofile (natural history, cohort, longitudinal) = 20

Total from all databases for Key Question 1 = 254

After duplicates, book chapters, foreign language articles and dissertations were removed, the total unduplicated count for KQ1 = 210.

For Key Question 2, Incidence, the following terms were used:

MEDLINE

19 exp INCIDENCE/ (76679)

20 8 and 19 (31)

CINAHL (Incidence) = 7

PsycINFO (Incidence) = 23

Sociofile (Incidence) = 1

Total file = 62, minus duplications, dissertations, etc = 46

For Key Question 3, Risk, the following terms were used:

16 exp Risk Factors/ (221767)

17 8 and 16 (153)

CINAHL (Risk Factors) = 32

PsycINFO (risk) = 59

Sociofile (risk) = 11

Total from all databases for Key Question 3 = 255

After duplicates, book chapters, foreign language articles and dissertations were removed, the total unduplicated count for KQ3 = 204.

For Key Question 4, Therapies, the following terms were used:

MEDLINE

12 treatment.mp. or exp Therapeutics/ (2537613)

14 8 and 12 (513)

CINAHL (Treatment) = 90

PsycINFO (Treatment) = 91

Sociofile (Treatment) = 5

Total file = 699, minus duplications, dissertations, etc = 485

For Key Questions 5 and 6, Screening Accuracy and Screening Barriers, searches focused on “screening” and will give the total pool to investigators for finer sorting between questions.

MEDLINE

9 exp mass screening/ (62902)

10 8 and 9 (67)

CINAHL (screening) = 25

PsycINFO (screening) = 28

Sociofile (screening) = 1

Total from all databases for Key Questions 5 & 6 = 121

After duplicates, book chapters, foreign language articles and dissertations were removed, the total unduplicated count for KQ 5&6 = 96.

Appendix B. Quality Rating Forms

Quality Checklist for RCTs and Observational Studies of Prevalence and Incidence Studies

graphic element
graphic element
graphic element
graphic element
graphic element

Quality Checklist for Studies of Screening Instruments/Procedures

graphic element
graphic element
graphic element

Quality Checklist for RCTs and Observational Studies of Treatment Studies

graphic element
graphic element
graphic element
graphic element
graphic element
graphic element

Appendix C. Evidence Tables

Perinatal Depression List of Acronyms

Adjadjusted
BBedford
BDIBeck Depression Inventory
BDI-IIBeck Depression Inventory - II
CCatego
CCEICrown-Crisp Experiental Index
CESCurrent Experience Scale
CES-DCenter for Epidemiological Studies - Depression Scale
CIconfidence interval
CIDI-AComposite International Diagnostic Interview - Auto
Deptdepartment
DISdiagnostic interview schedule
DMCDyadic Mutality Code
DSM-IIIDiagnostic and Statistical Manual for Mental Disorders, Third Edition
DSM-III-RDiagnostic and Statistical Manual for Mental Disorders, Third Edition - Revised
DSM-IVDiagnostic and Statistical Manual for Mental Disorders, Fourth Edition
dxdiagnosis
EPDSEdinburgh Postnatal Depression Scale
GAgestational age
GHQGeneral Health Questionnaire
GHQ-DGeneral Health Questionnaire - Depression
GPgeneral practitioner
GP/psychgeneral practitioner/psychiatrist
HDRSHamilton Depression Rating Scale
HMOhealth maintenance organization
HOMEHome Observation for Measurement of Environment
hr(s)hour(s)
HShigh school
ICD-9International Classification of Diseases, Ninth Edition
IDD-10International Classification of Disease, Tenth Edition
IDSInventory of Depressive Symptomology
LQLeverton Questionnaire
MAACLMultiple Affect Adjective Check List
MADRSMontgomery and Asberg Depression Rating Scale
MDEmajor depressive episode
MINIMini International Neuropsychiatric Interview
MINI-V4.4Mini International Neuropsychiatric Interview, Version 4.4
mo(s)month(s)
NAnot applicable
NICUNeonatal Intensive Care Unit
No.number
NPVnegative predictive value
NRnot reported
NSnot significant
Ob-Gynobstretrics and gynecology
ORodds ratio
PASPsychiatric Assessment Schedule
PDSSPostpartum Depression Screening Scale
PEGPsycho Educational Group
PPPostpartum
PPGPostpartum Guidelines
PSEPresent State Examination
PSE-IDPresent State Examination - Index of Definition
RCTrandomized controlled trials
RDCresearch diagnostic criteria
SADSSchedule for Affective Disorders and Schizophrenia
SADS-CSchedule for Affective Disorders and Schizophrenia - Change version
SADS-LSchedule for Affective Disorders and Schizophrenia - Long
SCANSchedules for Clinical Assessment in Neuropsychiatry
SCIDStructured Clinical Interview for DSM-IV
SCID-GermanStructured Clinical Interview for DSM-IV - German
SCIP-NPStructured Clinical Interview for DSM-III-R - non-patient
SCLR-90Symptom checklist Revised - 1990
SDstandard deviation
Sensisensitivity
Specispecificity
SIDSSudden Infant Death Syndrome
SPIStandardized Psychiatric Interview
SRQself-reported questionnaire
TSHthyroid stimulating hormone
UKUnited Kingdom
Univ.University
USAUnited States of America
vs.versus
wk(s)week(s)
yr(s)year(s)

References

Affonso D D, Lovett S, Paul S M. et al. A standardized interview that differentiates pregnancy and postpartum symptoms from perinatal clinical depression. Birth. 1990; 17(3): 12130. [PubMed]
Areias M E, Kumar R, Barros H. et al. Comparative incidence of depression in women and men, during pregnancy and after childbirth. Validation of the Edinburgh Postnatal Depression Scale in Portuguese mothers. Br J Psychiatry. 1996; 169(1): 305. [PubMed]
Armstrong K L, Fraser J A, Dadds M R. et al. A randomized, controlled trial of nurse home visiting to vulnerable families with newborns. J Paediatr Child Health. 1999; 35(3): 23744. [PubMed]
Ballard C G, Davis R, Cullen P C. et al. Prevalence of postnatal psychiatric morbidity in mothers and fathers. Br J Psychiatry. 1994; 164(6): 7828. [PubMed]
Beck C T, Gable R K. Comparative analysis of the performance of the Postpartum Depression Screening Scale with two other depression instruments. Nurs Res. 2001; 50(4): 24250. [PubMed]
Berle J, Aarre T, Mykletun A. et al. Screening for postnatal depression. Validation of the Norwegian version of the Edinburgh Postnatal Depression Scale, and assessment of risk factors for postnatal depression. J Affect Disord. 2003; 76(13): 1516. [PubMed]
Boyce P, Stubbs J, Todd A. The Edinburgh Postnatal Depression Scale: Validation for an Australian sample. Aust N Z J Psychiatry. 1993; 27(3): 4726. [PubMed]
Brugha T S, Wheatley S, Taub N A. et al. Pragmatic randomized trial of antenatal intervention to prevent post-natal depression by reducing psychosocial risk factors. Psychol Med. 2000; 30(6): 127381. [PubMed]
Bryan T L, Georgiopoulos A M, Harms R W. et al. Incidence of postpartum depression in Olmsted County, Minnesota. A population-based, retrospective study. J Reprod Med. 1999; 44(4): 3518. [PubMed]
Campbell S B, Cohn J F. Prevalence and correlates of postpartum depression in first-time mothers. J Abnorm Psychol. 1991; 100(4): 5949. [PubMed]
Chabrol H, Teissedre F, Saint-Jean M. et al. Prevention and treatment of post-partum depression: A controlled randomized study on women at risk. Psychol Med. 2002; 32(6): 103947. [PubMed]
Chen C H, Tseng Y F, Chou F H. et al. Effects of support group intervention in postnatally distressed women. A controlled study in Taiwan. J Psychosom Res. 2000; 49(6): 3959. [PubMed]
Cooper P J, Campbell E A, Day A. et al. Non-psychotic psychiatric disorder after childbirth. A prospective study of prevalence, incidence, course and nature. Br J Psychiatry. 1988; 152: 799806. [PubMed]
Cooper P J, Murray L, Hooper R. et al. The development and validation of a predictive index for postpartum depression. Psychol Med. 1996; 26(3): 62734. [PubMed]
Cox J L, Connor Y, Kendell R E. Prospective study of the psychiatric disorders of childbirth. Br J Psychiatry. 1982; 140: 1117. [PubMed]
Cox J L, Murray D, Chapman G. A controlled study of the onset, duration and prevalence of postnatal depression. Br J Psychiatry. 1993; 163: 2731. [PubMed]
Cox J, Chapman G, Murray D. et al. Validation of the Edinburgh Postnatal Depression Scale (EPDS) in non-postnatal women. J Affect Disord. 1996; 39(3): 1859. [PubMed]
Dennis C L. The effect of peer support on postpartum depression: A pilot randomized controlled trial. Can J Psychiatry. 2003; 48(2): 11524. [PubMed]
Elliott S A, Leverton T J, Sanjack M. et al. Promoting mental health after childbirth: A controlled trial of primary prevention of postnatal depression. Br J Clin Psychol. 2000; 39(Pt 3): 22341. [PubMed]
Fleming A S, Klein E, Corter C. The effects of a social support group on depression, maternal attitudes and behavior in new mothers. J Child Psychol Psychiatry. 1992; 33(4): 68598. [PubMed]
Garcia-Esteve L, Ascaso C, Ojuel J. et al. Validation of the Edinburgh Postnatal Depression Scale (EPDS) in Spanish mothers. J Affect Disord. 2003; 75(1): 716. [PubMed]
Georgiopoulos A M, Bryan T L, Wollan P. et al. Routine screening for postpartum depression. J Fam Pract. 2001; 50(2): 11722. [PubMed]
Gotlib I H, Whiffen V E, Mount J H. et al. Prevalence rates and demographic characteristics associated with depression in pregnancy and the postpartum. J Consult Clin Psychol. 1989; 57(2): 26974. [PubMed]
Guedeney N, Fermanian J. Validation study of the French version of the Edinburgh Postnatal Depression Scale (EPDS): New results about use and psychometric properties. Eur Psychiatry. 1998; 13: 839. [PubMed]
Harris B, Huckle P, Thomas R. et al. The use of rating scales to identify post-natal depression. Br J Psychiatry. 1989; 154: 8137. [PubMed]
Hiscock H, Wake M. Randomised controlled trial of behavioural infant sleep intervention to improve infant sleep and maternal mood. Br Med J. 2002; 324(7345): 10625. [PubMed] [Free Full Text in PMC icon.Free Full text in PMC]
Hobfoll S E, Ritter C, Lavin J. et al. Depression prevalence and incidence among inner-city pregnant and postpartum women. J Consult Clin Psychol. 1995; 63(3): 44553. [PubMed]
Honey K L, Bennett P, Morgan M. A brief psycho-educational group intervention for postnatal depression. Br J Clin Psychol. 2002; 41(Pt 4): 4059. [PubMed]
Horowitz J A, Bell M, Trybulski J. et al. Promoting responsiveness between mothers with depressive symptoms and their infants. J Nurs Scholarsh. 2001; 33(4): 3239. [PubMed]
Kent G N, Stuckey B G, Allen J R. et al. Postpartum thyroid dysfunction: Clinical assessment and relationship to psychiatric affective morbidity. Clin Endocrinol. 1999; 51(4): 42938.
Kitamura T, Shima S, Sugawara M. et al. Psychological and social correlates of the onset of affective disorders among pregnant women. Psychol Med. 1993; 23: 96775. [PubMed]
Kitamura T, Shima S, Sugawara M. et al. Temporal variation of validity of self-rating questionnaires: Repeated use of the General Health Questionnaire and Zung's Self-rating Depression Scale among women during antenatal and postnatal periods. Acta Psychiatr Scand. 1994; 90(6): 44650. [PubMed]
Kitamura T, Sugawara M, Shima S. et al. Temporal variation of validity of self-rating questionnaires: Improved validity of repeated use of Zung's Self-Rating Depression Scale among women during the perinatal period. J Psychosom Obstet Gynecol. 1999; 20(2): 1127.
Kumar R, Robson K M. A prospective study of emotional disorders in childbearing women. Br J Psychiatry. 1984; 144: 3547. [PubMed]
Lawrie T, Hofmeyr G, de Jager M. et al. Validation of the Edinburgh Postnatal Depression Scale on a cohort of South African women. S Afr Med J. 1998; 88(10): 13404. [PubMed]
Lee D, Yip A, Chiu H. et al. A psychiatric epidemiological study of postpartum Chinese women. Am J Psychiatry. 2001; 158(2): 2206. [PubMed]
Lee D, Yip A, Chiu H. et al. Screening for postnatal depression using the double-test strategy. Psychosom Med. 2000; 62(2): 25863. [PubMed]
Lee D, Yip A, Chiu H. et al. Screening for postnatal depression: Are specific instruments mandatory? J Affect Disord. 2001; 63(13): 2338. [PubMed]
Lee D, Yip S, Chiu H. et al. Detecting postnatal depression in Chinese women. Validation of the Chinese version of the Edinburgh Postnatal Depression Scale. Br J Psychiatry. 1998; 172: 4337. [PubMed]
Leverton T J, Elliott S A. Is the EPDS a magic wand? 1. A comparison of the Edinburgh Postnatal Depression Scale and health visitor report as predictors of diagnosis on the Present State Examination. J Reprod Infant Psychol. 2000; 18(4): 27996.
Lucas A, Pizarro E, Granada M L. et al. Postpartum thyroid dysfunction and postpartum depression: Are they two linked disorders? Clin Endocrinol. 2001; 55(6): 80914.
Matthey S, Barnett B, Howie P. et al. Diagnosing postpartum depression in mothers and fathers: Whatever happened to anxiety? J Affect Disord. 2003; 74(2): 13947. [PubMed]
Murray D, Cox J L. Screening for depression during pregnancy with the Edinburgh Depression Scale (EPDS). J Reprod Infant Psychol. 1990; 8(2): 99107.
Murray L, Carothers A. The validation of the Edinburgh Post-natal Depression Scale on a community sample. Br J Psychiatry. 1990; 157: 28890. [PubMed]
Muzik M, Klier C, Rosenblum K. et al. Are commonly used self-report inventories suitable for screening postpartum depression and anxiety disorders? Acta Psychiatr Scand. 2000; 102(1): 713. [PubMed]
O'Hara M W, Neunaber D J, Zekoski E M. Prospective study of postpartum depression: Prevalence, course, and predictive factors. J Abnorm Psychol. 1984; 93(2): 15871. [PubMed]
O'Hara M W, Zekoski E M, Philipps L H. et al. Controlled prospective study of postpartum mood disorders: Comparison of childbearing and nonchildbearing women. J Abnorm Psychol. 1990; 99(1): 315. [PubMed]
Onozawa K, Glover V, Adams D. et al. Infant massage improves mother-infant interaction for mothers with postnatal depression. J Affect Disord. 2001; 63(13): 2017. [PubMed]
Pop V J, Essed G G, de Geus C A. et al. Prevalence of post partum depression--or is it post-puerperium depression? Acta Obstet Gynecol Scand. 1993; 72(5): 3548. [PubMed]
Stamp G E, Williams A S, Crowther C A. Evaluation of antenatal and postnatal support to overcome postnatal depression: A randomized, controlled trial. Birth. 1995; 22(3): 13843. [PubMed]
Watson J P, Elliott S A, Rugg A J. et al. Psychiatric disorder in pregnancy and the first postnatal year. Br J Psychiatry. 1984; 144: 45362. [PubMed]
Whiffen V E. Screening for postpartum depression: A methodological note. J Clin Psychol. 1988; 44(3): 36771. [PubMed]
Whiffen V. Vulnerability of postpartum depression: A prospective multivariate study. J Abnorm Psychol. 1988; 97(4): 46774. [PubMed]
Wickberg B, Hwang C. The Edinburgh Postnatal Depression Scale: Validation on a Swedish community sample. Acta Psychiatr Scand. 1996; 94(3): 1814. [PubMed]
Wisner K L, Perel J M, Peindl K S. et al. Prevention of recurrent postpartum depression: A randomized clinical trial. J Clin Psychiatry. 2001; 62(2): 826. [PubMed]
Wisner K L, Wheeler S B. Prevention of recurrent postpartum major depression. Hosp Community Psychiatry. 1994; 45(12): 11916. [PubMed]
Yamashita H, Yoshida K, Nakano H. et al. Postnatal depression in Japanese women. Detecting the early onset of postnatal depression by closely monitoring the postpartum mood. J Affect Disord. 2000; 58(2): 14554. [PubMed]
Yonkers K A, Ramin S M, Rush A J. et al. Onset and persistence of postpartum depression in an inner-city maternal health clinic system. Am J Psychiatry. 2001; 158(11): 185663. [PubMed]
Yoshida K, Marks M, Kibe N. et al. Postnatal depression in Japanese women who have given birth in England. J Affect Disord. 1997; 43(1): 6977. [PubMed]
Zlotnick C, Johnson S L, Miller I W. et al. Postpartum depression in women receiving public assistance: Pilot study of an interpersonal-therapy-oriented group intervention. Am J Psychiatry. 2001; 158(4): 63840. [PubMed]

Appendix E. Acknowledgments

Acknowledgments

This study was supported by Contract 290-02-0016 from the Agency for Healthcare Research and Quality (AHRQ), Task Order No. 4. We acknowledge the continuing support of Kenneth Fink, MD, MGA, MPH, Director of the AHRQ Evidence-Based Practice Center (EPC) Program, and Marian James, PhD, the AHRQ Task Order Officer for this project.

The investigators deeply appreciate the considerable support, commitment, and contributions of the EPC team staff at RTI International and the University of North Carolina (UNC). From UNC, we thank EPC Co-Director, Timothy S. Carey, MD, MPH; EPC Literature Search Specialist, B. Lynn Whitener, PhD; and Research Assistant Leah Randolph, MA. We also express our gratitude to Loraine Monroe, EPC word processing specialist, Debra Bost, EPC editor, and Kathleen Mohar, Manager, Publications Specialist Group at RTI International.

Technical Expert Advisory Group

We extend our appreciation to the members of our Technical Expert Advisory Group (TEAG), who provided advice and input during our research process. The RTI-UNC EPC team solicited the views of TEAG members from the beginning of the project. TEAG members also provided insights into and reactions to work in progress and advice on substantive issues and overlooked areas of research. TEAG members participated in refining the analytic framework and key questions and discussing the preliminary assessment of the literature, including inclusion/exclusion criteria and methods for data synthesis. The TEAG was both a substantive resource and a “sounding board” throughout the study. It was also the body from which expertise was formally sought at several junctions. TEAG members are listed below:

  • Jeffrey Kuller, MD

  • Associate Professor

  • Division of Maternal-Fetal Medicine

  • Duke University Medical Center

  • Michael W. O'Hara, PhD

  • Professor

  • University of Iowa

  • Susan F. Meikle, MD, MSPH

  • Center for Outcomes and Evidence

  • Agency for Healthcare Research and Quality

  • Katherine L. Wisner, MD, MS

  • Director

  • Women's Behavioral HealthCARE

  • Professor of Psychiatry, Obstetrics and Gynecology and Reproductive Sciences and Epidemiology

  • University of Pittsburgh Medical Center

Peer Reviewers

We gratefully acknowledge the following individuals who reviewed the initial draft of this report and provided us with constructive feedback. External reviewers comprised clinicians, researchers, representatives of professional societies, and potential users of the report. We would also like to extend our appreciation to David Atkins, MD, MPH from AHRQ for contributing peer review comments. Our peer review panel also includes all members of the TEAG. Peer review was a separate duty for these individuals and not part of their commitment as TEAG members. All are active professionals in the field. The peer reviewers were asked to provide comments on the content, structure, and format of the evidence report and to complete a checklist. The peer reviewers' comments and suggestions formed the basis of our revisions to the evidence report. Acknowledgments are made with the explicit statement that this does not constitute endorsement of the report.

Individuals

  • Cheryl Beck, DNSc, CNM, FAAN

  • Professor

  • School of Nursing

  • University of Connecticut

  • Diana Dell, MD, FACOG

  • Women's Behavioral Health Program

  • Department of Obstetrics and Gynecology

  • Duke University Medical Center

  • Judith Lumley, MD

  • Centre for Mother's and Children's Health

  • La Trobe University

  • Carlton, Victoria, Australia

Organizations

  • Shoshana Bennett, PhD

  • Postpartum Support International

  • Janet Chapin, RN, MPH

  • American College of Obstetricians and Gynecologists

  • Ruth Johnson, CNM

  • American College of Nurse-Midwives

  • Marlene Freeman, MD

  • American Psychiatric Association

  • Gwen Gjerdingen, MD, MS

  • American Association of Family Physicians

  • Sheila M. Marcus, MD

  • American Psychiatric Association

  • Laura J. Miller, MD

  • American Psychiatric Association

  • Darrel A. Regier, MD, MPH

  • American Psychiatric Association

  • Kimberly A. Yonkers, MD

  • American Psychiatric Association

References and Included Studies
1.
Kessler R C, McGonagle K A, Zhao S. et al. Lifetime and 12-month prevalence of DSM-III-R psychiatric disorders in the United States. Results from the National Comorbidity Survey. Arch Gen Psychiatry. 1994; 51(1): 819. [PubMed]
2.
Robins L, Regier D. Psychiatric Disorders in America. New York: Free Press, 1991.
3.
Depression Guideline Panel. Depression in Primary Care: Volume 1. Detection and Diagnosis. Clinical Practice Guideline, Number 5. Rockville, Md: Agency for Health Care Policy and Research, 1993; AHCPR No. 93-0550.
4.
Shaw J, Kennedy SH, Joffe RT. Gender differences in mood disorders: A clinical focus. In. Gender Psychopathol. Washington, DC: American Psychiatric Press, 1996: 89–111.
5.
Burke K C, Burke J D Jr, Rae D S. et al. Comparing age at onset of major depression and other psychiatric disorders by birth cohorts in five US community populations. Arch Gen Psychiatry. 1991; 48(9): 78995. [PubMed]
6.
Kessler R C. Epidemiology of women and depression. J Affect Disord. 2003; 74(1): 513. [PubMed]
7.
Murray L, Stein A. The effects of postnatal depression on the infant. Baillieres Clin Obstet Gynaecol. 1989; 3(4): 92133. [PubMed]
8.
Marmorstein N R, Malone S M, Iacono W G. Psychiatric disorders among offspring of depressed mothers: associations with paternal psychopathology. Am J Psychiatry. 2004; 161(9): 158894. [PubMed]
9.
Burke L. The impact of maternal depression on familial relationships. Int Rev Psychiatry. 2003; 15(3): 24355. [PubMed]
10.
Stein A, Gath D H, Bucher J. et al. The relationship between post-natal depression and mother-child interaction. Br J Psychiatry. 1991; 158: 4652. [PubMed]
11.
Flynn H A, Davis M, Marcus S M. et al. Rates of maternal depression in pediatric emergency department and relationship to child service utilization. Gen Hosp Psychiatry. 2004; 26(4): 31622. [PubMed]
12.
American Psychiatric Association (APA). Practice guideline for the treatment of patients with major depression (revision). Am J Psychiatry 2000; 157(4).
13.
Wagner H R, Burns B J, Broadhead W E. et al. Minor depression in family practice: Functional morbidity, comorbidity, service utilization, and outcomes. Psychol Med. 2000; 30(2): 137790. [PubMed]
14.
Oxman T E, Sengupta A. Treatment of minor depression. Am J Geriatr Psychiatry. 2002; 10(3): 25664. [PubMed]
15.
Judd L L, Rapaport M H, Yonkers K A. et al. Randomized, placebo-controlled trial of fluoxetine for acute treatment of minor depressive disorder. Am J Psychiatry. 2004; 161(10): 186471. [PubMed]
16.
Bloch M, Daly R C, Rubinow D R. Endocrine factors in the etiology of postpartum depression. Compr Psychiatry. 2003; 44(3): 23446. [PubMed]
17.
Jones I, Craddock N. Familiality of the puerperal trigger in bipolar disorder: results of a family study. Am J Psychiatry. 2001; 158(6): 9137. [PubMed]
18.
Klein M, Essex M J. Pregnant or depressed? The effect of overlap between symptoms of depression and somatic complaints of pregnancy on rates of major depression during the second trimester. Depression. 1994; 2: 19945.
19.
O'Hara M W, Neunaber D J, Zekoski E M. Prospective study of postpartum depression: Prevalence, course, and predictive factors. J Abnorm Psychol. 1984; 93(2): 15871. [PubMed]
20.
Cox J L, Murray D, Chapman G. A controlled study of the onset, duration and prevalence of postnatal depression. Br J Psychiatry. 1993; 163: 2731. [PubMed]
21.
O'Hara M W, Swain A M. Rates and risk of postpartum depression -- A meta-analysis. Int Rev Psychiatry. 1996; 8: 3754.
22.
Llewellyn A M, Stowe Z N, Nemeroff C B. Depression during pregnancy and the puerperium. J Clin Psychiatry. 1997; 58(Suppl 15): 2632. [PubMed]
23.
Yonkers K A, Ramin S M, Rush A J. et al. Onset and persistence of postpartum depression in an inner-city maternal health clinic system. Am J Psychiatry. 2001; 158(11): 185663. [PubMed]
24.
Gaynes B, Gavin N, Meltzer-Brody S, Sleath B, Sutton S. Perinatal Depression: Feasibility Study. Rockville, Md.: Agency for Healthcare Quality and Research (AHRQ), 2003.
25.
Cochrane Methods Working Group. Based on the Cochrane Methods Working Group on Systematic Review of Screening and Diagnostic Tests: Recommended Methods, updated June 6, 1996. 1996.
26.
Downs S H, Black N. The feasibility of creating a checklist for the assessment of the methodological quality both of randomised and non-randomised studies of health care interventions. J Epidemiol Community Health. 1998; 52(6): 37784. [PubMed]
27.
Bennett H A, Einarson A, Taddio A. et al. Prevalence of depression during pregnancy: Systematic review. Obstet Gynecol. 2004; 103(4): 698709. [PubMed]
28.
American Psychiatric Association (APA). Diagnostic and statistical manual of mental disorders. Fourth edition. Washington, DC: American Psychiatric Association, 1994.
29.
Affonso D D, Lovett S, Paul S M. et al. A standardized interview that differentiates pregnancy and postpartum symptoms from perinatal clinical depression. Birth. 1990; 17(3): 12130. [PubMed]
30.
Areias M E, Kumar R, Barros H. et al. Comparative incidence of depression in women and men, during pregnancy and after childbirth. Validation of the Edinburgh Postnatal Depression Scale in Portuguese mothers. Br J Psychiatry. 1996; 169(1): 305. [PubMed]
31.
Berle J, Aarre T, Mykletun A. et al. Screening for postnatal depression. Validation of the Norwegian version of the Edinburgh Postnatal Depression Scale, and assessment of risk factors for postnatal depression. J Affect Disord. 2003; 76(13): 1516. [PubMed]
32.
Campbell S B, Cohn J F. Prevalence and correlates of postpartum depression in first-time mothers. J Abnorm Psychol. 1991; 100(4): 5949. [PubMed]
33.
Cooper P J, Murray L, Hooper R. et al. The development and validation of a predictive index for postpartum depression. Psychol Med. 1996; 26(3): 62734. [PubMed]
34.
Cox J L, Connor Y, Kendell R E. Prospective study of the psychiatric disorders of childbirth. Br J Psychiatry. 1982; 140: 1117. [PubMed]
35.
Garcia-Esteve L, Ascaso C, Ojuel J. et al. Validation of the Edinburgh Postnatal Depression Scale (EPDS) in Spanish mothers. J Affect Disord. 2003; 75(1): 716. [PubMed]
36.
Gotlib I H, Whiffen V E, Mount J H. et al. Prevalence rates and demographic characteristics associated with depression in pregnancy and the postpartum. J Consult Clin Psychol. 1989; 57(2): 26974. [PubMed]
37.
Hobfoll S E, Ritter C, Lavin J. et al. Depression prevalence and incidence among inner-city pregnant and postpartum women. J Consult Clin Psychol. 1995; 63(3): 44553. [PubMed]
38.
Kent G N, Stuckey B G, Allen J R. et al. Postpartum thyroid dysfunction: Clinical assessment and relationship to psychiatric affective morbidity. Clin Endocrinol. 1999; 51(4): 42938.
39.
Kitamura T, Shima S, Sugawara M. et al. Psychological and social correlates of the onset of affective disorders among pregnant women. Psychol Med. 1993; 23: 96775. [PubMed]
40.
Kitamura T, Sugawara M, Shima S. et al. Temporal variation of validity of self-rating questionnaires: Improved validity of repeated use of Zung's Self-Rating Depression Scale among women during the perinatal period. J Psychosom Obstet Gynecol. 1999; 20(2): 1127.
41.
Kumar R, Robson K M. A prospective study of emotional disorders in childbearing women. Br J Psychiatry. 1984; 144: 3547. [PubMed]
42.
Lee D, Yip A, Chiu H. et al. A psychiatric epidemiological study of postpartum Chinese women. Am J Psychiatry. 2001; 158(2): 2206. [PubMed]
43.
Lee D, Yip A, Chiu H. et al. Screening for postnatal depression: Are specific instruments mandatory? J Affect Disord. 2001; 63(13): 2338. [PubMed]
44.
Lucas A, Pizarro E, Granada M L. et al. Postpartum thyroid dysfunction and postpartum depression: Are they two linked disorders? Clin Endocrinol. 2001; 55(6): 80914.
45.
Matthey S, Barnett B, Howie P. et al. Diagnosing postpartum depression in mothers and fathers: Whatever happened to anxiety? J Affect Disord. 2003; 74(2): 13947. [PubMed]
46.
Murray D, Cox J L. Screening for depression during pregnancy with the Edinburgh Depression Scale (EPDS). J Reprod Infant Psychol. 1990; 8(2): 99107.
47.
Pop V J, Essed G G, de Geus C A. et al. Prevalence of post partum depression--or is it post-puerperium depression? Acta Obstet Gynecol Scand. 1993; 72(5): 3548. [PubMed]
48.
Watson J P, Elliott S A, Rugg A J. et al. Psychiatric disorder in pregnancy and the first postnatal year. Br J Psychiatry. 1984; 144: 45362. [PubMed]
49.
Whiffen V. Vulnerability of postpartum depression: A prospective multivariate study. J Abnorm Psychol. 1988; 97(4): 46774. [PubMed]
50.
Yamashita H, Yoshida K, Nakano H. et al. Postnatal depression in Japanese women.Detecting the early onset of postnatal depression by closely monitoring the postpartum mood. J Affect Disord. 2000; 58(2): 14554. [PubMed]
51.
Yoshida K, Marks M, Kibe N. et al. Postnatal depression in Japanese women who have given birth in England. J Affect Disord. 1997; 43(1): 6977. [PubMed]
52.
Cooper P J, Campbell E A, Day A. et al. Non-psychotic psychiatric disorder after childbirth. A prospective study of prevalence, incidence, course and nature. Br J Psychiatry. 1988; 152: 799806. [PubMed]
53.
O'Hara M W, Zekoski E M, Philipps L H. et al. Controlled prospective study of postpartum mood disorders: Comparison of childbearing and nonchildbearing women. J Abnorm Psychol. 1990; 99(1): 315. [PubMed]
54.
Bryan T L, Georgiopoulos A M, Harms R W. et al. Incidence of postpartum depression in Olmsted County, Minnesota. A population-based, retrospective study. J Reprod Med. 1999; 44(4): 3518. [PubMed]
55.
Georgiopoulos A M, Bryan T L, Wollan P. et al. Routine screening for postpartum depression. J Fam Pract. 2001; 50(2): 11722. [PubMed]
56.
Endicott J, Spitzer R L. A diagnostic interview: The schedule for affective disorders and schizophrenia. Arch Gen Psychiatry. 1978; 35(7): 83744. [PubMed]
57.
Spitzer RL, Williams JBW, Gibbon M, First MB. Structured Clinical Interview for SSM-III-R. Washington, DC: American Psychiatric Press, 1990.
58.
Frist MB, Spitzer RL, Gibbon M, Williams JBW. Structured Clinical Interview for DSM-IV Axis I Disorders (SCID), Clinical Version. Washington, DC: American Psychiatric Press, 1996.
59.
Goldberg D P, Cooper B, Eastwood M R. et al. A standardized psychiatric interview for use in community surveys. Br J Prev Soc Med. 1970; 24(1): 1823. [PubMed]
60.
Janca A, Ustun T B, Sartorius N. New versions of World Health Organization instruments for the assessment of mental disorders. Acta Psychiatr Scand. 1994; 90(2): 7383. [PubMed]
61.
Robins L N, Helzer J E, Croughan J. et al. National Institute of Mental Health Diagnostic Interview Schedule. Its history, characteristics, and validity. Arch Gen Psychiatry. 1981; 38(4): 3819. [PubMed]
62.
Lecrubier Y, Sheehan D, Weiller E. et al. The Mini International Neuropsychiatric Interview (M.I.N.I) a short diagnostic structured interview: Reliability and validity according to the CIDI. Euro Psychiatry. 1997; 12: 22431.
63.
Wing JK, Cooper JE, Sartorius N. The Measurement and Classification of Psychiatric Symptoms. Cambridge: Cambridge University Press, 1974.
64.
Montgomery S A, Asberg M. A new depression scale designed to be sensitive to change. Br J Psychiatry. 1979; 134: 3829. [PubMed]
65.
Spitzer R L, Endicott J, Robins E. Research diagnostic criteria: Rationale and reliability. Arch Gen Psychiatry. 1978; 35(6): 77382. [PubMed]
66.
American Psychiatric Association (APA). Diagnostic and statistical manual of mental disorders. Third edition. Washington, DC: American Psychiatric Association, 1987.
67.
Pitt B. “Atypical” depression following childbirth. Br J Psychiatry. 1968; 114(516): 132535. [PubMed]
68.
Blazer D G, Kessler R C, McGonagle K A. et al. The prevalence and distribution of major depression in a national community sample: The National Comorbidity Survey. Am J Psychiatry. 1994; 151(7): 97986. [PubMed]
69.
Finlay-Jones R, Brown G W, Duncan-Jones P. et al. Depression and anxiety in the community: Replicating the diagnosis of a case. Psychol Med. 1980; 10(3): 44554. [PubMed]
70.
Ballard C G, Davis R, Cullen P C. et al. Prevalence of postnatal psychiatric morbidity in mothers and fathers. Br J Psychiatry. 1994; 164(6): 7828. [PubMed]
71.
Beck C T, Gable R K. Comparative analysis of the performance of the Postpartum Depression Screening Scale with two other depression instruments. Nurs Res. 2001; 50(4): 24250. [PubMed]
72.
Boyce P, Stubbs J, Todd A. The Edinburgh Postnatal Depression Scale: Validation for an Australian sample. Aust N Z J Psychiatry. 1993; 27(3): 4726. [PubMed]
73.
Cox J, Chapman G, Murray D. et al. Validation of the Edinburgh Postnatal Depression Scale (EPDS) in non-postnatal women. J Affect Disord. 1996; 39(3): 1859. [PubMed]
74.
Harris B, Huckle P, Thomas R. et al. The use of rating scales to identify post-natal depression. Br J Psychiatry. 1989; 154: 8137. [PubMed]
75.
Leverton T J, Elliott S A. Is the EPDS a magic wand? 1. A comparison of the Edinburgh Postnatal Depression Scale and health visitor report as predictors of diagnosis on the Present State Examination. J Reprod Infant Psychol. 2000; 18(4): 27996.
76.
Murray L, Carothers A. The validation of the Edinburgh Post-natal Depression Scale on a community sample. Br J Psychiatry. 1990; 157: 28890. [PubMed]
77.
Whiffen V E. Screening for postpartum depression: A methodological note. J Clin Psychol. 1988; 44(3): 36771. [PubMed]
78.
Guedeney N, Fermanian J. Validation study of the French version of the Edinburgh Postnatal Depression Scale (EPDS): New results about use and psychometric properties. Eur Psychiatry. 1998; 13: 839. [PubMed]
79.
Lawrie T, Hofmeyr G, de Jager M. et al. Validation of the Edinburgh Postnatal Depression Scale on a cohort of South African women. S Afr Med J. 1998; 88(10): 13404. [PubMed]
80.
Kitamura T, Shima S, Sugawara M. et al. Temporal variation of validity of self-rating questionnaires: Repeated use of the General Health Questionnaire and Zung's Self-rating Depression Scale among women during antenatal and postnatal periods. Acta Psychiatr Scand. 1994; 90(6): 44650. [PubMed]
81.
Lee D, Yip A, Chiu H. et al. Screening for postnatal depression using the double-test strategy. Psychosom Med. 2000; 62(2): 25863. [PubMed]
82.
Lee D, Yip S, Chiu H. et al. Detecting postnatal depression in Chinese women. Validation of the Chinese version of the Edinburgh Postnatal Depression Scale. Br J Psychiatry. 1998; 172: 4337. [PubMed]
83.
Muzik M, Klier C, Rosenblum K. et al. Are commonly used self-report inventories suitable for screening postpartum depression and anxiety disorders? Acta Psychiatr Scand. 2000; 102(1): 713. [PubMed]
84.
Wickberg B, Hwang C. Counselling of postnatal depression: A controlled study on a population based Swedish sample. J Affect Disord. 1996; 39(3): 20916. [PubMed]
85.
Cox J L, Holden J M, Sagovsky R. Detection of postnatal depression. Development of the 10-item Edinburgh Postnatal Depression Scale. Br J Psychiatry. 1987; 150: 7826. [PubMed]
86.
Beck A T, Ward C H, Mendelson M. et al. An inventory for measuring depression. Arch Gen Psych. 1961; 4: 56171.
87.
Beck AT, Steer RA, Brown GK. Manual for Beck Depression Inventory II (BDI-II). San Antonio, Tx: Psychology Corporation, 1996.
88.
Beck A T, Guth D, Steer R A. et al. Screening for major depression disorders in medical inpatients with the Beck depression inventory for primary care. Behav Res Ther. 1997; 35: 78591. [PubMed]
89.
Radloff L S. The CES-D Scale: A self-report depression scale for research in the general population. Appl Psychol Measure. 1977; 1: 385401.
90.
Williams J Jr, Pignone M, Ramirez G. et al. Identifying depression in primary care: A literature synthesis of case-finding instruments. Gen Hosp Psych. 2002; 24: 22537.
91.
Henkel V, Mergl R, Kohnen R. et al. Use of brief depression screening tools in primary care: Consideration of heterogeneity in performance in different patient groups. Gen Hosp Psychiatry. 2004; 26(3): 1908. [PubMed]
92.
Meager I, Milgrom J. Group treatment for postpartum depression: A pilot study. Aust N Z J Psychiatry. 1996; 30(6): 85260. [PubMed]
93.
Brisco M. The detection of emotional disorders in the post natal period by health visitors. Health Visit. 1989; 62(11): 3368. [PubMed]
94.
Stamp G E, Williams A S, Crowther C A. Evaluation of antenatal and postnatal support to overcome postnatal depression: A randomized, controlled trial. Birth. 1995; 22(3): 13843. [PubMed]
95.
Brugha T S, Wheatley S, Taub N A. et al. Pragmatic randomized trial of antenatal intervention to prevent post-natal depression by reducing psychosocial risk factors. Psychol Med. 2000; 30(6): 127381. [PubMed]
96.
Elliott S A, Leverton T J, Sanjack M. et al. Promoting mental health after childbirth: A controlled trial of primary prevention of postnatal depression. Br J Clin Psychol. 2000; 39(Pt 3): 22341. [PubMed]
97.
Zlotnick C, Johnson S L, Miller I W. et al. Postpartum depression in women receiving public assistance: Pilot study of an interpersonal-therapy-oriented group intervention. Am J Psychiatry. 2001; 158(4): 63840. [PubMed]
98.
Armstrong K, Fraser J, Dadds M. et al. A randomized, controlled trial of nurse home visiting to vulnerable families with newborns. J Paediatr Child Health. 1999; 35(3): 23744. [PubMed]
99.
Chabrol H, Teissedre F, Saint-Jean M. et al. Prevention and treatment of post-partum depression: A controlled randomized study on women at risk. Psychol Med. 2002; 32(6): 103947. [PubMed]
100.
Chen C H, Tseng Y F, Chou F H. et al. Effects of support group intervention in postnatally distressed women A controlled study in Taiwan. J Psychosom Res. 2000; 49(6): 3959. [PubMed]
101.
Dennis C L. The effect of peer support on postpartum depression: A pilot randomized controlled trial. Can J Psychiatry. 2003; 48(2): 11524. [PubMed]
102.
Fleming A S, Klein E, Corter C. The effects of a social support group on depression, maternal attitudes and behavior in new mothers. J Child Psychol Psychiatry. 1992; 33(4): 68598. [PubMed]
103.
Hiscock H, Wake M. Randomised controlled trial of behavioural infant sleep intervention to improve infant sleep and maternal mood. Br Med J. 2002; 324(7345): 10625. [PubMed] [Free Full Text in PMC icon.Free Full text in PMC]
104.
Honey K L, Bennett P, Morgan M. A brief psycho-educational group intervention for postnatal depression. Br J Clin Psychol. 2002; 41(Pt 4): 4059. [PubMed]
105.
Horowitz J A, Bell M, Trybulski J. et al. Promoting responsiveness between mothers with depressive symptoms and their infants. J Nurs Scholarsh. 2001; 33(4): 3239. [PubMed]
106.
Onozawa K, Glover V, Adams D. et al. Infant massage improves mother-infant interaction for mothers with postnatal depression. J Affect Disord. 2001; 63(13): 2017. [PubMed]
107.
Wisner K L, Wheeler S B. Prevention of recurrent postpartum major depression. Hosp Community Psychiatry. 1994; 45(12): 11916. [PubMed]
108.
Wisner K L, Perel J M, Peindl K S. et al. Prevention of recurrent postpartum depression: A randomized clinical trial. J Clin Psychiatry. 2001; 62(2): 826. [PubMed]
109.
Turk DC, Meichenbaum D, Genest M. Pain and Behavioral Medicine. New York: Guilford, 1983.
110.
US Preventive Services Task Force. Screening for depression: recommendations and rationale. Ann Intern Med. 2002; 136(10): 7604. [PubMed]
111.
Pignone M P, Gaynes B N, Rushton J L. et al. Screening for depression in adults: a summary of the evidence for the US Preventive Services Task Force. Ann Intern Med. 2002; 136(10): 76576. [PubMed]
112.
Hoffbrand S, Howard L, Crawley H. Antidepressant treatment for post-natal depression. Nurs Times. 2001; 97(45): 35. [PubMed]
113.
Cooper P, Murray L. The impact of psychological treatments of postpartum depression on maternal mood and infant development. In: Cooper P, Murray L, editor(s). Postpartum depression and child development. New York: Guilford, 1997: 201–20.
114.
O'Hara M W, Stuart S, Gorman L L. et al. Efficacy of interpersonal psychotherapy for postpartum depression. Arch Gen Psychiatry. 2000; 57(11): 103945. [PubMed]
Appendix D. Excluded Articles
Abramowitz J S, Schwartz S A, Moore K M. et al. Obsessive-compulsive symptoms in pregnancy and the puerperium: A review of the literature. J Anxiety Disord. 2003; 17(4): 46178. [PubMed]
Affonso D D, Mayberry L J, Lovett S. et al. Pregnancy and postpartum depressive symptoms. J Womens Health. 1993; 2(2): 15764.
Affonso D D, Arizmendi T G. Disturbances in post-partum adaptation and depressive symptomatology. J Psychosom Obstet Gynaecol. 1986; 5(1): 1532.
Areias M E, Kumar R, Barros H. et al. Comparative incidence of depression in women and men, during pregnancy and after childbirth. Validation of the Edinburgh Postnatal Depression Scale in Portuguese mothers. Br J Psychiatry. 1996; 169(1): 305. [PubMed]
Areias M E, Kumar R, Barros H. et al. Correlates of postnatal depression in mothers and fathers. Br J Psychiatry. 1996; 169(1): 3641. [PubMed]
Beck C T. Postpartum depression predictors inventory--revised. Adv Neonatal Care. 2003; 3(1): 478. [PubMed]
Beck C T. Recognizing and screening for postpartum depression in mothers of NICU infants. Adv Neonatal Care. 2003; 3(1): 3746. [PubMed]
Beck C T, Gable R K. Comparative analysis of the performance of the Postpartum Depression Screening Scale with two other depression instruments. Nurs Res. 2001; 50(4): 24250. [PubMed]
Beck C, Gable R. Postpartum depression screening scale: Spanish version. Nurs Res. 2003; 52(5): 296306. [PubMed]
Beeghly M, Olson K L, Weinberg M K. et al. Prevalence, stability, and socio-demographic correlates of depressive symptoms in Black mothers during the first 18 months postpartum. Matern Child Health J. 2003; 7(3): 15768. [PubMed]
Bijl R V, van Zessen G, Ravelli A. Psychiatric morbidity among adults in The Netherlands: The NEMESIS-Study. II. Prevalence of psychiatric disorders. Netherlands Mental Health Survey and Incidence Study. Ned Tijdschr Geneeskd. 1997; 141(50): 245360. [PubMed]
Brugha T S, Wheatley S, Taub N A. et al. Pragmatic randomized trial of antenatal intervention to prevent post-natal depression by reducing psychosocial risk factors. Psychol Med. 2000; 30(6): 127381. [PubMed]
Caravale B, Allemand F, Libenson M H. Factors predictive of seizures and neurologic outcome in perinatal depression. Pediatr Neurol. 2003; 29(1): 1825. [PubMed]
Carro M G, Grant K E, Gotlib I H. et al. Postpartum depression and child development: An investigation of mothers. Dev Psychopathol. 1993; 5(4): 56779.
Chan S, Levy V. Postnatal depression: A qualitative study of the experiences of a group of Hong Kong Chinese women. J Clin Nurs. 2004; 13(1): 1203. [PubMed]
Cooper P J, Campbell E A, Day A. et al. Non-psychotic psychiatric disorder after childbirth. A prospective study of prevalence, incidence, course and nature. Br J Psychiatry. 1988; 152: 799806. [PubMed]
Cooper P J, Murray L, Hooper R. et al. The development and validation of a predictive index for postpartum depression. Psychol Med. 1996; 26(3): 62734. [PubMed]
Cox J L, Holden J M, Sagovsky R. Detection of postnatal depression. Development of the 10-item Edinburgh Postnatal Depression Scale. Br J Psychiatry. 1987; 150: 7826. [PubMed]
Cutrona C E. Causal attributions and perinatal depression. J Abnorm Psychol. 1983; 92(2): 16172. [PubMed]
Demyttenaere K, Lenaerts H, Nijs P. et al. Individual coping style and psychological attitudes during pregnancy and predict depression levels during pregnancy and during postpartum. Acta Psychiatr Scand. 1995; 91(2): 95102. [PubMed]
Dennis C. Can we identify mothers at risk for postpartum depression in the immediate postpartum period using the Edinburgh Postnatal Depression Scale? J Affect Disord. 2004; 78(2): 1639. [PubMed]
Eberhard-Gran M, Tambs K, Opjordsmoen S. et al. A comparison of anxiety and depressive symptomatology in postpartum and non-postpartum mothers. Soc Psychiatry Psychiatr Epidemiol. 2003; 38(10): 5516. [PubMed]
Edwards M, Waldorf M. Reclaiming Birth: History and heroines of American Childbirth Reform. 1984: The Crossing Press, 1984: 4–6.
Ellison M, Hall J. Social stigma and compounded losses: Quality-of-life issues for multiple-birth families. Fertil Steril. 2003; 80(2): 40514. [PubMed]
Evins G G, Theofrastous J P, Galvin S L. Postpartum depression: A comparison of screening and routine clinical evaluation. Am J Obstet Gynecol. 2000; 182( 5): 10802. [PubMed]
Feggetter G, Cooper P, Gath D. Non-psychotic psychiatric disorders in women one year after childbirth. J Psychosom Res. 1981; 25(5): 36972. [PubMed]
Fergusson D M, Horwood L J, Thorpe K. Changes in depression during and following pregnancy. ALSPAC Study Team. Study of Pregnancy and Children. Paediatr Perinat Epidemiol. 1996; 10(3): 27993. [PubMed]
Ghubash R, Abou-Saleh M T, Daradkeh T K. The validity of the Arabic Edinburgh Postnatal Depression Scale. Soc Psychiatry Psychiatr Epidemiol. 1997; 32(8): 4746. [PubMed]
Gilman S E, Kawachi I, Fitzmaurice G M. et al. Family disruption in childhood and risk of adult depression. Am J Psychiatry. 2003; 160(5): 93946. [PubMed]
Greene S M, Nugent J K, Wieczorek Deering D E. et al. The patterning of depressive symptoms in a sample of first-time mothers. Ir J Psychol. 1991; 12(2): 26375.
Guedeney A, Doleans M C, Huot-Marchand M. Early screening of withdrawal reaction in the Maternal and Infant Welfare Protection program. Arch Pediatr. 2003; 10(Suppl 1): 131s3s. [PubMed]
Guedeney N, Fermanian J, Guelfi J D. et al. The Edinburgh Postnatal Depression Scale (EPDS) and the detection of major depressive disorders in early postpartum: Some concerns about false negatives. J Affect Disord. 2000; 61(12): 10712. [PubMed]
Harris B, Fung H, Johns S. et al. Transient postpartum thyroid dysfunction and postnatal depression. J Affect Disord. 1989; 17(3): 2439. [PubMed]
Harris B, Johns S, Fung H. et al. The hormonal environment of post-natal depression. Br J Psychiatry. 1989; 154: 6607. [PubMed]
Hobfoll S E, Ritter C, Lavin J. et al. Depression prevalence and incidence among inner-city pregnant and postpartum women. J Consult Clin Psychol. 1995; 63(3): 44553. [PubMed]
Holt W J. The detection of postnatal depression in general practice using the Edinburgh postnatal depression scale. N Z Med J. 1995; 108(994): 579. [PubMed]
Honey K L, Bennett P, Morgan M. Predicting postnatal depression. J Affect Disord. 2003; 76(13): 20110. [PubMed]
Kessler R C, McGonagle K A, Zhao S. et al. Lifetime and 12-month prevalence of DSM-III-R psychiatric disorders in the United States. Results from the National Comorbidity Survey. Arch Gen Psychiatry. 1994; 51(1): 819. [PubMed]
Kessler R C, Zhao S, Blazer D G. et al. Prevalence, correlates, and course of minor depression and major depression in the National Comorbidity Survey. J Affect Disord. 1997; 45(12): 1930. [PubMed]
Kitamura T, Shima S, Sugawara M. et al. Psychological and social correlates of the onset of affective disorders among pregnant women. Psychol Med. 1993; 23: 96775. [PubMed]
Kurki T HVRRMHYO. Depression and anxiety in early pregnancy and risk for preeclampsia. Obstet Gynecol. 2000; 95(4): 48790. [PubMed]
Leavitt JW. Brought to Bed: Childbearing in America 1750 to 1950. New York: Oxford University Press, 179–86.
Lee D, Yip A, Chan S. et al. Postdelivery screening for postpartum depression. Psychosom Med. 2003; 65(3): 3573361. [PubMed]
Leopold K, Zoschnick L. Postpartum depression. Female Pat. 1997; 22: 409.
Leverton TJ, Elliott SA. Transition to parenthood groups: A preventive intervention for perinatal depression? In: van Hall EV, Evereard W, eds. The free woman: woman's health in the 1990s. Invited papers of the 9th international conference of psychosomatic obstetrics and gynecology. Vol. 479–486. Lancaster, Pa: Parthenon Press, 1989.
McGill H, Burrows V L, Holland L A. et al. Postnatal depression: A Christchurch study. N Z Med J. 1995; 108(999): 1625. [PubMed]
McKenry P C, Browne D H, Kotch J B. et al. Mediators of depression among low-income, adolescent mothers of infants: A longitudinal perspective. J Youth Adolescence. 1990; 19(4): 32747.
McMahon C, Barnett B, Kowalenko N. et al. Postnatal depression, anxiety and unsettled infant behaviour. Aust N Z J Psychiatry. 2001; 35(5): 5818. [PubMed]
Meager I, Milgrom J. Group treatment for postpartum depression: A pilot study. Aust N Z J Psychiatry. 1996; 30(6): 85260. [PubMed]
Morris-Rush J K, Freda M C, Bernstein P S. Screening for postpartum depression in an inner-city population. Am J Obstet Gynecol. 2003; 188(5): 12179. [PubMed]
Muzik M, Klier C, Rosenblum K. et al. Are commonly used self-report inventories suitable for screening postpartum depression and anxiety disorders? Acta Psychiatr Scand. 2000; 102(1): 713. [PubMed]
Neugebauer R. Rate of depression in the puerperium. Br J Psychiatry. 1983; 143: 4212. [PubMed]
Nott P N. Extent, timing and persistence of emotional disorders following childbirth. Br J Psychiatry. 1987; 151: 5237. [PubMed]
O'Hara ME, Zekoski EM. Postpartum depression, a comprehensive review. Motherhood and Mental Illness 2. Butterworth & Co. Ltd, 1988: 17–63.
Okano T, Nomura J, Kumar R. et al. An epidemiological and clinical investigation of postpartum psychiatric illness in Japanese mothers. J Affect Disord. 1998; 48(23): 23340. [PubMed]
Owen P J, Lazarus J H. The treatment of post-partum thyroid disease. J Endocrinol Invest. 2003; 26(4): 2901. [PubMed]
Peindl K S, Wisner K L. Successful recruitment strategies for women in postpartum mental health trials. J Psychiatr Res. 2003; 37(2): 11725. [PubMed]
Pfost K S, Lum C U, Stevens M J. Femininity and work plans protect women against postpartum dysphoria. Sex Roles. 1989; 21(56): 42331.
Philipps L H, O'Hara M W. Prospective study of postpartum depression: 4 1/2-year follow-up of women and children. J Abnorm Psychol. 1991; 100(2): 1515. [PubMed]
Posmontier B, Horowitz J A. Postpartum practices and depression prevalences: Technocentric and ethnokinship cultural perspectives. J Transcult Nurs. 2004; 15(1): 3443. [PubMed]
Posner N A, Unterman R R, Williams K N. et al. Screening for postpartum depression: An antepartum questionnaire. J Reprod Med. 1997; 42: 20715. [PubMed]
Priest S R, Henderson J, Evans S F. et al. Stress debriefing after childbirth: A randomised controlled trial. Med J Aust. 2003; 178(11): 5425. [PubMed]
Rahman A, Iqbal Z, Harrington R. Life events, social support and depression in childbirth: Perspectives from a rural community in the developing world. Psychol Med. 2003; 33(7): 11617. [PubMed]
Rees W D. Parental depression before and after childbirth. An assessment with the Beck Depression Inventory. J R Coll Gen Pract. 1971; 21(102): 2631. [PubMed] [Free Full Text in PMC icon.Free Full text in PMC]
Schaper A, Rooney B, Kay N. et al. Use of the Edinburgh Postnatal Depression Scale to identify postpartum depression in a clinical setting. J Reprod Med. 1994; 39(8): 6204. [PubMed]
Shakespeare J, Blake F, Garcia J. A qualitative study of the acceptability of routine screening of postnatal women using the Edinburgh Postnatal Depression Scale. Br J Gen Pract. 2003; 53(493): 6149. [PubMed] [Free Full Text in PMC icon.Free Full text in PMC]
Sharp D J. Validation of the 30-item General Health Questionnaire in early pregnancy. Psychol Med. 1988; 18(2): 5037. [PubMed]
Shimizu Y, Kaplan B. Postpartum depression in the United States and Japan. J Cross-Cultural Psychol. 1987; 18(1): 1530.
Silver L. Postnatal depression: An overview. J Fam Health Care. 2003; 13(6): 1445. [PubMed]
Teissedre F, Chabrol H. Detecting women at risk for postnatal depression using the Edinburgh Postnatal Depression Scale at 2 to 3 days postpartum. Can J Psychiatry. 2004; 49(1): 514. [PubMed]
Thorpe K, Dragonas T, Golding J. The effects of psychosocial factors on the mother's emotional well-being. J Reprod Infant Psychol. 1992; 10(4): 205.
Troutman B R, Cutrona C E. Nonspychotic post partum depression among adolescent mothers. J Abnorm Psychol. 1990; 99: 6978. [PubMed]
Verkerk G J, Pop V J, Van Son M J. et al. Prediction of depression in the postpartum period: A longitudinal follow-up study in high-risk and low-risk women. J Affect Disord. 2003; 77(2): 15966. [PubMed]
Wang S Y, Jiang X Y, Jan W C. et al. A comparative study of postnatal depression and its predictors in Taiwan and mainland China. Am J Obstet Gynecol. 2003; 189(5): 140712. [PubMed]
Webster J, Pritchard M, Creedy D. et al. A simplified predictive index for the detection of women at risk for postnatal depression. Birth. 2003; 30(2): 1018. [PubMed]
Whiffen V E. Screening for postpartum depression: A methodological note. J Clin Psychol. 1988; 44(3): 36771. [PubMed]
Whiffen V. Vulnerability of postpartum depression: A prospective multivariate study. J Abnorm Psychol. 1988; 97(4): 46774. [PubMed]
Wickberg B, Hwang C. Counselling of postnatal depression: A controlled study on a population based Swedish sample. J Affect Disord. 1996; 39(3): 20916. [PubMed]
Yonkers K A, Ramin S M, Rush A J. et al. Onset and persistence of postpartum depression in an inner-city maternal health clinic system. Am J Psychiatry. 2001; 158(11): 185663. [PubMed]
Footnotes
a

Two studies assessed the mood of fathers in addition to the mothers.30, 45 We do not address the comparison of mothers and fathers in this chapter because it is beyond the scope of this study.

Help ǀ Contact Bookshelf
AHRQ Evidence Reports
(navigation arrows) Go to previous chapter Go to next chapter Go to top of this page Go to bottom of this page Go to Table of Contents