Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Eur Neuropsychopharmacol. Author manuscript; available in PMC Dec 26, 2007.
Published in final edited form as:
PMCID: PMC2151980

The Montgomery Äsberg and the Hamilton Ratings of Depression

A Comparison of Measures
Thomas Carmody, Ph.D.,1 A. John Rush, M.D.,1 Ira Bernstein, Ph.D.,2 Diane Warden, Ph.D., M.B.A.,1 Stephen Brannan, M.D.,3 Daniel Burnham, Ph.D.,4 Ada Woo, M.A.,2 and Madhukar Trivedi, M.D.1


The 17-item Hamilton Rating Scale for Depression (HRSD17) and the Montgomery Äsberg Depression Rating Scale (MADRS) are two widely used clinicianrated symptom scales. A 6-item version of the HRSD (HRSD6) was created by Bech to address the psychometric limitations of the HRSD17. The psychometric properties of these measures were compared using classical test theory (CTT) and item response theory (IRT) methods. IRT methods were used to equate total scores on any two scales. Data from two distinctly different outpatient studies of nonpsychotic major depression: a 12-month study of highly treatment-resistant patients (n=233) and an 8-week acute phase drug treatment trial (n=985) were used for robustness of results.

MADRS and HRSD6 items generally contributed more to the measurement of depression than HRSD17 items as shown by higher item-total correlations and higher IRT slope parameters. The MADRS and HRSD6 were unifactorial while the HRSD17 contained 2 factors. The MADRS showed about twice the precision in estimating depression as either the HRSD17 or HRSD6 for average severity of depression. An HRSD17 of 7 corresponded to an 8 or 9 on the MADRS and 4 on the HRSD6.

The MADRS would be superior to the HRSD17 in the conduct of clinical trials.

Keywords: MADRS, HRSD, item response theory, classical test theory, psychometrics


The measurement of depressive symptom severity is important not only for the conduct of efficacy and effectiveness trials but increasingly also for the proper implementation of treatment guideline recommendations for major depressive and other mood disorders (Crismon et al., 1999; Depression Guideline Panel, 1993; Rush et al., 2003; Trivedi et al., 2004). A number of self-reports (e.g., Carroll Rating Scale, Beck Depression Inventory, Zung Self-Rating Scale, and Inventory of Depressive Symptomatology - Self Report) and clinician ratings (e.g., Hamilton Rating Scale for Depression, Montgomery Äsberg Depression Rating Scale, and Inventory of Depressive Symptomatology - Clinician-rated) are available. Perhaps the two most popular clinical ratings are the Hamilton Rating Scale for Depression (HRSD), which comes in several versions (e.g., 17, 21, 24, 28, and 31 items) (Hamilton, 1960; Hamilton, 1967) and the 10-item Montgomery Äsberg Depression Rating Scale (MADRS) (Montgomery and Äsberg, 1979). The MADRS is used frequently in European registration and other clinical trials, while the Hamilton continues to be more widely used in the United States, though recent reports (Bagby et al., 2004; Zimmerman et al., 2005) have highlighted significant shortcomings in the HRSD.

The MADRS has been reported as equivalent to or more sensitive to change in symptoms over time than the HRSD17 (Mulder et al., 2003; Rivera et al., 2000; Senra, 1996) and equivalent to the HRSD17 in detecting drug/placebo differences (Khan et al., 2002). The MADRS has been reported to be unifactorial after treatment (Galinowski and Lehert, 1995; Rocca et al., 2002), although more than one factor has been found using ratings with more limited ranges in total score (i.e., prior to treatment) (Corruble et al., 1999; Craighead and Evans, 1996; Galinowski and Lehert, 1995; Hammond, 1998; Rocca et al., 2002). A meta-analysis (Faries et al., 2000), however, found that the superiority of the MADRS or the HRSD in detecting differences between drug and placebo depended on the class of the medication, and the specific effects and side effects of the medication.

The HRSD17 has been found consistently to be multidimensional (Bech et al., 1981; Gibbons et al., 1993; Hamilton, 1967; Maier et al., 1988), which may reduce its sensitivity to detecting changes in depression severity or in differentiating between two treatments. Prior analyses of the HRSD17 have identified specific problematic items in terms of response characteristics (Bagby et al., 2004; Santor and Coyne, 2001). Several briefer versions of the HRSD have been developed to improve upon the HRSD17 by creating a more unifactorial measure of depression that, consequently, should be more sensitive to detecting changes in depression or to detecting drug/placebo differences than the HRSD17. The most commonly used brief HRSD may be that developed by Bech (Bech et al., 1975) — a 6-item scale that includes the following items: depressed mood, guilt, work and activities, retardation, psychic anxiety, and somatic symptoms general. In fact, the HRSD6 has been found to be more clearly unidimensional (Bagby et al., 2004; Bech et al., 1992; Bech et al., 1997; Bech et al., 1984; Bech et al., 1975), more sensitive to change than the HRSD17 (de Montigny et al., 1981; O’Sullivan et al., 1997), and equivalent to (Hooper and Bakish, 2000) or more sensitive to detecting drug/placebo or drug/drug differences than the HRSD17 (Bech et al., 2000; Faries et al., 2000). This briefer version appears to have less psychometric bias as a result of side effects (Moller, 2001).

Item response theory (IRT) models (Embretson and Reise, 2000; Hambleton and Swaminathan, 1985; Hulin et al., 1983; Nunnally and Bernstein, 1994) represent an important and increasingly sophisticated framework for examination of the psychometric properties of rating scale total scores and individual items. The Rasch model has been used to examine the psychometric properties of the HRSD17 (Bech et al., 1981) and recently the more complex Samejima’s graded IRT model has been used to further examine the HRSD17 (Rush et al., 2005a). Unlike classical test theory (CTT) methods, IRT methods can be used to create conversion tables that allow a reliable crosswalk between total scale scores (Orlando et al., 2000). Such conversion tables allowing a crosswalk between the HRSD17, MADRS, and Bech’s HRSD6 would greatly facilitate extrapolating findings from published reports that use one scale to allow reliable estimates of the same study results using the alternative scale (e.g., MADRS to HRSD17). In addition, such information would clarify what total symptom scores are comparable in defining remission, as well as mild, moderate, and severe symptom levels. Ideally, the validity of such conversion tables would be higher if they were developed from a large diverse study sample or samples.

This report provides both CTT and IRT results on two distinctly different depressed outpatient samples, each of which were developed for different research purposes. These results provide an empirical basis for converting one scale total score into another scale total score. Further, since the HRSD17 and the MADRS were collected at the same time on each of two samples, these two clinician ratings, as well as the HRSD6, which was extracted from the HRSD17, can be compared on the basis of item response and other psychometric features.


Two datasets were analyzed for this report. The first (Study 1) (n=233) was generated from a 12-month uncontrolled, long-term study of adult outpatients (18-75 years old) with highly treatment-resistant, nonpsychotic major depressive episodes (MDEs) who participated in a study of adjunctive vagus nerve stimulation added onto ongoing diverse medication regimens (Rush et al., 2005b). Diagnoses were rendered with the Structured Clinical Interview for DSM-IV (SCID) (First et al., 1994). This population included 208 (89.3%) patients with major depressive disorder and 25 (10.7%) in a depressed phase of bipolar I (n=12) or bipolar II (n=13) disorder. The baseline features of this sample included 62.2% female with an average age 47.2 (SD=8.9) (range: 24 to 72), 96.6% Caucasian with an average baseline HRSD17 total score of 21.9 (SD=4.4) (range: 13 to 37), an average baseline HRSD6 total score of 12.4 (SD=2.5) (range: 6 to 19), and an average baseline MADRS total score of 31.9 (SD=6.7) (range: 14 to 50).

This patient group had not responded adequately to 2-6 trials of known effective treatments delivered at adequate doses and durations in the current MDE as assessed by the Antidepressant Treatment History Form (Oquendo et al., 1999; Prudic et al., 1996; Prudic et al., 1990; Sackeim et al., 1990; Sackeim et al., 2000; Sackeim, 2001). When counting all clinical treatments received, patients had on average received over 12 different medications in the current MDE.

Raters were not blind to treatment when the Study 1 data analyzed in this report were collected. Data at study exit (or the date closest to 12 months following study initiation) were subjected to analysis. Patients with no post-baseline data were excluded. The HRSD28 was collected using a structured interview modeled after Williams et al. (Williams, 1988). The HRSD17 was extracted from the HRSD28 for these analyses. Study 1 data were supplied by Cyberonics, Inc.

The second sample (Study 2) (n=985) included only outpatients with nonpsychotic major depressive disorder (MDD) defined by DSM-IV. A complete psychiatric evaluation following APA guidelines (American Psychiatric Association, 2000) was performed by a psychiatrist after a screening interview using the Mini International Neuropsychiatric Interview (MINI) (Sheehan et al., 1998). These subjects were randomized to one of three treatment cells (placebo, standard antidepressant, experimental antidepressant) for an 8-week double-blind treatment phase. A total score of 20 on the HRSD17 and an item 1 (sad mood) score of 2 were required for study entry. Treatment-resistant patients were excluded. Altogether, 59.9% were female with an average age 39.8 (SD=11.6) (range: 18-65); 81.5% were Caucasian. The average baseline HRSD17 total score was 23.6 (SD=2.9) (range: 20 to 35); the average baseline HRSD6 total score was 12.8 (SD=1.7) (range: 7 to 18). The average baseline MADRS total score was 28.6 (SD=5.1) (range: 10 to 45). Raters were blind to treatment assignment. Raters were trained in the conventional way by completing the HRSD17 while watching videotaped interviews. For Study 2, the HRSD17 was obtained without a structured interview. For these analyses, data were supplied by GlaxoSmithKline.

These two data sets were chosen to maximize the generalizability of the findings in this report. Exit ratings were chosen to provide a maximum range in symptom scores. In both data sets, the HRSD17 and MADRS were collected by the same evaluator in the same interview. The HRSD6 was extracted and totaled from the HRSD17 items.

Statistical Analyses

Classical test theory (CTT) measures of consistency: Cronbach’s alpha (α) (Cronbach, 1951) and item-total correlations (not corrected for item/total overlap), were computed for the HRSD17, MADRS, and HRSD6 at study exit for each study. Also, effect sizes were computed for each total score and item for each measure within each study. The effect size was computed as the exit item or total score minus the baseline item or total score divided by the standard deviation of the change in item or total score. Let D = X Base - X Exit, where X Base and X Exit are baseline and exit scores and D, is their difference for a given item. The effect size (E) is then simply D̄/Sd, where is the mean of D and Sd is its standard deviation.

IRT methods were used to equate total scores for each pair of scales. Samejima’s graded IRT model (Samejima, 1997) item parameters were estimated for each item of each measure. These parameters were used according to the procedure of Orlando et al. (Orlando et al., 2000) (and associated software) to generate an IRT score for each possible total score on the HRSD17, MADRS, and HRSD6. The IRT score, usually called theta, is a unitless measure of depression estimated from the IRT procedure commonly scaled to a mean of 0 and a standard deviation of 1. The total scores for each pair of scales were equated by matching the IRT scores for the two corresponding scales (Orlando et al., 2000). When an IRT score did not match exactly, best judgment was used to equate the scales taking into account the matching of total scores immediately above and below the total score in question.

The graded IRT model was also used to compute the test information function (TIF) (Birnbaum, 1968) for each scale in each study. The “information” provided by a scale is defined to be the inverse of the standard error of the total score of the scale. Thus, a total score that provides a precise estimate of symptom severity contains more “information” than a total score that provides an imprecise estimate. The TIF allows one to see at which levels of symptom severity any given measure’s total scores provide the most precise estimates and also to compare the precision of two or more measures across all levels of symptom severity.

An assumption of the IRT approach is that the measures assess only depression (i.e., are unidimensional, see the above IRT references). Therefore, a principal components factor analysis was conducted to assess the dimensionality of each measure in each study. Parallel analysis (Horn, 1965; Humphreys and Ilgen, 1969; Humphreys and Montanelli, 1975; Montanelli and Humphreys, 1976) was used to infer how many “real” factors (dimensions) were present in the data. This approach avoids some of the limitations of the traditional Kaiser-Guttman eigenvalue-greater-than-1 rule. Parallel analysis involves comparing the eigenvalues from a principal components analysis of the real data to eigenvalues that might be expected to arise by chance alone. To determine how large the latter eigenvalues might be, we generated a series of simulated datasets consisting of random numbers (where correlations between all variables are zero) using the same number of observations and variables (items) as the real data. Eigenvalues of the principal components for each simulated dataset are computed and averaged over replications. The largest real data eigenvalue is compared to the largest simulated data eigenvalue, then the second largest real and simulated eigenvalues are compared, and so on until we find a real eigenvalue smaller than the corresponding simulated eigenvalue. The number of principal components for which the real eigenvalues exceed the simulated eigenvalues defines the dimensionality.

These analyses were applied to (1) the 17-item Hamilton Rating Scale for Depression (HRSD17), (2) the 10-item Montgomery Äsberg Depression Rating Scale (MADRS), and (3) the 6-item Hamilton Rating Scale for Depression (HRSD6) defined by Bech et al. (Bech et al., 1975).


Classical Test Theory Analyses


All three measures were highly correlated with each other at exit in both studies. In Study 1, the correlation between the HRSD17 and HRSD6 total scores was 0.89; between the HRSD17 and MADRS, the correlation was 0.88, and between the HRSD6 and MADRS, the correlation was 0.86. In Study 2, all the correlations were slightly higher: HRSD17 vs. HRSD6 was 0.94, HRSD17 vs. MADRS was 0.92, and HRSD6 vs. MADRS was 0.91.

Internal Consistency

Cronbach’s alpha showed highly acceptable internal consistency for all measures using study exit data. For the HRSD17, the values were 0.81 (Study 1) and 0.88 (Study 2). For the MADRS, values were slightly higher: 0.90 (Study 1) and 0.92 (Study 2). Finally, for the HRSD6, the values were 0.78 (Study 1) and 0.86 (Study 2).

Item Total Correlations

Table 1 summarizes the item total correlations for each test in each study. Most items on the MADRS correlated with the total score at ≥ 0.60 (both studies). For the HRSD17, only 4 of 17 items (Study 1) and only 6 of 17 items (Study 2) correlated at ≥ 0.60 with the total score. In addition, median item-total correlations were 0.75 (Study 1) and 0.78 (Study 2) for the MADRS. HRSD6 item-total correlations were about the same magnitude as for the MADRS. For the HRSD17 median item total correlations were lower (0.50 for Study 1 and 0.56 for Study 2).

Table 1
Item Total Correlations for Each Measure in Each Study

Item Response Theory Analyses

Item Performance

Table 2 summarizes the item performance for the HRSD17 for each study. Tables Tables33 and and44 provide similar information for the MADRS and HRSD6, respectively. Items with numerically higher slopes (“a” values) contribute more to the measurement of depression because these items are better able to discriminate among different levels of depression. For the HRSD17, only 7/17 items had a value ≥ 1.0 (Study 1), while for Study 2, 13/17 items were ≥ 1.0. For the MADRS, 9/10 items in each study had slopes of ≥ 1.0. For the HRSD6, all items in both studies had slopes ≥ 1.0

Table 2
Item Response Analyses for the HRSD17 for Study 1 (n=233) and Study 2 (n=985)
Table 3
Item Response Analyses for the MADRS for Study 1 (n=233) and Study 2 (n=985)
Table 4
Item Response Analyses for the HRSD6 for Study 1 (n=233) and Study 2 (n=985)


Principal components factor analyses were conducted on each measure for each study. For the HRSD17 in Study 1, two factors were identified using parallel analysis to determine the number of factors. The average of the first three eigenvalues from the simulated datasets were 1.50, 1.39, and 1.31, which were compared to the first 3 real eigenvalues of 4.33, 1.73, and 1.19. Two factors were chosen because the first two real data eigenvalues were larger than the first two simulated data eigenvalues. After oblique rotation, factor 1 included items for depressed mood (1), suicide (3), work and activities (7), retardation (8), psychic anxiety (10), somatic symptoms general (13), and libido (14). These items had loadings greater than 0.3 on the first factor and less than 0.1 on the second factor. Factor 2 included initial insomnia (4), middle insomnia (5), delayed insomnia (6), and reduced appetite/gastrointestinal symptoms (12). The other 6 items did not clearly belong to either factor. These items had loadings of similar magnitude on both factors.

The HRSD17 in Study 2 also revealed two factors based on the comparison of the first 3 simulated data eigenvalues of 1.23, 1.19, and 1.15 to real data eigenvalues of 5.77, 1.30, and 1.11. After oblique rotation, factor 1 included depressed mood (1), guilt (2), suicide (3), work and activities (7), psychic anxiety (10), somatic symptoms general (13), and hypochondriasis (15). Factor 2 items included initial insomnia (4), middle insomnia (5), and delayed insomnia (6). The other 7 items did not clearly belong to either factor.

For the MADRS, only one factor was identified for Study 1 because the first real eigenvalue of 5.41 was much larger than the first simulated eigenvalue of 1.33, while the second real eigenvalue of 1.06 was smaller than the second simulated eigenvalue of 1.23. One MADRS factor was also identified for Study 2 based on simulated eigenvalues of 1.15 and 1.11 compared with real eigenvalues of 6.00 and 0.91.

Comparison of simulated versus real data eigenvalues showed the HRSD6 to be unifactorial in both Study 1 and Study 2. In Study 1, the first 2 eigenvalues were 1.21 and 1.11 (simulated) versus 2.92 and 0.92 (real). In Study 2, the first 2 eigenvalues were 1.10 and 1.06 (simulated) versus 3.57 and 0.73 (real).

Conversion Tables

Table 5 summarizes the IRT conversions for each rating scale total score for Study 1 and Study 2 combined. In the combined sample, an HRSD17 of 7 was comparable to a MADRS total score of 8 or 9, and an HRSD 6 total score of 4.

Table 5
IRT Conversion Table for the HRSD17, MADRS, and HRSD6 (Studies 1 and 2 combined) (n=1218)

Relative Precision

Figures Figures11 and and22 show the test information functions (TIFs) for each scale in Study 1 and Study 2, respectively. In these figures theta represents a unitless measure of depression estimated from the IRT procedure scaled to a mean of 0 and a standard deviation of 1. In Study 1, the TIF for the MADRS was over twice that of the HRSD17 or HRSD6 for thetas from -1 to +2 indicating that for patients with average levels of depression, the MADRS was about 2 times as precise as the HRSD17. For very high or low levels of depression, the differences between scales were not as great. Similar findings were seen for Study 2. The small improvement in precision from the HRSD6 to the HRSD17 showed that most of the precision of the HRSD17 came from the 6 items included in the HRSD6. The remaining 11 items did not add substantially to the precision of the HRSD17.

Figure 1
Test Information Functions for Study 1 (n=233)
Figure 2
Test Information Functions for Study 2 (n=985)

Item Effect Sizes

Table 6 displays the item and total score effect sizes (baseline to exit) for each scale in each study. Not surprisingly, the more treatment-resistant sample (Study 1) had lower overall item and total score effect sizes with each of the three measures. The items on the MADRS with the highest effect size (in both studies) were reported sadness, apparent sadness, concentration, lassitude, inability to feel, and pessimistic thoughts. For the HRSD17, the items with the largest effect sizes included depressed mood, work and activities, guilt, retardation, psychic anxiety, and somatic symptoms general.

Table 6
Effect Sizes for Study 1 (n=232) and Study 2 (n=985)


When the HRSD17, MADRS, and HRSD6 were compared across two different populations, the total scores obtained at study exit on all three measures were highly correlated. The MADRS consistently had the highest Cronbach alpha levels. Both the MADRS and the HRSD6 were unifactorial. The HRSD17 had two factors, with 6-7 items not loading on either factor or loading on both factors. This is consistent with prior reports of the multidimensionality of the HRSD17 and unidimensionality of the MADRS and the HRSD6 (Bagby et al., 2004; Bech et al., 1992; Bech et al., 1997; Bech et al., 1984; Bech et al., 1981; Bech et al., 1975; Galinowski and Lehert, 1995; Gibbons et al., 1993; Hamilton, 1967; Maier et al., 1988; Rocca et al., 2002).

Given the unifactorial nature of the MADRS and the HRSD6, it is not surprising that more MADRS items and more HRSD6 items evidenced high (≥ 0.60) item total correlations as compared to items on the HRSD17. IRT analyses revealed that for the HRSD17, depressed mood, work and activities, somatic symptoms general, suicide, psychic anxiety, and guilt were most highly reflective of the core concept of depression (both studies) as shown by larger values of the slope (a) parameter. All items on the HRSD6 related highly to core depression. For the MADRS, IRT analyses revealed that most (8/10) items related highly to the core concept of depression (both studies). Reduced sleep and reduced appetite on the MADRS also related to overall depression, but less so than the other 8 items.

For the HRSD17, IRT analyses revealed several items that were minimally related to the core concept of depression (e.g., hypochondriasis, loss of insight, weight loss, agitation). For the MADRS, only reduced sleep in Study 1 and reduced appetite in Study 2 showed a similarly weak relationship to the core concept of depression, and no HRSD6 items showed such a weak relationship to core depression.

A majority of HRSD17, MADRS, and HRSD6 items rarely exhibited the full range of response options (where b0, b1, b2, etc. were >2.0 or 7 score units), which is understandable given the outpatient nature of the samples. However, for 7 HRSD17 items, 2 MADRS items, and 2 HRSD6 items, the most severe range of responses were endorsed so infrequently that the “b” parameters were either very large (20+) or unestimable (e.g., HRSD17: guilt, suicide, retardation, agitation, hypochondriasis, loss of insight, and weight loss, MADRS: pessimism and suicide, HRSD6: guilt and retardation).

Conversion tables were generated by combining data from both studies. An HRSD17 of 7 approximated a MADRS of 8 or 9 (i.e., 8.5). An HRDS17 total of 20 approximated a MADRS of 27. In general, results were comparable to those reported by Hawley et al. (Hawley et al., 1998) who recommended that MADRS = 1.3 HRSD17 + 0.7 based on a regression analysis.

All MADRS items had acceptable effect sizes, and were therefore sensitive to change over time, although the sleep and appetite items were less sensitive to change. The MADRS had a consistent order to the effect size for each item in both studies, which revealed the consistency of the MADRS items in measuring the degree to which symptoms improved in two very different depressed samples. The items with the greatest effect size for change over time in the HRSD17 were the six items assessed by the HRSD6 (e.g., work and activities, sad mood, psychic anxiety, etc.). HRSD17 items with the lowest effect size (both studies) included loss of insight, weight loss, hypochondriasis, libido, appetite, agitation, and the three insomnia items. The inclusion of items that are relatively less sensitive to change over time risks decreased power to detect change over time. These results support the conclusion that the MADRS is preferred over the HRSD17 in measuring depression severity and change in depression severity over time given its unifactorial structure, the high and consistent relationship between items and the measured concept of depression (by IRT) or to total score (by CTT), and its greater precision (Figures (Figures11 and and2).2). This report, however, did not compare the scales in terms of drug vs. placebo effect sizes.

A variety of additional factors also recommend the MADRS over the HRSD17. These factors include: (1) each MADRS item measures a single symptom whereas some HRSD17 items measure two related but different concepts, such as work and activities, making it difficult to relate the item score to a specific symptom; (2) the even weighting of items by the MADRS such that each symptom item may contribute equally to the total score; (3) the use of a broader range of responses to individual items with the MADRS as opposed to the HRSD17, suggesting increased sensitivity of the items to change, and (4) the 10-item MADRS typically takes 15 minutes to complete compared with 15 to 20 minutes for the 17-item HRSD (Rush et al., 2000). In addition, the HRSD17 is known to perform better with a structured interview and trained raters, while there is no need for a structured interview with the MADRS. In fact, the MADRS outperformed the HRSD17 in Study 1, even when a structured interview was used. From the point of view of enhancing the cost efficiency of trials, the MADRS should be the first choice, since it typically takes no longer to complete than the HRSD17, yet it provides substantially better precision in assessing symptom severity.

These conclusions are strengthened by the consistency of results across two very different patient populations and by the fact that in both studies the HRSD17 and MADRS were administered at the same visit thus eliminating as many extraneous influences as possible. However, the administration of both measures at the same interview by the same interviewer may have lead to the subtle influence of responses to one interview affecting the other. Since the order of the two measures was not strictly randomized, we cannot estimate the effect of this potential confound.

Neither the HRSD17 nor the MADRS, however, assess all of the core criterion symptoms used in DSM IV TR to diagnose a major depressive episode. In that regard, neither is fully adequate to define severity of depression or remission - assuming remission refers to the virtual absence of core criterion symptoms of MDD. The HRSD17 lacks ratings of oversleeping, overeating, and concentration. The MADRS lacks ratings of oversleeping and overeating, as well as interest (though it assesses inability to feel), energy (though it assesses lassitude), self criticism (guilt), and psychomotor changes.


These data provide a strong basis for selecting the MADRS rather than the HRSD17 in the conduct of efficacy and effectiveness trials, given its unifactorial nature and other preferable psychometric properties, and the avoidance of a structured interview and training (for experienced clinicians). How other clinical or self-report depression measures compare with the MADRS deserves study.


This project was funded in part by the National Institute of Mental Health (NIMH), National Institutes of Health (MH-68851 to the University of Texas Southwestern Medical Center at Dallas, A. John Rush, M.D., PI, and by MH-68852 to the University of Texas at Arlington, Ira H. Bernstein, Ph.D., PI).

The authors wish to express their appreciation to both Cyberonics Inc. and GlaxoSmithKline for providing the datasets used in this report. The authors personally conducted these analyses and received no support for conducting these analyses or preparing this report. A. John Rush, M.D. has received payments as a consultant and speaker for both GlaxoSmithKline and Cyberonics Inc. Madhukar Trivedi, M.D. has received payments as a consultant to Cyberonics. Daniel Burnham, M.D. is an employee and stockholder of GlaxoSmithKline. Steve Brannan, M.D. is an employee and stockholder of Cyberonics, Inc.


  • American Psychiatric Association . Diagnostic and Statistical Manual of Mental Disorders. 4th ed, Text Revision American Psychiatric Press; Washington DC: 2000.
  • Bagby RM, Ryder AG, Schuller DR, Marshall MB. The Hamilton Depression Rating Scale: has the gold standard become a lead weight? Am. J. Psychiatry. 2004;161(12):2163–2177. [PubMed]
  • Bech P, Allerup P, Gram LF, Reisby N, Rosenberg R, Jacobsen O, Nagy A. The Hamilton Depression Scale. Evaluation of objectivity using logistic models. Acta Psychiatr. Scand. 1981;63:290–299. [PubMed]
  • Bech P, Allerup P, Maier W, Albus M, Lavori P, Ayuso JL. The Hamilton scales and the Hopkins Symptom Checklist (SCL-90). A cross-national validity study in patients with panic disorders. Br. J. Psychiatry. 1992;160:206–211. [PubMed]
  • Bech P, Allerup P, Reisby N, Gram LF. Assessment of symptom change from improvement curves on the Hamilton depression scale in trials with antidepressants. Psychopharmacology. 1984;84:276–281. [PubMed]
  • Bech P, Cialdella P, Haugh MC, Birkett MA, Hours A, Boissel JP, Tollefson GD. Meta-analysis of randomised controlled trials of fluoxetine v. placebo and tricyclic antidepressants in the short-term treatment of major depression. Br. J. Psychiatry. 2000;176:421–428. [PubMed]
  • Bech P, Gram LF, Dein E, Jacobsen O, Vitger J, Bolwig TG. Quantitative rating of depressive states. Acta Psychiatr. Scand. 1975;51(3):161–170. [PubMed]
  • Bech P, Stage KB, Nair NP, Larsen JK, Kragh-Sorensen P, Gjerris A. The Major Depression Rating Scale (MDS). Inter-rater reliability and validity across different settings in randomized moclobemide trials. Danish University Antidepressant Group. J. Affect. Disord. 1997;42(1):39–48. [PubMed]
  • Birnbaum A. Some latent trait models and their use in inferring an examinee’s ability. In: Lord FM, Novick MR, editors. Statistical Theories of Mental Test Scores. Addison-Wesley; Reading, MA: 1968.
  • Corruble E, Legrand JM, Duret C, Charles G, Guelfi JD. IDS-C and IDS-SR: psychometric properties in depressed in-patients. J. Affect. Disord. 1999;56(23):95–101. [PubMed]
  • Craighead WE, Evans DD. Factor analysis of the Montgomery-Asberg Depression Rating Scale. Depression. 1996;4(1):31–33. [PubMed]
  • Crismon ML, Trivedi M, Pigott TA, Rush AJ, Hirschfeld RM, Kahn DA, DeBattista C, Nelson JC, Nierenberg AA, Sackeim HA, Thase ME. The Texas Medication Algorithm Project: report of the Texas Consensus Conference Panel on Medication Treatment of Major Depressive Disorder. J. Clin. Psychiatry. 1999;60(3):142–156. [PubMed]
  • Cronbach LJ. Coefficient alpha and the internal structure of tests. Psychometrika. 1951;16:297–334.
  • de Montigny C, Grunberg F, Mayer A, Deschenes JP. Lithium induces rapid relief of depression in tricyclic antidepressant drug non-responders. Br. J. Psychiatry. 1981;138:252–256. [PubMed]
  • Depression Guideline Panel . Treatment of Major Depression. U.S. Department of Health and Human Services, Public Health Service, Agency for Health Care Policy and Research; Rockville, MD: 1993. Clinical Practice Guideline, Number 5: Depression in Primary Care: Volume 2.
  • Embretson SE, Reise SP. Item Response Theory for Psychologists. Lawrence E. Erlbaum Associates; Mahwah, NJ: 2000.
  • Faries D, Herrera J, Rayamajhi J, DeBrota D, Demitrack M, Potter WZ. The responsiveness of the Hamilton Depression Rating Scale. J. Psychiatr. Res. 2000;34(1):3–10. [PubMed]
  • First MB, Spitzer R, Gibbon M, Williams J. Structured Clinical Interview for DSM-IV Axis I Disorders (SCID-I), Patient Edition. NY State Psychiatric Institute Biometrics Research Department; New York: 1994.
  • Galinowski A, Lehert P. Structural validity of MADRS during antidepressant treatment. Int. Clin. Psychopharmacol. 1995;10(3):157–161. [PubMed]
  • Gibbons RD, Clark DC, Kupfer DJ. Exactly what does the Hamilton Depression Rating Scale measure? J. Psychiatr. Res. 1993;27:259–273. [PubMed]
  • Hambleton RK, Swaminathan H. Item Response Theory. Kluwer-Nijoff; Boston: 1985.
  • Hamilton M. A rating scale for depression. J. Neurol. Neurosurg. Psychiatry. 1960;23:56–62. [PMC free article] [PubMed]
  • Hamilton M. Development of a rating scale for primary depressive illness. Br. J. Soc. Clin. Psychol. 1967;6(4):278–296. [PubMed]
  • Hammond MF. Rating depression severity in the elderly physically ill patient: reliability and factor structure of the Hamilton and the Montgomery-Asberg Depression Rating Scales. Int. J. Geriatr. Psychiatry. 1998;13(4):257–261. [PubMed]
  • Hawley CJ, Gale TM, Smith VRH, Sen P. Depression rating scales can be related to each other by simple equations. Int. J. Psychiatry Clin. Pract. 1998;2:215–219.
  • Hooper CL, Bakish D. An examination of the sensitivity of the six-item Hamilton Rating Scale for Depression in a sample of patients suffering from major depressive disorder. J. Psychiatry Neurosci. 2000;25(2):178–184. [PMC free article] [PubMed]
  • Horn JL. An empirical comparison of various methods for estimating common factor scores. Educ. Psychol. Meas. 1965;25:313–322.
  • Hulin CL, Drasgow F, Parsons CK. Item-Response Theory: Applications to Psychological Measurement. Dow Jones Irwin; Homewood, IL: 1983.
  • Humphreys LG, Ilgen D. Note on a criterion for the number of common factors. Educ. Psychol. Meas. 1969;29:571–578.
  • Humphreys LG, Montanelli RG., Jr. An investigation of the parallel analysis criterion for determining the number of common factors. Multivariate Behav. Res. 1975;10:193–206.
  • Khan A, Khan SR, Shankles EB, Polissar NL. Relative sensitivity of the Montgomery-Asberg Depression Rating Scale, the Hamilton Depression rating scale and the Clinical Global Impressions rating scale in antidepressant clinical trials. Int. Clin. Psychopharmacol. 2002;17(6):281–285. [PubMed]
  • Maier W, Philipp M, Heuser I, Schlegel S, Buller R, Wetzel H. Improving depression severity assessment--I. Reliability, internal validity and sensitivity to change of three observer depression scales. J. Psychiatr. Res. 1988;22:3–12. [PubMed]
  • Moller HJ. Methodological aspects in the assessment of severity of depression by the Hamilton Depression Scale. Eur. Arch. Psychiatry Clin. Neurosci. 2001;251(Suppl 2):1113–1120. [PubMed]
  • Montanelli RG, Jr., Humphreys LG. Latent roots of random data correlation matrices with squared multiple correlations on the diagonal: a Monte Carlo study. Psychometrika. 1976;41:341–348.
  • Montgomery SA, Äsberg M. A new depression scale designed to be sensitive to change. Br. J. Psychiatry. 1979;134:382–389. [PubMed]
  • Mulder RT, Joyce PR, Frampton C. Relationships among measures of treatment outcome in depressed patients. J. Affect. Disord. 2003;76(13):127–135. [PubMed]
  • Nunnally JC, Bernstein IH. Psychometric Theory. 3rd edn. McGraw-Hill; New York: 1994.
  • O’Sullivan RL, Fava M, Agustin C, Baer L, Rosenbaum JF. Sensitivity of the six-item Hamilton Depression Rating Scale. Acta Psychiatr. Scand. 1997;95:379–384. [PubMed]
  • Oquendo MA, Malone KM, Ellis SP, Sackeim HA, Mann JJ. Inadequacy of antidepressant treatment for patients with major depression who are at risk for suicidal behavior. Am. J. Psychiatry. 1999;156(2):190–194. [PubMed]
  • Orlando M, Sherbourne CD, Thissen D. Summed-score linking using item response theory: application to depression measurement. Psychol. Assess. 2000;12(3):354–359. [PubMed]
  • Prudic J, Haskett RF, Mulsant B, Malone KM, Pettinati HM, Stephens S, Greenberg R, Rifas SL, Sackeim HA. Resistance to antidepressant medications and short-term clinical response to ECT. Am. J. Psychiatry. 1996;153:985–992. [PubMed]
  • Prudic J, Sackeim HA, Devanand DP. Medication resistance and clinical response to electroconvulsive therapy. Psychiatry Res. 1990;31:287–296. [PubMed]
  • Rivera CS, Perez CR, Cao ES. Use of three depression scales for evaluation of pretreatment severity and of improvement after treatment. Psychol. Rep. 2000;87:389–394. [PubMed]
  • Rocca P, Fonzo V, Ravizza L, Rocca G, Scotta M, Zanalda E, Bogetto F. A comparison of paroxetine and amisulpride in the treatment of dysthymic disorder. J. Affect. Disord. 2002;70(3):313–317. [PubMed]
  • Rush AJ, Bernstein IH, Trivedi MH, Carmody TJ, Wisniewski S, Mundt JC, Shores-Wilson K, Biggs MM, Nierenberg AA, Fava M. An evaluation of the Quick Inventory of Depressive Symptomatology and the Hamilton Rating Scale for Depression: a STAR*D report Biol. Psychiatry 2005a. Epub ahead of print, doi:10.1016/j.biopsych.2005.08.022 [PMC free article] [PubMed]
  • Rush AJ, Crismon ML, Kashner TM, Toprac MG, Carmody TJ, Trivedi MH, Suppes T, Miller AL, Biggs MM, Shores-Wilson K, Witte BP, Shon SP, Rago WV, Altshuler KZ. Texas Medication Algorithm Project, phase 3 (TMAP-3): rationale and study design. J. Clin. Psychiatry. 2003;64(4):357–369. [PubMed]
  • Rush AJ, Pincus HA, First MB, Blacker D, Endicott J, Keith SJ, Phillips KA, Ryan ND, Smith GR, Tsuang MT, Widiger TA, Zarin DA. Handbook of Psychiatric Measures. American Psychiatric Association; Washington, DC: 2000.
  • Rush AJ, Sackeim HA, Marangell LB, George MS, Brannan SK, Davis SM, Lavori P, Howland R, Kling MA, Rittberg B, Carpenter L, Ninan P, Moreno F, Schwartz T, Conway C, Burke M, Barry JJ. Effects of 12 months of vagus nerve stimulation in treatment-resistant depression: a naturalistic study. Biol. Psychiatry. 2005b;58(5):355–363. [PubMed]
  • Sackeim HA. The definition and meaning of treatment-resistant depression. J. Clin. Psychiatry. 2001;62(Suppl 16):10–17. [PubMed]
  • Sackeim HA, Prudic J, Devanand DP, Decina P, Kerr B, Malitz S. The impact of medication resistance and continuation pharmacotherapy on relapse following response to electroconvulsive therapy in major depression. J. Clin. Psychopharmacol. 1990;10:96–104. [PubMed]
  • Sackeim HA, Prudic J, Devanand DP, Nobler MS, Lisanby SH, Peyser S, Fitzsimons L, Moody BJ, Clark J. A prospective, randomized, double-blind comparison of bilateral and right unilateral electroconvulsive therapy at different stimulus intensities. Arch. Gen. Psychiatry. 2000;57:425–434. [PubMed]
  • Samejima F. Graded response model. In: van Linden W, Hambleton RK, editors. Handbook of Modern Item Response Theory. Springer-Verlag; New York: 1997. pp. 85–100.
  • Santor DA, Coyne JC. Examining symptom expression as a function of symptom severity: item performance on the Hamilton Rating Scale for Depression. Psychol. Assess. 2001;13(1):127–139. [PubMed]
  • Senra C. Evaluation and monitoring of symptom severity and change in depressed outpatients. J. Clin. Psychol. 1996;52(3):317–324. [PubMed]
  • Sheehan DV, Lecrubier Y, Sheehan KH, Amorim P, Janavs J, Weiller E, Hergueta T, Baker R, Dunbar GC. The Mini-International Neuropsychiatric Interview (M.I.N.I.): the development and validation of a structured diagnostic psychiatric interview for DSM-IV and ICD-10. J. Clin. Psychiatry. 1998;59(Suppl 20):22–33. [PubMed]
  • Trivedi MH, Rush AJ, Crismon ML, Kashner TM, Toprac MG, Carmody TJ, Key T, Biggs MM, Shores-Wilson K, Witte B, Suppes T, Miller AL, Altshuler KZ, Shon SP. Clinical Results for Patients With Major Depressive Disorder in the Texas Medication Algorithm Project. Arch. Gen. Psychiatry. 2004;61(7):669–680. [PubMed]
  • Williams JB. A structured interview guide for the Hamilton Depression Rating Scale. Arch. Gen. Psychiatry. 1988;45(8):742–747. [PubMed]
  • Zimmerman M, Posternak MA, Chelminski I. Is the cutoff to define remission on the Hamilton Rating Scale for Depression too high? J. Nerv. Ment. Dis. 2005;193(3):170–175. [PubMed]
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • MedGen
    Related information in MedGen
  • PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...