NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Abou-Setta AM, Mousavi SS, Spooner C, et al. First-Generation Versus Second-Generation Antipsychotics in Adults: Comparative Effectiveness [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2012 Aug. (Comparative Effectiveness Reviews, No. 63.)

Cover of First-Generation Versus Second-Generation Antipsychotics in Adults: Comparative Effectiveness

First-Generation Versus Second-Generation Antipsychotics in Adults: Comparative Effectiveness [Internet].

Show details

Summary and Discussion

This report provides a comprehensive synthesis of the evidence on the comparative effectiveness and safety of first- (FGAs) versus second-generation antipsychotics (SGAs) in adults with schizophrenia, schizophrenia-related psychoses, and bipolar disorder. We included studies that directly compared one FGA versus one SGA. We did not include: studies comparing various FGAs or those comparing various SGAs; trials with no active comparator (e.g., no treatment or placebo-controlled trials); or those trials comparing antipsychotics not approved by the U.S. Food and Drug Administration (FDA) or no longer available in the U.S. (Appendix O). The strength of evidence (SoE) for core illness symptoms and key adverse events (AEs) is summarized by comparison in the tables below.

We identified a large number of studies comparing individual FGAs with individual SGAs, with the majority of studies being efficacy trials.41 The decision to limit the scope of the review to only randomized controlled trials (RCTs) and non-RCTs with additional input from long-term cohort studies was based on a previous literature search and expert opinion. The technical expert panel felt that there were sufficient RCTs in this area to adequately address the questions. To supplement the RCTs (which are often short-term), it was decided that long-term cohort studies would also be included as these could provide data on long-term outcomes (e.g. mortality, tardive dyskinesia, diabetes mellitus and metabolic syndrome), that are not usually available in the more rigorously designed RCTs. Overall, 113 studies provided data on 22 different comparisons for patients with schizophrenia or schizophrenia-related psychoses. Fewer studies provided evidence comparing antipsychotic drugs in patients with bipolar disorder (n = 11). One trial included patients with schizophrenia or bipolar disorder. The most frequent comparisons involved haloperidol, with 43 studies comparing haloperidol with risperidone and 37 studies comparing haloperidol with olanzapine. Nevertheless, the number of studies available for each comparison and outcome was often limited. Although many studies reported data for core illness symptoms, a total of 111 scales and subscales or composite outcomes were used across studies. The heterogeneity in outcome assessment tools and the small number of studies within specific comparisons precluded drawing firm conclusions that may be directly relevant to front-line clinical decisions. Further, the primary outcomes were most often for core illness symptoms and did not cover the full spectrum of outcomes that clinicians and patients need information about for medication decisionmaking (e.g., patient employment and functioning). Outcomes potentially important to patients were rarely assessed in the studies, including health-related quality of life, social and occupational functioning, legal interactions, and certain symptoms, such as depression or anxiety. This limits the potential applicability to real-life functions and naturalistic outcomes. For future reviews, these important outcomes may be searched for in long-term observational studies, but it is unclear whether these types of outcomes are assessed in the research literature.

Data were provided primarily from randomized controlled trials (RCTs); however, in our quality assessment, most of the trials were found to have unclear risk of bias due to insufficient reporting of the methods to prevent selection bias (i.e., random sequence generation and allocation concealment) and performance bias (i.e., proper blinding of participants and study personnel). Inadequate randomization and allocation concealment have been associated with exaggerated estimates of treatment effects for a number of medications in different fields of on average 12 and 18 percent, respectively.164,165 Within this clinical field, trials that are single-blind or open-label have been found to favor the SGAs over FGAs;6 hence, results need to be interpreted in light of these limitations of the primary studies.

Despite our efforts to identify long-term safety data from observational studies, only two retrospective cohort studies provided data for a minimum 2-year followup period. In contrast, we included eight RCTs providing data over a followup period of 2 or more years. Short-term efficacy trials, which are accepted by the regulatory authorities, may not identify time-dependent AEs, such as tardive dyskinesia, diabetes mellitus, metabolic syndrome or mortality. The optimal and minimal acceptable duration of followup in trials remains to be determined, but may arbitrarily be set at 2 years duration in order to capture important clinical and patient-related outcomes (e.g., occupational functioning measures and long-term safety). Even with long-term trials, it is important that researchers document and report these outcomes as it is so far evidenced that there is a gap in the literature with regards these outcomes.

The majority of studies were industry-funded (n = 88; 70 percent), which can increase the chance of pro-industry findings.166 Full disclosure of the nature and extent of industry involvement in the design, conduct, and analysis of such studies can help readers better evaluate the likelihood of industry bias in trial results. Of further note, funding was not disclosed for 19 percent of studies (n = 24), highlighting the need for transparency in reporting the nature and extent of financial support. Industry bias in the studies included in this review may include the choice of medication comparisons, dosages and outcomes being driven by the funder’s interests and priorities. For instance, the largest volume of research within the report compared haloperidol with olanzapine or risperidone, whereas many other drugs have not been extensively examined.

The evidence is summarized by key question (KQ) in the sections that follow. Overall, there were few differences of clinical importance between the active drug comparisons. In general, few differences between FGAs and SGAs on symptom improvement were identified; however, we cannot assume that the drugs are equivalent. Rather, the analyses were unable to detect differences often as a result of small numbers of trials for any given comparison and outcome. Moreover, most of the trials were designed as superiority trials testing the a priori hypothesis that SGAs are more efficacious than FGAs; hence, the individual trials, and some of the pooled analyses, may not have adequate power to confirm equivalence. FGAs generally had poorer safety profiles during the studies’ followup period.

Key Question 1. Core Illness Symptoms

The findings for core illness symptoms are presented for each condition in Table 106. Comparisons and outcomes for which there was insufficient strength of evidence (e.g., evidence from single trials) to draw a conclusion are not displayed in the tables. The evidence comparing individual FGAs and SGAs was insufficient to draw conclusions for the following comparisons: chlorpormazine versus olanzapine, quetiapine, and ziprasidone; fluphenazine versus olanzapine, quetiapine, and risperidone; haloperidol versus asenapine; perphenazine versus aripiprazole, olanzapine, quetiapine, risperidone and ziprasidone; trifluoperazine versus clozapine.

Table 106. Summary of the strength of evidence for core illness symptoms (KQ1).

Table 106

Summary of the strength of evidence for core illness symptoms (KQ1).

For schizophrenia or schizophrenia-related psychoses, seven studies provided data on core illness symptoms for chlorpromazine versus clozapine. No differences were found for positive, negative, or general psychopathology. Clozapine showed benefits for total score (moderate SoE).

Eight studies provided data on core illness symptoms for haloperidol versus aripiprazole. No differences were found for positive or general psychopathology, global ratings, or total symptom score. The SoE was low for positive outcomes, global ratings and total scores; the SoE was insufficient for general psychopathology. Aripiprazole showed benefits for negative symptoms (moderate SoE).

Eight studies provided data on core illness symptoms for haloperidol versus clozapine. No significant differences were found for positive symptoms, negative symptoms, or general psychopathology (low SoE). The findings were discordant for total symptom score: no difference was found based on Brief Psychiatric Rating Scale (BPRS) and Positive and Negative Symptom Scale (PANSS) (low SoE), whereas one study showed benefits for clozapine on the Clinical Global Impression–Improvement (CGI–I) and Severity (CGI–S) scales (insufficient SoE).

Twenty-seven studies provided data on core illness symptoms for haloperidol versus olanzapine. No differences were found for positive symptoms (low SoE). Olanzapine was favored for negative symptoms (moderate SoE). In terms of general psychopathology, a significant benefit for olanzapine was found based on the Hamilton Rating Scale for Depression (HAM–D), Montgomery-Asberg Depression Rating Scale (MADRS), and Young Mania Rating Scale (YMRS). No differences were observed for the other five scales of general symptoms assessed. The SoE varied across outcomes from insufficient to moderate. Olanzapine was favored for global ratings and total symptom scores based on the CGI–S and PANSS; however no differences were found for the other four scales assessed. The SoE for these outcomes also varied from insufficient to moderate.

Nine studies provided data on core illness symptoms for haloperidol versus quetiapine. No significant differences were found for positive, negative, or general psychopathology. A significant difference favoring haloperidol was found for one (CGI–S) of the five global ratings and total symptom scores assessed. The SoE across outcomes ranged from insufficient to moderate.

Thirty-one studies provided data on core illness symptoms for haloperidol versus risperidone. There were no differences for positive symptoms (low SoE). Risperidone was favored for negative symptoms based on the Scale for the Assessment of Negative Symptoms (SANS) and PANSS (negative) (moderate SoE). No differences were found for any of the six measures used to assess general psychopathology (low or insufficient SoE). Seven of the global ratings or total symptom scores showed no differences, whereas the Symptom Checklist (SCL–90–R) showed a benefit for risperidone (low or insufficient SoE).

Seven studies provided data on core illness symptoms for haloperidol versus ziprasidone. There were no significant differences in terms of negative symptoms, general psychopathology, global ratings, or total score (low or insufficient SoE). No studies provided data on positive symptoms.

A total of 11 studies examined patients with bipolar disorder. The most frequent comparison was haloperidol versus risperidone (four RCTs). No significant differences were found for mood (mania), mood (depression), positive or negative symptoms, or global ratings and total scores (low or insufficient). Two studies compared haloperidol versus olanzapine and found no significant differences in sleep, mood (mania), mood (depression), global ratings, or total symptom scores (low or insufficient SoE). Two studies compared haloperidol with aripiprazole and found no differences in mood (mania), mood (depression), positive or negative symptoms, or global ratings and total symptom scores (low or insufficient SoE). Single studies compared chlorpromazine versus clozapine and haloperidol versus quetiapine and ziprasidone (insufficient SoE).

Key Question 2. Functional Outcomes and Health Care Resource Utilization

The findings for functional outcomes and health care system utilization are presented for each condition and comparison in Table 107. We did not assess the SoE for outcomes in KQ2.

Table 107. Summary of evidence for functional outcomes, health care system utilization, and other outcomes (KQ2).

Table 107

Summary of evidence for functional outcomes, health care system utilization, and other outcomes (KQ2).

Results for functional outcomes were available from 9 head-to-head comparisons in studies of patients with schizophrenia or schizophrenia-related psychoses. No significant differences in functional outcomes were observed between groups for most of the comparisons. However, in most cases evidence came from single studies. Results for health care system utilization were available for 10 head-to-head comparisons, and no differences were found for any comparison.

Only one trial comparing haloperidol with olanzapine provided data on functional outcomes in patients with bipolar disorder. Significant differences were found favoring olanzapine for the number of individuals actively working for pay. No differences were found for household or work activities impairment.

Key Question 3. Medication-Associated Adverse Events and Safety

The findings for the AEs that were deemed most clinically important are summarized in Table 108. The evidence comparing individual FGAs and SGAs was insufficient to draw conclusions for the following outcomes and comparisons: tardive dyskinesia (chlorpromazine vs. clozapine and ziprasidone; haloperidol vs. clozapine, olanzapine, quetiapine, and ziprasidone), mortality (chlorpromazine vs. clozapine and ziprasidone; haloperidol vs. risperidone; thioridazine vs. clozapine and risperidone), diabetes mellitus (haloperidol vs. olanzapine; perphenazine vs. olanzapine, quetiapine, risperidone, and ziprasidone), and metabolic syndrome (haloperidol vs. clozapine; perphenazine vs. olanzapine, quetiapine, risperidone, and ziprasidone).

Table 108. Summary of the strength of evidence for medication-associated adverse events and safety (KQ3).

Table 108

Summary of the strength of evidence for medication-associated adverse events and safety (KQ3).

Two trials each provided data on mortality for chlorpromazine versus clozapine and haloperidol versus aripiprazole; no significant differences were found, although the length of followup of the trials for the latter comparison was only 24 hours. For metabolic syndrome, two trials provided data for haloperidol versus olanzapine and showed no significant difference in incidence of metabolic syndrome. The SoE for these comparisons was low, suggesting that further research may change the results and change our confidence in the results.

Data were also recorded for general measures of adverse events (AEs) and specific AEs by physiological system (e.g., cardiovascular, endocrine); these outcomes were not assessed for SoE. For general measures of AEs, significant differences were found in the incidence of patients with AEs and withdrawals due to AEs for several comparisons. Most often, the comparison included haloperidol, and the risk was consistently higher for the FGA. The most frequently reported AEs with significant differences were in the category of extrapyramidal symptoms (EPS) and most often involved a comparison with haloperidol. In the vast majority of cases, the SGA had the preferred AE profile for EPS.

We were unable to adequately examine persistence and reversibility of AEs due to the relatively short followup of the included studies: study followup periods averaged 8 weeks. It is unclear whether AE persistence and reversibility of several significant AEs could be reasonably examined during this time period (e.g., metabolic conditions, body mass index or weight, and cardiovascular).

Key Question 4. Other Outcomes

The findings for other outcomes are presented for each condition and comparison in Table 109. We did not assess the SoE for outcomes in KQ4.

Table 109. Summary of the evidence for other outcomes (KQ4).

Table 109

Summary of the evidence for other outcomes (KQ4).

Results for other outcomes were available for 19 head-to-head comparisons in studies of patients with schizophrenia or schizophrenia-related psychoses. Few significant differences were found across the comparisons and outcomes examined. For most significant findings, the SGA was preferred. The most commonly reported other outcome was response rate. A significant difference in response rates based on three studies was found favoring clozapine compared with chlorpromazine. Olanzapine was favored over haloperidol for remission (3 trials) and response rates (14 trials). Significant differences were found favoring aripiprazole over haloperidol for caregiver satisfaction (1 trial) and patient satisfaction (1 trial). Risperidone was favored over haloperidol for relapse rates (6 trials). Olanzapine was favored over perphenazine for time to all-cause medication discontinuation (1 trial). Health-related quality of life was evaluated for the following comparisons, and no significant differences were found: haloperidol versus olanzapine, quetiapine, risperidone, and ziprasidone (1 trial each); olanzapine, quetiapine, risperidone, and ziprasidone (1 trial each). There was a significant difference in HRQoL for perphenazine over aripiprazole (1 trial).

Results for other outcomes were available for three head-to-head comparisons in studies of patients with bipolar disorder. Significant differences were found for health-related quality of life in one trial comparing haloperidol versus olanzapine: haloperidol was favored for the mental summary score, and olanzapine was favored for the physical summary score. One study showed a significant difference favoring haloperidol compared with ziprasidone for response rates.

Key Question 5. Subgroups

A total of 41 studies compared outcomes for predefined subgroups. Among the studies of patients with schizophrenia and schizophrenia-related psychoses, data were most often available for race and treatment resistance. The race most often examined was Asian. No notable differences were observed for the subgroups compared to the overall findings.

The only subgroup available for analysis in studies of patients with bipolar disorder was disorder subtype, specifically bipolar I and bipolar II. The results were consistent with the overall findings. A significant difference favored haloperidol compared with ziprasidone for core illness symptoms (YMRS) in patients with bipolar I disorder.

Results in the Context of Previous Literature

The results of this review are similar in some respects to another recent systematic review of SGAs versus FGAs, although the present review is broader in scope in terms of medications included, patient populations, and outcomes.6 There were a number of methodological differences between the previous review and this one; the previous review included non-FDA-approved antipsychotics, restricted the analysis to only double-blinded trials, included only studies examining optimum SGA dosage and oral route of administration, and pooled data across efficacy outcome measures. The differences in the methodologies may have led to slightly different conclusions regarding individual SGAs.

The previous review compared nine SGAs (six of which were included in this report) with FGAs for overall efficacy (total symptom scores), positive, negative, and depressive symptoms, relapse, quality of life, EPS, weight gain, and sedation. They reported that the overall efficacy of the FDA-approved SGAs clozapine, olanzapine, and risperidone faired better than FGAs. In terms of global ratings and total symptom scores, we found that clozapine was more efficacious than chlorpromazine, but not compared with haloperidol. We found that olanzapine performed better than haloperidol on one of the three total symptom scores assessed. We found no differences between haloperidol and risperidone for the five total symptom scores reported. The previous review found that SGAs were not superior to FGAs regarding the negative symptoms. We found no difference in negative symptoms for haloperidol versus clozapine; however, we found evidence that olanzapine was more efficacious than haloperidol for negative symptoms, whereas the evidence for risperidone compared with haloperidol was mixed. In general, the findings for AEs were consistent between reviews showing poorer safety profiles with respect to EPS for FGAs (specifically haloperidol) and more weight gain among the SGAs (in particular, olanzapine and risperidone).

The general results of our review for schizophrenia are consistent with the results of two widely cited trials in this clinical field: Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE)23 and Cost Utility of the Latest Antipsychotic drugs in Schizophrenia Study (CUtLASS).21 The CATIE trial was included in this review and was designed to evaluate whether FGAs were inferior to SGAs in efficacy and safety. Findings from the CATIE trial suggested that the FGA perphenazine and various SGAs (olanzapine, quetiapine, risperidone, and ziprasidone) differed more in their side-effect profiles than therapeutic effects. The study, like this review, also demonstrated that effectiveness across medications varied, and in some cases, this difference was clinically important. For example, in CATIE, clozapine was most effective for patients whose symptoms did not improve with first-line treatment, but it had a worse side-effect profile, and quetiapine (SGA) was more effective for patients who did not tolerate perphenazine (FGA). The CUtLASS trial was not included in our review as this trial compared FGAs and SGAs as classes rather than as individual medications. However, this study like CATIE and our review found that differences between 13 grouped FGAs and four grouped SGAs for effectiveness were not significantly different. This body of work suggests that there are no clear-cut advantages of either medication class, and that there is a range of medication choice (both FGA and SGA) for prescribing clinicians and their patients to consider during treatment initiation and maintenance. Establishing how to use existing medications safely and effectively to optimize patient outcomes, and determining the role and effect of long-term antipsychotic use (> 2 years) remains urgently needed both for treating schizophrenia and use of antipsychotics for bipolar disorder.

In general, there were some differences between this review and the above-cited studies with respect to methods and scope. Our review was very broad in scope, including all patient populations and all FDA-approved medications, regardless of dose or route of administration. We also included an exhaustive list of outcomes. The extent of outcomes we examined was substantial; however, many outcomes were reported too sparsely to provide strong evidence. Moreover, one of the contributions of this comprehensive synthesis is that it highlights this problem of variable outcome selection across trials while providing extensive details to consider when making treatment choices on an individual basis. One of the unique features of our review was the SoE assessments. Although previous trials and reviews have found some significant findings, our SoE assessments provide information on how confident we can be in those results and how likely the effects may change with future research. In most cases, the SoE was insufficient or low, highlighting the likelihood that future research may change the estimates of effect and the need for a stronger evidence base to inform clinical practice.

Applicability

This report included studies that compared an individual FGA to an individual SGA. Placebo-controlled studies or studies comparing a FGA versus another FGA, or a SGA versus another SGA, were not included. Therefore, the evidence is focused on the comparative effectiveness of FGAs versus SGAs, but not on their effectiveness and safety compared to placebo or other active agents. Overall, there were 22 head-to-head comparisons across the relevant studies; however, within most comparisons there were few studies.

The focus of this review was adults, age 18 to 64 years, with schizophrenia or schizophrenia-related psychoses and bipolar disorder. The average age across studies ranged from 21 to 50 years (median = 37 years [interquartile range (IQR), 33 to 41]). Most studies were highly selective in patient enrollment and included patients who (1) met strict diagnostic criteria for case definition, (2) had few comorbidities, and (3) used few or no concomitant medications. Older adults and the most seriously ill patients were also underrepresented. Such highly selective criteria may increase the likelihood of drug benefit and decrease the likelihood of AE occurrence. Almost half the studies involved hospitalized patients (inpatient treatment) (62 of 125 studies) or mixed inpatient and outpatient populations (26 studies); relatively few studies examined only outpatient treatment populations (19 studies). As such we judge the results of this report to be applicable to patients in outpatient and inpatient treatment settings.

Another factor that restricts the applicability of results is the limited duration of followup. The limited long-term (≥2 years) followup data precludes the ability to detect serious adverse events (SAEs) that may develop over the course of several years. The average length of followup in the included studies was only 8 weeks (IQR, 6 to 26 weeks). Further, a priori, we defined the following key AEs: diabetes mellitus, mortality, tardive dyskinesia, and major metabolic syndrome. In order to identify evidence for these important outcomes, we expanded our scope to search for and include cohort studies with a minimum 2-year duration. Despite a comprehensive search, we only identified two cohort studies meeting our criteria. This is an important limitation that needs to be considered when interpreting the results and applying them in clinical practice.

Limitation of Existing Evidence

Inconsistency in treatment comparisons, outcomes, outcome measurement, and patient populations across studies makes drawing firm clinical conclusions difficult. Few studies compared the same antipsychotic medications and dosage using similar measures; various scales and surrogate measures were used to assess efficacy for different outcomes and AEs. Consensus is needed regarding outcomes and measures used to assess outcomes. Surrogate outcome measures may have been attractive alternatives in studies given their ability to save time (e.g., shorter followup durations) and ease to assess. However, their main limitation may be the lack of correlation between the results from surrogate and clinically meanful outcomes. This inconsistency can lead to recommendations of harmful medications or the exclusion of beneficial medications. Examples of surrogate outcome measures in this report include laboratory values to indicate treatment emergent metabolic syndrome, a clinical outcome. Additionally, functional outcomes and symptomatic outcomes (e.g., sedation, restlessness) were rarely and unequally reported throughout the trial reports, even though these outcomes are often vital to patient compliance.

A key limitation and challenge in synthesizing and interpreting this body of evidence is the heterogeneous patient populations across and within studies, which is in part driven by the complex nature of these disorders and their course over time. The studies we included had very mixed populations with respect to disorder subtypes, comorbid drug/or alcohol use, treatment resistance, and number of previous episodes. These variables may create differential response to treatment, and this has been the basis for recommendations around personalized medicine in this area.167 We conducted extensive subgroup and sensitivity analyses to explore these varying features. In many cases, the subgroup analyses were consistent with the overall estimates of effect. In cases where some differences were found, these were often based on small numbers of studies within the subgroups. In any case, the results of subgroup analyses should be interpreted as hypothesis generating rather than hypothesis confirming. Our findings may provide some information to make treatment decisions for individual patients, but need to be confirmed in future research. Moreover, treatment decisions and future research should take into consideration individual characteristics that can influence response to treatment including needs, preferences, past treatment history and response to previous medications, and clinical factors such as family history of medical conditions (e.g., diabetes, hypertension). Patients should have access to a range of options that meet their differing needs and response patterns, as well as changes in these over time.

An additional limitation and challenge of synthesis in this area is that characteristics of the research may have changed over time, including drug doses and patient populations. For instance, relatively higher doses of haloperidol may have been used in earlier studies. Further, patients in more recent studies may not have been exposed to an FGA, whereas in earlier studies patients may have been exposed, and have become resistant, to an FGA. An additional problem is that patient response to treatment may vary depending on what medication they were taking prior to entering the study.23 Information on baseline medication was not often provided in the individual studies; this information should be collected and reported in future studies for clinicians who are reviewing study findings and should be acknowledged when considering treatment options for individual patients.

Another important limitation in this body of evidence pertains to the instruments used to measure outcomes. Over 100 different scales and subscales or composite outcomes were used to assess efficacy outcomes across the studies. Although some outcomes and scales were assessed fairly consistently for core symptoms across conditions, such as the PANSS and BPRS for schizophrenia, measurement of core symptoms using subscale scores, different criteria, and different measures were common. The CGI reported across the studies make study outcomes relevant for clinicians; however, the heterogeneity in the different types of scales used to measure global improvements makes comparisons of patient improvement across studies and interventions challenging.

We also identified a vast array of different measures to assess functional capacity. For instance, 80 different measures were used among studies comparing haloperidol with olanzapine. For most measures, only single trials provided data. This is problematic in that when significant differences are found, we are not able to discern whether they are real differences or arise due to multiple statistical testing. Discussion and consensus are also needed on outcomes that can provide more information on patient functioning and well-being. This includes a systematic assessment of outcomes potentially important to patients, such as health-related quality of life, social and occupational functioning, and legal interactions.

An important limitation of this review and other systematic reviews is the design and quality of the primary included studies. The majority of studies providing data for this report were RCTs (n = 123); however, most were designed as superiority trials, often with an a priori hypothesis that the SGA would be more efficacious.25 The individual studies and, in many cases, the pooled results may not have had sufficient power to detect equivalence or noninferiority between drugs. These study designs are consistent with CATIE superiority trial design, but given the well-regarded results of CATIE and the findings from this review and others,6 future trials need sound rationale for designing superiority trials versus using equivalence or inferiority designs. On another note, we assessed risk of bias in RCTs using an empirically derived tool developed by The Cochrane Collaboration and assessed the methodological quality of cohort studies using the Newcastle-Ottawa Scale. All of the trials had an unclear risk of bias (n = 78; 63 percent) or high risk of bias (n = 45; 37 percent). Only 15 RCTs (12 percent) were evaluated as having adequately generated the allocation sequence, and 6 RCTs (5 percent) had an adequately concealed allocation processes. Measures employed by the study investigators to ensure that the allocation sequence was random and occurred without foreknowledge of treatment assignments was unclear in the majority of the trials. These features should be routinely employed in order to avoid selection bias.

Only 17 percent of RCTs (n = 20) reported blinding study investigators and participants (26 percent had unclear reporting), which is another important limitation of this body of evidence as a lack of blinding has been shown to produce exaggerated treatment effects.6 Blinding through use of matched placebo tablets that appear and taste similar to the study medication may reduce the risk that the knowledge of which intervention was received, rather than the active drug itself, affected outcomes. Studies should also consistently ensure and report that outcome assessors are blinded to treatment allocation. Incomplete outcome data was a limitation in almost half of the trials (unclear risk of bias, 26 percent; high risk of bias, 20 percent) due to loss to followup and inadequate handling of missing data in the reporting and analysis, which may have exaggerated reported treatment effects. The majority of trials were free of selective reporting (97 percent) and other sources of bias (e.g., significant baseline imbalances between study groups) (84 percent).

Two cohort studies were included in this review, due to their focus on AEs (tardive dyskinesia and mortality rates). These studies were identified as being good quality cohorts, receiving a rating of 8 out of 9 points on the Newcastle-Ottawa Scale. However, these cohort studies are limited by their design; the lack of randomization for treatment allocation makes the results vulnerable to bias due to a lack of comparability between treatment groups.

With regards to bipolar disorder, none of the included studies was limited to individuals with bipolar depression and therefore no conclusions can be made about the comparative effectiveness of interventions for this condition.

This comparative effectiveness review (CER) has several limitations. Only English-language studies were eligible for inclusion in the review; therefore, it is possible that relevant studies published in other languages may have affected the review findings although. However, our findings are consistent with a similar review that included non-English studies.6 The scope of this report was limited to the direct comparison of individual FGAs with individual SGAs. Although this produces results that are internally valid, there is a risk that findings of no difference lead to false conclusions of equal efficacy by the reader. Future research incorporating indirect analyses through mixed treatment comparisons may add more strength to the evidence base. Further, we cannot make conclusions on the comparison of antipsychotics within the same drug class or with placebo. Therefore, without formal indirect quantitative or qualitative comparisons, no conclusions can be made as to the comparative efficacy of drugs in the same class. In addition, evidence on the use of other drug classes (e.g., anticonvulsants) that are frequently used in the treatment of these patient populations is not considered in this report. Finally, specific patient populations (e.g., patients with prior antipsychotic resistance) were under-studied in long-term trials, precluding firm conclusions relating to comparative effectiveness, response and remission rates, and side-effects profiles. The inclusion criteria of many studies were highly selective, primarily examining inpatients with no serious mental illness and who were not alcohol or substance users. This may not be generally reflective of patients with schizophrenia, as there is a high prevalence of comorbid disorders and alcohol or illicit drug use in this patient population.

This report presents a synthesis of the available evidence on the effectiveness and safety of antipsychotics in the adult population. However, we do not make clinical recommendations on the use of these medications, as this is the purview of the user group. We trust that the evidence presented in this report will be helpful in the further development of clinical practice guidelines in this field.

Future Research

This review identified a growing body of literature examining the effectiveness of FGAs and SGAs for treating schizophrenia and related psychoses. However, for many of the individual comparisons there were few trials. There is a need for consensus on the most important FGA and SGA comparisons. For many of the comparisons, the FGA was haloperidol. As haloperidol is known to have a poor AE profile, using this as the standard comparison may exaggerate the apparent safety profile of the SGA being compared. Consensus is needed on which comparisons will be the most informative and provide the most valid and accurate information to inform clinical decisions.

For treating bipolar disorder, more head-to-head trials are needed to compare the effectiveness of currently approved FGAs and SGAs. Given that antipsychotic medications are used to augment treatment with mood-stabilizing medications to ensure effective treatment of core illness symptoms for various forms of the disorder (e.g., acute mania, bipolar depression) and maintenance treatment, further research is necessary to better understand the impact of treatment on patient safety and function.

More longitudinal research is also needed on long-term AEs. Only two cohort studies were identified for this review that examined SAEs with long-term antipsychotic use; however, these studies only provided evidence for two SAEs: tardive dyskinesia and mortality rates. Studies examining the naturalistic and long-term efficacy and, particularly, the safety of antipsychotics over the course of several years and across a number of important AEs are required. Such studies could be modeled after longitudinal cohort studies in other fields, such as the Framingham study168 that has been ongoing for decades to examine risk factors for cardiovascular disease.

Short- and long-term evaluations of the effectiveness of FGAs and SGAs with patient subpopulations, including patients with medical and neurological comorbidities, are needed. Further, there is a need for studies investigating how drug dose, age, and other factors, such as comorbidities, influence the occurrence of SAEs, which would help estimate possible risks in specific patient populations.

Future studies should examine functional naturalistic outcomes that are important to patients. These outcomes include health-related quality of life and other patient-reported outcomes, relationships, academic and occupational performance, and legal interactions.

Conclusions

This report provides a comprehensive synthesis of the evidence on the comparative effectiveness and safety of individual FDA-approved FGAs compared with individual FDA-approved SGAs. The report provides extensive details in terms of study characteristics and methodological features, which may help inform individual treatment decisions. The focus of the report was adults age 18 to 64 years with schizophrenia, schizophrenia-related psychoses, and bipolar disorder. The vast majority of relevant studies involved patients with schizophrenia or schizophrenia-related psychoses. Studies most often involved haloperidol, which was compared most frequently with risperidone (43 studies) and olanzapine (37 studies). Numerous studies provided data on core illness symptoms; however, many different scales were used to assess outcomes, which limited the quantitative pooling of data. Few notable differences of clinical importance were identified. In the majority of cases where significant differences were observed, the SGA showed greater improvement in core illness symptoms. Further, the SoE was low or insufficient for most comparisons, suggesting that future research may change the results and change our confidence in the results.

Data on the relative effectiveness of individual FGAs and SGAs for functional outcomes, health care system utilization, and other outcomes were generally sparse. Numerous tasks and tests were used to assess functional capacity. In most cases, only single studies contributed to each measure. The variety of functional measures assessed across studies precluded firm conclusions regarding the overall effectiveness of individual drugs in terms of patient functioning. Few studies reported on health care system utilization or patient-important outcomes. Where health-related quality of life was assessed, no differences were found.

The scope of this report included cohort studies with a minimum followup of 2 years in order to identify AEs of most clinical importance, including diabetes mellitus, mortality, tardive dyskinesia, and major metabolic syndrome. Only two studies with long-term followup were identified; hence, evidence on these important AEs is limited and urgently needed. A variety of AEs associated with numerous physiological systems were reported. The AEs most often reported involved EPS, which occurred more frequently for FGAs, particularly haloperidol, than for SGAs.

The evidence for important subgroups was limited. The most frequently examined subgroups were race and treatment resistance. There were no notable differences in outcomes for these subgroups compared to the overall results.

Future research needs to incorporate design elements to minimize bias, in particular blinding of investigators, patients, and outcome assessors and adequate handling and reporting of missing data. Researchers need to ensure and report on appropriate methods for sequence generation and allocation concealment. Long-term longitudinal studies of at least 2-year duration are needed to detect important differences in the relative safety profile of individual FGAs and SGAs.

In summary, data on the comparative effectiveness of individual FGAs and SGAs precluded drawing firm conclusions for outcomes that are directly relevant to front-line clinical decisions. Overall, there were few statistically significant differences. Outcomes potentially important to patients were rarely assessed. Finally, data on long-term safety are lacking and urgently needed.

PubReader format: click here to try

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (6.5M)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...