Opioid‐specific medication‐assisted therapy and its impact on criminal justice and overdose outcomes

Abstract Background The overlap between justice system involvement and drug use is well‐documented. Justice‐involved people who misuse opioids are at high risk for relapse and criminal recidivism. Criminal justice policymakers consider opioid‐specific medication‐assisted therapies (MATs) one approach for improving outcomes for this population. More research is needed that explores the impacts of opioid‐specific MATs for justice‐involved people. Objectives This study sought to assess the effects of opioid‐specific MAT for reducing the frequency and likelihood of criminal justice and overdose outcomes for current or formerly justice‐involved individuals. Search Methods Records were searched between May 7, 2021 and June 23, 2021. We searched a total of sixteen proprietary and open access databases that included access to gray literature and conference proceedings. The bibliographies of included studies and relevant reviews were also searched. Selection Criteria Studies were eligible for inclusion in the review if they: (a) assessed the effects of opioid‐specific MATs on individual‐level criminal justice or overdose outcomes; included (b) a current or formerly justice‐involved sample; and (c) a randomized or strong quasi‐experimental design; and c) were published in English between January 1, 1960 and October 31, 2020. Data Collection and Analysis We used the standard methodological procedures as expected by The Campbell Collaboration. Main Results Twenty studies were included, representing 30,119 participants. The overall risk of bias for the experimental studies ranged from “some” to “high” and for quasi‐experimental studies ranged from “moderate” to “serious.” As such, findings must be interpreted against the backdrop of less‐than‐ideal methodological contexts. Of the 20 included studies, 16 included outcomes that were meta‐analyzed using mean log odds ratios (which were reported as mean odds ratios). Mean effects were nonsignificant for reincarceration (odds ratio [OR] = 0.93 [0.68, 1.26], SE = .16), rearrest (OR = 1.47 [0.70, 3.07], SE = 0.38), and fatal overdose (OR = 0.82 [0.56, 1.21], SE = 0.20). For nonfatal overdose, the average effect was significant (OR = 0.41 [0.18, 0.91], SE = 0.41, p < 0.05), suggesting that those receiving MAT had nearly 60% reduced odds of a nonfatal overdose. Implications for Policy, Practice, and Research The current review supports some utility for adopting MAT for the treatment of justice‐involved people with opioid addiction, however, more studies that employ rigorous methodologies are needed. Researchers should work with agencies to improve adherence to medication regimens, study design, and collect more detailed information on participants, their criminal and substance use histories, onset, and severity. This would help clarify whether treatment and control groups are indeed comparable and provide better insight into the potential reasons for participant dropout, treatment failure, and the occurrence of recidivism or overdose. Outcomes should be assessed in multiple ways, if possible (e.g., self‐report and official record), as reliance on official data alone may undercount participants' degree of criminal involvement.

multiple ways across the full follow-up period, including from selfreported and official records.

| How up-to-date is this review?
The review authors searched for studies between May and June 2021. The overlap between criminal justice system involvement and drug use is well-documented across a variety of countries and samples (e.g., Boutwell et al., 2007;Dolan et al., 2007;European Mon-itoring Centre on Drugs and Drug Addiction [EMCDDA], 2012; Winkelman et al., 2018). Once released from secure correctional facilities, people with opioid addiction are at high risk for relapse and criminal recidivism. Specifically, a meaningful minority of deaths of former inmates is attributable to opioid overdose (Binswanger et al., 2013;Singleton et al., 2016;World Health Organization [WHO], 2010), and a significant percentage of former inmates will recidivate within five years (Fazel & Wolf, 2015).
Criminal justice agencies have been particularly overwhelmed by the recent opioid epidemic. Treating opioid (and other substance) addiction as a means to reduce risk for future criminality and improve public safety is inherently a responsibility for the criminal justice system, as the influence of substance use on criminal activity is well documented in the literature (Bonta & Andrews, 2017). Of course, one could also argue that opioid addiction, its withdrawal symptoms, and its recovery constitute a serious medical condition for which criminal justice agencies have a responsibility to treat and manage, in accordance with the United Nations requirements for the Basic Principles for the Treatment of Prisoners (United Nations, 1990). In fulfilling their responsibility to provide adequate health care for individuals in their custody, and through their efforts to rehabilitate offenders to reduce their risk for recidivism more generally, correctional providers must treat substance use disorders. When they do, they may impact criminal recidivism as well as health outcomes like future overdose. As such, it is necessary to deploy the most effective treatment available to achieve maximum impact on these outcomes.
Policy recommendations (WHO, 2009) place emphasis on the use of medication-assisted treatments (MAT) as a front-line defense among correctional populations, because its efficacy and effectiveness has been well-established in other contexts (Belenko et al., 2013;Koehler et al., 2014). Despite these policy recommendations criminal justice agencies have been reluctant or slow to do so Matusow et al., 2013;Parrino et al., 2015). Many factors may contribute to the poor uptake of this particular approach for managing and treating opioid addiction. It is possible that practitioners may question the utility for MAT to impact public safety outcomesthe chief policy concern of the criminal justice system. Indeed, the uptake of psychological research evidence-particularly that which establishes a strong link between addiction and criminality-into correctional policy and practice has been slow, at best (Gannon & Ward, 2014). Moreover, there may be confusion or even hesitation among practitioners in correctional settings about their responsibility to encourage or administer an intervention that traditionally falls under the purview of health care providers.

| Description of the intervention
There are a variety of MAT drugs that are currently used for the management and treatment of opioid addiction. This review focuses on those most modern and commonly used drugs to treat opioid addiction over the long-term, in the form of supervised maintenance programs, drug substitution, or antagonist protocols. Thus, this review does not examine the effects of Naloxone, which is used to revive someone in a singular emergent opioid overdose event.
The drugs examined in the current review include opioid agonists (heroin, methadone, and levo-alpha-acetyl-methadol), partial agonists (buprenorphine), and antagonists (naltrexone). Opioid agonists are drugs that work on the opioid receptors in the brain and produce a full opioid effect. Heroin and methadone maintenance MAT services must be administered under the supervision of medical professionals in a highly controlled environment and on a regimented schedule.
This approach is designed to help reduce illicit or off-label use of opioids, cravings, and, gradually, the amount of opioid intake over time. Partial agonists like buprenorphine also operate on the opioid receptors but produce weaker euphoric effects than felt with full agonists. This class of MATs is also designed to help lower dependency symptoms, misuse, cravings, and symptoms of withdrawal.
Buprenorphine is a longer-acting agent, so it can be administered less frequently and has approval to be administered in a variety of clinical settings. Opioid antagonists like naltrexone block the opioid receptors entirely, so that if a patient used an opioid, they will not be able to achieve any euphoric effects. It is designed to relieve withdrawal and cravings and must be administered by a doctor, nurse, or nurse practitioner.

| How the intervention might work
The impact of opioid-specific MAT on overdose outcomes is well established. MAT reduces cravings, illicit drug use, and the amount of opioid STRANGE ET AL. | 3 of 28 use over time (Belenko et al., 2013;Koehler et al., 2014). All of these, in turn, aid in the reduction of overdose outcomes. The mechanisms by which opioid-specific MAT impact criminal justice outcomes are less understood. However, prior research on substance use treatment in general and correctional rehabilitation theory suggests MAT could reduce criminal risk. As substance use is a robust predictor of criminal involvement, reducing substance use may reduce future criminal involvement (see Bonta & Andrews, 2017). By extension, any intervention targeting addiction, including MAT, may operate to reduce recidivism risk. More specifically, because MAT facilitates reductions in risky drug use, opioid users may engage less frequently or not at all in drug-related behaviors that warrant a criminal justice response (i.e., drug use, possession, trafficking, paraphernalia possession). Similarly, by reducing cravings and use, people may no longer be motivated to engage in criminal activity that supports, or fuels, their addiction (e.g., burglary).

| Why it is important to do this review
The current evidence base for MAT on overdose outcomes is strong, but little is known about its impact on the subsample of people with an opioid addiction who also are involved with the criminal justice system.
Because these individuals face challenges posed by addiction and criminal justice involvement, they likely have different experiences, needs, and risks than people not facing this combination of challenges.
Thus, it is necessary to identify whether the same positive clinical outcomes seen among non-offender or mixed groups can be observed among people with current or prior criminal justice involvement. Further, although addressing substance use should reduce criminal risk (Bonta & Andrews, 2017), it is unclear if it is enough to reduce recidivism for people with a serious opioid addiction. A rigorous and systematic synthesis of the evidence base on the effectiveness of MAT for improving public safety will allow criminal justice agencies to make informed decisions about policy, practice, and the allocation of resources. In light of the range of MAT options currently available and the pressing need for methodologically robust results and changes to the underling legal and public health landscape, an updated and complete review is particularly policy relevant today. This systematic review is an update and modification of a 2009 Campbell Systematic Review entitled "Effects of Drug Substitution Programs on Offending among Drug-Addicts" (Egli et al., 2009). Although the authors of this review reported the intent to publish an update every five years, no update has yet been published. To the current authors' knowledge, an update is also not currently in progress or planned. As ten years of research has amassed on this topic, particularly during the height of the "opioid epidemic" and with the application of newer MAT therapies in opioid treatment (e.g., naltrexone), it is necessary to update this 2009 review. Further, this review is more comprehensive, because it includes both criminal justice and overdose outcomes observed among exclusive criminal justice samples and incorporates studies examining a variety of pharmacological interventions for opioid use.

| Objectives
The current review provides criminal justice and substance use treatment decision-makers with information regarding the efficacy and effectiveness of opioid-specific medication-assisted therapies (MAT) on offending and overdose outcomes. Specifically, the authors address the following objectives: 1. To assess the effects of opioid-specific MATs for reducing the frequency or likelihood of criminal justice outcomes (as defined by official or self-reported indices of offending, arrest, conviction, or incarceration) for individuals currently or previously involved in the criminal justice system; and 2. To assess the effects of opioid-specific MATs for reducing the frequency or likelihood of opioid overdose for individuals currently or previously involved in the criminal justice system. The objectives will help to inform criminal justice and substance use treatment policymakers on the usefulness of opioid-specific MATs in reducing criminal justice outcomes in criminal justice settings, or overdose outcomes in treatment settings with the criminal justice population. Implementing or maintaining an opioid-specific MAT program is a practical decision that requires the use of program resources, and therefore it is important that program leadership have a foundational understanding of the intervention and its established efficacy and effectiveness.

| METHODS
The current review is an update and expansion of a Campbell Collaboration publication "Effects of Drug Substitution Programs on Offending among Drug-Addicts" (Egli et al., 2009). The associated protocol can be found at https://onlinelibrary.wiley.com/doi/full/10. 1002/cl2.1138 (Strange et al., 2021).

| Types of studies
To be eligible for inclusion in this review, studies were required to use a strong quasi-experimental or randomized experimental design that prospectively tests the effects of the MAT for opioid use disorder on criminal justice and overdose outcomes. Due to the difficulty of conducting randomized controlled trials (RCTs) in criminal justice settings, it is necessary to examine quasi-experimental studies that employ more rigorous design features. Specifically, all quasi-experimental studies were required to either use a matching procedure when testing differences in the treatment and comparison groups or use statistical controls for baseline group differences (if observed). This was necessary to ensure equivalent comparison groups. All studies were required to use an individual level unit of analysis.
3.1.2 | Types of participants Study samples were required to consist of opioid-using adults and adolescents who are male, female, or nonbinary, and racially/ethnically diverse. All participants had to have current opioid use as indicated by self-report or diagnosis; participants were not required to have an opioid-specific substance use disorder (OUD) but were likely to, given opioid-specific MAT is typically administered for people with a known diagnosis of OUD. Additionally, all participants in the study samples had to have current prior criminal justice involvement, as indicated by selfreport or official report of prior or current arrest, incarceration, charges, or convictions. This review did not include studies that examined samples with no current or prior criminal justice involvement, or if the sample was mixed with respect to criminal justice involvement, or if this information was missing from the manuscript.

| Types of interventions
In contrast to the original review, which included MAT treatment for other illicit substance use (e.g., cocaine), this review focused solely on MAT for opioid use disorder. Specifically, this review included studies that tested the impacts of heroin and methadone maintenance, buprenorphine, levo-alpha-acetyl-methadol, and/or naltrexone as the independent variable. The comparison and control group for the quasi-experimental and experimental designs, respectively, could be any intervention that was not an opioid-specific MAT (i.e., alternative medication not specifically intended for opioid use treatment [e.g., anti-depressant]), "talk therapy" (i.e., any individual or group counseling, using any theoretical model or approach; e.g., cognitive behavioral therapy, group processing, psychotherapy), no intervention, forced detoxification, wait list control, or a placebo. Additionally, the review also allowed for comparison of two opioid-specific MAT conditions (e.g., methadone vs. buprenorphine), as well as combined MAT + talk therapy versus a comparison condition fitting the above criteria. We did not impose restrictions on the number of treatment versus control conditions, but because biomedical or pharmaceutical research with criminal justice populations can be logistically challenging, we did not anticipate many studies with multiple conditions. The current review included studies of opioid-specific MAT meeting the inclusion criteria, regardless of where it was administered or delivered (e.g., community, court, institutional). This was an expansion upon the original review, which excluded incarcerationbased treatment programs.

| Types of outcomes
The primary dependent variable, criminal justice involvement, was determined through self-report or official record, and could include any of the following outcomes: reconviction, rearrest, reincarceration, or reoffending. Outcomes could be in the form of failure proportions, mean frequencies, or survival rates.
The secondary dependent variable examined was opioid overdose, which could also be determined through self-report (nonfatal overdose) or official record (nonfatal and fatal overdose). Nonfatal outcomes could be in the form of mean frequencies, and both overdose outcomes could be in the form of failure proportions or survival rates. The studies for the current review were accessed on the following platforms (via access from the University of Cincinnati), followed by the specific databases and dates of coverage in parentheticals: EBSCOhost (Criminal Justice Abstracts [1910-present], SocINDEX with Full Text [1895-present], Legal Collection [1965-present], Wilson Omnifile [1980, PsycINFO [1872-present], Social Work Abstracts [1965-present]

| Searching other sources
Furthermore, and similar to Egli et al. (2009), the authors of the current review consulted the bibliographies of other relevant reviews for additional studies to include, as well as the bibliographies of the included studies. Irrespective of electronic availability, the authors contacted university libraries and first/corresponding authors to retrieve all articles that appeared to meet the criteria for inclusion.
For the current review, search terms were harvested according to their demonstrated success in drawing out relevant and complete results for studies regarding the effectiveness of opioid-specific MAT. This method was adapted from the rigorous strategies often employed in systematic reviews from the medical field. First, 10 "gold-standard" articles were selected from the Egli et al. (2009) review (i.e., those studies that best reflected the type of studies desired for the current review, both methodologically and in subject matter). These articles were entered into the PubMed database, where a Medical Subject Heading (MeSH) analysis generated a list of common terms across all ten gold-standard articles. The author team identified relevant terms from the MeSH analysis and then brainstormed potential variants of each term and Boolean operators (including variants of the terminology, spelling, use of quotations, and truncations) to determine the version of each term that was most likely to draw complete and relevant results.
Each term was tested using the Criminal Justice Abstracts database for its breadth of subject matter. From this process two core search strings were created, each with the same general base terms, but unique outcome measure(s) (i.e., the specified criminal justice or overdose outcomes). Search strings were created such that studies were retrieved if they contained any of the base terms, and the outcome. Some search strings were modified due to database functionality. All final search strings are listed by platform and database in Supporting Information Appendix 1. Results from all source types were considered in the initial phase of the search (e.g., newspapers, journals, letters, conference abstracts) unless otherwise indicated in Supporting Information Appendix 1 (due to issues with volume and relevance of results).

| Description of methods used in primary research
The studies included in the current review employed an experimental or strong quasi-experimental design and measured the impacts of the specified MATs on individual-level criminal and overdose outcomes for people with opioid use problems who are currently or previously justice system-involved.
In the quasi-experimental and experimental studies, the treatment group could have received an opioid-specific MAT (e.g., buprenorphine, naltrexone, methadone maintenance, heroin maintenance, levo-alpha-acetyl-methadol), and the control group could have received a different type of opioid-specific MAT (e.g., methadone compared to buprenorphine), a placebo, some sort of alternative medication not specific to opioid addiction, talk therapy (e.g., individual or group counseling), or no treatment at all. Additionally, the treatment condition could also have been a MAT + talk therapy treatment. Coders attempted to subclassify all talk therapy interventions into cognitivebehavioral (CBT) versus other, since cognitive-behavioral therapies traditionally produce greater effects and have a larger evidence base than other approaches in the treatment of substance use disorder (McHugh et al., 2010). However, the descriptions of the psychosocial/talk therapy interventions lacked this level of specificity in the original articles to support this level of detail in coding. Egli et al. (2009) discussed three potential avenues for the nonindependence of findings: (1) multiple indicators of offending reported from a single study (e.g., arrest, criminal offending); (2) the same outcome measured at multiple points in time; and (3) the same data being reported across multiple studies. The criteria for the determination of independent findings are the same for the current review as is standard in Campbell Review protocols (see e.g., Lipsey & Landenberger, 2006). and (4) reoffending. Upon completion of coding, only two of these outcomes were consistently measured across multiple studies, lending themselves to meta-analysis: reincarceration and rearrest. Studies that reported the other criminal justice outcomes that were not included in the meta-analysis are discussed in narrative format (e.g., Bellin et al., 1999) so as not to preclude them from contributing information to the review. Given that only four studies reported both reincarceration and rearrest outcomes, and because these are unique outcomes that are often correlated in the literature but not necessarily interdependent, these four studies are meta-analyzed in each criminal justice outcome.

| Criteria for determination of independent findings
Importantly, no one study is represented twice within an analysis. Following a similar logic, nonfatal and fatal overdoses are meta-analyzed separately and include studies that report both outcomes. For studies that reported outcomes at multiple points, the outcome with the longest-follow up or with the follow up most similar to that used across the other studies was coded-typically six or 12 months. This was done to encourage as much comparability as possible given the unique methods employed across some studies (Lipsey & Landenberger, 2006).
In the event that multiple publications reported results using the same set of data, the study with the most complete and detailed outcome information was used as the primary coding source. Following the study coding protocol, coders also referenced published study protocols (e.g., clinical trial registrations) and affiliated publications to ensure accurate and complete coding of study methodologies and findings. A list of all reports of the included studies (i.e., study "families") can be found in Supporting Information Appendix 2.
In addition to the above avenues for the potential nonindependence of findings, it is also possible that multi-arm studies will include more than one eligible comparator condition. For these studies the authors combined MAT and comparator conditions so that only a single pairwise comparison was computed. This is in line with recommendations from Higgins, Eldridge, et al. (2019) and prevents an intervention group from being double counted.

| Selection of studies
Once a full set of potentially relevant citations were identified, the authors received assistance from a Campbell Collaboration representative to de-duplicate the results using EndNote ® . After de-duplication, all remaining citations were uploaded to DistillerSR ® systematic review software. Three members of the author team and six students trained by the study authors independently reviewed all potentially relevant studies for the proper inclusion criteria. All studies were screened in two phases. In the first phase, the titles and abstracts were reviewed to determine if basic inclusion criteria appeared to be met-that is, (a) the experimental or strong quasi-experimental evaluation of effectiveness of MAT services (b) on criminal or overdose outcomes (c) for people with opioid use disorder (d) who are or have been involved in the criminal justice system. Studies meeting these criteria, or any study for which this information could not be readily determined from the title or abstract, were retained for screening in Phase 2.
In Phase 2, the full text of each study was reviewed by the second author. All studies with inappropriate design and/or rigor, irrelevant independent or dependent variables, and ineligible sample characteristics were removed from consideration for inclusion. Reviews and meta-analyses were also removed from inclusion but flagged so that the coding team could later review their reference lists for studies that should be included in the current review but were not identified through the initial search. As a check to ensure relevant studies were not mistakenly excluded at either phase, the "Check for Screening Errors" function in the DistillerSR ® software was employed for all excluded studies that went through both phases of review and, separately, for all studies excluded at Phase 1. This software feature uses machine learning to identify potentially misclassified studies based upon characteristics of the studies included. The second author rereviewed in detail 382 total citations identified by the software and added back in 14 citations mistakenly excluded at earlier phases of review. Figure 1 contains the PRISMA flow chart.

| Data extraction and management
The Egli et al. (2009) team created a coding protocol for the original review that provided a systematic method of extracting information regarding each study's research design, program, nature of the outcome measures, and outcome data. This protocol was availed to the study authors to promote consistency in the coding procedures. The current study team updated the protocol to reflect the changes from the original to the current review. The updated coding protocol included the systematic extraction of information regarding the study identification, content and methodological inclusion criteria and rigor, control and treatment sample descriptive information, actions taken upon the control and treatment samples, treatment characteristics, the types and measurement of outcome data, and effect size information (see Supporting Information Appendix 3 for the updated coding protocol).
A team of eight Ph.D. and doctoral-level coders were trained on the updated coding protocol and two coders coded each study independently. If discrepancies were observed, a third coder not originally assigned to that study resolved the discrepancy. Discrepancies between the coders were quite rare and were often the result of coding information in the wrong place, as opposed to coding the information incorrectly.

| Assessment of risk-of-bias for included studies
We used two tools to assess risk-of-bias in our included studies: (1) The Revised Cochrane risk-of-bias tool for randomized trials be expressed only about issues that are likely to affect the ability to draw reliable conclusions from the study" (Higgins, Savović, et al., 2019, p. 4). Coders follow the RoB manual to answer specific "signaling questions" posed in each domain rated. Coders follow the instructions based on the answers to these signaling questions to yield specific overall judgments of risk in each domain.
The manual provides specific definitions of low, some, and high risk for each domain, which further helps coders to reliably rate the domain. The overall risk of bias for the whole study must be at least the level of the domain rated as highest risk (e.g., if one domain is rated as some concern [two] and all others are rated as low risk [one], the study must be rated as some concern [two]).
Additionally, if multiple domains are rated as some concern [two], but none are rated as high concern individually, the study's overall rating could be either some [two] or high [three] risk, depending on the extent of the issues within and across domains. Bias due to deviations from intended interventions; (5) Bias due to missing data; (6) Bias in measurement of the outcomes; and (7) Bias in the selection of reported results. Each domain is scored on a 1-5 scale, where 1 = low risk of bias ("comparable to a well-performed randomized trial with regard to this domain"), 2 = moderate risk of bias ("study is sound for a non-randomized study with regard to this domain but cannot be considered comparable to a well-performed randomized trial"), 3 = serious risk of bias ("the study has some important problems in this domain"), 4 = critical risk of bias ("the study is too problematic in this domain to provide any useful evidence on effects of the intervention and should not be included in any synthesis"), and 5 = no information ("no information on which to base a judgment about risk of bias for this domain"). Like the RoB, the ROBINS-I tool provides signaling questions and detailed guidance to coders in how to render an overall rating of risk of bias for each domain. Also like the RoB, the overall rating for the study must be at least equal to the highest score (one through four) in any one domain.
Studies can be rated with an overall risk of five (no information) only if one or more domains is rated as no information [five] and the other domains are rated as low risk [one].
Two of the study authors read both the RoB and the ROBINS-I manuals and referenced them frequently while completing the ratings of risk of bias for each study. The two coders independently rated each domain for each study first. Scores were then compared and the coders discussed any domains for which there was disagreement and arrived at an overall risk of bias rating for each study, consistent with each tool's scoring guidelines. The percent agreement in initial coding (i.e., pre-discussion/consensus), plus the overall judgment of risk of bias for each study, and the rationale supporting these

| Measures of treatment effects
The statistical procedures and conventions align closely with those that were used in the Egli et al. (2009) review, as the types of studies and outcomes that were included are similar. The most detailed numerical data were coded to facilitate similar analyses across the included studies. For binary offending outcomes (e.g., arrest, conviction, incarceration, and criminal involvement) and overdose outcomes (fatal and nonfatal), odds ratios were computed for the individual studies and mean logged odds ratios were used in the meta-analyses. We exponentiated and inverted the mean logged odds ratios and reported these in the tables, forest plots, and text to show a positive mean treatment effect (i.e., an odds ratio [OR] < 1 indicates a reduction in the outcome). Continuous or quasicontinuous measures of these outcomes (e.g., average number of arrests) were rarely and inconsistently reported across studies and therefore were not meta-analyzed.

| Dealing with missing data
One study (Bellin et al., 1999) did not report the necessary data to allow its inclusion in the meta-analysis. Despite successful contact and correspondence with the lead author, these data were no longer available or on record and, as such, the study could not be included in the meta-analysis.

| Assessment of heterogeneity
To assess heterogeneity, we used the homogeneity Q test. A p-value of 0.10 was set as the cut off for significance as higher quality studies are likely to have smaller sample sizes, which may reduce the statistical power of the Q test and increase the likelihood of a type II error.

| Data and analysis
The current review complies with the standards of meta-analysis as specified in Practical Meta-Analysis by Lipsey and Wilson (2001). The two types of included studies (RCTs and quasiexperiments) were meta-analyzed together using SPSS v.28 (IBM Corp., 2021). As stated above, in multi-arm studies (i.e., in which there was more than one eligible comparator condition) the authors combined the intervention and comparator conditions so that only a single pairwise comparison was computed. This prevents an intervention group from being "double counted" and inflating the unit of analysis error (Higgins, Savović, et al., 2019).
This affected six total studies, for which two of the authors independently classified the comparator arms into either a treatment or control groups based upon both the similarity of the intervention received and the timing of the condition. The two raters had 100% agreement on these classifications. Study arms were combined in the following manner: (1) in Farabee et al. (2020), the naltrexone and naltrexone + patient navigation groups were combined into one MAT condition; (2)  To compute mean effect sizes (i.e., log odds ratios) the inverse variance weight method of the meta-analysis was used (Lipsey & Wilson, 2001) and random effects models were assumed a priori.
Fixed effects models were conducted first to examine any heterogeneity of effects due to sample size. There was no significant evidence of funnel plot asymmetry and, as such, results from random effects models are reported. No effect size outliers were observed, and no data were imputed for missing values.

Subgroups
The potential moderators of MAT effectiveness on criminal justice and overdose outcomes determined a priori included: (1) study design elements (i.e., experimental, quasi-experimental, follow-up period); and (2) treatment elements (e.g., supplementing MAT with individual or group CBT or non-CBT counseling, medication dosage and adherence, treatment length). Secondary a priori potential moderators included gender, race, and age of the sample, location/context of treatment (e.g., jail, prison, community, court), and era (i.e., before or during the opioid epidemic). Upon completion of coding, we were only able to empirically examine the effects of study design type (experimental vs. quasi-experimental) on the outcome. The other potential moderators were either missing, inconsistently reported across the studies, or there was not enough variability across the studies that reported the variable and measured it consistently.

| Sensitivity analysis
Relative to the quasi-experimental studies, the experimental studies employed smaller sample sizes and, in many cases, were underpowered. As such, we assessed the impacts of sample size using meta-regression, in which a covariate for sample size was included in the initial models. This was nonsignificant in the analyses of all four outcomes (reincarceration, rearrest, fatal overdose, and nonfatal overdose).

| Included studies
Of the 20 studies included in this review, 18 were peer reviewed publications, one was a student thesis, and one was an agency report.
There were six quasi-experimental studies and fourteen experimental studies. All quasi-experimental studies examined the effectiveness of methadone, though one study (Marsden et al., 2017) had participants who were "MAT-exposed" and could have had methadone or buprenorphine. Among the experimental studies, most assessed the effectiveness of methadone (n = 7), or naltrexone (n = 6), followed by buprenorphine (n = 2), and LAAM (n = 1). 1 Figure 1 shows the study selection process and Table 1 shows the characteristics of the included studies (Supporting Information Appendix 2 is a supplement to Table 1, which contains all reports, that is, "study families," for each included study).
About two-thirds of included studies (n = 13) had only one comparator condition, or if they had more than one, only one comparator condition was eligible for inclusion in this review. The remaining seven studies had two comparator conditions or only two comparator conditions that were eligible for inclusion. Two studies explicitly compared different types of opioid-specific MAT, and four studies had a comparator condition that entailed either a different dosage or adherence level for the same drug or a MAT + condition (e.g., MAT with patient navigation). When the comparator conditions did not include any MAT, they were subclassified into two groups: (1) "no treatment" (n = 14; e.g., nothing or referral to treatment, wait list control, or detoxification); or (2) "treatment as usual" (e.g., non-specified psychosocial treatment Norway (n = 1), England (n = 1), and Australia (n = 1). All studies examined the impact of MAT that was first administered while participants were incarcerated in a jail or prison setting. Outcomes were typically assessed at 12 months (n = 7) and 6 months (n = 5).
One study reported three-month outcomes, one reported ninemonth outcomes, three studies reported outcomes of two years or longer, and three reported variable lengths of follow up for dif-   Totals may not sum to 100% due to rounding for display purposes.
Missing information on treatment length and timing indicates these were not clearly specified in the respective article.
d Includes only declarations of interest that are explicitly named by study authors.
e "The initial naltrexone dose was 25 mg. During the first week, subjects returned for two more visits, and on the second visit the dose was increased to 50 mg and on the third visit the dose was 100 mg. Beginning in the second week, subjects receive 150 mg of naltrexone twice a week (300 mg per week) for a total of 26 weeks or 6 months" (Coviello et al., 2010, p. 424 on Mondays and Wednesdays and 20 mg (or 2.5 times the daily dose) on Fridays. Allowance was made to increase the Friday dose to 3 times the daily dose if needed" (Gordon et al., 2018, p.35 (Gordon et al., 2018, p. 238) i "Dr. Schwartz serves as a senior fellow at the Open Society Institute-Baltimore, which funded a previous study conducted by Dr. Kinlock et al., 2008, p. 238 (Marsden et al., 2017(Marsden et al., , p. 1416).
reviews that employ slightly more relaxed inclusion criteria. Group 3 includes seven studies that otherwise meet criteria for inclusion in this review but either still in ongoing data collection, or were not available or ready for inclusion in this review, per correspondence with the lead study authors. These studies should be included in any updates of the current review. Finally, the one study listed in Group 2 was removed after the assessment of Risk of Bias phase and before meta-analysis and synthesis because it was determined to have "critical" risk of bias in the bias in deviation from interventions domain and moderate to serious risk of bias in five of six remaining domains. Per the ROBINS-I guidelines, a rating of critical risk in any domain suggests that it should not be included in the synthesis.

| Experimental studies
Across the 14 experimental studies, none were rated as low risk of bias. Ten had some risk of bias, and four had high risk of bias.
Overall study ratings and justifications are listed in Supporting Information Appendix 4.

Bias arising from the randomization process
In total, 10 studies were rated low risk of bias and four as some concern in this domain (Cornish et al., 1997;Coviello et al., 2010;Kinlock et al., 2005;McKenzie et al., 2012). None were rated high risk of bias in this domain.

Bias due to deviations from intended interventions
Three studies were rated high risk of bias in this domain (Kinlock et al., 2005;Schwartz et al., 2020), and the remaining 11 were rated as some concern. None were rated low risk of bias in this domain.

Bias in the measurement of the outcome
Almost all studies (n = 11) were rated as low risk of bias in this domain, and three were rated as some concern Magura et al., 2009;Schwartz et al., 2020). None were rated as high risk of bias in this domain.

Bias in the selection of reported results
All studies were rated as low risk of bias in this domain.

| Quasi-Experimental studies
The six quasi-experimental studies included in this review can be considered among the most rigorous available to date that examine MAT for overdose and criminal justice outcomes among criminal justice samples. Even with the strict inclusion criteria, however, two had a moderate risk of bias, and two had a serious risk of bias. Two others were quite rigorous but were missing information needed to score at least one of the domains. All studies are included in the analyses despite their risk of bias. Supporting Information Appendix 5 details all risk of bias ratings and corresponding justifications.

Bias due to confounding
One study was rated as serious risk (Westerberg et al., 2016), one as moderate risk (Bellin et al., 1999), and four as low risk Haas, 2020;Marsden et al., 2017;McSwain et al., 2014) in this domain.

Bias in selection of participants into the study
Three studies were rated as moderate risk (Bellin et al., 1999;Haas, 2020;Marsden et al., 2017) and three were rated as low risk of bias in this domain McSwain et al., 2014;Westerberg et al., 2016).

Bias in classification of the interventions
One study (Zaller et al., 2013) was rated as serious risk, one study as moderate risk (Marsden et al., 2017), and the remaining studies (n = 5) were rated as low risk of bias in this domain.

Bias due to deviations from intended interventions
One study had low risk (Hass, 2020), one had moderate risk (Marsden et al., 2017), one had serious risk (Westerberg et al., 2016), and three studies (Bellin et al., 1999;Farrell-MacDonald et al., 2014;McSwain et al., 2014) did not have enough information to be able to rate this domain.

Bias due to missing data
One study had moderate risk (Marsden et al., 2017), and the remaining (n = 5) studies had low risk of bias in this domain.

Bias in measurement of the outcomes
All studies had low risk of bias in this domain.

Bias in the selection of reported results
One study had moderate risk of bias in this domain (Bellin et al., 1999). All others (n = 5) had low risk of bias in this domain.

| Criminal justice outcomes (raw effects)
The raw effects of MAT on criminal justice outcomes across all included studies are displayed in Table 2.  Outcomes included in the table are those that most comparable to outcomes reported by other studies. The offenses for reincarceration that are reported were selected because they were the most common.
Bold font indicates statistical significance. b This study is not included in the meta-analysis as its outcomes differed too much from those observed across other studies. c This study was not included in the meta-analyses as it compared two treatment conditions. d This study was not included in the meta-analyses as it compared two treatment conditions. e This study is not included in the meta-analysis as its outcome differed too much from those observed across other studies.
f This study is not included in the meta-analysis as significant differences were observed between treatment group and each comparator.

| Overdose outcomes (raw effects)
The raw effects of MAT on overdose outcomes across all included studies are displayed in Table 3

| Quality of the evidence
There are several important methodological limitations in the original studies included in this review. Across all the included studies, none were given a low risk of bias rating across two independent coders. In both quasi-experimental and experimental designs, deviations from the intended intervention/group assignment were consistently problematic. Underpowered studies, poor medication adherence among treatment participants (when these data were available), compensatory contamination of control participants, and missing or insufficient information accounting for experimental and control group contamination may have biased our results. The findings could also be a function of how study outcomes were measured. For instance, although most jurisdictions routinely track fatal overdose rates in specific communities, very few track these at the individual level in a systematic way that includes identifiable information. This makes it difficult for researchers to access and link these outcomes to individual study participants.
For criminal outcomes, three of the four indices examined in this review were typically accessed through official record, which can provide inaccurate estimates of the true incidence of criminality.
Further, procedural factors (e.g., bail, plea bargaining) often play an important role in outcomes such as conviction or incarceration. In contrast, self-reported criminal behavior may provide a more "pure" and potentially more accurate index of criminality; however, it was often assessed in the included studies using the Addiction Severity Index, which requires respondents to report outcomes only over the "past 30 days." Thus, for both official records and self-report in these studies, there is a likely underestimation of criminal behavior.

| Potential biases in the review process
This review included an agency report, a student dissertation, and several peer reviewed publications. Additionally, the study authors were fairly liberal in the inclusion of citations during the screening stage so as to ensure that no eligible but unpublished or not-yet-published studies were excluded from consideration. The inclusion of several conference presentations and clinical trial protocols allowed for the identification of studies that were missed in the initial search. This approach also permitted the identification of several studies that would have otherwise been included in this review if the timing were later (and should be included in an update). Indeed, seven studies meeting inclusion criteria were still in active data collection and/or were not yet available for sharing with our team (per study authors).

| Agreements or disagreements with other studies or reviews
To our knowledge there are no other systematic reviews that include only studies which assess opioid-specific MAT F I G U R E 5 Fatal overdose forest plot (corresponds with

| AUTHORS' CONCLUSIONS
Our results suggest that MAT can yield meaningful reductions in nonfatal overdose among those involved in the criminal justice system. They do not support MAT's ability to reduce fatal overdose or criminal outcomes for people with current or prior justice system involvement. Given the design rigor of the included studies, this conclusion might be considered by some to be foregone.
While it is possible that MAT is ineffective at reducing these outcomes, substantial methodological issues outside the main design render these findings more tentative. Under more ideal research conditions (e.g., where medication adherence was improved, attrition was reduced, fatal overdose data were more readily accessible, and sample sizes were larger), the study authors would be more confident in the estimates of MAT's impacts on overdose and criminal outcomes.

| Implications for practice
The opioid epidemic is now in its third decade with no signs of slowing. Whole communities feel the toll of this crisis. As policymakers and practitioners work to identify solutions to reduce the harm of opioid addiction, particularly for public health and criminal justice outcomes, they must deploy multiple strategies at once and emphasize those that have the strongest impact and evidence.
MAT is one tool in this effort. Its harm reduction utility in treatment-seeking samples is well-established. The findings from this review suggest that MAT's impact on nonfatal overdose also extends to individuals who are justice system involved, though the findings must be interpreted in light of considerable risk of bias in the evidence. One must be cautious not to oversell the promise of MAT as an antidote to criminality and overdose among people involved in the justice system. Indeed, these are highly complex social and health outcomes; both addiction and criminal behavior are influenced by a wide range of risk factors that also must be targeted in interventions. Timely access to appropriate and evidence-based treatment must be coupled with an infrastructure of resources and social support.

ACKNOWLEDGMENTS
The author team would like to acknowledge and thank the following individuals for their respective contributions to the project: Liz Eggins and Ajima Olaghere from the Campbell Collaboration for their technical and editorial assistance (including deduplicating the results); DistillerSR ® software. The team would also like to thank NIHR for providing the incentive award, Charlotte Gill for her assistance, and the peer reviewers for their helpful comments on earlier drafts of the manuscript.

CONFLICT OF INTERESTS
One member of the author team, Dr. Jordan Hyatt, conducted a study that was included in this review. The team took the steps to ensure that he was not involved in the decision-making concerning its inclusion or exclusion. There are no other potential conflicts of interest in this study.

AUTHOR CONTRIBUTIONS
The All study authors (with the exception of DP who had not yet joined the project) contributed to conceptualizing and/or writing the study's protocol.

DIFFERENCES BETWEEN PROTOCOL AND REVIEW
There were deviations from the study protocol at search, screening, coding, and analysis stages. These decisions are detailed below in the order that they were made.
To streamline the search process, we reduced the number of strings from seven to two: one for the group of criminal justice outcomes and one for overdose. This did not change the substantive nature of the search but combined the terms into strings representing either category of outcome.
Delimiters were also added to the search strings for certain databases (e.g., searching specific indices or source types) to draw more relevant results and/or temper the high volume retrieved. This was done under the guidance of the Campbell Collaboration and the delimiters used are noted in Supporting Information Appendix 1.
Several platforms/databases were removed or replaced due poor search functionality and/or the inability to refine sufficiently large results. These included the following platforms (and databases): Gale Uni. These sources overlapped in journal coverage with the databases that were used, which suggested they also would not add much unique content to the results. ClinicalTrials. gov was replaced with the Cochrane Register of Trials and MEDLINE to draw more relevant results by way of better search refinement tools. Summon was removed as the author team did not have institutional access to this database. Science. gov could be searched through crimesolutions.
gov and therefore did not need to be searched separately. Last, Google Scholar was used to "forward search" included articles only, as there was no reliable way to search this database and no transparency in the search algorithms. All decisions to remove or replace databases specified in the protocol were done after much troubleshooting and in conjunction with the Campbell Collaboration.
The expanded list of criminal justice outcomes specified in the protocol (including specialized court docket failure, mandated treatment failure, and revocation of community supervision) was reduced to self-reported and official indices of offending, arrest, conviction, and incarceration. This made our review a "pure" update in terms of the original categories of criminal justice outcomes reported in Egli et al. (2009 The study team did not specify reference screening software in the protocol. Given the high volume of search results (n = 27,361 after deduplication), the team purchased access to DistillerSR ® (at the recommendation of the Campbell Collaboration) and proceeded with screening using a team of graduate students in addition to three of the study authors. Using the DistillerSR ® "Check for Screening Errors" tool also afforded us the opportunity to replace the protocol's initial strategies outlined to hand-check for screening errors with a more reliable method.
In terms of coding, it was reported in the protocol that we would use the same coding scheme as Egli et al. (2009). We instead modified their coding scheme to capture differences in the outcomes examined and in the methodological and reporting standards. This updated coding scheme is included in Supporting Information Appendix 3.
Due to time constraints, we could not contact all authors of the included studies for unpublished data. However, multiple databases that we searched included gray literature and drew unpublished results (some of which were tied to the included studies, including conference presentations). This supports that the risk of publication bias remains low even without having contacting study authors. This is also why we did not display or report contour enhanced funnel plots.
In terms of the analyses, we did not examine the effects of moderators because they were not recorded in a consistent manner in the original studies. Furthermore, in terms of the timing of treatment as a potential moderator, nearly all interventions were initiated while individuals were incarcerated and then followed up in the community. As such, there was little to no variability to examine.
Last, continuous or quasi-continuous measures of outcomes (e.g., average number of arrests) were rarely and inconsistently reported across studies. These results were not meta-analyzed but instead reported in a narrative format.

SOURCES OF SUPPORT
This study/project is funded by the National Institute for Health Research (NIHR) Incentive Award Scheme 2020 Reference 133293.
The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care.