Online interventions for reducing hate speech and cyberhate: A systematic review

Abstract Background The unique feature of the Internet is that individual negative attitudes toward minoritized and racialized groups and more extreme, hateful ideologies can find their way onto specific platforms and instantly connect people sharing similar prejudices. The enormous frequency of hate speech/cyberhate within online environments creates a sense of normalcy about hatred and the potential for acts of intergroup violence or political radicalization. While there is some evidence of effective interventions to counter hate speech through television, radio, youth conferences, and text messaging campaigns, interventions for online hate speech have only recently emerged. Objectives This review aimed to assess the effects of online interventions to reduce online hate speech/cyberhate. Search Methods We systematically searched 2 database aggregators, 36 individual databases, 6 individual journals, and 34 websites, as well as bibliographies of published reviews of related literature, and scrutiny of annotated bibliographies of related literature. Inclusion Criteria We included randomized and rigorous quasi‐experimental studies of online hate speech/cyberhate interventions that measured the creation and/or consumption of hateful content online and included a control group. Eligible populations included youth (10–17 years) and adult (18+ years) participants of any racial/ethnic background, religious affiliation, gender identity, sexual orientation, nationality, or citizenship status. Data Collection and Analysis The systematic search covered January 1, 1990 to December 31, 2020, with searches conducted between August 19, 2020 and December 31, 2020, and supplementary searches undertaken between March 17 and 24, 2022. We coded characteristics of the intervention, sample, outcomes, and research methods. We extracted quantitative findings in the form of a standardized mean difference effect size. We computed a meta‐analysis on two independent effect sizes. Main Results Two studies were included in the meta‐analysis, one of which had three treatment arms. For the purposes of the meta‐analysis we chose the treatment arm from the Álvarez‐Benjumea and Winter (2018) study that most closely aligned with the treatment condition in the Bodine‐Baron et al. (2020) study. However, we also present additional single effect sizes for the other treatment arms from the Álvarez‐Benjumea and Winter (2018) study. Both studies evaluated the effectiveness of an online intervention for reducing online hate speech/cyberhate. The Bodine‐Baron et al. (2020) study had a sample size of 1570 subjects, while the Álvarez‐Benjumea and Winter (2018) study had a sample size of 1469 tweets (nested in 180 subjects). The mean effect was small (g = −0.134, 95% confidence interval [−0.321, −0.054]). Each study was assessed for risk of bias on the following domains: randomization process, deviations from intended interventions, missing outcome data, measurement of the outcome, and selection of the reported results. Both studies were rated as “low risk” on the randomization process, deviations from intended interventions, and measurement of the outcome domains. We assessed the Bodine‐Baron et al. (2020) study as “some” risk of bias regarding missing outcome data and “high risk” for selective outcome reporting bias. The Álvarez‐Benjumea and Winter (2018) study was rated as “some concern” for the selective outcome reporting bias domain. Authors' Conclusions The evidence is insufficient to determine the effectiveness of online hate speech/cyberhate interventions for reducing the creation and/or consumption of hateful content online. Gaps in the evaluation literature include the lack of experimental (random assignment) and quasi‐experimental evaluations of online hate speech/cyberhate interventions, addressing the creation and/or consumption of hate speech as opposed to the accuracy of detection/classification software, and assessing heterogeneity among subjects by including both extremist and non‐extremist individuals in future intervention studies. We provide suggestions for how future research on online hate speech/cyberhate interventions can fill these gaps moving forward.

1.5 | What are the main findings of this review? such negative attitudes are, the more hostile the action will be. Allport (1954) put forward a scale of acts of prejudice to illustrate different degrees of acting out harmful attitudes, which starts with antilocution (or what we call hate speech), described as explicitly expressing prejudices through negative verbal remarks to either friends or strangers. Avoidance is the next level on the scale of prejudice, with people avoiding members of certain groups, followed by discrimination, where distinctions are made between people based on prejudices, which leads to the active exclusion of members from specific groups (Allport, 1954). This level of acting on prejudices is routed in institutional or systemic prejudices, such as the differential treatment of people within employment or education practices as well as within the criminal justice system or through the social exclusion of certain minoritized group members. Physical attack is the next level on the scale of prejudice, which includes violence against members of certain groups by physically acting on negative attitudes or prejudices. The last level, extermination, includes ultimate acts of violence against members of specific groups, an expression of prejudice that systematically eradicates an entire group of people (e.g., genocide). Allport's (1954) scale of prejudice makes it clear how hate speech/cyberhate is connected to more extreme forms of violence motivated by specific biases, with hate speech (or antilocution) being only the starting point (Bilewicz & Soral, 2020). The importance of this scale of prejudice is not only that it clearly illustrates a range of different ways and intensity levels to act out prejudices, but also the "progression from verbal aggression to physical violence or, in other words, the performative potential of hate speech" (Kopytowska & Baider, 2017, p. 138). This is where interventions at the lower level of prejudices, specifically online interventions targeting online hate speech/cyberhate, become important.
Because different countries inconsistently conceptualize the same hate speech phenomenon, there is no universal definition of hateful conduct online. This, unfortunately, affects our ability to develop a comprehensive search of the literature. However, there is some consensus that hate speech targets disadvantaged social groups (Jacobs & Potter, 1998). Bakalis (2018) more narrowly defines cyberhate as "any use of technology to express hatred toward a person or persons because of a protected characteristic-namely race, religion, gender, sexual orientation, disability and transgender identity" (p. 87). Another definition that also points out the ambiguity and challenges involved with identifying more subtle forms of hate speech, and also makes reference to the potential threat of hate speech escalating to offline violence, is put forward by Fortuna and Nunes (2018): "Hate speech is any language that attacks or diminishes, that incites violence or hate against groups, based on specific characteristics such as physical appearance, religion, descent, national or ethnic origin, sexual orientation, gender identity or other, and it can occur with different linguistic styles, even in subtle forms or when humor is used" (p. 5).
In this systematic review, we distinguish hate speech/ cyberhate specifically from other forms of harmful online activity, WINDISCH ET AL. | 3 of 25 such as cyber-bullying, harassment, trolling, or flaming, as perpetrators of such online behavior repeatedly and systematically target specific individuals to cause distress, to seek out negative reactions, or to create discord on the Internet. Research focused on desensitization suggests that being exposed to hate speech leads to a normalization of prejudiced attitudes, which further leads to an increase in outgroup bias toward groups targeted by such speech (Soral et al., 2018). With society increasingly recognizing that it is inappropriate to express prejudices in public settings, interventions may include some form of social norm nudging to reduce such prejudices or interventions that "nudge behavior in the desired direction" (Titley et al., 2014, p. 60).
Therefore, hate speech not only affects minoritized group members but also has an influence on the opinions of majority group members (Soral et al., 2018), which makes strategies that can elicit change in people's prejudice-related attitudes crucial (see, e.g., Zitek & Hebl, 2007).
We specifically choose to assess the effectiveness of online hate speech/cyber hate interventions for two reasons. First, the unique feature of the Internet is that such individual negative attitudes toward minoritized groups and more extreme, hateful ideology can find their way onto certain platforms and can instantly connect people sharing similar prejudices. By closing the social and spatial distance, the Internet creates a form of collective identity (Perry, 2000) and can convince individuals with even the most extreme ideologies that others out there share their views (Gerstenfeld et al., 2003). In addition, the enormous frequency of hate speech/cyberhate within online environments creates a sense of normativity to hatred and the potential for acts of intergroup violence or political radicalization (Bilewicz & Soral, 2020, p. 9).
Seeing other people post prejudiced comments online can lead to the adoption of an online group's biases and can influence an individual's own perceptions and feelings toward the targeted, stigmatized group (Hsueh et al., 2015). Second, in contrast, hate speech/cyberhate is more general and does not necessarily target a specific individual (Al- Hassan & Al-Dossari, 2019). Instead, hate speech/cyberhate heavily features prejudice, bias, and intolerance toward certain groups within society, with most hate speech happening online. Interventions that take place online are therefore an important way to challenge prejudice and bias, potentially reaching masses of people across the globe.
It is important to challenge hate speech, especially since hate movements have increasingly crossed into the mainstream (Futrell & Simi, 2017). With hate speech/cyberhate posing a threat to the social order by violating social norms (Soral et al., 2018), perceptions of social norms as either supporting or opposing prejudice have been found to have an influence on how individuals react online (Hsueh et al., 2015). Governments around the world face increased demand for understanding and countering hateful ideology and violent extremism both online and offline (e.g., the Christchurch Call). The US Government's 2021 national strategy for countering domestic terrorism highlights the importance of ongoing research and analysis, the sharing of knowledge and best practices internationally, and the countering of hateful ideologies and propaganda. The goal of this systematic review is to examine the effectiveness of online campaigns and strategies for reducing online hate speech and cyberhate. In doing so, we take a step toward better understanding the complex and multifaceted nature of this type of hateful messaging.

| Description of the intervention
The Internet provides an opportunity to reach masses of people, people who are exposed to hateful content and hateful ideology online, but also people who engage in consuming and spreading hateful content online. Online interventions that address such hateful online behavior, therefore, become crucial. This systematic review set out to focus on online interventions addressing online hate speech and cyberhate, with interest in interventions deployed on websites, text messaging applications, and online and social media platforms including, but not limited to, Facebook, Instagram, TikTok, WhatsApp, Google, YouTube, and Snapchat. We focused specifically on online interventions that aimed to change people's online behavior and encouraged individuals or groups to conform to established social norms. Such social norms, for example, can be communicated through creating community standards on online platforms themselves (e.g., Facebook, Twitter, etc.), through more formal online training courses, or through anti-hate speech/anti-cyberhate campaigns teaching people to recognize hate, embrace diversity, and stand up to bias. Such prevention campaigns are designed to challenge bias and build ally behaviors by supplying people with constructive responses to combat, for example, antisemitism, racism, and homophobia, as well as provide resources to help people explore and critically reflect on current events. Other interventions we set out to find in this systematic review addressed online hate speech/ cyberhate by adding messages to hateful online comments, countering hateful content or extremist ideology, or redirecting people to more credible sources.

| How the intervention might work
Regardless of how an individual develops certain racial, religious, or sexual biases, in this systematic review, we were interested in online interventions that targeted and reduced the consumption and creation of original hateful content, such as spreading antisemitic Tweets and/or homophobic blog posts as well as accessing and consuming hate speech material online (e.g., watching or reading hate speech videos or blogs). For example, Bodine-Baron et al. (2020) used rather broad messaging approaches by promoting racial sensitivity and inclusion through hashtag campaigns (i.e., "#CapekGakSih" ("Aren't You Tired?") or "#AkuTemanmu" ("I Am Your Friend")) on Facebook, Instagram, Twitter, and YouTube. These campaigns were designed to recast online encounters as opportunities for personal growth and share humanity. These campaigns disputed and contradicted negative stereotypes associated with specific cultures, people, and institutions by sharing different points of view based on human rights values such as openness, respect for difference, freedom, and equality. Moreover, such interventions involved blanket bans on specific behaviors enforced through the public promotion of norms or individual sanctions enforced by moderators.
Other interventions, such as the "Redirect Method," are narrower in their messaging. These interventions generate curated playlists and collections of authentic content that challenge hate speech/cyberhate narratives and propaganda (Helmus & Klein, 2018). For instance, people who are directly searching for extremist content online may be linked to videos and written content that confronts such claims. These videos are designed to be objective in appearance instead of containing material that explicitly counters extremist propaganda. The underlying goal of this type of intervention is to provide credible content that effectively undermines extremist messaging but does not overtly attack the source of propaganda. There were three key findings associated with the Redirect Method (Helmus & Klein, 2018). First, the Redirect Method reached a portion of the "low-prevalence, high-risk" audience that advertising services are not designed to reach. Second, it created friction between search queries for white supremacist and/or neo-Nazi communities and positive search results; and finally, it functioned as the conduit between high-risk individuals and their respective delivery partners (e.g., Life After Hate) such that some passive searches became active conversations.
With that said, more effort should be given to expanding the keyword list and creating partner microsites with content specifically tailored to the needs of the redirected individuals. Online platforms, such as Twitter and Facebook, have started to employ such methods, redirecting people who comment on or share "fake news" or conspiracy theories, which often are fraught with prejudicial undertones and are harmful to minoritized groups, to more credible content and news sources.

| Why it is important to do this review
Findings from this systematic review enhance our understanding of the effectiveness of online anti-hate speech/anti-cyberhate interventions, help ensure that programming funds are dedicated to the most effective efforts and play a critical role in helping individual programs improve the quality-of-service provisions. Our findings also inform governments and policymakers of the current state of such online efforts, what works and which modes of interventions to implement, and help guide economically viable investments in nation-state security.
Our search of the scholarly literature identified one review, Blaya (2019), as similar to the current topic. Blaya's (2019) review, however, focused on the prevalence, type, and characteristics of existing interventions for counteracting cyberhate and did not include a metaanalysis. Two other similar reviews focused on exposure to extremist online content (Hassan et al., 2018) and communication channels associated with cyber-racism (Bliuc et al., 2018). A search of the Campbell Library using key terms (hate OR radical*) identified two protocols and one review for further inspection to assess potential overlap. The protocols include "Psychosocial processes and intervention strategies behind Islamist deradicalization: A scoping review" by de Carvalho and colleagues (2019) and "Police programs that seek to increase community connectedness for reducing violent extremism behavior, attitudes and beliefs" by Mazerolle and colleagues (2020). A further review on a similar topic is a recently completed Campbell review (January 2020), "Counter-narratives for the prevention of violent radicalization: A systematic review of targeted interventions" by Carthy et al. (2018) at the National University of Ireland, Galway.
Our review is distinguished from the de Carvalho and colleagues' (2019) review in that we are focusing on hate speech and cyberhate generally without delimiting our approach to a specific type of radicalization (e.g., Islamist). Furthermore, we elected to complete a systematic review and meta-analysis. Likewise, the protocol by Mazerolle and colleagues (2020) focuses on interventions involving police officers either as initiators, recipients, or implementers of community connectedness interventions. Our review focuses specifically on any online intervention, which may or may not involve the police, but police will not be the focus nor the basis of the online intervention strategy. Judging from Carthy et al.'s (2018) protocol, our review also captured counter-narrative interventions but differed based on setting, timing, and scope of interventions. Specifically, we were interested in online interventions that extend beyond countermessaging campaigns to include a broad array of interventions outlined above and extend beyond radicalization to include everyday hate and prejudice. In addition to conducting a meta-analysis, our review builds on Blaya's (2019) work by expanding the population parameters to include both adolescents as well as adults. Blaya (2019) limited her search to include interventions aimed toward children and adolescents (e.g., young adults, teenagers) and did not focus on extremism.

| OBJECTIVES
The main objective of this review was to synthesize the available evidence on the effectiveness of online interventions aimed at reducing the creation and/or consumption of online hate speech/ cyberhate material. We initially sought to examine differences in intervention effectiveness based on the type of intervention and individual characteristics. However, we were unable to complete these analyses. Later in this review, we provide explanations for why we are currently unable to answer RQ 2 and RQ 3 below. As set out within our protocol (see Windisch et al., 2021), we planned to include both experimental and quasi-experimental quantitative studies in this review as these methodological approaches are the most effective strategies for isolating the effect of the intervention. Therefore, eligible quantitative study designs included the following:

| Experimental designs
Eligible experimental designs that involved random assignment of participants to distinct treatment and control group(s). Designs that involved quasi-random assignment of participants, such as alternate case assignment, were also eligible and were coded as experimental designs.

| Quasi-experimental designs
All eligible quasi-experimental designs must have included participants in a control condition compared to participants in a treatment condition. Eligible studies included those that report matching procedures (individual-or group-level) and statistical procedures employed to achieve equivalency between groups. Statistical procedures included, but were not limited to, propensity score matching, regression analysis, and analysis-of-covariance. Furthermore, in anticipation of a limited quantitative evidence base, we also included quasi-experimental studies with unmatched comparison groups that provide a baseline assessment of outcomes for both groups. Finally, time-series analyses were also included. Eligible time-series designs included short-interrupted time series designs with a control group (less than 25 pre/post observations) and long-interrupted time series designs with or without a control group (more than 25 pre/post observations). Ineligible quasi-experimental designs involved studies that included a comparison group consisting of participants who either refused to participate in the study or who initially participated in a study but then dropped out before the start of a study.
Eligible comparison conditions included other online interventions or conditions in which participants did not receive or experience an online intervention.

| Types of participants
Both youth and adult participants of any racial/ethnic background, religious affiliation, gender identity, sexual orientation, nationality, or citizenship status were eligible for this review. The eligible youth population included study participants with a minimum age of 10 through age 17. The eligible adult population included study participants with a minimum age of 18 and older.
Studies in which only a subset of the sample was eligible for inclusion-for example, if a study subject participated in both online and offline hate speech interventions-were excluded. This exclusion was necessary to specifically focus our review on the effects of online interventions on changes in hate speech behavior online, especially when unable to extract data unique to the online subset.
We did not anticipate excluding studies based on sample eligibility, as our inclusion criteria were wide-ranging, and we took reasonable steps to locate studies that only involved online interventions.

| Types of interventions
We adopted Blaya's (2019) four-part typology of intervention strategies to outline the potential universe of eligible interventions. The first intervention strategy is the adaptation of legal responses to hate speech/cyberhate, which includes the countering of violent extremism and aims to address cybercrime. More specifically, online interventions that are eligible range from disrupting hateful content online via specific "crackdowns" (e.g., server shutdowns, deletion of social media accounts) to responding to online hate using targeted strategies (e.g., through counter-narratives, modifying hateful content). Examples of studies focusing on online crackdowns include the monitoring and investigation of online accounts and content takedowns, online content monitoring and censorship (Álvarez-Benjumea & Winter, 2018), modifying hateful online comments to non-hateful comments (Salminen et al., 2018), and possibly changing algorithms to divert users out of online echo chambers. We were also interested in interventions such as the recent take-down of 8chan after this online platform was linked to "in real life" attacks in New Zealand and the United States and the existence of interventions that disrupt further hateful online content and radicalization after similar trigger events.
Disrupting hateful content online via such crackdowns has brought up free speech concerns, as well as concerns around online users and hateful groups just moving on to other online platforms.
Responding to hateful content online using targeted strategies has, therefore, been suggested as an effective online intervention.
Examples include message priming using the endorsement from religious elites , the use of bots to sanction online harassers , automatically generating responses to intervene in online conversations where hate speech has been detected (Qian et al., 2019), and redirecting online users to YouTube videos debunking, for example, ISIS recruiting themes (https:// redirectmethod.org/). Our systematic review included a broad range of online interventions, many of which have only recently emerged. Two other strategies identified by Blaya (2019) are the automatic identification and regulation of hate speech/cyberhate using technology, as well as the creation of online counter-spaces and countercommunication initiatives. These interventions include online counter-narrative marketing campaigns, the establishment and/or use of online counter spaces, online education-based interventions, online citizenship training, and online legislative initiatives narrowly defined to address extremist ideologies and hate speech that incites targeted violence and radicalization. In general, such interventions seek to prevent or minimize the occurrence of violent extremism or radicalization, including the spread of hate speech and extremist propaganda, by disrupting recruitment channels and creating opportunities to leave such groups.
The fourth and final intervention strategy eligible for this systematic review involves educational programs that, for example, provide people with online literacy skills and challenge racism (Blaya, 2019). We included online empowerment/resilience ap-

| Types of outcomes
The primary outcome of interest in this systematic review was the creation and/or consumption of hateful online content. By creation, we refer to the production and authorship of original hateful content such as posting antisemitic Tweets, uploading racist YouTube videos, and/or writing homophobic blog posts (Ligon et al., 2018). The consumption of hate speech material may include visiting or being a member of a hate website/online group, watching or reading hate speech videos or blogs, being a target of online hate speech/cyberhate, or reporting hate speech material (Ligon et al., 2018). Secondary outcomes of interest in this review included affective and emotional states such as anger, fear, emotional unrest, depression, anxiety, mood swings, and attitudes toward hate speech/cyberhate. We included these secondary outcomes to also capture interventions which may not have measured behavioral changes around hateful content online, but may have otherwise impacted participants' affective and emotional states, which in turn can have an impact on the creation and/or consumption of hateful content online, and may specifically have an influence on reactions to or reporting of online hate speech/cyberhate material.
This systematic review focused specifically on online interventions and their impact on changes in online hate speech/cyberhate behavior. We, therefore, excluded offline hate behavior outcomes (i.e., hate incidents and hate crimes). As mentioned earlier, we wanted to capture online interventions that can reach masses of people across the globe and with the prospect of changing and challenging the vast amount of online (compared to offline) hate that is being seen and spread in the virtual world. In addition, it was necessary to clearly distinguish our study setting from those of previous reviews Eligible studies had to report a primary or secondary outcome (or both) to be included. There were no exclusion criteria on the source of outcome data. We had also planned to include data for the primary and secondary outcome measures from any programs, including institutional records, direct observations, surveys, or questionnaires completed by participants.

| Adverse effects
There was also the possibility of adverse effects of online interventions on online hate speech/cyberhate. We included any measure of unintended adverse effects from strategies to increase the scale of implementation of potentially effective anti-hate speech interventions for participants, including, for example, adverse changes to emotional or psychological well-being, defensiveness, guilt, shame, resistance to the teaching, miscommunication, creation of barriers, and dysfunctional adaptation behaviors. Adverse effects could have also included nonindividual effects such as relocation of hate speech/ cyberhate to other platforms instead of a reduction of hate speech/ cyberhate. We included all adverse effects described in eligible studies in this meta-analysis.

| Other inclusion criteria
We focused on the period between 1990 and 2020. For purposes of the current study, we opted for an inclusive approach by designating 1990 as the lower end of our search period. Based on prior research, 1990 was considered a period in which the Internet transitioned to a wider infrastructure and broad-based global community (Leiner et al., 2009). While it is conceivable that instances of hate speech or cyberhate were present online through mailing lists or emails, the odds of experimental interventions assessing the effectiveness of online interventions are slim.
Our population of studies was limited to studies published in English and German but inclusive of studies completed in any geographical region, as we focused on online content consumed and shared across geographic and nation-state boundaries. The language parameters reflect the language abilities of the review team. Our fulltext coding captured the geographic location where studies were conducted and study participants were located.

| Terms used to search
We conducted our systematic search between August 19, 2020 and We used Zotero to manage references and implement the search strategy below. We documented the search process using the following fields: date, reviewer initials, database/website/journal searched, final search string, total yield, and notes to capture any aberrant cases (see Supporting Information: Appendix A for a complete search record). Search terms were developed based on implementation and dissemination research terminology and included search filters used in previous reviews (see, e.g., Blaya, 2019). The search strategy was conducted using the search terms specified below within the default search field of database, meaning we did not search with the Title, Abstract, Keywords (supplied by the author), and indexing terms as specified in our protocol. If and when used, these fields were used to refine searches by increasing specificity. interven* OR option* OR strategy* OR "counter narrative*" OR "nudge" OR "norm* intervention" OR "norm* nudge" OR counternarrative* OR "alternative narrative*" OR campaign* OR counter* OR peer-topeer OR prevent* OR disrupt* OR stop* OR fight* OR redirect* OR "censoring hate content" AND 4. Evaluation terms: comparison* OR quantitative OR quasi-experiment* OR survey* OR interview* OR poll* OR mixedmethods OR individual-level OR group-level OR control* OR experiment* OR study OR studies OR evaluat* OR MTurk OR longitudinal OR random* OR "digital method*" OR "machine learning" OR "natural language processing" OR multisectoral OR review* AND 5. Year limiter: 1990 -2020

| Electronic searches
The search strategy described above was applied to the following databases, which cover easily accessible sources as well as gray literature.
Gray literature includes reports, working papers, white papers, government documents, and generally non-peer-reviewed works.

Academic databases
EBSCOHost platform

| Assessment of risk of bias in included studies
Two reviewers independently evaluated the risk of bias for the primary outcome using the Cochrane Risk of Bias tool, version 2.0 (Sterne et al., 2019). This tool encourages consideration of the following domains: bias in the randomization process; deviations from the intended intervention (intervention assignment); missing outcome data; bias in the measurement of the outcome; and bias in selecting the reported result.
Two review authors independently judged each source of potential bias indicating low risk, high risk, or some concerns. We then made an overall risk of bias judgment for each study by combining ratings across the six domains. Specifically, if any of the above domains were rated at high risk, the overall risk of bias judgment would be rated at high risk. Finally, we processed the "risk of bias" assessments using the revised Cochrane risk-of-bias tool for randomized trials (ROB 2) as well as the Cochrane Handbook and the

Methodological Expectations of Campbell Collaboration Intervention
Reviews (MECCIR) reporting standards. We made our risk of bias ratings available in Table 3 and Figure 4. As the authors of the original WINDISCH ET AL.
| 11 of 25 studies provided adequate details for this assessment, we did not need to contact corresponding authors for clarification. However, should this become problematic in any future review updates, we would solve disagreements through discussions with authors.
We planned to address the risk of bias in non-randomized quantitative studies using ROBINS-I and the domains of bias in selecting participants and all domains of bias in post-intervention (Higgins et al., 2011;Sterne et al., 2016). We coded for the experimental and quasi-experimental design type based on assignment (e.g., matching, waitlist control, cohort, etc.) at the study level. However, no quasiexperimental studies fit our inclusion criteria. For future review updates, quasi-experimental studies will be evaluated using the ROBINS-I tool as outlined in the protocol (Windisch et al., 2021 Figure 1 and the list of studies awaiting classification attached later in this review). Of the twenty studies that we planned to code, six studies lacked the necessary information to allow inclusion in a meta-analysis (see Supporting Information: Appendix E). In these situations, we contacted study authors with a request to provide the missing text, with some either not responding to our requests or responding after the period of performance (see Pigott & Polanin, 2020). We plan to follow up with these authors in subsequent review updates. With our final two studies, we did not encounter issues with missing outcomes or missing participants.

| Assessment of heterogeneity
We intended to use study design, among other factors, to explore heterogeneity between study outcomes using the Q-statistic and I 2 statistics to describe the percent variation across studies (see Windisch et al., 2021). Posthoc moderating factors could have included the intervention setting, such as an online intervention versus a laboratory or classroom intervention setting. Unfortunately, we could not explore heterogeneity because we lacked viable a priori and posthoc moderators to use. We attempted to collect information on study sample characteristics (i.e., age, gender, race/ethnicity), but this information was incomplete or not reported within the final set of studies. Furthermore, both studies were randomized, and the intervention took place in an online setting.

| Assessment of reporting biases
Publication selection bias is an important consideration when assessing the robustness of meta-analytic findings because statistically significant results are more likely than nonsignificant results to be published (Lipsey & Wilson, 2001;Rothstein et al., 2005). To minimize publication bias, we extended our search to gray literature studies and included technical reports, theses, and other unpublished works (e.g., government and agency reports) (Rothstein & Hopewell, 2009). One of the two studies included in this review is a technical report found via a gray literature website search (see Bodine-Baron et al., 2020). Unfortunately, we could not use various methods (see Coburn & Vevea, 2015) to assess for publication bias given the limited number of included studies (n = 2). Nevertheless, we surmise there is a possibility of publication bias in the results given the variety of potential sources of bias, the availability of eligible studies being chief among them, in addition to language bias.

| Data synthesis
The underlying nature of data for this outcome was continuous. As such, we calculated the standardized mean difference for this review, using the Stata meta set command for precomputed effect sizes. We used the following formulas to compute standardized mean differences and standard errors (see Figure 2): One included study provided proportions (Bodine-Baron et al., 2020). We used the logit method for transformation and divided the logged odds ratio by 1.83, the standard deviation of the logistic distribution, to rescale the logged odds ratio onto the normal distribution (Lipsey & Wilson, 2001; see Figure 3).
We used a random-effects model for the meta-analysis, estimated using the restricted maximum likelihood (REML) method.
All statistical analyses were performed using Stata IC/16.1. Per our protocol, Windisch et al. (2021), we intended to implement robust variance estimation to address statistically dependent effect sizes (correlated effects) using robumeta (Hedges et al., 2010). However, both studies in the meta-analysis contributed one effect size each, for the content creation outcome, so we did not employ this method.
Furthermore, we did not have any meaningful moderators to exploit the benefits of robust variance estimation fully. Therefore, additional research is needed for all treatment types, and the outcome examined in this meta-analysis and other outcomes of interest that we were unable to meta-analyze within this systematic review (see Windisch et al., 2021).

| Subgroup analysis and investigation of heterogeneity
Heterogeneity was assessed using I 2 in conjunction with τ 2 (tausquared) and χ 2 (Chi-squared). Our protocol ( At the title and abstract screening stage (level 1), we deviated from the protocol by only screening for studies in English and German, as the Arabic and Persian speaker was no longer available to assist us. We also dropped the screening questions: "Is the study in English, German, Persian, or Arabic?" and "Was the study conducted between 1990 and 2020" as we realized that most references included an English title and abstract, regardless of the actual study language, and that the year of data collection or the intervention was often not noted within the title or abstract. The first question regarding whether the study was a quantitative or experimental study was a deviation from the review protocol. This question was added to the title and abstract screening stage as we encountered many studies that fit our other two screening questions but were clearly not quantitative or experimental.
We also piloted the coding forms at different stages of the review process in DistillerSR, either to make screening documents more efficient or to remove questions/fields that were deemed unnecessary and adjusted the coding forms accordingly (see Information: Appendices C and D for characteristics of excluded studies). Some of these studies had incomplete information, and we could not receive a response from study authors in time. These studies, however, were signposted via an asterisk (*) and noted for possible inclusion in an update to this review. Overall, two studies were included in the final review and deemed eligible for the metaanalysis (refer to Figure 1 for PRISMA flow diagram illustrating the reference distillation process and see list of references included in the meta-analysis).

| Included studies
The study characteristics for the two included studies are displayed in   ranging from "friendly" to "hostile." An example of a friendly comment included, "Very brave, I find it great and refreshing. I find despising homosexuals generally bad," whereas an example of a hostile comment included, "Gays are the last thing I would tolerate, especially in public." In contrast, Baron-Bodine and colleagues (2020) assessed how participants would respond to a dispute by airing their feelings on social media. While the authors did not specify the nature of the social media posts, we elected to treat them as antisocial because participants were given five options for how they may respond to a dispute with somebody, including: 1 = "do nothing," 2 = "talk," 3 = "insult," 4 = "use social media" and 5 = "use violence." Similar to Allport's (1954) scale of prejudice, we viewed these responses as escalating from the least combative behavior (i.e., "do nothing") to behaviors with more life-threatening consequences (i.e., "use violence"). From this perspective, "use social media" was considered more antisocial than insulting someone but less antisocial than using violence. An example of using social media to resolve a dispute would include doxxing a person, blasting their networks with spam, or inserting @mention messages to legitimate users.

Secondary outcomes of interest in this review included affective
and emotional states such as anger, fear, emotional unrest, depression, anxiety, mood swings, and attitudes toward hate speech/ cyberhate. Our systematic search did not yield any eligible studies that measured these outcomes.

| Excluded studies
Due to the large number of full-text documents already screened in our search between August and December 2020 (n = 748), we broadly indicate the reasons for excluding these studies. In addition, most studies at this stage were excluded due to the absence of an eligible empirical intervention (n = 411) or because study authors assessed the accuracy of online hate speech/cyberhate detection and classification software (n = 291) without testing an intervention. Due to the number of excluded studies, the "references to excluded studies" contain interventions that were initially deemed eligible but were subsequently excluded for various reasons (n = 21).  Out of the 748 retrieved full-text documents, 23 progressed to the full-text coding stage, with 21 of these references excluded during full-text coding (see PRISMA flowchart in Figure 1 and the list later within this document), for the following reasons: six studies were excluded because the studies lacked the necessary information to complete the meta-analysis, such as standard errors, standard deviations, confidence intervals, or sample sizes. The potential impact of these studies is unclear without this information to assess. While attempts were made to contact the corresponding author(s) of these studies, we were unsuccessful in receiving the required information in time for inclusion. Four additional references were nearly eligible but ultimately excluded because the online intervention focused more on the effects of media exposure on cognition than countering the transmission, creation, and/or consumption of online hate speech/cyberhate materials (see, e.g., Shortland et al., 2020). Finally, 11 references were excluded upon further examination because they did not meet our inclusion criteria. For example, in 5 of these studies, the online intervention addressed offline rather than online hate speech/cyberhate behaviors.

| Risk of bias in included studies
Methodological quality and risk of bias were coded during data extraction. Two reviewers independently evaluated the risk of bias using the Cochrane Collaboration's risk of bias tools (RoB 2). In particular, we focused the risk of bias assessment on the following domains: randomization process, deviations from intended interventions, missing outcome data, measurement of the outcome, and selection of the reported results (Sterne et al., 2019). Our ratings for evaluating the risk of bias were "low risk," "some concerns," and "high risk" of bias (see Table 3 for Summary of Risk of Bias Ratings).
Based on our assessment, we rated both studies as "low risk" on the randomization process as researchers reported simple random assignment. We rated both studies as "low risk" on the deviations from intended interventions domain as the authors utilized a "double- In terms of missing outcome data, we identified "high" risk of bias as an issue in one study (Bodine-Baron et al., 2020) because, relative to the baseline-only sample, follow-up respondents were older, more likely to live in Java, more likely to have the Internet at home, and regularly used social media. The source of this difference in the sample characteristics at follow-up relative to baseline was differential attrition, as a portion of the individuals who completed the baseline survey did not complete any of the follow-up surveys.
The Álvarez-Benjumea and Winter (2018) study was rated as "low risk" for the missing outcome data domain as data was available for all randomized participants (see Figure 4 for Risk of Bias Summary).
As mentioned in 4.3.3, we assessed eligible studies for selective outcome reporting bias, which concerns outcome data authors may not have reported for all variables measured in their study. In this review, the Bodine-Baron et al. (2020) study was rated "high risk" for selective outcome reporting bias. In this study, the authors collected outcome data at multiple time points-at baseline, during the intervention, and at its conclusion (5 weeks and 10 weeks, respectively), and during a follow-up (15 weeks). However, the authors only reported baseline and 15-week outcome data.
The Álvarez-Benjumea and Winter (2018) study was rated as "some concern" for the selective outcome reporting bias domain as the authors did not specify if the data that produced the results were analyzed following a prespecified analysis plan. This information was not included in the published manuscripts.

| Effects of interventions
Our systematic search identified a total of two eligible studies, which allowed us to conduct one meta-analysis, as well as a presentation of single effect sizes (see Supporting Information: Appendix F for full Stata log). Of the two eligible studies, we included two effect sizes, one from each study, in our meta-analysis. Although the Bodine- Baron et al. (2020) study included seven effect sizes in total, only one outcome aligned with those reported by Álvarez-Benjumea and Winter (2018). The other six effect sizes were not related to content creation/consumption (e.g., responses to disputes that involved the following actions: doing nothing, talking, insulting someone, or using violence; and justifying violence based on religious or ethnic insults). We analyzed effect sizes from online interventions designed to reduce the creation and/or consumption of hateful online content. when participants were exposed to the counter-speaking treatment versus the baseline/control group condition. While there were no statistical differences between the groups, the effect size favored the intervention, whereby those in the counter-speaking condition created lower levels of hateful content online.  (2018) study also included treatments that were deemed different from counter-speaking, namely the censoring of hateful content (i.e., the deletion of prior hateful content and presenting only friendly/neutral content) and extremely censoring of hateful content (i.e., presenting only friendly content). Table 5 indicates that the effect sizes favored the intervention, whereby those in the censored condition ( While our meta-analysis results indicate no variability between studies (τ 2 = 0.00, Q = 0.09, p = 0.763) and I 2 = 0%, suggesting variability is due to chance, only two studies were included. Q is traditionally underpowered when few studies are included (Altman et al., 2021;Higgins et al., 2003), which is the case for this review. Furthermore, we can presume heterogeneity is given or inevitable (see Bryan et al., 2021;Higgins et al., 2003), particularly for social science research.
These effect sizes and lack of heterogeneity should be interpreted very cautiously. Both studies measured negative behavior as higher scores and instances of online behavior (see Forest Plot in

| Overall completeness and applicability of the evidence
While this review offers a meta-analysis of the effectiveness of online interventions aimed at reducing the creation and/or consumption of online hate speech/cyberhate material, it must be acknowledged that the scope and span of online interventions likely extend beyond the two studies included in this review. This is the case for three reasons.
First, while we had promising results in the initial searches of our review, most online interventions did not meet the inclusion criterion for outcomes related to online hate speech/cyberhate. Instead, the evaluative components of many of these campaigns were more reflective of the effects of media exposure on cognition than countering the transmission and/or consumption of online hate speech/cyberhate materials (e.g., Frischlich et al., 2018;Shortland et al., 2020). Moreover, 290 references were excluded from this review as they measured outcomes related to the accuracy of computer algorithms to classify and identify hateful online content rather than overall effectiveness at reducing cyberhate and online hate speech.
Second, we know that additional campaigns exist that fight online hate speech and cyberhate, however, the effectiveness of such tools for the reduction of hateful content online either still needs testing by use of an experimental study design, or such experimental

| Potential biases in the review process
We did not identify any specific biases in the systematic review process.
Although our review only identified two eligible online interventions, Second, further experimental studies are needed to be able to examine the effect of online interventions on additional outcome measures of interest (see Windisch et al., 2021). For instance, research must attend to the importance of cyberhate and online hate speech within the context of creating (e.g., making videos, posting, sharing, liking, etc.) and transmitting hateful content rather than the classification and identification of online hate speech/cyberhate and its selected users. As mentioned, many interventions were excluded from this review as they measured outcomes related to the accuracy of computer algorithms to classify and identify hateful online content rather than overall effectiveness at reducing cyberhate and online hate speech. If online interventions are to become an evidence-based tool for reducing online hate speech/cyberhate, more emphasis should be given to outcomes that measure the creation and transmission of hateful content.
There is also a need for more empirical studies of online interventions that focus their outcomes specifically on online behavior.
In this review, we found some studies that tested online interventions but then investigated subjects' potential changes in offline behavior instead of also testing for any perceived changes in online behavior.
There is also an opportunity here to test if such online interventions can make a difference in both the online and offline world, as it is likely that online interventions could influence both behaviors. However, further empirical testing is necessary in this regard. We encourage research into truly innovative approaches to addressing this problem that are radically distinct from existing programs. Third, echoing the recommendation put forth by Carthy et al. (2018), we encourage researchers to clearly specify the theoretical frameworks that guided their online intervention and/or campaign. While social norming (Elster, 1989; also see Bicchieri, 2005) emerged as a common theoretical perspective among the eligible studies, there are other useful theoretical frameworks (e.g., social identity theory, terror management theory, subjective uncertainty reduction theory) that warrant thoughtful consideration and testing.
Fourth, we suggest that future studies should focus on exploring online interventions for individuals who may have already been exposed to and/or have become radicalized by more extremist ideologies and/or who have moved on to more radical platforms with fewer rules around hateful content creation online. As pointed out at the beginning of this review and within our protocol (Windisch et al., 2021), hate speech and other prejudice-motivated behaviors need to be considered on a continuum of victimization (Bowling, 1993), with more extreme forms of prejudice-motivated violence founded on "lower level" acts of prejudice and bias (Allport, 1954). Hateful content online on such lower levels of the prejudice scale should therefore not be ignored, and the two studies we found within our systematic review explored interventions at such lower levels of prejudice. However, more empirical studies are necessary to explore online interventions that address online behavior of individuals who have already advanced to more extreme forms of prejudice-motivated violence.
Finally, given the scarcity of experimental (random assignment) and

Systematic review methods
Ajima Olaghere has extensive expertise in statistical analyses. She has co-authored two Campbell Systematic Reviews, one on youth curfews and the other on police-initiated diversion of low-risk youth.

Statistical analysis
Ajima Olaghere and Susann Wiedlitzka have extensive expertise in statistical analyses. Elizabeth Jenaway provided substantial assistance with data management and cleaning.

Information retrieval
Steven Windisch, Ajima Olaghere, Susann Wiedlitzka, and Elizabeth Jenaway all have experience performing systematic searches on various topics and retrieving studies and documents for review.

DECLARATION OF INTERESTS
Ajima Olaghere is an editor for the Crime and Justice Coordinating Group within the Campbell Collaboration. She has recused herself in the review of the protocol and completed systematic review. Susann Wiedlitzka and the editor overseeing this review know each other on a personal and professional level. Susann Wiedlitzka has also started work on a new hate crime project with another Crime and Justice editor. The following steps have been taken to deal with this potential conflict of interest: Multiple layers of the review (CJCG Cochair review, EiC review, Campbell Methods Group review) are already in place due to this being a fast-tracked review. In addition, David B. Wilson (Methods Editor) has reviewed and co-signed action letters and associated materials and has been copied into any communications between the editors and the authors of this review.
The editor with a current professional relationship with Susann Wiedlitzka recused herself from the editorial processes for the completed review, but oversaw the editorial process for the protocol, before her current working relationship with Susann Wiedlitzka.