“Stop the Bleed” Education Assessment Tool (SBEAT): Development and Validation

As part of the national Stop the Bleed campaign in the United States, more than a million people have received bleeding control training through the work of many organizations. These public and professional educational experiences are ideally grounded in health sciences, clinical, and educational evidence to be most effective. However, there is currently no standard tool for evaluating the educational quality of these programs. We developed and validated the Stop the Bleed Education Assessment Tool (SBEAT) to provide a standard measure of life-threatening bleeding educational programs knowledge learning outcomes to aid in evaluation and development of this public health program. The SBEAT development included medical, clinical, and educational experts to derive and validate learning outcomes. Specific item writing incorporated focus groups for input on language and then pilot testing before a full community pilot test established a data set, for which a Rasch methodology was applied. The resulting tool used 34 items embedded in 19 survey questions, with item separation statistic of 5.56 (0.97 reliability) and person separation statistic of 2.09 (0.81 reliability) for 171 persons. Overall, the Cronbach Alpha (KR-20) person score “test reliability” equaled 0.85 (SEM = 2.24). The SBEAT project establishes a standardized assessment tool to evaluate the cognitive aspects of first aid for life threatening bleeding. Comparison of outcomes from different teaching styles and methods will allow for the development of best practices for future bleeding control education and help organizations demonstrate value to learners, funders, and policy makers, and advance health sciences education. SBEAT offers a measure for which educational efficiency and efficacy can be judged within a larger effort to prepare people for personal emergencies or large-scale disasters.


Introduction
Public health, school-based, and community-based health educators work to prevent, respond, and help people recover from personal emergencies and disasters by teaching first aid. Internationally and historically, stopping life-threatening bleeding (LTB) is a core competency for first aid providers to surgeons. Although there are many causes for LTB, from road traffic accidents to acts of violence, the epidemiology of death from LTB suggests that a point-of-injury approach to stopping LTB must be improved [1][2][3][4][5][6][7][8]. In the United States the Stop the Bleed (STB) campaign launched in 2015 [9] to translate lifesaving military medical lessons learned from point-of-injury hemorrhage control to benefit the public through awareness and skill education [10][11][12]. After the STB campaign launch, many organizations in and around schools, as well as the community started teaching the lay public (students, faculty, staff, neighbors) and first responders hemorrhage control knowledge, skills, and behaviors.
Aligning educational outcomes and clinical evidence bases to bleeding control education took shape in 2017, when the National Center for Disaster Medicine and Public Health assembled the Stop the Bleed Education Consortium (SBEC), consisting of an international group of trauma, medical education, and public health experts. They put forward, based on analysis of existing programs, educational content and delivery recommendations for STB education programs [13], which may be applied to any LTB curriculum. The SBEC recommended three critical objectives for education programs: increase motivation of learners to act, differentiation of lifethreatening from non-life-threatening bleeding, and learning to apply pressure (both direct and via a tourniquet) to stop bleeding [13]. While these recommendations have proven helpful for creating courses and vetting STB training programs, currently no validated measures exist to compare learner outcomes between or within various educational programs. Hence, it is not known which educational modalities are most effective for various learner groups, or what deficits learners may have at course completion [14][15][16][17][18][19]. In order to compare learner outcomes with objective data, and offer feedback for programmatic improvement, we developed the Stop the Bleed Educational Assessment Tool (SBEAT). The SBEAT measures the cognitive and behavioral elements needed to respond appropriately to LTB emergencies. This article describes the SBEAT's development process and validity assessment from a Rasch approach [20] for its application in assessing any LTB education program holistically, internally and externally.

LTB competencies: a systematic approach to identify and stop hemorrhage
LTB competencies incorporate cognitive, psychomotor, and affective elements for hemorrhage control response to influence the effectiveness and willingness to help in emergencies, by moving through the Chain of Survival Behaviors [21,22]. The challenge of assessing LTB education lies in the validity and reliability of an instrument(s) to be clinically appropriate, psychometrically sound, and educationally informative to organizations and learners [23]. The goal of LTB education, while primarily targeted to the general public, spans the population from laypeople to professionals, which adds complexity to address human factors (e.g., reading level, culture, vernacular) and clinical guideline expectations at various levels of education.

Materials And Methods
We employed a Rasch methodology for the current project to improve psychometric soundness and validate theoretical and statistical constructs measuring the cognitive elements of LTB education beyond any one curriculum [20,24,25]. Rasch methodology ideally is a proactive developmental approach to the instrument and ultimately examines the probability of categorical data (e.g., correct/incorrect) and models it as a function of respondent (aka person) and element being considered (aka item). The Aultman Health Foundation Human Research Review Board (2017.07 JP) approved this research project.
A multi-stage approach was used to strengthen content validity of the proposed scale (see Figure 1). The first stage for assessment tool development in a Rasch approach starts with the identification of the latent variable. In this instance the latent variable consisted of the behaviors to stop LTB in a non-medical setting. The behaviors were drawn from the Stop the Bleed Educational Consortium work [13], the Stop the Bleed™ curriculum, and the international first aid guidelines [22]. In the second stage, content of a course was validated as essential by clinical and health education professionals (n = 28) through a 3-round Delphi study design, with a snowball sample. Professionals contributed to the establishment of competencies needed at various levels of training (i.e., public, lay responder, and professional) in terms of specific learning outcomes derived from the SBEC outline [13]. And they validated a variety of graphic representations (video, photos, animations). Through this phase we attempted to identify all relevant elements for assessment (see Table 1).  In the third stage, community participants (n = 187) contributed to item development. The artifacts used throughout the study, such as the visuals, words, metaphors, and sample questions, were taken to communities of interest in focus groups. These focus groups allowed us to strengthen the tool by clarifying meanings of words, and gain knowledge on plain or vernacular language that would help better represent medical terms and constructs. For example, hemorrhage control experts identified 6 oz (≈177 ml) of blood loss to represent LTB for lay responders. While this particular amount of blood loss may not be inherently lifethreatening for most individuals, the presence of this volume of blood in the setting of continued bleeding could lead to rapid clinical deterioration. This is different than a blood donation, for example, in which the bleeding is controlled and will stop. In an injured person, this volume represents life-threatening bleeding that might benefit from rapid layperson intervention.

FIGURE 1: Item development strategy for SBEAT
People while looking at a 6 oz pool of "blood" described it in three different categories. Volume was described in various measurements including ounces, gallons, and liters. Graphically people described it in terms like "a lot", big, or puddle. And, in emotional terms that included "crap load", "oh damn." This variety impressed upon us the need to have multiple representations to describe LTB when asking people to identify it. During a citizen-scienceproject with high school students and faculty (n = 142) at a conference, individuals polled model items and electronically provided feedback on illustrations (word-clouds, open text) before group debriefing. Feedback provided an opportunity to revise the question format and word choices. For example, a picture of a commercial tourniquet on a non-bleeding limb was mis-identified for something other than a hemorrhage control device by more than 60% of the 142.
From this preparation, we developed a series of items for the cognitive and competency domains that were presented for peer review to the Education and First Aid sub-councils of the national Scientific Advisory Council (SAC) of the American Red Cross (n = 18). Then items were pilot tested (n = 77) to discern final language and domain appropriateness, to build construct validity.
This report summarizes stage four, the Community Assessment for validation. Data was collected September-October 2019 from 16 STB courses. Courses consisted of 1-2 trainers, both with 20+ years of community and professional education experience, lasting no more than 60 minutes, covering the learning outcomes of Stop the Bleed®. Participants took pre-learning assessment 3-7 days prior to face course. Two courses had STB built into a full day first aid course. In total, we extracted data from 172 surveys for analysis (see Figure 2).

Analysis for reliability and validity
We utilized Qualtrics software (Provo, UT, USA), version October 2019, to collect data, with data transformed into binary responses in Microsoft Excel and then uploaded into Winsteps (4.4.7) for analysis. Rasch scaling and modeling uses a mathematical formula to describe the relationship between persons (participants) and items (survey responses) to limit the bias of samples and provide a true interval scale estimation [20]. B n -D i = ln(P ni /1-P ni ) articulates the Rasch mathematical model for dichotomous test items, "based on an appreciation that, to make measurements in the case of right/wrong test items, researchers must consider the difficulty of each test item along a variable and the overall ability level of a test taker with respect to the variable" [25]. Therefore, when a potential first aider (B n ) answers an item (D i ), this relationship will be expressed by the natural log of the respondent correctly answering the item (P ni ) divided by the probability of the respondent not correctly answering the test item (1 − P ni ).
Fit statistics of persons and items provide an indication to the appropriateness (quality control) of the overall tool and each person and item by showing the size of randomness or distortion. The Mean-square expected value (MNSQ) is 1.0, with values <1.0 identifying persons or items that are too predictable and >1.0 being unpredictable [26]. "Infit means are less sensitive than outfit means to extreme responses" [27]. The Means Square Index with values between .50 and 1.50 are acceptable and more preferred as they approach 1.0. If items depart from these parameters, then it alerted us to responses to items that may not be indicative of the STB constructs (e.g., indicative of perhaps reading level or experience).
Next, overall person and item "separation and reliability" provided a perspective of the variable on a continuum. Person separation <2 and reliability <0.8 implies that a survey may need more items because the instrument may not be sensitive enough to distinguish all participants. The advantage of significant person separation >2 is being able to analyze participant response patterns independently of the reliability of the survey items. Item separation <3 and reliability <0.9 implies that the participant sample is not large enough, which limits construct validity [28]. The Cronbach Alpha (KR-20) traditionally is reported as the conventional "test" reliability index in Rasch methodology.
Finally, detecting and evaluating the impact of multidimensionality uses item fit statistics and principal component analysis of residuals (PCA) (www.winsteps.com/winman/principalcomponents.htm). From a Rasch development model, the goal is to limit contrast (number of variables) and to focus on the variable of interest. This process adds to the validity and reliability for the items measuring the same variable or understanding that there may be alternative variables that need to be explored and segregated. Once items demonstrate unidimensionality and free from bias, the Rasch model places all items on a hierarchical linear scale, with the most simplistic items, the lower end, to the most difficult STB underlying concepts, at the upper end. This is plotted against participants ranging from least knowledgeable, lower end, to the most knowledgeable, at the upper end (see Figure 3). After data are transformed through Rasch modeling to a linear measure, yielding individual Person Measures (Winsteps, 2020) they provide person ability estimates that may be statistically assessed for scale measurements (e.g., t-test). Scores can range from -4 to 4, where 0 refers to a 30% probability of success. Independent t-tests were then performed in IBM SPSS software version 26.0 (IBM Corp., Armonk, NY) to identify differences between groups in preand posttest scores. Cronbach's alphas were determined to identify difference between groups and categorical results with Pearson measures reported. We hypothesized that sample characteristics of gender, race, educational level would not have statistical differences between the standardized Rasch Person Measures. We further hypothesized that level of first aid training and specifically hands on training with a tourniquet would be different than those without such training.

Results
Using a Rasch approach, pre and post course SBEAT administrations signify individual responses because the person should be different based on any intervention. In this Community Pilot, 181 surveys were submitted, nine were substantially incomplete, leaving 172 for analysis. Demographics were collected from 136 participants (see Table 2), representing individuals taking the instrument the first time. Recruitment happened predominantly with high school and college students resulting in a sample that was younger, Caucasian, and female with a diversity of previous first aid training, including tourniquet training with practice (11%).  The 83 individual items' Infit measures (MNSQ) fell within the acceptable ranges (person 0.99 and item 1.00). The original modeled data reported person separation statistic of 2.27 (0.84 reliability) for 171 persons (one person dropped from analyses because of a perfect score) and item separation statistic of 5.10 (0.96 reliability) for all 83 items. In examining the Disattenuated (Pearson) Correlation, with extreme scores removed, showed about half as much variance as they did independently, meaning possible different variables between the three clusters identified as well as a negative correlation between clusters, however the unexplained variance all fell below three eigenvalues (PCA), representing that these were a combination of random and accidental variance.

Characteristic Frequency (n) Percentage
Of the 83 items, several were redundant in differentiating individuals, as well as redundant in measuring a learning outcome. Refinement of the model commenced to reduce the number of items to a meaningful measure that would be practical to administer in the field. The final model for SBEAT utilized 34 items found within 19 survey questions (see Appendix A). In this version, all the items fell below the 1.5 threshold for fit statistics; however, five items fell below the 0.5 MNSQ, with the average Outfit of 0.85 for both person and item. Upon investigation, the five items all did not change the overall variance explained by measures (34%), as well as provided a lower anchor for the scale to improve person separation, and thus were kept. All the Infit measures (MNSQ) fell within the acceptable ranges (person and item 1.00).

Relationship to other variables
Design of the SBEAT follows the two expectations of Rasch modeling calibration and measurement: "(1) that a more able person should always have a greater probability of success on any item than a less able person, and (2) that any person should always be more likely to do better on an easier item than on a harder one" [29]. In the SBEAT Wright map, persons are represented in a hierarchy of low to high performers. Items, similarly, are presented from easy to hard (see Figure 3). A person score adjacent to an item score correlates to that a person having a 30% chance of answering the item correctly. We chose this metric over the default 50% chance because SBEAT users need greater assurance; it is the competency and not chance being measured.
The average person score (n = 171) was 0.51. Gender, race, and educational level did not differ significantly in Person Measure. Groups we hypothesized being different are summarized in Tables 3-5.

Discussion
We explored simple to complex cognitive and behavioral survey items and their variations to represent the competencies for treating LTB. The rigors of soliciting expert clinicians and educators in identifying learning outcomes reduced the chance of including non-relevant or distracting elements, thus increasing content validity. The inclusion of community input into the structure of building items helped build stronger construct validity arguments by appropriating non-medical language that is accessible to people with post-elementary education. Our substantive theory follows a pattern of being able to recognize an LTB from environmental context (what do you see) and the victim (how are they acting), then choosing an appropriate action based on the injury (use of pressure or tourniquet), and finally the sequence of applying pressure or a tourniquet, including actions in case hemostasis is not achieved. These sequenced competencies also represent modules within various LTB curricula. reliability indicating the internal consistency of the scale. Similarly, we worked to have an appropriate Item separation to verify item hierarchy. The high item separation (5.56) in the model implies a large enough person sample to confirm item difficulty hierarchy/construct validity of SBEAT [28]. Unidimensionality was confirmed as the eigenvalue plot (i.e., scree plot) showed one dominant factor and explained at least 34% of the variance and the contrasts (factors) loadings were under three eigenvalues.
The sample, although young, represented a large audience for STB education in schools and community. The diversity of first aid experience demographically, and specifically with tourniquets contributed to assessing our hypotheses for discrimination by the tool, when compared to demographics that should not influence STB competencies (gender, race, or educational level). Relationship to other variables is an important element when developing a novel tool because other tools are not comparable. With our hypothesis testing we established a high sense of connection as those with more experience in first aid education and specifically tourniquet training performed higher on the SBEAT. This then was followed by people with practice with tourniquets measure significantly higher than those only introduced to tourniquets. Although there is not a discrete or perfect correlation or exclusiveness to the level of training, this in vivo sample represents a range of learners that exist as potential or actual participants. The final SBEAT, as a whole, offers administrators of the tool content and construct validity for use in assessing knowledge and intended behaviors for LTB irrespective of social demographics [30].

SBEAT application
The SBEC aims to standardize the features of bleeding control programs in order to maximize participant success and increase public safety. SBEC recommended establishing tiered training categories for education domains, objectives, definitions, content, educational design, and assessment to motivate learners to act when faced with a hemorrhage emergency, distinguish life-threatening hemorrhage, and learn to apply pressure. The SBEC also suggested educational design including web-based, in-person didactics and in-person skill for different levels of training. SBEAT can be used to comparatively evaluate the learning outcomes of each of these educational designs/modalities, as well as evaluating between tiers of training for all levels of learners.
Rasch measures are reported differently than percentages of correct vs. incorrect responses, which on the surface maybe confusing to those trying to understand the output. Person measures range from -4 to 4, with zero being the mean, and correlate to the item measure. The Wright map provides a picture of the exam items and ability of the SBEAT participants on an interval scale. Theoretically, when participants and items are opposite each other on the map, the difficulty of the item and the ability of the candidate are comparable, so the candidate has approximately a 30% probability of answering the item correctly. More important to the application of SBEAT is the Person Measure to linear statistical analysis.
Psychometrically, we can identify varying amounts of STB competencies in individuals and consequently in educational interventions, with appropriate samples, using the SBEAT. As educational and public health organizations utilize SBEAT, part of the value may be in the development of pretest/posttest versions for completion or certification. The use of Rasch methodology encourages future item development, for which pretest/posttest equivalency can be assessed; assuming each measure remains part of a single variable because it is on a single scale [20]. The development of new items may also serve to keep the instrument fresh and reduces the chance that organizations only teach to specific test items versus larger concepts.
All new items should be piloted and then plotted using Rasch analysis.
Moving forward, SBEAT should be centralized for free web-based administration to organizations that can use it to assess and improve their learners' outcomes based on LTB competencies and local implementation. Additionally, the use of outcomes from a common tool allows assessment between training programs, from which guidelines can be developed. From this effort standard setting activities may also establish mastery levels outcomes at various training levels.

Limitations
As with most first aid educational assessments, SBEAT validity did not include observation of participants in LTB situations. Nor were the individual skills of learners assessed before or during STB education evaluated in relationship to SBEAT measures. We utilized the expertise of individuals with actual experience and practice of STB competencies to establish construct validity of SBEAT. There is also a purposeful lack of categorical definition for educational success described by this instrument, as any threshold value of a certification or documentation of completion of a STB course needs to be valued by an organization in context of the learner and expectations for results. For example, legally obligated responders may need a higher level of performance, where a community member might be judged successful if they know when to call for help, while applying pressure.
The tool may also be limited at the more knowledgeable end as people may obtain a perfect score. Based on the development process we are professionally comfortable with people demonstrating up to the maximum score as there is a finite knowledge of bleeding and control competencies in the first aid domain. The Person and Item separation scores however discriminate within and between groups at various levels of training, which add to the overall value of the tool. However, additional tools are needed to measure the intention to help and skill assessment components to achieve a comprehensive educational assessment of a curriculum, with the SBEAT being a primary component.
In some cases of first aid education, assessments provide a level of feedback to the learner to encourage a person to learn more or feel confident about their level of knowledge, however this development and validity study did not have this intent. If educational organizations desired to do this, modifications to scoring and application of SBEAT would need to take place. We chose to apply a Rasch Analysis to the data, which we believe are equally valued and reliable and not subject to interference from extraneous variables, however other forms of assessment need to now take place in terms of applying it to actual learning outcomes, including intention to aid, to holistically assess all SBEC outcome recommendations.

Conclusions
We have developed and validated the first universal assessment tool for knowledge and competencies to stop LTB. The SBEAT project establishes a standardized assessment tool to evaluate the cognitive aspects of the SBEC objectives and universal first aid competencies. The SBEAT also models a Rasch approach for validating a competency in education. Additional work is needed to merge the motivational and psychomotor aspects of STB competencies. We believe SBEAT to be validated and implementable across training programs to determine the level of learning outcomes of individual students for program assessment and between programs. Comparison of different teaching styles and methods will allow for the development of best practices for future bleeding control education and help organizations demonstrate value to learners, funders, and policy makers. In addition, with a validated widely disseminated assessment, curriculum designers can gather the information needed to continue to design and advance training in bleeding control.

Appendices
Which of the following need immediate first aid to stop the bleeding: Options: Photo of Scrape from a fall Photo of full thickness burn Photo of Cut to thigh GIF file of Spurting blood from leg Which of the following need immediate first aid to stop the bleeding: Options: Blood spotting through one piece of gauze

Blood soaked clothing or bandages
Bleeding that is not soaking through a bandage The overlying clothes are soaked in blood Which of the following need immediate first aid to stop the bleeding: Options: Pooling of blood on the ground Enough blood to fill 1/2 a soda can Two-tablespoons of blood from the injury on the ground A hole in the skin without flowing blood Which of the following need immediate first aid to stop the bleeding: Options: Dripping blood from a cut Gushing blood that will not stop coming out, even with a hand pushing on it