Systematic Comparison of OWAS, RULA, and REBA Based on a Literature Review

This study aimed to systematically compare three representative observational methods for assessing musculoskeletal loadings and their association with musculoskeletal disorders (MSDs): Ovako Working Posture Analysis System (OWAS), Rapid Upper Limb Assessment (RULA), and Rapid Entire Body Assessment (REBA). The comparison was based on a literature review without time limitations and was conducted on various factors related to observational methods. The comparisons showed that although it has a significant limitation of comprising only two classifications for the leg postures, (1) the RULA is the most frequently used method among the three techniques; (2) many studies adopted the RULA even in evaluation of unstable lower limb postures; (3) the RULA assessed postural loads as higher risk levels in most studies reviewed in this research; (4) the intra- and inter-reliabilities for the RULA were not low; and (5) the risk levels assessed by the RULA were more significantly associated with postural load criteria such as discomfort, MHTs and % capable at the trunk, and MSDs.


Introduction
Work-related musculoskeletal disorders (WMSDs) are one of the most frequent occupational disabilities in the industry. WMSDs accounted for approximately 67% of occupational injuries and illness in Korea in 2019 and 29-35% in the USA in 1992-2010 [1,2]. WMSDs cost approximately 40% of the global compensations for occupational and work-related accidents and diseases [3]. Work-related musculoskeletal load due to awkward or static postures, excessive amount of force, repetitive effort, etc., is known to be a strong risk factor for developing WMSDs [4]. For managing and preventing WMSDs, it is critical to assess exposure to risk factors of WMSDs and to implement intervention programs that reduce the load to acceptable levels for workers [5][6][7].
Therefore, ergonomists and practitioners use various tools for quantifying exposure levels. Of the varying assessment methods, observational techniques have been used more frequently. While the use of direct measurement approaches, including motion capture/measurement, electronic goniometer, and push/pull force sensors, has minimally increased, the use of observational methods, such as Rapid Upper Limb Assessment (RULA) [8], Rapid Entire Body Assessment (REBA) [9] and Ovako Working Posture Analysis System (OWAS) [10], by US ergonomists has significantly increased in 2017, compared to that in 2005 [11]. The observational techniques are inexpensive, easy to use, flexible, and do not interfere with workers' tasks or the jobs being performed [5,[12][13][14][15][16].
Although many observational methods have been developed and applied for assessing musculoskeletal disorders (MSDs)-related risk factors, the OWAS, RULA, and REBA have been most frequently applied in industries for assessing the load of the whole body [11,17]. The OWAS technique was developed by Ovako Oy, a Finnish steel company, and identifies four work postures for the back, three postures for the arms, seven postures for the lower limbs, and three categories of the weight of loads handled or amount of force used [10].
The RULA was proposed to provide a quick assessment of the load on the musculoskeletal system due to the postures of the neck, trunk, and upper limbs, muscle function, and external loads exerted [8]. The REBA technique is a postural analysis system that is sensitive to musculoskeletal risks in a variety of tasks, especially in the assessment of the working postures found in health care and other service industries [9]. While the OWAS and REBA classify the joint motions of the whole body into some groups, the RULA focused on the classification of the upper body including the trunk.
The three techniques have been cited in relevant literature approximately 1670-3680 times, which resulted in far more citations than any other observational method [6,14,18]. Joshi and Deshpande [19] reported that of the 18 observational techniques dealt with in their study, the REBA (69%) was compared most frequently, followed by the RULA (64%), Strain Index (36%), and OWAS (33%), based on 39 comparative studies.
The above studies have compared the three techniques from a few perspectives that were different from one another. Consequently, these existing studies make it difficult to draw a comprehensive decision on which method is better or more suitable for a situation. Therefore, this study aimed to systematically compare the three observational techniques of the OWAS, RULA, and REBA based on a literature review, which included almost all perspectives mentioned above.

Materials and Methods
The comparison of the three observational techniques was mainly based on a literature survey. The author of this study has conducted several relevant studies, and already had much of the literature considered in this study. Other relevant literature was supplemented based on the reference lists of these articles and via a search of the electronic databases such as ScienceDirect and Scopus without time limitations using the following keywords: "OWAS," "RULA" and "REBA." Of the 190 papers/articles searched, 109 studies that were not related to this study, included a method or did not deal with any of the comparison factors used in this study were discarded. Whether a study was included in this study was judged based on its title and abstract. Only peer-reviewed journal papers that dealt with more than two methods or at least a comparison factor, and were written in English were considered. However, the author excluded those studies that were presented at conferences or found in edited-book chapters. Of the remaining 81 studies, 47 studies were performed or owned by the author. Thirty-four studies were newly selected for this study through the search process. A total of 81 studies were used in the subsequent comparisons.
Although there is no generally accepted standard way to systematically compare or evaluate observational techniques [16], this study compared the three techniques based on the general characteristics, applied fields, risk levels, agreements and correlations between methods, intra-and inter-rater reliability, and validations. The perspectives mentioned above have been used in existing studies that compared or evaluated ergonomics assessment tools. The validations were investigated by analyzing correspondence with valid references and associations with MSDs [16].
The OWAS and RULA classify postural loads for the urgency of corrective actions into four action categories and action levels, respectively, the meanings of which are similar. The REBA groups postural loads into five action levels, which have slightly different interpretations from the action categories/levels of the OWAS/RULA. To facilitate an effective comparison, the five REBA action levels were regrouped into four categories by considering the meanings of the action categories/levels in these three techniques. Therefore, the four new REBA action levels were as follows: action level 1 (originally action level of 0), 2 (originally action levels of 1 and 2), 3 (originally action level of 3), and 4 (originally action level of 4). This regrouping is identical with that by Kee and Karwowski [24].
In case agreement rates and correlations among the three methods, which will be employed as comparison factors in the following, were not presented in the original corresponding studies, the values were calculated with using the relevant results provided in the original studies or experimental data by the author, if possible.

General Characteristics
Kee [23] summarized the general characteristics of the three methods, based on previous studies [5,16,17,19], the results of which are shown in Table 1. While the OWAS assesses only the loadings of posture and force/external load, the RULA and REBA evaluate the exposure to posture and force/external load as well as repeated and static posture effects. The REBA has two additional assessment factors of coupling and dynamic loading effects, compared to the RULA. The OWAS does not categorize the left and right upper extremities during assessment, but the RULA and REBA evaluate only one side at a time, which is considered to be under greater stress. If it is difficult to decide which side is more loaded, both sides are assessed. The OWAS assesses postural loads based on the time sampling, while the REBA selects and estimates the most common or prolonged, or loaded postures [16]. However, it is general for the OWAS and RULA to observe the most common or prolonged, or loaded postures as in the REBA. The three observational methods comprise of four or five action categories or levels for classifying the risk category [8][9][10]. The three techniques do not consider the effects of recovery, duration, vibration, environmental conditions, and psychosocial and individual factors, which have been known to affect the occurrences of MSDs [12,16,34,35]. The relative strengths and limitations of the three methods are briefly summarized in Table 1, based on Takala et al.'s study [16]. Necessity to decide which side to observe * O: included; X: not included; ** Dynamic loading means rapid large changes in posture or an unstable base.

Application Fields
Among 166 OWAS-applied studies published between 1977 and 2017, Gómez-Galán et al. [14] identified a total of 12 work environments assessed in the studies. The OWAS was the most frequently applied technique in manufacturing industries with a total of 34 published articles, followed by healthcare and social assistance activities (22), information and communication (19), agriculture, livestock, forestry and fishing (19), transportation and storage (10), and construction (9).
Gómez-Galán et al. [18] stated that based on 226 RULA-relevant publications between 1993 and 2019, the manufacturing industries with 74 publications overwhelmingly ranked first in the applications of the RULA, followed by human health and social activities (38); agriculture, forestry, and fishing (18); transportation and storage, administrative and support service activities, and education (12); and information and communication (9).

Risk Levels by Methods
The 44 studies that dealt with assessments for postural loads using two or more of the three methods are summarized in Table 2. The summary included the author(s) of the studies, application fields, sample size, and rank order for risk levels evaluated by the three methods. The three techniques have been applied to various fields such as aerospace, agriculture, bicycle repairing, cattle slaughter, construction, experimental environment, food, forestry, hospital, kitchen, laboratory, lifting tasks, manufacturing, potter and sculptor, sawmill, service industries, tailors, typist, and welders. The sample sizes for the applications varied from two working postures [36] to 4251 postures [37]. For effective comparison, some studies regrouped postural loads into three or four risk levels with consideration of the meanings of risk categories for the techniques adopted. The number of risk levels reclassified is specified in the column of 'Remarks' in Table 2. Cremasco et al. [38] adopted the normalized values based on the minimum-maximum transformation for comparing the differences between the REBA and RULA scores.   Nine of the 10 studies dealing with the OWAS and RULA applications revealed that the RULA assessed postural loads as higher risk levels than the OWAS. A study by Mukhopadhyay et al. [39] showed that the two techniques equally evaluated nine activities performed in bicycle repairing units as action level 4.
Of the 13 studies in which the OWAS and REBA were simultaneously applied, 10 studies except those by Kee et al. [25], Isler et al. [37], and Mukhopadhyay et al. [39] concluded that the REBA-rated postural loads were more stressful than the OWAS-rated postural loads. According to Kee et al. [25], the OWAS evaluated postural loads for 72 experimental postures as more stressful than the REBA, but the differences were not statistically significant (Wilcoxon signed-rank test, p > 0.20). Isler et al. [37] estimated postural loads for 4237 and 4251 postures in the clothing sector using the OWAS and REBA, respectively, which resulted in no significant differences between the postural loads by the OWAS and REBA (Paired t-test, p > 0.10). Mukhopadhyay et al. [39] reported that the OWAS and REBA assessed nine activities in bicycle repairing with the same action category or level 4.
Of the 36 studies that adopted the RULA and REBA as ergonomic risk assessment tools, 30 studies demonstrated that the RULA estimated postural loads for the selected postures more stressfully than the REBA. Meanwhile, of the six studies in which the RULA did not evaluate postural loads as more stressful than the REBA, two studies by Mukhopadhyay et al. [39], and Pal and Dhara [36] exhibited that the RULA and REBA identically evaluated the postures selected in the bicycle repairing and uprooting job of rice cultivation, respectively. According to Balaji and Alphin [40], and Bhatia and Singla [41], there were no significant differences between assessments by the RULA and REBA. Kulkarni and Devalkar [42] claimed that while the RULA evaluated five activities such as granite cutting, brickwork, shuttering, plastering, and material transportation in construction as action level 3 or 4, the REBA evaluated them as action level 4. Sain and Meena [43] indicated that while the RULA rated 154 postures sampled from four tasks including spading, mold filling, mold evacuating, and brick carrying in clay brick kiln work as action level 3 on average, the REBA evaluated the postures for spading and mold filling as action level 3, and for mold evacuating and brick carrying as action level 4.
The above showed that the RULA generally rated postural loads as the highest risk levels, the REBA as the second, and the OWAS as the third.

Agreement Rates between Methods
Some studies presented the results following assessment by two or three methods, but did not explicitly provide the agreement rates between the methods in the original articles. The agreement rates by Cremasco et al. [38], Kulkarni and Devalkar [42], Gallo and Mazzetto [47], Garcia et al. [48], Noh and Roh [49], Sahu et al. [52], and Paini et al. [67] were calculated using the assessment results presented in the original studies by the author of this study, and those by Kee et al. [25] were obtained using the experimental data. The agreements among the three methods varied depending on the studies (Table 3). While the values between the RULA and REBA by Chiasson et al. [12], Pal and Dhara [36], Cremasco et al. [38], Jones and Kumar [46], Kulkarni and Devalkar [42], Sahu et al. [52], and Qureshi and Solomon [72] were not less than 60%, the rates in other studies were lower (≤50.0%). The agreement between the OWAS and RULA by Garcia et al. [48] was null (k = 0). In the study, the OWAS estimated the prevalence of risk of upper limb MSDs as 'low' or 'medium,' while the RULA evaluated it as 'high' or 'extremely high.' The agreement rates between the OWAS and RULA, and between the OWAS and REBA for most studies except two studies [24,25] were ≤50%. The mean agreement between the RULA and REBA was higher than those between the OWAS and RULA and those between the OWAS and REBA.

Inter-and Intra-Rater Reliability
Karhu et al. [10] and de Bruijin et al. [73] presented the intra-and inter-rater reliabilities of the OWAS, using 36,240 observations of 52 tasks in their original study on the OWAS, and 45 slides showing nurses in different working postures, respectively. The intra-rater reliabilities by Karhu et al. [10] and de Bruijin et al. [73] ranged from 70 to 100% and from 83 to 100%, respectively ( Table 5). The intra-rater reliabilities determined on a 4 week-interval by de Bruijin et al. [73] ranged from 88 to 97% (mean: 92%), and those for a 3.5 month-interval ranged from 83 to 100% (mean: 90%) depending upon the body parts. Five studies by Karhu et al. [10], de Bruijin et al. [73], Kivi and Mattila [74], Mattila et al. [75], and Lins et al. [76] provided the inter-rater reliabilities for the OWAS, the agreement values of which ranged from 23% to 100% depending on the raters and body parts. Lins et al. [76] presented the k values together with the agreement rates according to the body parts assessed, the values of which corresponded to 'very good.' They also found no significant differences between raters with and without prior training in physical therapy.
McAtamney and Corlett [8] claimed in the original study dealing with the RULA that there was a high consistency of scoring among 120 raters. Dockrell et al. [77] reported the intra-and inter-rater reliabilities for six raters (three physiotherapists and three undergraduate physiotherapy students), based on the 24 children's computing postures. The interclass correlation coefficients (ICCs) of the intra-rater reliability were 0.27-0.86 for the action levels and 0.47-0.84 for the grand scores depending upon the raters. The ICCs of the inter-rater reliability were 0.54-0.72 for the action levels and 0.50-0.77 for the grand scores. In addition, the intra-rater reliability was generally higher for the older children than for the younger children, especially when the physiotherapists assessed the postures. Laeser et al. [78] found that the inter-rater reliability for keyboarding and mousing tasks of 58 students in the sixth and eighth grades using Kendall's coefficient of concordance was statistically significant (Kendall's W = 0.773) and that the median score of a team of independent observers for four subjects' videotaped postures was significantly correlated with the lead investigator's ratings (the first author of the study) (Pearson's r = 0.96, p < 0.05). Breen et al. [79] and Oates et al. [80] showed that the inter-rater reliability of the RULA was 94.6% and Ebel r = 0.73 based on the observations in children using computers, respectively.
Widyanti [84] examined the inter-rater reliabilities of the OWAS, RULA and REBA, based on the % agreement and k value among 50 new raters. The mean inter-rater reliabilities of the OWAS, RULA and REBA were 57.07% (k value 0.39), 58.25% (k value 0.20) and 50.14% (k value 0.26), respectively. There were no significant differences in % agreements and k values among the three methods. In summary, the inter-and intra-rater reliabilities of the OWAS were the highest, those of the RULA ranked second, and those of the REBA ranked third.

Validation of the Three Methods
Previous studies by Kayis and Kothiyal [85], Olendorf and Drury [86], and Hellig et al. [87] argued that the assessments by the OWAS were associated with the subjective load criteria such as Borg scale [88] and perceived exertion scores (Table 6). Kayis and Kothiyal [85] and Hellig et al. [87,89] indicated that the OWAS action levels were in agreement or correlated with the biomechanical measures including the L5/S1 compressive forces and corresponding muscle activities. On the contrary, van der Beek et al. [90] asserted that the ranks for distinct scaffolding tasks determined by the OWAS were different from those determined by the revised NIOSH (National Institute for Occupational Safety and Health) lifting equation [91], lifting guidelines for the Dutch construction industry (Arbouw method) [92], and rapid appraisal of the NIOSH lifting equation (practitioner's method).      McAtamney and Corlett [8] suggested that the neck and lower arm scores, and scores A and B by the RULA were significantly related to the self-reported pain, ache, or discomfort in the neck and lower arm, and the relevant functional unit regions, respectively (p < 0.01). Several studies showed that the RULA action levels, grand scores, and body part scores were associated with electromyography signal (root mean square amplitude), discomfort, job attitude scores, and maximum holding time (MHT) [23,25,79,93]. Massaccesi et al. [31] indicated that the trunk and neck scores by the RULA were significantly associated with selfreported pains, aches, or discomforts in the trunk and neck regions. The self-reported pains, aches, or discomforts in the above two studies were measured using the Body Discomfort Chart by Corlett and Bishop [94]. Shuval and Donchin [33], and Yazdanirad et al. [27] reported that the assessment results or scores by the RULA were associated with the prevalence of the upper extremity symptoms evaluated using the Nordic Musculoskeletal Questionnaires [95].
Rathore et al. [32] pointed out, based on the multivariate logistic regression models, that the REBA scores A and B and REBA scores were significantly correlated with musculoskeletal symptoms for the different body regions. The data pertaining to the subjective MSD symptoms were collected using the Cornell Musculoskeletal Discomfort Questionnaire [96].
Domingo et al. [29], and Vahdatpour and Sayed-Marramazani [68] dealt with the association of subjective MSD symptoms with postural loads assessed by two of the three methods (RULA and REBA, and OWAS and RULA, respectively). The former showed the inconsistency between the results of the subjective symptoms and the assessments by the RULA and REBA (Spearman rank-order correlation coefficient: 0.1 for RULA and 0.46 for REBA). The latter asserted based on the Spearman correlation coefficients that although the OWAS action level had no significant association with the incidence of MSDs, the level of risk based on the RULA had a direct relationship with the incidence of MSDs in the neck, upper back and knees (p < 0.05).
Some studies have compared and validated the three techniques simultaneously. Choi et al. [21], and Kong et al. [26] estimated 196 working postures of agricultural tasks with the Agricultural Upper Limb Assessment (AULA), Agricultural Lower Limb Assessment (ALLA), OWAS, RULA, and REBA, and compared the assessment results with the subjective evaluations using a 10-point scale by 16 ergonomic experts. The studies revealed that based on the quadratic weighted k values, the RULA was in 'moderate' and 'good' agreements depending on the studies (0.599 and 0.627, respectively), whereas the OWAS and REBA were in 'moderate' agreement (0.538 [21] and 0.501 [26] for OWAS, and 0.578 [21] and 0.490 [26] for REBA) with the expert evaluations.
Kee [23,30] and Kee et al. [25] compared the three techniques based on perceived discomfort, MHT, % capable at the shoulder and trunk, and association with MSDs using correlation analysis, chi-squared test, and logistic regression analysis. The studies revealed that the RULA grand score and REBA score were significantly correlated with discomfort, MHT, and % capable at the trunk [23,25], and that the RULA grand score and action level, and REBA action level were significantly associated with MSDs [30].The OWAS action category was significantly associated with % capable at the trunk, but the correlation coefficient was lower. However, the OWAS action category was not significantly correlated with discomfort and MHT [23,25], and the OWAS action category and REBA score were not associated with MSDs [30]. The OWAS action category, RULA grand score and REBA score were not significantly correlated with % capable at the shoulder [23]. The research by Kee [30] exhibited that the percentage concordances for the RULA grand score and action level (52.4% and 44.8%, respectively) were significantly higher than those for the REBA action level (22.1%) in the logistic regression analyses.
Widyanti [84] presented the validities for the OWAS, RULA and REBA, based on the correlation between new 50 raters' ratings and an ergonomics expert's ratings. Here, the expert's rating was used as the gold standard. The study claimed that there were no significant differences between ratings by the new raters and the ergonomics expert for the OWAS, RULA and REBA (Wilcoxon signed-rank test, p > 0.05), and that there were significant correlations between the ratings of the new raters and those of the expert for the OWAS (r = 0.802, p < 0.01), RULA (r = 0.799, p < 0.01), and REBA (r = 0.790, p < 0.01).

Discussion
This literature review systematically compared three observational techniques for assessing postural loads and/or whether the assessment results were associated with MSDs, with respect to varying viewpoints including the general characteristics, applied fields, risk levels, agreements and correlations between methods, intra-and inter-rater reliability, and validations. The findings indicated that although it focused on the classification of the upper limb postures rather than the whole-body postures, the RULA may have some advantages in estimating postural loads and the association with MSDs, application fields, intra-and inter-rater reliability, and validations among the three methods. This agrees with the findings of previous studies suggesting that the RULA gives more sensitive assessment results compared to the OWAS and REBA [58,69]. Consequently, the RULA is a more precautionary method to protect the health of the workers or operators during work [38].
Although the three methods have been applied in various fields for assessing postural loads, they have been most frequently adopted in the manufacturing industry, followed by in health and social activities, agriculture, forestry and fishing, information and communication, and transportation and storage [6,14,18]. Many studies used the RULA for assessing workloads in the administrative and support service activities, and education involving considerable sedentary work, due to its characteristics, focusing on the upper limb postures [18]. Shah et al. [97] used the RULA for analyzing sitting postures and the REBA for assessing standing postures, following the methods' characteristics. However, the RULA has often been employed for rating postural loads even in the agriculture, forestry and fishing, mining and quarry, construction, and shipbuilding industries as well as general hospitals, which require frequent unstable or awkward lower limb postures such as squatting and kneeling [18,98].
Many studies have evaluated postural loads in industrial settings and experimental environments using two or three techniques of the OWAS, RULA, and REBA. Of these, studies using the RULA and REBA were the most common (36 studies), followed by those using the OWAS and REBA (13 studies) and the RULA and OWAS (10 studies) ( Table 2). The literature review revealed that the RULA assessed postural loads for corresponding postures as more stressful or identical, compared to the OWAS and REBA in 10 of 10 studies (100.0%) and 34 of 36 studies (94.4%), respectively, and that the REBA estimated postural loads as more stressful or the same, compared to the OWAS in 12 of 13 studies (92.3%). In summary, 36 of 38 studies exhibited that the RULA rated postural loads in various fields more stressfully or the same, compared to the OWAS and REBA. Of four studies that the RULA evaluated musculoskeletal loads as the same risk level as the OWAS or REBA, two studies by Balaji and Alphin [40], and Bhatia and Singla [41] showed that there were no statistically significant differences between assessments by the RULA and REBA. A study by Pal and Dhara [36] had too small sample sizes of two postures to draw a general conclusion on the order of postural loads evaluated by the two or three techniques. Mukhopadhyay et al. [39] found that the three techniques equally evaluated nine activities performed in bicycle repairing units as action level 4. This may be because the postural loads for the nine bicycle repairing activities were sufficiently high to reach or exceed those of action category/level 4. The above implies that the RULA tends to generally evaluate postural loads as the most stressful, followed by the REBA and the OWAS.
The agreement rates between the RULA and REBA were found more frequently than those between the OWAS and RULA, and those between the OWAS and REBA, and were higher on average, followed by the rates between the OWAS and REBA, and those between the OWAS and RULA (Table 3). This trend may be partly attributed to the rank of the risk levels for the three methods (RULA > REBA > OWAS). The agreements varied widely ranging from null (k = 0) to 100%, which means that the risk assessment results by the three methods do not agree or correlate well.
The correlation coefficients among the three techniques varied from 0.415 to 0.785 according to the studies (Table 4); however, there was no established trend among the coefficients. Most studies showed low correlation coefficients of <0.5, which implies that the three methods yielded assessment results which were different from one another. The reasons for the low agreements or correlations may be the differences in the ability to assess musculoskeletal loads and the risk levels such as the action level and category between the three methods, and the different weights assigned to risk factors when calculating a score (scores A, B, C, and D and grand score in the RULA; scores A, B, and C and REBA score in the REBA) [99,100].
It is challenging to directly compare the intra-and inter-rater reliabilities for the three techniques (Table 5), because different measures such as agreement rate (%), ICC, Kendall's W, Ebel r, and k values were adopted according to various studies. The two ratings for evaluating the intra-rater reliability should be conducted at a sufficiently long time interval (2-3 weeks) to prevent recall bias [101]. However, Karhu et al. [10] tested the intra-rater reliability in the morning and afternoon of the same day, which might have resulted in low reliability. Based on this study's review (Table 5), it can be concluded that the intra-and inter-rater reliabilities for the OWAS were the highest, followed by those for the RULA and REBA, in that order. This may be attributed to the degree of complexity of the three methods. In other words, the intra-and inter-rater reliabilities for the most simple and easyto-use method, namely OWAS, were the highest. The result was in agreement with those of the study by Takala et al. [16]. Except for the three studies by Hignett and McAtamney [9], McAtamney and Corlett [8] and Widyanti [84], the number of raters who participated in the studies was very small (1-9 raters). It is known that discrepancies among raters frequently occur when a body segment is at a border between two ranges of the evaluation factors such as body posture, external load, and dynamic activities [8].
The validity of an observational technique can be evaluated by comparing it with a more valid method (concurrent validity, which assesses the correspondence of a method with more valid ones) or by estimating the association of its assessment results with MSDs (predictive validity, which assesses the association of risk estimates by a method with MSDs) [16]. This study summarized the validations by several studies (Table 6). While 12 of the 21 studies (57.1%) were classified into concurrent validity [21,23,25,26,79,[84][85][86][87]89,90,93], 10 studies were included under predictive validity [8,[27][28][29][30][31][32][33]68,93]. Of the validation studies, the study by Fountain [93] was corresponded to both concurrent validity and predictive validity. The concurrent validities were performed by comparing the assessment results of the three methods with subjective measures of Borg scale, exertion, discomfort, MHT, job attitude scores, ergonomic experts' evaluations, and objective criteria including L5/S1 compressive forces, outputs of revised NIOSH lifting equation, and muscle activity. Ten predictive validity studies were based on the prevalence of MSD symptoms and association with MSDs. Nine predictive studies obtained the data regarding MSD symptoms using questionnaires [8,[27][28][29][31][32][33]68,93], while a study by Kee [30] acquired the real MSD cases diagnosed by medical doctors from the industrial sites of automobile and automotive parts manufacturing, and construction. The nine predictive validities except Kee [30] investigated the association of assessment results by one of the three techniques with the subjective MSD symptoms, which makes comparisons for the three methods impossible. Kee [30] examined the association with real MSD cases with the risk levels evaluated using the three methods and concluded that the RULA could predict the association with MSDs more accurately than the OWAS and REBA. The RULA action level was more significantly associated with MSDs than the REBA action level (p < 0.01 and p < 0.05 in the chi-squared test, respectively). The former showed higher percentage concordance than the latter in the logistic regression analysis (44.8% and 22.1%, respectively).
The review results should be interpreted with caution, because (1) some studies had extremely small sample sizes or number of raters (sample sizes for Pal and Dhara [36], Cremasco et al. [38], Noh and Roh [49], Boulila et al. [60], Dev et al. [61], Paini et al. [67], and Kamath et al. [71] in Table 2 were <10), (2) the agreements and correlations showed large standard deviations (Tables 3 and 4), and (3) this study is mainly based on the literature by various researchers with different experiences and fields of knowledge. These may generate bias in their results. For obtaining more reliable assessment results, and intra-and inter-reliabilities, further research with statistically sufficient sample sizes is required.
It should be noted that although this study systematically compared three representative observational methods using categorized or itemized characteristics frequently found in relevant studies, it is not easy to directly compare observational techniques and to draw general conclusions due to their own strengths and limitations, and differences between such techniques. The differences may be attributed to the characteristics of methods, including the applicable fields/settings/populations, risk factors assessed (e.g., posture, repetition, intensity, force exerted, vibration, coupling, and temperature), body parts (e.g., whole body, upper limb, and arm/hand), application procedures, observation strategy, decision rules, evaluation outcomes (e.g., quantitative risk score and qualitative job analysis), etc. Further studies to integrate and interpret the itemized comparison results may be required to obtain more general conclusions.

Conclusions
For overcoming shortcomings of existing studies dealing with one or two methods of the three techniques or comparing the methods in terms of a few viewpoints, this study systematically compared the OWAS, RULA, and REBA methods based on several viewpoints related to the observational methods, such as the general characteristics, applied fields, risk levels, correlations and agreements between methods, intra-and inter-rater reliability, and validations. The results showed that (1) the RULA is the most frequently used method among the three techniques in the US [11]; (2) many studies adopted the RULA with just two postural codes for the legs to assess postural loads where unstable or awkward lower limb postures, such as squatting and kneeling, occurred frequently; (3) in most studies reviewed in this research, the RULA assessed postural loads as higher risk levels, compared to the OWAS and REBA; (4) the intra-and inter-reliabilities for the RULA were not low (or moderate); and (5) the risk levels assessed by the RULA were more significantly associated with MSDs than with that by the OWAS and REBA.
It may be asserted, based on the above, that the RULA is better suited for assessing postural loads and the association with MSDs. This may be backed by the statement that in the safety field, it may be more desirable to estimate certain postures or tasks in industries as more stressful in preventing WMSDs, rather than to assess them as less stressful [23].
Although the RULA has some advantages stated above, it should be noted that there may be sensitive and insensitive zones according to the evaluation factors and scores (A and B) in the RULA [102]. The percentage concordance for the RULA was not high (44.8%) based on the logistic regression analysis. Further study for developing a new observational technique with less insensitive zones and more significant association with MSDs will be needed.

Conflicts of Interest:
There are no conflict of interest to declare.