The reliability and validity of the short version of the WHO Quality of Life Instrument in an Arab general population

BACKGROUND AND OBJECTIVES: There is rising interest in quality of life (QOL) research in Arabian countries. The aim of this study was to assess in a nationwide sample of Kuwaiti subjects the reliability and validity of the World Health Organization Quality of Life (WHOQOL-BREF), a shorter version of the widely used QOL assessment instrument that comprises 26 items in the domains of physical health, psychological health, social relationships, and the environment. METHODS: A one-in-three systematic random proportionate sample of consenting Kuwaiti nationals attending large cooperative stores and municipal government offices in the six governorates completed the Arabic translation of the questionnaire. The indices assessed included test-retest reliability, internal consistency, item internal consistency (IIC), item discriminant validity (IDV), known-groups and construct validity. RESULTS: There were 3303 participants (44.8% males, 55.2% females, mean age 35.4 years, range 16 to 87 years). The intra-class correlation for the test-retest statistic and the internal consistency values for the full questionnaire and the domains had a Cronbach's alpha≥0.7. Of the 24 items that constitute the domains, 21 met the IIC requirement of correlation ≥0.4 with the corresponding domain, while 16 met the IDV criterion of having a higher correlation with their corresponding domain than other domains. Domain scores discriminated significantly between well and sick groups. In the factor analysis, four strong factors emerged with the same construct as in the WHO report. CONCLUSION: The Arabic translation of the WHOQOL-BREF has impressive reliability and validity indices. The poor IDV findings are due to the multidimensional nature of the questionnaire. The highly significant validity indices should reassure researchers that the questionnaire represents the same constructs across cultures. Negatively worded items possibly need refinement.

R ecent interest in quality of life (QOL) as an outcome measure in medicine has inspired studies in chronic medical [1][2][3] and general popu-lation subjects 4,5 in Arab and neighboring countries. To further this interest, it is important to use assessment instruments that are psychometrically sound and have cross-cultural validity, so as to make findings compa-rable across countries. In this regard, the short version of the World Health Organization' s QOL Instrument, the WHOQOL-BREF, 6 is of interest for the following reasons: First, it was simultaneously developed in diverse cultures, thus overcoming the usual controversy over

The reliability and validity of the short version of the WHO Quality of Life Instrument in an Arab general population
Jude U. Ohaeri, a Abdel W. Awadalla b the problem of applying a questionnaire articulated in one culture in a different culture. 7 This means that the instrument has a strong potential for easy cross-cultural applicability, since the items are framed in culture-neu-tral terminology. Second, the items include widely valued contextual factors of life that are not generally regarded as health-related. 8 Therefore, it is a generic instrument that assesses health-related QOL (HRQOL), and social, environmental and subjective well-being issues.
The Arabic translation of the WHOQOL-BREF has been shown to have highly significant structural in-tegrity characteristics. 9 The Persian translation has been shown to have significant values for some reliability and validity indices. 4 In this study, we sought to contribute to published data on the Arab population by sampling Kuwaiti subjects. Our specific objectives were to as-sess the following reliability and validity indices, in comparison with the WHO 23-country report: 6 (i) the test-retest reliability of the WHOQOL-BREF; (ii) the floor/ceiling effect and acceptability of the WHOQOL-BREF, as well as the internal consis-tency of the full questionnaire and its constituent four domains of QOL; (iii) the item-internal consistency (IIC) and item-discriminant validity (IDV); 10 (iv) the construct validity by exploratory factor analysis, and known-groups validity by analyzing the differences in domain scores between sick and well subjects. Based on published information, 4,6,9 we hypothesized that the reliability and validity indices in an Arab popula-tion would be consistent with these indices in other populations. Furthermore, all domains were expected to be strongly and positively correlated with the con-cept of overall QOL and health (i.e., general facet on health and QOL). 6

METHODS
Of the 3.4 million population of Kuwait, Kuwaiti nationals make up 1.1 million (48.9% male, 51.1% female) (2007 census). About 97% live in urban ar-eas. The adult literacy rate in Arabic is 83.5% (2003 estimate) and the unemployment rate is 2.3% (2004 estimate by the Kuwait Public Authority for Civil Information). Administratively, the country is divided into six governorates, each consisting of a centrally lo-cated large cooperative supermarket store, as well as municipal government and immigration offices. Our sampling framework was the six governorates, and participants were recruited at the above locations. Our sample size was guided by the recommendation of the International Quality of Life Assessment (IQOLA) project researchers, that the sample size for general population norming should be 2500-3000. 10 We obtained census data for each governorate from the Public Authority for Civil Information. Using quo-ta sampling, we generated the number of potential par-ticipants that would constitute a proportionate repre-sentation of governorates, by gender, for a sample size of at least 3000 subjects. Although Kuwait has the in-frastructure for generating a sampling framework for a representative general population study (e.g. telephone listing, voting register), the strict conservative culture makes it impossible to carry out house-to-house sur-veys for private studies such as ours. To overcome this obstacle, researchers have attempted to target people that visit popular government administrative offices (e.g. municipal government and immigration offices) and the large cooperative shops. In a recent Kuwaiti study, 11 it was found that the large cooperative shops constituted convenient venues for interviewing wom-en in traditional social roles who are not represented in the work force. In a study comparing the attitudes towards women in Kuwait and Qatar, Abdalla 12 over-came the difficulty of house-to-house surveys by in-terviewing respondents in their work places.
To obtain a fairly representative national sample, we used the above methodologies to sample prospec-tive respondents in these popularly accessed facilities of each of the governorates. This sampling methodol-ogy is feasible in Kuwait because of the small popula-tion, the urban residence, and the favorable education and employment circumstances. Hence, it is reason-able to assume that a fairly representative proportion of Kuwaiti nationals patronize the above popular ur-ban places, and that they are literate enough in Arabic to complete uncomplicated questionnaires on their own. 11 For the operational definition of QOL we accepted the WHO definition as the individuals' perception of life in the context of the culture and value system in which they live and in relation to their goals, expecta-tions, standards and concerns. 6 The measure focused on subjective QOL, as distinct from objective QOL. 13 The WHOQOL-BREF The WHOQOL-BREF is a 26-item, self-admin-istered, generic questionnaire that is a short version of the WHOQOL-100 scale. 6 The response options range from 1 (very dissatisfied/very poor) to 5 (very satisfied/very good). Assessments are made over the preceding two weeks. It consists of domains and fac-ets (or sub-domains). The items on "overall rating of QOL" (OQOL) and subjective satisfaction with health constitute the general facet on OQOL and health. The more popular model for interpreting the scores has four domains, namely, physical health (seven items), psychological health (six items), social relations (three items) and environment (eight items). Our analysis was based on this model. The domain scores of the WHOQOL-BREF can be computed in three ways. The first is a summation of the raw scores of the constituent items. The second and third ways consist of transforming the raw scores. In the second way, the raw scores are transformed into scores that range from 4-20, to be in line with the WHOQL-100 Instrument. The third way converts the 4-20 scores onto a 0-100% scale. 14

Data collection
The study took place in 2006-2007. The questionnaire was translated into Arabic by the method of back-trans-lation. Ethical approval for the study was obtained from Kuwait University and the Kuwait Foundation for the Advancement of Science (KFAS). We obtained per-mission from the authorities of each study location to interview Kuwaiti nationals attending their facility. All participants gave verbal informed consent. The staff of a professional social research company was responsible for circulating and retrieving the questionnaires. These research assistants (RAs) were all Kuwaiti nationals, aged 21 to 27 years, who had previously undergone a 2-month course on research methods and interview techniques at Kuwait University. Pairs of RAs (one male, one female) worked in their own governorates of residence, so that they were familiar with the cultural environment. As a rule, prospective female respondents were approached by a female RA. To recruit prospec-tive respondents, the RAs were positioned at the main entrance of a place, introduced themselves as Kuwaitis doing a research, and politely asked the nationality of the prospective participant. This was to ensure that only Kuwaiti nationals were recruited. Thereafter, the objectives of the study were explained and the subject was informed that no penalty would result from de-clining to participate. We used a systematic random sampling of one-in-three subjects (by gender). If a sub-ject declined to participate, the next third person was approached. We continued in this way until we reached the population quota for the governorate. At the pre-liminary stage of the study, the RAs were trained in the use of the study' s questionnaires in a one-week period. We did not compute inter-rater reliability be-cause the questionnaires were all administered as selfreport. Only subjects literate in Arabic were invited to participate. In all cases, the participants completed the questionnaire anonymously and privately. The RAs were nearby to offer assistance in clarifying the items. All subjects completed the Arabic translation of the questionnaire. Test-retest reliability was done by giv-ing the questionnaire twice in a one-week period to 50 subjects who did not participate in the main study.

Data analysis
Data were analyzed by SPSS version 11 (SPSS Inc, Chicago, Illinois). QOL domain scores (range, 4-20 and 0% to 100%) were generated by organizing the items into the four domains as recommended. 6 Data for test-retest reliability were analyzed by intra-class correlation coefficient (ICC). The internal consistency for the full questionnaire and domains was assessed by Cronbach' s alpha for the 3303 participants, with a rec-ommended cut-off value of ≥0.7. The acceptability of the questionnaire was assessed by the missing values or proportion of respondents who failed to complete each item. A cut-off value of <2.5% is recommended for each item. 10 The proportion of respondents scoring at the lowest level (floor effect) and the highest level (ceil-ing effect) for each item was assessed. This is a measure of sensitivity to change and how far the item can be assumed to be capturing the full range of potential re-sponses in the population. 10 Item internal consistency (IIC) and item discriminant validity (IDV), measured by the Pearson correlation, were assessed after control-ling for item overlap in the corresponding scale. The IIC and IDV concern the relationship of each item to its hypothesized scale or domain. The IIC rule requires that the item should correlate r ≥0.4 with its adjusted scale score. For IDV, the item should have the high-est correlation with its scale, in comparison with other scales in the questionnaire. 10 The construct validity of the WHOQOL-BREF was assessed by exploratory factor analysis, using prin-cipal components analysis with orthogonal rotation for factors that have Eigen values above 1. 6 Known-groups validity was assessed by testing the significant differ-ences in domain scores between subjects who rated themselves as being well and those who rated them-selves as being sick, using standardized effect size cal-culations. The relationship between the general facet and the domains was assessed by the Pearson corre-lation. Missing data were handled by excluding cases analysis by analysis. The level of statistical significance was set at 5%, and all tests were two-tailed.

Socio-demographic characteristics
Of the 3376 subjects who agreed to participate in the study, 73 questionnaires were voided because subjects did not complete over 20% of the items of the WHOQOL-BREF, as recommended by the WHOQOL Group. 6 Hence we report data for 3303 subjects. The 3303 participants consisted of 44.8% (n=1479) men and 55.2% (n=1824) women, aged 16 to 87 years with a mean (SD) of 35.4 (11.9) years, with a proportionate representation of the six gov-ernorates. As in the Kuwaiti national general popu-lation, women were in the majority and 2.4% of the subjects were 65 years of age or older. The subjects were predominantly educated (59.9% had at least col-lege education), employed in skilled work (58.4%), and married (60.8%).

Descriptive statistics and internal consistency
The ICC for the test-retest statistic (0.95) was highly significant. Similarly, the internal consistency values for the full questionnaire and the domains met the 0.7 Cronbach' s alpha value requirement (Table 1 ). With the exception of the item on sexual satisfaction (7.9%), the proportion of missing values ranged from 0.06% to <1% for 18 items, and from 1% to 1.8% for seven items. This shows that the contents of the question-naire were generally acceptable to the respondents. Our rate of missing values for the item on sexual satisfac-tion is comparable to the 6.0% reported by the WHO, but higher than the 2.5% reported from Iran 4 where the questionnaire was interviewer-administered. The mean (SD) scores for the items ranged from 3.39 (1.2) to 3.89 (0.99) for 23 items; it was 4.0 (1.1) for two items (mobility and transport), and 2.96 (0.98) for one item (negative feelings). With regard to the floor and ceiling effects, the total frequency of lowest scores was 4.7% (range, 2.2% to 8.9%), while the total frequency of high-est scores was 24.9% (range, 6.8% to 42.2%). The com-parable figures for the WHO report were 4.0% (range, 1.7% to 8.8%) and 17.5% (range, 10.1% to 35.2%), re-spectively. 6

Item internal consistency and item discriminant validity
Of the 24 items that constitute the domains, 21 met the IIC requirement of correlation ≥0.4 (Table 2 ). Only the three negatively worded items (on pain, need for medi-cal treatment and negative feelings) did not meet this requirement. However, these three items were among the 16 that met the IDV criterion of having a higher correlation with their corresponding domain more than other domains. Two more items had equal correlation with at least one more domain other than their corre-sponding domain. In the WHO 23-country report, 6 it was noted that in 7 of 24 centers, the items on pain and the need for medical treatment were generally problem-atic in item-total correlations in the physical domain. In addition, poor item-total correlation (<0.3) was noted for negative feelings in one center.

Construct validity: factor analysis
In factor analysis with all the 26 items, four strong fac-tors (i.e., each with >3 items) and one weak factor (with 3 items) emerged, accounting for 58.7% of the variance. Each item loaded highly (≥0.45) on the correspond-ing factor (Table 3). Interestingly, the four strong fac-tors emerged exactly in the same sequence and with the same construct as in the WHO report. The three negatively worded items constituted the fifth factor. Of the 8 items in Factor I, five belong to the original physical health domain. It is noteworthy that the two general facet items (representing subjective well-being) were in this factor. Factor II is conceptually similar to the original psychological health domain because, of the five constituent items, three belong to psychological health. Factor III is conceptually similar to the original social relations domain because it contains the three items that define that domain. Factor IV appears to be a tighter definition of the environment domain because all the seven items were derived from that domain.

DISCUSSION
A major limitation of the study is that the subjects were not a representative house-to-house sample of the general population. In particular, those who did not attend the study locations were not represented. This limitation is inherent in the socio-cultural circumstanc-es of the country. 11,12 In view of this limitation, it was not practically possible to keep track of the number of refusals and their characteristics. However, it is reason-able to assume that our nationwide sample had adequate characteristics to estimate the psychometric properties of the WHOQOL-BREF. 10 In line with the hypothesis that the reliability and validity indices are similar to those in other populations, the Arabic translation of the WHOQOL-BREF gen-erally met the statistical criteria for the reliability issues investigated. In particular, our findings were similar to those of the WHO 23-country report. 6 There are three notable discrepancies: the relatively high proportion of missing values for the item on sexual satisfaction, the fail-ure of the three negatively worded items to meet the IIC requirement, and the fact that only 16 items met the IDV requirement. A relatively minor discrepancy is the fact that the alpha coefficient for the social relations domain (0.69) was just short of the 0.7 mark. These discrepan-cies are well known in the literature. The sexual item is usually a problem, where respondents complete the ques-tionnaire on their own, as in our study. In a Taiwanese study, the rate of missing values ranged from 0.9% to 7.7%. 15 In general population studies from Poland and Iran, only the social relations domain failed to meet the required value for internal consistency. 4,16 This problem is related to the fact that the domain has only three items.
We could not attribute the poor IIC performance of the three negatively worded items to translation prob-lems because they met the IDV requirement and the problem was also evident in some centers in the WHO report. Furthermore, the items loaded very highly (>0. 5) in the factor analysis. The issue of poor performance in IIC and IDV has been noted in other ways by studies that used the Rasch and item response theory mod-els, 17,18 and is attributable to the multidimensional na-ture of the instrument. 18 Our most impressive findings were for construct and known-groups validity. Our first four factors are a fair representation of the original WHO data 6 and have been replicated by others. 19,20 In a Sudanese study, the following factors were replicated: social relations, envi-ronment, and the small factor containing the three nega-tively worded items. Furthermore, in confirmatory factor analysis, the four-domain model had the best fit indices. 9 Our results thus support the impression that the same perceptions of the WHOQOL-BREF are found across cultures and disease conditions, 19 a cardinal requirement for a cross-culturally useful questionnaire. Furthermore, the two general facet items loaded highly on Factor I. This indicates the importance of the general facet as a first deconstruct of the QOL construct, 21 and hence it is a fair representation of the idea of global QOL/ subjec--tive well-being. 22,23 In support of this point, the general facet was highly significantly correlated with the four domains. 6 In conclusion, the Arabic translation of the WHOQOL-BREF has high reliability and validity in-dices. It represents the same constructs across cultures. 19 Our validity data support the impression that the WHOQOL-BREF is a cross-culturally valid generic instrument that is composed of parts that address the main issues of the subjective QOL construct in medi-cine, namely HRQOL, contextual issues and subjective well-being. 8 A possible area needing further develop-ment is the phrasing of negatively worded items, perhaps by having them rated in a positive direction as the other items.