• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of jgimedspringer.comThis journalToc AlertsSubmit OnlineOpen Choice
J Gen Intern Med. Nov 2009; 24(11): 1211–1216.
Published online Sep 16, 2009. doi:  10.1007/s11606-009-1105-7
PMCID: PMC2771237

Developing Predictive Models of Health Literacy

Abstract

INTRODUCTION

Low health literacy (LHL) remains a formidable barrier to improving health care quality and outcomes. Given the lack of precision of single demographic characteristics to predict health literacy, and the administrative burden and inability of existing health literacy measures to estimate health literacy at a population level, LHL is largely unaddressed in public health and clinical practice. To help overcome these limitations, we developed two models to estimate health literacy.

METHODS

We analyzed data from the 2003 National Assessment of Adult Literacy (NAAL), using linear regression to predict mean health literacy scores and probit regression to predict the probability of an individual having ‘above basic’ proficiency. Predictors included gender, age, race/ethnicity, educational attainment, poverty status, marital status, language spoken in the home, metropolitan statistical area (MSA) and length of time in U.S.

RESULTS

All variables except MSA were statistically significant, with lower educational attainment being the strongest predictor. Our linear regression model and the probit model accounted for about 30% and 21% of the variance in health literacy scores, respectively, nearly twice as much as the variance accounted for by either education or poverty alone.

CONCLUSIONS

Multivariable models permit a more accurate estimation of health literacy than single predictors. Further, such models can be applied to readily available administrative or census data to produce estimates of average health literacy and identify communities that would benefit most from appropriate, targeted interventions in the clinical setting to address poor quality care and outcomes related to LHL.

KEY WORDS: health literacy, estimation, multivariable model, community

INTRODUCTION

Low health literacy (LHL) remains a formidable barrier to reducing gaps in health care quality and improving outcomes. Approximately one-third of the population (36%) is estimated to have basic or below basic health literacy,1 defined as the “degree to which individuals have the capacity to obtain, process, and understand basic health information and services needed to make appropriate health decisions”.2 Individuals with LHL find it difficult to understand directions for taking medicine, to calculate a dose of an over-the-counter medication for a child or comprehend a consent form.3,4 LHL may also contribute to suboptimal care and outcomes through lower participation in screening programs,5 reduced ability to act on and understand the advice of a health professional,6 and limited ability to access and navigate the health care system.7,8

Although health literacy and general literacy may be linked, researchers contend that the complexity of the health care system, the medical jargon used by many providers,9 and the exposure to novel health concepts (many times while under a great deal of stress), have the potential to negatively impact one’s health literacy skills, even among those with adequate literacy.10 Therefore, the prevalence of limited literacy is even higher when considered within a health context.11

Despite the availability of direct measures of health literacy including the Rapid Estimate of Adults’ Literacy (REALM), the Test of Functional Health Literacy in Adults (TOFHLA), and the Newest Vital Sign.1215, the role of LHL in health outcomes remains largely unaddressed in public health and clinical practice. While these measures and other screening questions can be used by providers to identify individual patients at higher risk for LHL, administering such measures is logistically complex, and the measures themselves are limited largely to assessing reading ability and medical vocabulary and do not provide much, in any, information on other skills integral to health literacy.16,17 Such measures were also designed for individual, rather than community-level assessments, and provide little information about the level of health literacy within one’s patient population or community overall. Thus, there is a need for a predictive model that can use currently available data to help medical and public health practitioners, researchers, and health centers identify whether LHL may be a significant problem in the community or population they serve. The development of such a model may help set the stage for the development of community level measures, thus advancing action and facilitating efforts to target health literacy interventions in the practice setting and in areas of greatest need in the community at large.

Absent a predictive model, some providers use level of educational attainment or income as a proxy for health literacy, a practice that may lead to under- or over-estimation of the roles of each. While studies have identified individual characteristics associated with health literacy such as lower educational attainment, older age, lower income, and minority race or ethnicity12,13,15,1822,23 it is not clear whether using a combination of predictive factors in the form of a multivariable model is significantly more accurate than relying on a single variable.

Only recently have studies attempted to examine how these social factors work together11,21,24,25 and few, if any, of these models have included other constructs hypothesized to predict health literacy such as marital status, rurality,26 language spoken in home,1 and length of residence in the U.S. While the existing multivariable models demonstrate the utility, feasibility, and validity of such predictive models of health literacy, each has limitations. In Canada, for example, a multivariable model predicting health literacy included constructs such as daily reading at home and at work in addition to demographic characteristics.11 The amount of daily reading at home was the strongest predictor of health literacy in this model, yet such measures are not readily available in administrative or census data, reducing the model’s utility to generate community-level estimates. Recent U.S. models of health literacy have also been developed using data from elderly and/or Medicare populations. However, these models may have limited applicability to the general population in that the association of health literacy with demographic characteristics may vary with the age of the population of interest. Some have argued, for example, that among the elderly, income is not a strong predictor of health literacy, as many are no longer employed.27

Unfortunately, there have not been attempts to examine combinations of known predictors of LHL in a nationally representative sample of U.S. adults. Developing such a predictive model has significant potential to advance efforts to address action on poor quality care and outcomes related to health literacy by providing practitioners and public health officials with information on the average health literacy of the community they serve. Individuals and organizations serving communities with lower average health literacy may then target and implement a range of additional supports and strategies to increase individuals’ access to and understanding of health information.

As a first step toward overcoming the limitations of existing multivariable models, and to provide clinicians and health care providers with a means to estimate the health literacy of the community they serve, we developed two related models of health literacy that can be applied to widely available census data. Both used an identical set of demographic factors to predict health literacy as measured by the National Assessment of Adult Literacy (NAAL), a large, nationally-representative survey of adults in the United States. The first model estimates a mean health literacy score; the second estimates the probability of having health literacy skills in the “above basic” range (i.e., intermediate or proficient).1 We also examine whether such models are better predictors of community health literacy as compared to commonly used proxies such as education or income.

METHODS

Data

We used data from the 2003 NAAL household sample, an in-person assessment of literacy among a nationally-representative sample of 18,541 community-dwelling US adults (16 years and older) conducted by the U.S. Department of Education28 (response rate 60.1 percent29). The goal of the NAAL was to measure literacy by assessing the extent to which individuals could understand and use written materials encountered in everyday activities (e.g., reading a bus schedule or newspaper editorial). The NAAL included a component specific to health literacy, assessing the ability to effectively use health-related materials (e.g., medication label, written directions from doctor, consent form) to accomplish specific tasks, and was the first large-scale assessment of health literacy in the United States. Twenty-eight of the 152 NAAL items assessed health literacy, with each individual responding to about a third of the questions as part of the randomized block design of the NAAL survey.30 Our predictive models utilized data from the health literacy component of the NAAL specifically. More information about the NAAL, its multi-stage sampling design, and scoring procedures can be found in the 2006 National Center for Education Statistics report.1 Our analytic sample was limited to those who were 18 years of age or older, and were not missing items used to construct their health literacy score (n = 17,466).

Study Variables

Health Literacy

The NAAL assessed health literacy on a 0 to 500 point scale (mean = 245, standard deviation = 55).1 The National Research Council also classified these continuous scores into four ordinal performance categories to reflect an individual’s ability to successfully complete tasks of a given difficulty: below basic (0–184), basic (185–225), intermediate (226–309), and proficient (310–500).1 About 14% of the population had below basic health literacy skills, indicating the ability only to perform tasks such as circling the date on an appointment slip. About 22% of individuals had basic health literacy skills, indicating, for example, the ability to give two reasons why a person should be tested for a specific disease, using information in a clearly written pamphlet. 53% of the population had intermediate health literacy skills; such individuals can perform moderately challenging health literacy activities, including determining at what time a person can take a prescription medication, using information on the prescription drug label. Only about 12% of the population had proficient health literacy, indicative of the skills necessary to perform more complex and challenging literacy activities, such as calculating one’s personal share of employer health costs using a table.1

Socio-demographic Predictors

Variables included in the model had to meet two criteria: they had to be available in the NAAL and census data, given our focus on developing a model to estimate and map community level health literacy and they had to be established or strongly hypothesized to be associated with health literacy. We included as predictors gender, age, race/ethnicity, education, poverty status, marital status, residence in a metropolitan statistical area (MSA), language other than English spoken in home, and years residing in the United States (parameterized categorically as shown in Table 1). We used six categories of educational attainment; however, given that younger individuals (18–24) may not have completed their education at the time of the NAAL, we present estimates of the association between educational attainment and health literacy separately for individuals 18–24 years of age and 25 or older. Income was collected by the NAAL in 2003 dollars. We used this information to generate categories representing income relative to the federal poverty limit (FPL), as defined by the U.S. Census Bureau.31

Table 1
Sample Demographics (N = 17,446)

Analysis

We developed two predictive models of health literacy. The first is a linear regression model predicting the mean (or average) health literacy score using the continuous form of that outcome. The second is a probit model predicting the proportion of the population scoring above the basic level of health literacy (the complement of “basic or below basic” health literacy), coded so that positive coefficients correspond to better health literacy in both models. To assess the extent to which these multivariate models add predictive strength over a single predictor, we compared the model coefficients and r-squared of the multivariate linear model to bivariate models using only educational attainment or income as a predictor.

Given that, by design, each individual responded to only about one-third of the NAAL items, we employed the standard imputation and analysis methods that correspond to such a design.32 For each individual, five estimated health literacy scores were generated based on their responses to the NAAL items; these estimates capture uncertainty based on different individuals being asked different items. The five estimates are integrated into a single set of means, regression coefficients, and p-values using a standard process of averaging the results of five parallel regression models. All analyses included probability weights and accounted for clustering. Analyses were conducted using SAS 9.1.

RESULTS

Table 1 presents the weighted percentages of sample characteristics as well as the mean health literacy score (averaging the five parallel values) and percent scoring ‘above basic’. Forty-eight percent of the study population was male, 71% was White, 11% Black, and 12% Hispanic. One-seventh (14%) had less than a high school diploma and 27% reported an income below 200% of the federal poverty limit. The majority resided in a MSA, had never been married, were born in the U.S., and spoke English at home. Only 6% had lived in the U.S. less than 10 years. As expected, older individuals, minorities, those with less education, lower incomes, those who were divorced, widowed or separated, and those who had been living in the U.S. for fewer years had lower mean health literacy.

Predictive Models of Health Literacy

Table 2 presents the results of the multivariate models predicting health literacy as a continuous and dichotomous (‘above basic’) outcome. All variables, with the exception of living in an MSA and language spoken in home made significant independent contributions to the models. Linear regression results are presented as unstandardized regression coefficients; probit results are presented as marginal effects, which can be interpreted as the change in probability of having ‘above basic’ health literacy with a one unit change in the predictor. The adjusted R2 for the linear regression model and the probit model33 indicate that these demographic predictors accounted for 30% and 21% of the variance in health literacy scores respectively. Educational attainment was strongly positively associated with health literacy, with the 40.7 point mean difference between the lowest and highest categories among those 25 and older corresponding to 0.74 standard deviations, a “large” effect.34 Blacks and Hispanics had health literacy scores that averaged 0.6 standard deviations lower than non-Hispanic whites; mean health literacy for those aged 75 and older was 0.7 standard deviations lower than those aged 18–24. To a lesser extent, health literacy was lower for those with lower incomes and recent immigrants.

Table 2
Linear (column 2) and Probit (column 3) Multivariate Models Predicting Health Literacy (N = 17,466)

Comparison of Multivariate to Bivariate Models

The linear and probit multivariate models explained approximately twice the variance as a model using educational attainment alone (30% vs. 15.5% for linear model, 21% vs. 10% for the probit; results not shown). Similarly, a model using only income as a predictor explained only 11% of the variance for the linear model and 8% for the probit. Applying Cohen’s criteria for multivariate shared variance, the effect sizes for the individual variables can be classified as small to medium, while the effect size for the multivariable model can be classified as medium (for probit) to large (for linear model).35

DISCUSSION

Using a nationally representative sample, we developed two predictive models of health literacy: one estimating mean health literacy, and one estimating the probability of having health literacy skills in the ‘above basic’ (intermediate or proficient) range. Lower educational attainment, racial/ethnic minority status, older age, lower income, and recent immigration to the U.S. were associated with lower estimated health literacy. Individuals who were not married also had a lower health literacy, on average, although the association was much weaker (p < 0.05).

While the results of these models are consistent with previous work in this area, several findings merit further comment. First, despite controlling for a host of related characteristics, race and ethnicity were strongly associated with health literacy. The strength of this association was somewhat surprising, although it may be explained in part by unmeasured factors such as quality of education, which are correlated with both race/ethnicity and health literacy. Schools serving a high proportion of minority students, for example, are less likely to offer advanced placement courses and to have effective teachers in terms of years of experience and number of teachers with certifications in their primary teaching field.36 Given that racial/ethnic minorities tend to cluster in both inner-cities and rural areas where the quality of education may be lower, this may help to explain the observed racial/ethnic differences in health literacy.

Somewhat surprising was the lack of association between language spoken in the home and health literacy. Results from our models suggest that recent immigration to the U.S., rather than language spoken at home per se, is a stronger predictor of health literacy. Note, however, that our models were based on NAAL data, which assesses health literacy in the English language. Therefore, one’s health literacy skills may be higher in their native language than estimated by our models.

Another unexpected finding was that no difference in health literacy was found between those living in rural and urban locations. Results, however, may be limited by the only available measure of rurality in the NAAL: a dichotomous measure of MSA. It is more likely that health literacy follows an inverse U-shaped curve, where health literacy is lower, on average, among individuals residing in rural or urban areas, with individuals in suburban areas having higher health literacy, on average. Finally, the multivariate model was a stronger predictor of health literacy and explained substantially more of the variance than commonly used health literacy proxies such educational attainment or income.

Several limitations to these models are worth noting. First, the NAAL assessed health literacy using only printed materials. As a result, our models focus on the ability to read materials to accomplish health related tasks. They do not predict oral language (speaking) or aural language (listening) skills, which have been cited as important components of health literacy.17,16 Consequently, predictions of health literacy based on our models will not fully capture a broader conceptualization of health literacy. Second, although characteristics in our model were selected based on existing research findings and theoretical justification, there are likely other unmeasured characteristics that contribute to health literacy that were not included in the model, such as quality of education and state of residence. To assess the potential for regional variation in the models we did conduct stratified analyses for the four US Census regions (north, south, east, west; results not shown). Models predicting the mean health literacy score for each region explained between 27% and 31% of the variance in health literacy scores. While there were minor regional differences in the models, none of these were statistically significant (F test = 0.48, p-value = 0.99 for linear models; F test = 0.28, p-value = 1.0 for probit models).

These predictive models of health literacy expand our understanding of factors that contribute to low health literacy in the general population, and allow us to estimate the average health literacy of communities. In so doing, individuals and organizations serving communities with lower average health literacy may then target and implement a range of additional supports and strategies to increase individuals’ access to and understanding of health information. This includes, for example, offering in-depth patient counseling with nursing staff or health educators, where important information related to diagnosis, treatment and follow-up can be discussed using plain language and in a less intimidating environment. A significant advantage of such models is that, when applied to census data and well-defined geographic areas such as census tracts, the average health literacy of a region can be mapped, providing visual insight into local areas or “hot spots” of lower average health literacy within the community, further helping promote effectively and appropriately targeted action to reduce disparities, poor quality care and poor outcomes related to limited health literacy.

Acknowledgements

This research was supported by the Missouri Foundation for Health.

Conflict of Interest None disclosed.

References

1. National Center for Education Statistics. The Health Literacy of America’s Adults: Results from the 2003 National Assessment of Adult Literacy. Washington, DC: U.S. Department of Education; 2006.
2. Ratzan S, Parker R. Introduction. In: Selden C, Zorn M, Ratzan S, Parker R, eds. National Library of Medicine Current Bibliographies in Medicine: Health Literacy. Bethesda, MD: National Institutes of Health, U.S. Department of Health and Human Services, 2000.
3. Kirsch I. The International Adult Literacy Survey (IALS): Understanding what was measured. Princeton, NJ: Statistics and Research Division of the Educational Testing Service; 2001:61.
4. Kirsch I, Jungeblut A, Jenkinds L, Kolstad A. Adult literacy in America: A First Look at the Results of the National Adult Literacy Survey. Washington, DC: National Center for Education Statistics, US Department of Education; 1993.
5. Dolan N, Ferreira M, Davis TC, Fitzgibbon M, Rademaker A, Liu D, et al. Colorectal cancer screening knowledge, attitudes, and beliefs among veterans: Does Literacy make a difference? J Clin Oncol. 2004;22(13):2617–22. [PubMed]
6. Chew L, Bradley K, Flum D, Cornia P, Koepsell TD. The impact of low health literacy on surgical practice. Am J Surg. 2004;188(3):250–3. [PubMed]
7. Davis TC, Wolf MS, Bass PF, Middlebrooks M, Kennen E, Baker DW, et al. Low literacy impairs comprehension of prescription drug warning labels. J Gen Intern Med. 2006;21(8):847–51. [PMC free article] [PubMed]
8. Kripalani S, Henderson L, Chiu E, Robertson R, Kolm P, Jacobson T. Predictors of medication self-management skill in a low-literacy population. J Gen Intern Med. 2006;21(8):852–6. [PMC free article] [PubMed]
9. Castro CM, Wilson C, Wang F, Schillinger D. Babel Babble: Physicians’ use of unclarified medical jargon with patients. Am J Health Behav. 2007;31(Suppl 1):585–95. [PubMed]
10. Weiss BD. Epidemiology of Low Health Literacy. In: Schwartzberg J, VanGeest J, Wang C, eds. Understanding Health Literacy: Implications for Medicine and Public Health. USA: American Medical Association; 2005:17–40.
11. Health Literacy in Canada: A Healthy Understanding 2008. Ottawa: Canadian Council on Learning, 2008:38.
12. Parker R, Baker D, Williams M, Nurss J. The Test of Functional Health Literacy in Adults: A new instrument for measuring patients’ literacy skills. J Gen Inter Med. 1995;10:537–41. [PubMed]
13. Davis TC, Crouch M, Long S, Jackson R, Bates P, George R, et al. Rapid assessment of literacy levels of adult primary care patients. Fam Med. 1991;23:433–5. [PubMed]
14. Osborn CY, Weiss BD, Davis TC, Skripkauskas S, Rodrigue C, Bass PF, et al. Measuring adult literacy in health care: performance of the newest vital sign. Am J Health Behav. 2007;31(1):S36–46. [PubMed]
15. Davis TC, Long S, Jackson R, Mayeaux E, George R, Murphy P, et al. Rapid estimate of adult literacy in medicine; a shortened screening instrument. Fam Med. 1993;25(6):391–5. [PubMed]
16. Berkman N, DeWalt DA, Pignone MP, Sheridan S, Lohr K, Lux L, et al. Literacy and Health Outcomes. Evidence Report/Technology Assessment No. 87. Rockville, MD: Agency for Healthcare Research and Quality; 2004.
17. Institute of Medicine CoHL. Health Literacy: A Prescription to End Confusion. Washington, DC: National Academies Press; 2004.
18. Osborn CY, Paasche-Orlow MK, Davis TC, Wolf MS. Health literacy: an overlooked factor in understanding HIV health disparities. Am J Prev Med. 2007;33(5):374–8. [PubMed]
19. DeWalt DA, Pignone MP. Health literacy and health outcomes: Overview of the Literature. In: Schwartzberg J, VanGeest J, Wang C, eds. Understanding Health Literacy. USA: American Medical Association; 2005.
20. Gazmararian JA, Baker DW, Williams MV, Parker R, Scott T, Green D, et al. Health literacy among Medicare enrollees in a managed care organization. JAMA. 1999;281(6):545–51. [PubMed]
21. Hanchate A, Ash A, Gazmararian J, Wolf M, Paasche-Orlow M. The Demographic Assessment for Health Literacy (DAHL): A New Tool for Estimating Associations between Health Literacy and Outcomes in National Surveys. 2008;J Gen Internal Medicine(23):6. [PMC free article] [PubMed]
22. Paasche-Orlow MK, Parker R, Gazmararian JA, Nielsen-Bohlman L, Rudd R. The prevalence of limited health literacy. J Gen Intern Med. 2005;20(2):175–84. [PMC free article] [PubMed]
23. Artinian N, Lange M, Templin T, Stallwood L, Hermann C. Functional Health Literacy in An Urban Primary Care Clinic. Internet J Adv Nurs Pract. 2003;5(2).
24. Miller M, Degenholtz H, Gazmararian J, Lin C, Ricii E, Sereika S. Identifying elderly at greatest risk of inadequate health literacy: a predictive model for popuplation-health decision makers. Res Social Adm Pharm. 2007;3(1):70–85. [PubMed]
25. Paasche-Orlow MK, Hanchate A. Validation of a Derived Indicator of Health Literacy. Society of General Internal Medicine. Toronto, Ontario: Journal of General Internal Medicine, 2007.
26. A survey of rural women’s health literacy and sources of health information. American Public Health Association 134th Annual Meeting; 2006; Boston, MA.
27. Morales L, Rogowski J, Freedman V, Wickstrom S, Adams J, Escarce J. Sociodemographic differences in use of preventive services by women enrolled in Medicare+Choice plans. Preventive Medicine. 2004;39:738–45. [PubMed]
28. National Center for Education Statistics. National Assessment of Adult Literacy (NAAL), 2003.
29. Kutner M, Greenberg E, Jin Y, Paulsen C. The Health Literacy of America’s Adults: Results from the 2003 National Assessment of Adult Literacy. Washington, DC: National Center for Education Statistics; 2006.
30. White S, Dillow S. Key Concepts and Features of the 2003 National Assessment of Adult Literacy. Washington, DC: U.S. Department of eudcation, Institute of Education Sciences; 2005.
31. U.S. Census Bureau HaHESD. How the Census Bureau Measures Poverty, 2008.
32. Rubin D. Multiple imputation for nonresponse in surveys. New York: John Wiley & Sons; 1987.
33. Allen N, Carlson J, Zelenak C. The NEAP 1996 Technical Report. Washington: U.S. Department of Education, Office of Educational Research and Improvement; 1999.
34. Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Hillsdale, NJ: Erlbaum; 1988.
35. Cohen J. A Power Primer. Psychol Bull. 1992;112(1):155–9. [PubMed]
36. Rueben K, Murray S. Racial Disparities in Education Finance: Going Beyond Equal Revenues. Tax Policy Center Conference: "Race, Ethnicity, Poverty and the Tax-Transfer System on Racial Inequality and the Tax System". Washington, DC: Urban Institute; 2008.

Articles from Journal of General Internal Medicine are provided here courtesy of Society of General Internal Medicine
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • Cited in Books
    Cited in Books
    PubMed Central articles cited in books
  • MedGen
    MedGen
    Related information in MedGen
  • PubMed
    PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...