• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of hsresearchLink to Publisher's site
Health Serv Res. Feb 2005; 40(1): 227–252.
PMCID: PMC1361135

Testing for Statistical Discrimination in Health Care

Abstract

Objective

To examine the extent to which doctors' rational reactions to clinical uncertainty (“statistical discrimination”) can explain racial differences in the diagnosis of depression, hypertension, and diabetes.

Data Sources

Main data are from the Medical Outcomes Study (MOS), a 1986 study conducted by RAND Corporation in three U.S. cities. The study compares the processes and outcomes of care for patients in different health care systems. Complementary data from National Health And Examination Survey III (NHANES III) and National Comorbidity Survey (NCS) are also used.

Study design

Across three systems of care (staff health maintenance organizations, multispecialty groups, and solo practices), the MOS selected 523 health care clinicians. A representative cross-section (21,480) of patients was then chosen from a pool of adults who visited any of these providers during a 9-day period.

Data Collection

We analyzed a subsample of the MOS data consisting of patients of white family physicians or internists (11,664 patients). We obtain variables reflecting patients' health conditions and severity, demographics, socioeconomic status, and insurance from the patients' screener interview (administered by MOS staff prior to the patient's encounter with the clinician). We used the reports made by the clinician after the visit to construct indicators of doctors' diagnoses. We obtained prevalence rates from NHANES III and NCS.

Findings

We find evidence consistent with statistical discrimination for diagnoses of hypertension, diabetes, and depression. In particular, we find that if clinicians act like Bayesians, plausible priors held by the physician about the prevalence of the disease across racial groups could account for racial differences in the diagnosis of hypertension and diabetes. In the case of depression, we find evidence that race affects decisions through differences in communication patterns between doctors and white and minority patients.

Conclusions

To contend effectively with inequities in health care, it is necessary to understand the mechanisms behind the problem. Discrimination stemming from prejudice is of a very different character than discrimination stemming from the application of rules of conditional probability as a response to clinical uncertainty. While in the former case, doctors are not acting in the best interests of their patients, in the latter, they are doing the best they can, given the information available. If miscommunication is the culprit, then efforts should be aimed at reducing disparities in the ways in which doctors communicate with patients.

Keywords: Health-care disparities, race/ethnicity, statistical discrimination, clinical decision making, clinical uncertainty

In the U.S. context, a “disparity” refers to the unfair treatment of patients on the basis of race or ethnicity. In its recent report, “Unequal Treatment,” the Institute of Medicine (2002) defines a racial disparity as a difference in treatment provided to members of different racial (or ethnic) groups not justified by the underlying health conditions or preferences about treatment of the patient. Disparities by this definition have many sources. They can stem from an array of social factors, such as the patients' socioeconomic status, insurance, or geography, with the key one probably being that minorities1 are much more likely than whites to be uninsured or to be in plans with restrictive payment policies (Monheit and Vistnes 2000; Phillips, Mayer, and Aday 2000). Social factors do not, however, fully account for all of the unjustified differences between whites and minorities. Disparities also emerge from the face-to-face decisions of doctors when insurance and other social factors can be ruled out: such differences have been referred to as discrimination.

An increasing body of literature has documented and condemned disparities originating at the clinical encounter (Bach et al. 1999; Schulman et al. 1999; Mayberry, Mili, and Ofili 2000; Geiger 2001). Also, conceptual research has identified potential sources of health disparities within the clinical encounter (Einbinder and Schulman 2000; Balsa and McGuire 2001; Balsa and McGuire 2003; Bloche 2001; van Ryn 2001). However, we are aware of no paper trying to measure empirically the magnitude and importance of these different mechanisms. In this paper, we intend to shed light on some of the processes behind the observed discriminatory patterns in the provision of health care to patients of different racial and ethnic groups.

The Institute of Medicine (2002) identifies three mechanisms that may explain discrimination at the medical encounter2: simple prejudice against members of a minority group, stereotypes that a doctor holds about the health-related behavior of minorities (such as, “blacks do not comply with treatment recommendations”), or the rational application of probabilistic decision rules when uncertainty surrounds the doctor's estimate of a patient's health status. In the same way the term is applied in labor economics and related fields, we refer to this latter form of discrimination as “statistical.” The basic idea of statistical discrimination in the health context is that uncertainty about the patient's severity of illness can induce the doctor to behave differently with otherwise identical members of different race/ethnic groups. If the underlying prevalence of the illness is associated with race, the doctor might take race into account in deciding about the diagnosis and treatment of a particular patient. Or, race/ethnicity might be associated with poor doctor–patient communication (Balsa and McGuire 2001), interfering with the doctor's ability to discern and respond appropriately to the patient's health status (Cooper and Roter 2001). Our objective in this paper is to test whether statistical discrimination can explain a race effect in clinical decisions and the extent to which this mechanism can account for the observed racial differentials in health care data. Our tests of statistical discrimination revolve around whether a race effect can be interpreted as working through the route of information in its influence on the physician's decision to diagnose a certain condition.

Methodologically, our paper integrates the normative literature on clinical decisionmaking (Weinstein et al. 1980) with economic literature on statistical discrimination. Decision-theoretic approaches to diagnosis are common in the medical literature (Mushlin et al. 1997 or Fendrick et al. 1995), although the connection with discrimination and disparities appears not to have been made previously. In economics, attempts to distinguish statistical discrimination from prejudice (or “taste discrimination”) have been made in the area of wage differentials (Altonji and Pierret 2001) and differences in vehicle detention rates (Knowles, Persico, and Todd 2001). Our paper is in the same spirit as these investigations.

Disparities and discrimination are complex, with multiple sources and operating through multiple mechanisms. In this paper, we study the diagnostic phase of a clinical encounter, chosen by us to be the domain of decision making where information-related forces on the clinician are most likely to play out. Most of the literature on disparities is concerned, by contrast, with decisions about treatment. The role of statistical discrimination in relation to other possible causes of discrimination may well differ in other parts of the decision-making sequence.

STATISTICAL DISCRIMINATION AS A MECHANISM BEHIND DISPARITIES

Uncertainty, Bayes' Rule, and Differential Predictions by Race

The basic idea behind statistical discrimination is that doctors, unencumbered by prejudice or stereotypic beliefs, and in the presence of uncertainty about patients' underlying condition, may use race in making a diagnosis of a patient. As doctors are instructed, “The starting point of any diagnostic process is the patient presenting with a constellation of symptoms and signs.” (User's Guide to the Medical Literature [UG] website, American Medical Association). The immediate symptoms must be put in context of the clinical information that the doctor brings to the encounter. A doctor hears a symptom report from the patient and weighs this along with her prior belief about the likelihood that the patient has the condition given what else the doctor knows. This process is formalized by Bayes' rule, whereby the doctor's posterior assessment of the probability that the patient has the disease, given an observed symptom, equals:

Pr(disease|symptom)=Pr(symptom|disease)×Pr(disease)Pr(symptom)
(1)

After observing a symptom, a doctor can be regarded as revising a prior estimate and coming to a decision about diagnosis and treatment, even if she does not engage in an explicit conditional probabilistic analysis. The doctor may have a threshold probability above which he/she decides that diagnosis is not warranted.

Within this model, race can matter in two ways. One form of statistical discrimination results when physicians use race as an indicator of the underlying likelihood that a patient has a condition. For instance, the doctor may believe that the prevalence of the disease (Pr[disease] in (1)) differs by racial/ethnic group. A second form arises when a physician reads diagnostic signals with more noise from members of minority groups (different Pr[symptom|disease] for patients of different ethnic/racial groups in (1)). If doctors fail to understand certain patients, because of differing language, culture or communication patterns, they may be more likely to fail to detect symptoms when they are present or vice versa. We refer to these forms of statistical discrimination as a prevalence hypothesis and a miscommunication hypothesis about the role of race.3 The rest of this section illustrates the concept of statistical discrimination with an example.

An Example of a Symptom–Disease Relationship That Differs by Racial Groups

Table 1 shows the relationship between a symptom report and the presence of major depression for African-American and white respondents according to data from National Health and Examination Survey III (NHANES III), a national survey designed by the National Center for Health Statistics to estimate the prevalence of selected diseases and risk factors.4 We use this information to illustrate how clinical uncertainty may lead to clinical discrimination. The symptom question is “Have you ever had two weeks or more during which you felt sad, blue, depressed, or when you lost all interest and pleasure in things that you usually cared about or enjoyed?”5 The 2 × 2 matrix for each group shows the percent of respondents who fell into the four possible combinations of reporting the symptom yes/no and having a DSMIII diagnosis of depression or not. The matrices are obviously different, and some of these differences are summarized in the lower part of Table 1. The prevalence of major depression is greater for whites, 10.4 percent compared with 6.2 percent for African Americans. The properties of the symptom-test differ for both groups as well. The test is more sensitive for African Americans, meaning the probability of reporting the symptom given the disease is present is greater for African Americans (6.0/6.2=96.6 percent versus 9.7/10.4=93.2 percent for whites), whereas the test is more specific for whites, meaning the probability of not reporting the symptom given the disease is absent is higher for whites (59.8/89.6=66.8 percent versus 57.6/93.8=61.4 percent).

Table 1
Symptom–Disease Relationship in Depression (NHANES III)

From the standpoint of the doctor, when hearing a patient endorse the symptom, “yes there has been a two-week period… .,” the positive predictive value (given by equation [1] when symptom is positive) is the key number. When the doctor hears the report from an African American, the right inference is to say that there is a 14.1 percent chance (5.96/42.15) that this patient has major depression, whereas for a white patient who says the same thing, the right inference is that there is a 24.6 percent chance (9.73/39.47), nearly double the African-American number.

Decisions about collecting more information or about treatment depend on the likelihood that a doctor thinks the patient has a condition. The doctor could well recommend treatment for a patient for whom the chances of having the condition are one in four, but might not make the recommendation for a patient with a 14 percent chance, roughly the magnitude of the racial differences here on the basis of the answer to one symptom question. This would be statistical discrimination. Assuming all else about these patients is the same, and even with exactly the same test result, the doctor recommends treatment for the white and no treatment for the African-American patient.

What explains the better predictive power of these symptoms for whites than for African Americans? There are two factors at work in the example in Table 1: the higher prevalence rates for whites, and the more reliable signal contained in the symptom. Whites start off being more likely than African Americans to be depressed in a ratio of 10.4/6.2 even prior to the doctor hearing the symptom. Both percentages go up after a positive report, to 24.6/14.1. Whites go up a bit more because more information is conveyed for this group.6 We can illustrate these two effects by simple simulations. If African Americans are given the same sensitivity and specificity as whites, but keeping their underlying prevalence lower, their positive predictive value would go to 15.6 percent, erasing about one-tenth of the difference between them and whites. If they were to keep their own sensitivity/specificity but have the higher prevalence of whites, the positive predictive value would move to 22.6 percent, erasing about 80 percent of the difference. Both factors contribute substantially to the worse predictive value of the test for African Americans.

METHODS

Our tests of statistical discrimination are based on the comparison of a traditional disparities regression, a regression in which race effects are only accounted for through the coefficient of a race dummy, with a statistical discrimination regression, in which new variables that account for information-related processes are introduced. A traditional disparities regression would consist, for instance, of a probit or logistic regression explaining the likelihood that a doctor diagnoses certain condition, given indicators of the patient's disease (Si), patient's minority status (Ri), patient's characteristics (Xi), and doctor's characteristics (Yj). Formally, the coefficient of interest would be that of Ri in the following expression:

Pr(diagnosis)=Φ(Si,Ri,Xi,Yj)
(2)

While the usual claim is that a negative and significant sign of the race (minority) coefficient is evidence for discrimination, the traditional disparities regression offers no information for why diagnosis is lower for some racial groups. A negative sign on the race variable admits many possible interpretations. Doctors might be prejudiced against minority groups, might hold stereotypes about minority patients, or might have different information about members of different racial groups.

A statistical discrimination regression accounts, in addition, for the doctor's process of inference when information about the patient's health status is imperfect. Assume that the doctor's posterior assessment of the probability that the patient has the disease (the equivalent to equation (1)), is given by expression pi, Si), where θi is the prevalence of the disease among individuals similar to patient i, and Si is a symptom experienced by patient i (which could be observed with more or less accuracy by the doctor).7 The statistical discrimination expression would be of the form:

Pr(diagnosis)=Φ(p(θi,Si),Ri,Xi,Yj)
(3)

Race effects in the diagnosis decision could work through the race dummy Ri, but also through different prevalence rates (θi's) across races (the prevalence hypothesis) or through different weights given by the provider to the patient's signal, Si, when doctor–patient communication patterns differ systematically across patients of diverse racial/ethnic groups (the miscommunication hypothesis).

To make (3) operational, we construct and include as a regressor a prior that is common for all doctors across race, age, and gender categories. While the assumption that physicians share a single prior distribution on the prevalence of each disease is strong, it helps us to test a relevant empirical question: the degree to which “rational” priors could explain a clinician's decisionmaking process. It is true that factors like the physician's experience (such as age or volume of practice) and geographic location are likely to influence physicians' beliefs. Our regressions may control for some of the systematic variation in the priors by considering physician characteristics or physician fixed effects. In addition to the prior, we include as regressors interactions of race and signal and interactions of prior and signal. The basic estimating equation is of the form8:

Pr(diagnosis)=Φ(λ1Si+λ2Siθ^i+λ3θ^i+λ4Ri+λ5RiSi+Xiβ+Yjη)
(4)

with λ1λ5, β, η parameters.

To test for the prevalence effect of statistical discrimination, we evaluate whether the prior captures some of the effects of race, age, and gender in our equations for diagnosis of the two illnesses. We test the model in (4) versus the traditional disparities regression in (2)—a competing model that assumes no significant effect of the prior. The restrictions we test are whether λ2 and λ3 are jointly equal to zero. If we find (4) to be a superior model, we then test the extent to which the prior terms account for the effects of race, gender, and sex as found in a traditional disparities regression. A finding that the terms in the prior are significant and positive and that the main effects of race, gender, and sex diminish once we account for the prior is consistent with the hypothesis of statistical discrimination in its prevalence form. We also use a Bayesian Information Criterion (BIC) to compare the fit of the data with the alternative models.9

The miscommunication hypothesis implies that the way a doctor perceives a signal of disease differs across racial/ethnic groups. Poor communication between doctor and patient is like noise between the patient's report and the signal actually observed by the doctor. If a race effect is because of miscommunication between the white doctor and minority patients, then the doctor's diagnostic decisions should be less related to the patient's signal in the case of nonwhite patients. To account for this, we interact the patient's signal of the disease Si with a race (minority) dummy, Ri. We expect miscommunication to be reflected in a lower weight on the signal for minorities (λ5<0).

Data

Data Sources: The Medical Outcomes Study (MOS), NHANES III, and National Comorbidity Survey (NCS)

The main data we use are from the screener sample (cross-section) of patients in the MOS. The MOS was an observational study of adult patients who received medical care under three different systems of health care (staff model health maintenance organizations [HMOs], large multispecialty groups, and solo practices) at three U.S. sites (Los Angeles, Boston, and Chicago). The MOS was designed to compare the processes and outcomes of care within these systems for patients with hypertension, diabetes, depression, and heart disease. Within each system of care, a representative sample of physicians (general internists, family physicians, cardiologists, endocrinologists, psychiatrists), psychologists, and other mental health providers was selected (523 in total). A representative cross-section (21,481) of English-speaking patients was chosen from a pool of adults (ages 18 and older) who visited one of the MOS participating clinicians during a 9-day period in the autumn of 1986.10

The MOS has several advantages for us in comparison with other health care data sets. Most importantly, patients completed screening questionnaires on their clinical characteristics on the same day that they were seen and evaluated by a physician. Having independent measures of the patient's health status and morbidities allows us to control for these factors in a study of what doctors report. Furthermore, the MOS contains information about diseases for which the physician has different tools for assessment and for which the importance of clinical uncertainty is likely to differ.

In this paper, we use only a subsample of the MOS data consisting of patients of white family physicians or internists. We consider in our analysis two groups of patients: whites and minorities (including blacks, Hispanics, Asians, Native American, and other minorities). Although black patients were adequately represented in the MOS, the number of patients in other minority groups was too small to study separately. Also, there were too few black physicians or physicians from other racial/ethnic groups to allow us to study other racial/ethnic combinations reliably.11 Our analysis compares three of the conditions surveyed in the MOS: depression, hypertension, and diabetes. We do not consider an analysis of heart disease because few patients had this condition. The final sample we use consists of 11,664 patients (9,441 white and 2,223 minority patients). The screener questions about hypertension were distributed only to around half of the sample, explaining the smaller numbers of observations used in the analysis of this condition.

We used complementary data from the NHANES III and from the NCS to compute external prevalence rates for hypertension, diabetes, and depression, respectively. The NHANES III is a national survey designed by the National Center for Health Statistics to estimate the national prevalence of selected diseases and risk factors. The survey was implemented in two phases (1988–1991 and 1991–1994) in households selected from 81 counties (approximately 33,000 individuals) and consisted of both an interview and a medical examination. We use the first phase of the NHANES III to perform our computations because it is most proximate to the date of the MOS data. The NCS, fielded from the Fall of 1990 to the Spring of 1992, constitutes the first nationally representative mental health survey in the U.S. that uses a fully structured research diagnostic interview to assess the prevalence and correlates of psychiatric disorders. It is based on a stratified, multistage area probability sample of persons aged 15–54 years in the noninstitutionalized civilian population in the 48 coterminous states. A total of 8,098 respondents participated in the baseline survey.

Variables

We describe below the variables used in the analysis. Most of the variables are constructed on the basis of the MOS data, so we make no explicit reference to the data source when that is the case. We explicitly indicate when NHANES III or NCS are the sources of variables.

Signals of Severity

We assess whether a patient emits a positive signal for a certain condition by analyzing patients' responses to a set of questions asked in the screener form distributed to them before the visit. We assume that a patient signals positive for hypertension when the patient answers “yes” to the question “Have you ever been told you have hypertension?” For depression, we use a screener based on the patient's report about his/her mental health history and recent symptoms (Wells et al. 1996).12 Patients with a screener value greater than a certain threshold are considered to have a high probability of current major depression. We consider a patient to be diabetic when he/she acknowledges having been told that he/she had diabetes or if he/she reports taking insulin injections. For each condition, we construct a variable we denote as the “signal,” which takes the value of 1 if the patient has rated positive and zero otherwise. Notice that we treat the patient's response to screener questions as a signal. Whether the doctor ends up receiving this information or not will depend on both the patient's and the doctor's willingness and ability to communicate. Also, we do not assume that the patient's response represents “truth.”

Doctors' Diagnoses

We assume that there is a positive identification of hypertension or diabetes if the doctor either indicates the condition as the main reason for the visit or answers positively to the question: “Does this patient have hypertension/diabetes?” We assume that the doctor diagnoses depression whenever she either indicates depression as the main problem addressed in the visit, counsels the patient for depression, refers the patient to a mental health specialist, or acknowledges that the patient was depressed for 2 weeks or more in the past year. This is based on a definition of recognition of depression used for the MOS in Borowsky et al. (2000).

Patients' and Clinicians' Information

We control in our analysis for patients' age, gender, race, marital status, employment status, education (highest grade completed), income, location, and patient's insurance status (whether insured or not and whether the plan is an HMO or fee for service). In addition to the disease-specific information, we also control for some other clinical information reported by the patient before the visit: the patient's self-report of health status, the patient's body mass index, and whether the patient is a smoker. With respect to clinicians' characteristics, we adjust for age, practice type (staff, solo, or group), and specialty (family practice or internist). We also take advantage of the multiple observations available for each doctor and check for robustness by using doctor fixed effects.

Doctors' Priors

The “prior” in equation (3) is θi, the doctor's assessment of the probability that patient i has the illness before the doctor reads any patient signal. We assume that all physicians share the same “prior” estimate of a patient in an age–gender–race cell. Sixteen different priors are computed (four age categories, two gender categories, and two race categories). We first construct the priors using epidemiological (external) information. For hypertension, we use both the Blood Pressure Section of the Physician's exam and the Blood Pressure section of the Interview Questionnaire in NHANES III, phase 1 (1988–1991) and define a person as hypertensive when the systolic blood pressure average (average of three readings) is above 140 and/or the diastolic average is above 90, and/or the person is taking medication for hypertension. After excluding pregnant women and those below 18 years of age, we compute the prevalence rates as weighted averages of the hypertension indicator by race, age, and sex categories (the weights are nationally representative). In the case of diabetes, we construct the priors on the basis of three questions of the Interview Questionnaire in NHANES III, phase 1 (1988–1991): “Were you ever told you had diabetes?”, “Are you now taking diabetes pills?”, “Are you now taking insulin?” For depression, the rates are defined as the number of people with major depression disorder within the total sample. The diagnostic interview used to generate the diagnosis of major depression in the NCS was a modified version of the Composite International Diagnostic Interview. Since the NCS only surveyed people aged below 59 years, when using these data we exclude from the analysis patients aged 60 years or more.

An alternative way to construct priors is by using the information reported by the patient in the screener questionnaire of the MOS.13 Within this second approach, we define priors as averages, for each condition, of the patients' signals within cells defined by age, gender, and race categories. Formally,θik^=isiknk, where Sik is the signal of individual i in cell k, nk is the number of patients in cell k, and cells are defined by age, race, and sex categories. In computing these priors, we use not only observations from patients visiting white general physicians (GPs) or internists (the core of our analysis) but also observations from patients visiting minority GPs or internists (13,000 observations for depression and 6,764 for hypertension). The average number of observations per cell exceeds 400 for hypertension, and the smallest cell contains 93 observations, allaying concerns about small number problems.

There are advantages and disadvantages of using either priors from epidemiological data or priors based on the MOS screener. On the one hand, epidemiological data are completely independent of doctor's diagnoses and of patient's perceptions. Unlike the MOS priors, they are free of potential biases generated by doctors' attitudes or patients' knowledge of their diseases. Moreover, epidemiological data are likely to be at least as strong an influence on the prior than any information obtained from the patient during the screening interview. On the other hand, the external prevalence rates we use in this paper represent the U.S. population as a whole, and not the population that visits a primary physician. These prevalence rates would not capture, for instance, doctors' awareness of sick minorities being less likely to visit a physician than whites. Because neither of these measures satisfies us completely, we perform our main analysis using priors from epidemiological data and then check the robustness of the results using MOS priors.

Table of Means

Table 2 describes the variables used by minority status. Depression prevalence rates (“priors”), constructed on the basis of NCS, average 6.9 percent for whites and 7.6 percent for minorities (a significant difference). The NHANES data, on the other hand, indicate higher rates of diabetes in minorities than in whites (6.4 versus 5.1 percent) but similar rates of hypertension (23 percent) for both groups. The rates of disease derived from the MOS signals maintain in general the sign of the white–minority differences shown for the external prevalence rates. But because patients visiting a clinician are more likely to be sick than the average person in the population, rates of disease portrayed by the MOS screener are higher than those constructed on the basis of NHANES III or NCS data. For hypertension and diabetes, the differences in rates of diagnosis across groups run in the same direction as those derived from the patients' signals. But for depression, the difference runs in the opposite way: doctors are less likely to diagnose depression in minority patients than in white patients.

Table 2
Description of the Data by Minority Status

Minorities in the MOS are younger, more likely to be female, less likely to be married, more likely to be unemployed, and have a lower income than whites. They are more likely to have no insurance, more likely to be in an HMO, and more likely to come from LA or Chicago rather than Boston. Minorities are more likely to assess their health as poor and to be overweight. Also, they see staff physicians and group-practice physicians at higher rates than whites.

RESULTS

Table 3Table 4Table 5 show the results for hypertension, diabetes, and depression, respectively.14 The first column in each table depicts a traditional disparities regression in which the physician's diagnosis is estimated as a function of patients' sociodemographics (including race, age, gender) and other factors, physicians' characteristics, and a control for the presence of the disease obtained from the patient's report in the screening survey before the visit. We evaluate this model against that of a more comprehensive one that allows for information-related effects, as specified in equation (4). Columns 2–6 show the results for the statistical discrimination model.

Table 3
Hypertension
Table 4
Diabetes
Table 5
Depression

Hypertension

In a traditional disparities regression (column 1 in Table 3), detection of hypertension is significantly affected by gender (p=.003) and age (p=.012; .000, and .000 for each age category): men and elderly patients are more likely to be diagnosed with hypertension. Minority status is positive, but nonsignificant (p=.167). The second column in the table shows the results of the full statistical discrimination specification, given by equation (4). The terms in the prior turn out to be very significant and of the right sign (jointly, they are significant at 0.1 percent, with a χ2 [2] of 13.32). We also notice that the main effects of the gender and age variables become insignificant once the prior is accounted for. Using the BIC, the comparison of these two models leads to a rejection of the traditional disparities regression for hypertension. In other words, the statistical discrimination full specification fits the data better than the traditional disparities regression. To ensure that multicollinearity is not behind these results, we next check each variable singly. Columns (4)–(6) contain the results of adding race, age, and gender to the statistical discrimination expression in turn. When comparing these columns with column (1), we observe that the effects of gender and age virtually disappear or are reduced in magnitude. The race effect also goes to zero. These results are evidence that age, gender, and race effects can be understood as working through prevalence in the doctor's diagnosis of hypertension.15

On the other hand, the coefficient on the race times signal interaction is negative but insignificant for hypertension. The null result on such an interaction is not surprising given the lack of race effect to begin with in the traditional disparities regression. Miscommunication is unlikely to be an issue in the diagnosis of hypertension.

Diabetes

The results for diabetes (Table 4) are similar to those reported for hypertension, although less robust. The initial race effect in the traditional disparities regression, after controlling for the signal, is positive but insignificant (p=.141). Women are less likely to be diagnosed than men (p=.05), and older patients are more likely to be diagnosed than younger ones (p=.008 and .001). Accounting for the complete statistical discrimination specification (column 2) does not improve the fit of the data relative to the traditional disparities regression. Multicollinearity may be behind this initial misfit: once we remove the demographics from column 2, the fit of the information-based model improves (see column 3). The prior becomes significant (p=.000), and remains so when we study the individual effects of each of the race, age, and gender variables. Controlling for the prior reduces the coefficient on the race dummy substantially and makes the age effects insignificant. The effect of gender, on the other hand, remains strong, even after accounting for the prior. The results suggest some prevalence effect in the diagnosis of diabetes in the case of age and race, but no evidence for prevalence in gender. We find no support for the miscommunication hypothesis in the diagnosis of diabetes. The race–signal interaction in column 2 is insignificant (p=.844) and remains insignificant when we run a specification in which only miscommunication (and no prevalence) is modeled.

Depression

Similar to the results in Borowsky et al. (2000), also using MOS data to study only depression, the traditional disparities regression in column 1 of Table 5 shows that minorities are less likely to have depression diagnosed than whites. Women are more likely to be diagnosed with depression and, as in hypertension, middle-aged and older patients have a higher probability of being diagnosed than younger patients. While not shown in Table 5, depressed patients are more likely to be diagnosed when they have reported they are in bad health, but are less likely to be recognized in the first visit with the doctor. Also, being single or unemployed and seeing a GP instead of an internist increases the chances of a depression diagnosis.

The statistical discrimination model contains mixed results. With respect to the prevalence hypothesis, the coefficients on the prior variables are jointly significant but of the wrong sign (column 2, Table 5). Using a BIC, the traditional disparities regression cannot be rejected against the full specification of the statistical discrimination model. The effects of age and gender remain significant even when the prior is in the model (p=.042 and p=.033 for two age categories and p=.001 for gender), and the magnitudes of the coefficients on race, gender, and age do not diminish.16 The analysis suggests that either the doctor is not acting in a Bayesian fashion, or the priors she held are not captured in our estimate of θi.

We do find, however, evidence of miscommunication in the case of depression. The coefficient on the race times signal interaction is negative and marginally significant at p=.084 in the full specification of the statistical discrimination regression. Because the prevalence hypothesis did not fit the data in the case of this condition, we run a new specification of the depression regression excluding the terms in the prior (Table 5, column 7). A significant coefficient of the minority–signal interaction in such a specification (p=.049) suggests that doctors dealing with cases of depression rely less on the minority patients' reports.17 Miscommunication between white doctors and minority patients appears to be one of the forces behind the disparities observed in the diagnosis of depression. The fact that we find miscommunication to be relevant for depression but not for hypertension or diabetes is reasonable given that the detection of depression relies more on communication than the detection of the other conditions.

Communication between doctors and patients requires that a signal be emitted, be heard, and be internalized. What we call miscommunication in this paper conflates these. Another way to say what we have found is that an available signal is not responded to by a physician. We cannot tell with the data available whether the patient did express the signal at all or whether the doctor understood it but ignored it. But we can address a related concern: whether the doctor's misunderstanding stemmed from real communication problems with the patient or from less effort or time put in the encounter. It is possible that prejudice shows itself in a lower level of effort or time on behalf of certain patients, leading to miscommunication. As a response to this issue, we introduce into the analysis a variable that may be indicative of the physician's effort: the amount of time that the physician spent with the patient (as reported by the patient). We run a regression of “duration of visit” as the dependent variable on all patient and physician characteristics, including race and controls for different comorbidities. The coefficient on race is significant and positive (p=.032) when controlling for the main comorbidities in the study, diabetes, depression, hypertension, and myocardial infarction, and remains positive, although insignificant (p=.162), when adding other comorbidities, such as cancer, arthritis, paralysis, amputations, and others. This implies that doctors are not spending less time with minority patients than with white ones. We take this as evidence that the observed miscommunication is unlikely to be a manifestation of prejudice, at least if prejudice is measured by a shorter duration of the medical visit.

Black Patients; Nonwhite Physicians

Results are unchanged when we include only black patients in the minority category. We also checked the data for evidence of “racial” determinants of physicians' diagnosis decisions in the case of nonwhite physicians, running separate models for this group of 40 physicians and 735 and 1,399 patients for hypertension and depression/diabetes, respectively. There is no evidence of a race effect in any of the traditional disparities regressions we run for the conditions in the study. This may be because of imprecise estimates from a smaller sample, a different pattern of behavior for nonwhite physicians, or systematically different patients in unobserved factors.18 Unfortunately, interpreting this finding does not add much to the identification of miscommunication versus prejudice discrimination. While minority physicians may be less likely to feel prejudice against minority patients, they may also be more likely to communicate better with them.19 For depression, age and gender appear to have a significant input on doctors' diagnoses decisions, while only age is significant in the case of hypertension and gender in diabetes. Once we account for the statistical discrimination structure, we find some evidence of prevalence for age and gender in hypertension, but not in diabetes. We find no evidence of heightened miscommunication between nonwhite doctors and their patients for depression. This result is not surprising since we found no significant race effects in the traditional disparities regression for minority physicians.

Robustness of Prevalence Results

We performed several robustness checks that support the prevalence results. First, we ran the statistical discrimination equations with new priors constructed on the basis of the MOS signals. We also tried different specifications for the priors (both for the MOS as for the NHANES and NCS priors), like considering different priors by location. To account for the fact that we were running a second-stage regression with a prior estimated in a first stage, we bootstrapped the standard errors of the joint estimation. In addition, we reran the prevalence equations with doctor fixed effects and refined the estimation of the posteriors by adding a quadratic term in the prior. For most specifications, the conclusions obtained in the main framework remained valid. Results were not robust, however, to the use of location-specific priors.

Conclusions

Race/ethnicity effects are being intensively investigated in health care data, where findings of lower rates of diagnosis and treatment for minorities document a fundamental inequity in health care. Our paper proposes and tests a specific mechanism for why race matters. We analyze two implications of the statistical discrimination idea. The first one claims that racial effects on doctors' diagnostic decisions operate through the priors doctors hold about the patient's condition, even before observing any particular signal from the patient. The other variant states that “race effects” appear because of communication problems between white doctors and minority patients. We find strong evidence that race works through a “prior,” Bayesian fashion, for hypertension and diabetes. Furthermore, in the case of depression, we find evidence that race affects decisions through communication. While we do not explore it in this manuscript, communication may play a role in diagnosis also through other variables, like age and gender.

Our findings of statistical discrimination should not be taken as implying that Bayesian behavior is responsible for most health care disparities. We only study the diagnostic decision, the precursor to decisions about resource use. If disparities emerge in the diagnostic process, empirical research conditional upon diagnosis may understate the full magnitude of disparities. We chose to begin our analysis of the role of information in discrimination with the stage of treatment where information matters most. At the same time we recognize that disparities are mainly about treatment differences. A more complete analysis would follow patients as they and their clinicians make decisions about treatment. In these next stages of the process, stereotypes or prejudices may play more of a role.

The perspective of statistical discrimination illuminates how racial/ethnic discrimination might emerge from clinicians' efficient use of information. While we think this point is very important, it should not obscure focus on another aspect of the information issue: clinicians' obligation to find out about the health state of their patients in order to make good decisions. Communication problems can be overcome with more effort. Reliance on “priors” related to age, gender, or race, when low-cost reliable tests are available, is difficult to justify. We hope that our findings in support of statistical discrimination do not close off analysis of how to address disparities in care and outcomes, but rather open up fruitful avenues of research and policy.

There are some limitations to this study we have to consider. First, the model we estimate assumes that prejudice enters as a cost and affects the physician's utility in a linear way, and in particular does not interact with severity of the patient. If prejudice is different for patients of different underlying severities, it could offer another interpretation of our Race–Signal coefficient. Second, our measure of diagnosis may not always capture the fact that the doctor has recognized the condition. For instance, in the case of depression, physicians may have recognized a mental health problem as a secondary, rather than a primary reason for a visit. Third, we assume that these screeners are good approximations of the patients' underlying health conditions and that they are equally valid for patients from all groups. This assumption may be incorrect for several reasons. The same questions may have different significance for members of different cultural groups and fail to capture the same objective measures. Additionally, if patients' knowledge of their condition depend on earlier contacts with providers, and prejudice affected this previous exposure, then the signal (constructed on the basis of patients' reports) would already reflect a bias. Furthermore, patients may not be aware of their own condition and the lack of correlation between patient's signals and doctor's diagnoses may signal “good diagnosticians” rather than problems of communication.20 Finally, there might be some self-selection biasing the results. Since some patients can choose the doctor they want to be treated by, possibly on the basis of physician communication styles, the results we obtain are likely to be milder than they would be were patients randomly assigned to doctors. Also, our sample excludes non-English-speaking patients. Even with such a bias working against us, we find evidence for miscommunication in the case of depression.

To contend effectively with inequities in health care and outcomes, it certainly is necessary to know the source of the problem. Discrimination stemming from prejudice is of a very different character than discrimination stemming from application of rules of conditional probability. When doctors indulge prejudices, they are not acting in the best interests of their patients; when they apply rules of conditional probability they are doing the best they can, given the information available.21 Sorting out the role of these two classes of explanation is critical for empirical research and policymaking. But statistical reasons for discrimination may also be unfair. A finding in the case of depression that miscommunication is the explanation for a race effect does not ameliorate the disparity in services or outcomes that results, in spite of not stemming from doctors' ill will. Minorities still fare worse than whites in the clinical encounter, and attempts to bridge this gap through policies that improve doctor–patient communication should be encouraged. Better communication between doctors and patients will improve doctors understanding of patients' condition, reduce reliance on population averages in clinical decisions, and thereby improve the match between treatment and the health needs of the patient.

Acknowledgments

This research was supported by Grant PO1 MH59876 from the National Institute of Mental Health. We are grateful to John Ayanian, Carlos Blanco, Randy Ellis, Kevin Lang, Sharon Lise-Normand, Marc Rysman, Jon Skinner, Milt Weinstein, Alan Zaslavsky, and HSR editors and referees for helpful comments.

Footnotes

1We use “Minority” as a shorthand term for black, Hispanic, Asian, Native American, and/or any other nonwhite racial group. We recognize that in some U.S. regions, the number of people in some of these groups may exceed that of whites.

2These mechanisms refer to discrimination by doctors (the subject of this paper). Disparities at the clinical encounter could also stem from higher patient refusal rates among members of ethnic/racial minorities. There is some literature on patient refusal, although Geiger (2001) and IOM (2002) discount this as a major factor.

3Both hypotheses draw heavily on the labor economics literature on statistical discrimination. A first strand asserts that differences in wages across ethnic groups are related to the existence of group differences in the quality of employers' information (Aigner and Cain 1977; Lundberg and Startz 1983). The other strand attributes differences in outcomes to decision makers' use of priors that differ across racial groups (Arrow 1973; Coate and Loury 1993). In both lines of work uncertainty can generate disparities.

4See the data section for a description of NHANES III. Within the medical examination part, a Diagnostic Interview Schedule (DIS) was administered to individuals aged 15–39 years to detect the presence of mental illnesses.

5This symptom is one of many symptoms used to construct the DIS diagnoses.

6“More information being conveyed” by whites may be a reflection of both how the questions are phrased, understood, and answered, and of how the answers are understood. It does not necessarily indicate where the problems of communication arise.

7As the signal becomes more informative about the underlying disease, the likelihood of diagnosing the disease given a positive signal p(θi, Si=1) increases. In the limit, if the signal is perfectly informative, the doctor always diagnoses the disease when observing a positive signal, and the priors play no role in the diagnosis. The degree of reliance on the priors depends on the informative content of the signal. Doctors are less likely to rely on priors when the underlying disease of the patient is easily verifiable and more likely to rely on priors when the costs of diagnosis are high.

8The estimating equations are derived from a structural model of decision making under imperfect information. Such a model can be made available by authors upon request.

9The limitation of usual log-likelihood-based measures is that the null model has to be a subset of the alternative. Since our models are not nested, we use an alternative model selection criterion, the BIC. A model is better than another model if it has a smaller BIC value.

10Weights were constructed to account for the study design effects of sampling probabilities for patients and clinicians. The final weights were derived from three components. The first one adjusted for the different quota for clinicians in each specialty in the solo and small group system of care. In the large multi-specialty groups and HMO systems, every clinician was selected if eligible. The second component adjusted for the differential probability of being sampled within the screening period. The third component adjusted for the differential probability of patients visiting the office within a two-week period. For more details, see Rogers et al. (1992).

11In the Results section, we consider black physicians and distinguish black from other minority patients.

12The questions asked were: Have you had 2 weeks or more in the past year during which you felt sad, blue, or depressed? Have you felt depressed or sad much of the time in the past year? During the past week, how often have you felt depressed/had crying spells/felt sad/enjoyed life/slept restlessly/felt that people disliked you?

13Another way would be using physicians' diagnosis. One could argue that a prior is a belief on the part of the physician, and the average rate of diagnosis in an age–gender–race cell might better capture the physician's prior belief about the prevalence of some disease for that set of people. Using diagnosis presents two major problems. First, we would be using as a regressor the mean of the variable we are seeking to explain, raising identification problems. Second, we want the prior to represent the underlying distribution of disease in the population without incorporating other beliefs (such as stereotypes or prejudice) that may be reflected in doctors' diagnoses rates.

14We weighted observations to account for the MOS sample design, and computed robust standard errors recognizing clustering at the clinician level.

15Note that, before conditioning for other factors, the initial disparity here is negative: minorities are more likely to be diagnosed with hypertension than whites. The general principles of the prevalence form of statistical discrimination are as likely to hold in the more common cases where minority patients are less likely to be diagnosed than whites.

16The prior for depression does not vary much across age–gender–race cells. Econometrically, if the variance in the prior is too small, it would be tantamount to having another constant term in the regression. This could result in imprecise estimates of the prior and its interactions. To dismiss this problem, we run new regressions assuming no constant. Results are still imprecise and far from what theory would predict.

17We also ran a regression with a continuous signal of depression. The interaction between this continuous variable and race remains negative and significant.

18We comment on the “selection” of patients to minority physicians in the Conclusions.

19We also ran a regression that considers jointly all physicians (white and minority). The likelihood of diagnosing depression depends now on doctor's race, on interactions of patient's race and doctor's race on interactions of patient's race, signal and doctor's race, and on the other characteristics used previously. None of the variables associated with the race of the physician is significant.

20If minorities are less likely to have contact with the medical system, they are less likely to be aware of their health problems. In this case, the lower correlation between signal and diagnosis for this group of patients would indicate lack of trust in the medical system, but not necessarily miscommunication in the medical encounter.

21The information available may not always reflect the “truth.” Stereotypes may lie behind priors held by doctors.

References

  • Aigner D, Cain G. “Statistical Theories of Discrimination in Labor Markets.” Industrial and Labor Relations Review. 1977;30(2):175–87.
  • Altonji J, Pierret C. “Employer Learning and Statistical Discrimination.” Quarterly Journal of Economics. 2001;116(1):313–50.
  • Arrow K. “The Theory of Discrimination.” In: Ashenfelter O, Rees A, editors. Discrimination in Labor Markets. Princeton, NJ: Princeton University Press; 1973. pp. 3–33.
  • Bach P, Cramer L, Warren J, Begg C. “Racial Differences in the Treatment of Early-Stage Lung Cancer.” The New England Journal of Medicine. 1999;341(16):1198–205. [PubMed]
  • Balsa A, McGuire T. “Statistical Discrimination in Health Care.” Journal of Health Economics. 2001;20:881–907. [PubMed]
  • Balsa A, McGuire T. “Prejudice, Clinical Uncertainty and Stereotyping as Sources of Health Disparities.” Journal of Health Economics. 2003;(22):89–116. [PubMed]
  • Bloche MG. “Race and Discretion in American Medicine,”Yale Journal of Health Policy. Law and Ethics. 2001;1:95–131. [PubMed]
  • Borowsky S, Rubenstein LV, Meredith LS, Camp P, Jackson-Triche M, Wells KB. “Who Is at Risk for Non-Detection of Mental Health Problems in Primary Care?” Journal of Global Information Management. 2000;15:381–8. [PMC free article] [PubMed]
  • Coate S, Loury G. “Will Affirmative-Action Policies Eliminate Negative Stereotypes?” American Economic Review. 1993;83(5):1220–40.
  • Cooper LA, Roter DL. 2001. Patient–Provider Communication: The Effect of Race and Ethnicity on Process and Outcomes of Health Care. Unpublished, Johns Hopkins University.
  • Einbinder LC, Schulman KA. “The Effect of Race on the Referral Process for Invasive Cardiac Procedures.” Medical Care Research and Review. 2000;57(1):162–80. [PubMed]
  • Fendrick AM, Chernew ME, Hirth RA, Bloom BS. “Alternative Management Strategies for Patients with Suspected Peptic Ulcer Disease.” Annals of Internal Medicine. 1995;123:260–8. [PubMed]
  • Geiger HJ. 2001. Racial and Ethnic Disparities in Diagnosis and Treatment: A Review of the Evidence and a Consideration of Causes. Unpublished.
  • Knowles J, Persico N, Todd P. “Racial Bias in Motor Vehicle Searches: Theory and Evidence.” Journal of Political Economy. 2001;109(1):203–29.
  • Lundberg S, Startz R. “Private Discrimination and Social Intervention in Competitive Labor Markets.” American Economic Review. 1983;73:340–7.
  • Mayberry R, Mili F, Ofili E. “Racial and Ethnic Differences in Access to Medical Care.” Medical Care Research and Review. 2000;57(1):108–45. [PubMed]
  • Monheit AC, Vistnes J. “Health Insurance Coverage and Access to Care.” Medical Care Research and Review. 2000;57(1):11–35. [PubMed]
  • Mushlin AI, Mooney C, Holloway RG, Detsky AS, Mattson DH, Phelps CE. “The Cost-Effectiveness of Magnetic Resonance Imaging in Patients with Equivocal Neurological Symptoms.” International Journal of Technology Assessment in Health Care. 1997;13:21–34. [PubMed]
  • Phillips KA, Mayer ML, Aday LA. “Barriers to Care among Racial/Ethnic Groups under Managed Care.” Health Affairs. 2000;19(4):65–75. [PubMed]
  • Rogers W, McGlynn E, Berry S, Nelson EC, Perrin E, Zubkoff M, Greenfield S, Wells BK, Stewart AL, Arnold SB, Ware . “Methods of Sampling.” In: Stewart AL, Ware JE Jr., editors. Measuring Functioning and Well-Being: The Medical Outcomes Study Approach. Durham, NC: Duke University Press; 1992. pp. 27–47.
  • Schulman KA, Berlin JA, Harless W, et al. “The Effects of Race and Sex on Physicians' Recommendations for Cardiac Catheterization.” New England Journal of Medicine. 1999;340(8):618–26. [PubMed]
  • Weinstein M, Fineberg H, Elstein A, Frazier H, Neuhauser D, Neutra R, McNeil B. Clinical Decision Analysis. Philadelphia: W.B. Saunders Company; 1980.
  • Wells KS, Sturm R, Sherbourne CD, Meredith LS. Caring for Depression. Cambridge, MA: Harvard University Press; 1996.

Articles from Health Services Research are provided here courtesy of Health Research & Educational Trust
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...

Links

  • PubMed
    PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...