Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Soc Sci Res. Author manuscript; available in PMC 2012 Mar 1.
Published in final edited form as:
Soc Sci Res. 2011 Mar 1; 40(2): 523–537.
doi:  10.1016/j.ssresearch.2010.11.007
PMCID: PMC3106307

Eyes on the Block: Measuring Urban Physical Disorder Through In-Person Observation


In this paper, we present results from measuring physical disorder in Los Angeles neighborhoods. Disorder measures came from structured observations conducted by trained field interviewers. We examine inter-rater reliability of disorder measures in depth. We assess the effects of observation conditions on the reliability of reporting. Finally, we examine the relationships between disorder, other indicators of neighborhood status, and selected individual outcomes.

Our results indicate that there is considerable variation in the level of agreement among independent observations across items, although overall agreement is moderate to high. Durable indicators of disorder provide the most reliable measures of neighborhood conditions. Circumstances of observation have statistically significant effects on the observers’ perceived level of disorder. Physical disorder is significantly related to other indicators of neighborhood status, and to children’s reading and behavior development. This result suggests a need for further research into the effects of neighborhood disorder on children.

Keywords: Neighborhood, disorder, reliability, measurement

1. Introduction

1.1 Background

A large literature suggests that physical disorder (e.g., dilapidated buildings, trash, broken sidewalks, graffiti, etc.) in urban areas may increase crime rates, disruptive behavior, stress levels, and health and psychological problems among neighborhood residents (Hill et al., 2005; Ross and Mirowsky, 2001; Taylor, 2001). The theory that reducing physical disorder can increase safety and social control has led to anti-graffiti programs, vacant land management, and other initiatives to reduce “urban decay” in many cities (Barnard, 2006; Wachter and Gillen, 2006).1 However, researchers have only recently begun to develop methods of measuring urban physical disorder (Raudenbush and Sampson, 1999; Taylor et al., 1985). Comprehensive and reliable measures are essential to testing hypotheses about the causes and consequences of physical disorder and to identifying urban communities where disorder is most acute.

In this paper, we examine the results from measuring physical disorder in the Los Angeles Family and Neighborhood Survey, Wave 1 (L.A.FANS-1). In L.A.FANS-1, multiple trained observers walked through study neighborhoods while systematically recording what they observed on standardized forms. Although expensive, independent neighborhood observations can provide more objective assessments of neighborhood conditions than proxy measures such as resident reports of disorder, used by most studies.2 We examine this method of assessing physical disorder in Los Angeles by addressing four questions. First, do independent observers of the same neighborhood perceive similar levels of physical disorder? If independent observers agree about disorder levels, the items that they were asked to code are more likely to be well-defined, readily observable, and replicable. Second, how do observation circumstances (time of day, day of the week, and season) and observers’ prior experience with the neighborhood affect their perceptions of physical disorder? Earlier research and common sense suggests that what is observed in a neighborhood may be different in the morning and evening (Raudenbush and Sampson, 1999), on weekdays and weekends, and in different seasons—although the effect of situational variables may be quite modest for items that change little over time (e.g., the number of street lanes) compared to items for which there may be substantial variation over time (e.g., presence of litter). Observers’ previous experience in the neighborhood may also affect what they observe, despite standardized training and methods. We assess how important these factors are in determining what observers report. The third question we address is what types of neighborhoods in Los Angeles County have the highest levels of physical disorder? In answering this question, we also examine how well social characteristics available from sources such as the decennial census serve as proxies for more comprehensive and direct measures of neighborhood disorder. Our fourth question is whether physical disorder is associated with child and adult well-being indicators such as cognitive development, behavioral problems, and mental health, as previous research suggests. Our aim was to assess whether disorder has any independent effect on these selected outcomes, beyond the effects of other indicators of neighborhood disadvantage.

1.2 The Effects of Physical Disorder

Physical disorder includes the condition of streets, sidewalks, building exteriors, and other characteristics visible to any passer-by. Both the criminology and social epidemiology literatures have been concerned with the consequences of urban physical disorder for neighborhood and individual welfare. Below, we provide a brief overview of each literature.

1.2.1. Physical Disorder and Crime

The “broken windows” theory first advanced in the 1980s (Wilson and Kelling, 1982) suggests that physical disorder affects crime rates in two ways. First, physical disorder visually advertises that residents tolerate infractions against social order and are unlikely to intervene to stop crime and disorderly conduct. To potential criminals, disorder indicates poor social control, which they can exploit. Second, residents feel personally threatened by disorderly elements of their neighborhood environment. Therefore, they retreat into their homes, spending less time in public spaces and investing less in relationships with neighbors (Skogan, 1990; Wilson and Kelling, 1982). As a consequence, few people spend time on the street, undermining social control and further increasing opportunity for disorderly conduct and crime. Jane Jacobs believed this dearth of “eyes on the street” was a key mechanism of urban decay (Jacobs, 1961; Skogan, 1990). The neighborhood’s ability to act collectively (e.g., to maintain order) is also impaired because residents do not know or trust each other (Sampson and Raudenbush, 1999). Residents are less likely to intervene to prevent disorderly conduct and crime because they fear that their neighbors will not back them up or may even threaten them.

Physical disorder can generate further disorder, since it encourages residents to move to less disordered neighborhoods (Skogan, 1990). Because this option is less available to poor residents, disorder can contribute to a concentration of poverty and to disinvestment in housing and businesses (Wilson, 1987). The remaining residents are less enfranchised, have fewer resources, and may feel less ownership of their streets (Sampson and Raudenbush, 1999; Skogan, 1990). Because they are disenfranchised, they are also less likely to maintain public spaces and to keep physical disorder at bay by, for example, picking up litter, painting over graffiti, and maintaining yards and buildings.

Cross-sectional studies show that neighborhood disorder is associated with higher crime rates and fear of crime (Kelling and Coles, 1998; Sampson and Raudenbush, 1999; Skogan, 1990). However, crime may, in fact, cause perceived disorder rather than the reverse (Harcourt, 2001). Sampson and Raudenbush (1999) argue that disorder is a symptom, not a cause, of poor social control and crime in the neighborhood. Therefore, cleaning up minor infractions such as graffiti and cigarette butts will have little effect on burglaries and homicides. Rather, the solution to both problems is to improve social control through strengthening trust and collective efficacy within neighborhoods.

1.2.2. Physical Disorder and Stress

Research in social epidemiology suggests that physical disorder can also cause chronic stress among neighborhood residents. Residents may experience fear of crime and violence, feelings of hopelessness, or feelings of isolation, each of which is a source of chronic stress. Chronic stress, in turn, increases the risk of negative health outcomes such as obesity, high blood pressure, heart disease, and depression (Hill et al., 2005; McEwen, 1998; Molnar et al., 2004; Ross and Mirowsky, 2001; Sampson and Raudenbush, 1999). Reducing physical disorder may, therefore, improve the mental and physical health of neighborhood residents; the effect may be particularly strong in poor communities, where both physical disorder and poor health are more common.

Empirical evidence on the disorder-health relationship is limited and often indirect. For example, Moving to Opportunity (MTO) – a randomized intervention in which selected residents of subsidized housing were moved to middle class neighborhoods – found that mental health and feelings of safety improved significantly among adults and female youth in the treatment group compared to control group members who remained in poor neighborhoods (Kling et al., 2007). However, this study did not directly examine physical disorder.

Ross and Mirowsky (2001) found that residents reporting high neighborhood physical disorder also reported worse health status and more limitations in physical functioning than their counterparts in neighborhoods with less disorder. Their models suggest that stress—specifically, fear of crime and violence—is the mechanism linking health to physical disorder. Their study relies on self-reported disorder as well as self-reported fear and health, which raises some questions about the causal order of variables.

1.3 Neighborhood Social Characteristics and Physical Disorder

Socially disadvantaged neighborhoods appear to have higher levels of physical disorder (Cohen et al., 2003; Sampson and Raudenbush, 1999), for several reasons. Residents of poor neighborhoods have less income, and often less time, to maintain their homes, yards, public spaces, and businesses. Less political clout makes it harder to obtain public maintenance services such as graffiti cleanup and sidewalk repair. Businesses in poor neighborhoods also have fewer resources to maintain their property and contribute to neighborhood improvement projects (Alwitt and Donley, 1997). Under-investment can lead to vacant or abandoned properties, which are themselves a form of physical disorder (Sampson and Raudenbush, 1999).

High levels of residential turnover and a high proportion of renters may also contribute to physical disorder. Frequent residential turnover makes it difficult for neighbors to get to know each other, establish trust, and exercise social control (Ross and Jang, 2000; Sampson et al., 1997). High homeownership rates are associated with residential stability and may improve both property maintenance and residents’ ability to control their physical surroundings (Sampson and Raudenbush, 1999). Homeowners have more incentive to invest in home and neighborhood improvement than renters because it improves property values. For the same reasons, they are also more likely to share with other homeowners norms about appropriate behavior (e.g., trash disposal, building and yard maintenance).

Immigrant concentration may also contribute to physical disorder, independently of poverty and residential instability. Where immigrant neighborhoods are more culturally and linguistically diverse, residents may be less likely to form social bonds, and therefore to collaborate to limit or remove physical disorder (Sampson et al., 1997). In Los Angeles, the immigrant population comprised more than one third of population in 2000(Malone et al., 2003). However, immigrant neighborhoods in Los Angeles are, on average, less ethnically diverse than other neighborhoods: the high level of immigration from Mexico and Central America combined with residential settlement patterns has generated highly concentrated immigrant neighborhoods. In 2000, the average Latino person in Los Angeles lived in a census tract that was 78% Latino (Ortiz and Telles, 2008). Although less ethnically diverse, immigrant neighborhoods in Los Angeles are often more disenfranchised and less able to marshal resources or obtain public services. For this reason alone, they may have higher levels of physical disorder.

Thus, previous research suggests that physical disorder is likely to be more common in neighborhoods that are poor, have high residential turnover rates, have lower owner-occupancy rates, and have high concentrations of immigrants.

1.4 Measurement of Physical Disorder

Although “windshield surveys” have a long history in urban studies and public health, most research on neighborhood disorder relies on residents’ perceptions of disorder (Elo et al., 2009; Perkins and Taylor, 1996; Ross and Mirowsky, 2001; Sampson and Raudenbush, 1999). Several recent studies have attempted to develop independent and objective measures of disorder. In an early study, Perkins et al. (1992) focused on observation of three theoretical constructs in Baltimore neighborhoods: physical and social incivilities (e.g., litter, vandalism, harassment, selling drugs), territorial functioning (e.g., property maintenance, “neighborhood watch” signs), and defensible space (e.g., lighting, fences). Each sampled block was observed simultaneously by two trained observers who were instructed not to discuss ratings with each other. Intra-class correlation coefficients (ICCs) and Cronbach’s alpha statistics calculated for the two observers’ rating for each block were remarkably high for many physical characteristics. For example, the ICCs for the estimated percentage of open block frontage that was unused vacant lots, parking lots, public playgrounds, and public gardens ranged from 0.97 to 0.99, suggesting either that the two observers were highly consistent or, possibly, that the observers at least sometimes violated the prohibition on discussing observations. However, agreement on some items (e.g., number of abandoned cars on the street) was considerably lower (Perkins et al., 1992): Table 1).

Table 1
Description of the L.A.FANS Neighborhood Observation Study Design

The Project on Human Development in Chicago Neighborhoods (PHDCN) used a motor vehicle with videotape cameras on each side and one observer for each side of the block to observe social disorder (e.g., adults loitering, people drinking) and physical disorder(Raudenbush and Sampson, 1999; Sampson and Raudenbush, 1999). Videotapes were coded by independent observers and differences reconciled. Several other studies have used shortened or modified versions of the PHDCN instruments (Franzini et al., 2009; Grafova, 2008; Kelly et al., 2007; Wei et al., 2005) while some studies have developed their own questions on physical disorder (Miles, 2006). The most common items in all these studies include the presence or absence of litter and garbage, graffiti, beer and liquor containers, broken glass, abandoned cars, vacant lots, condoms, and drug paraphernalia.

Most studies of observed disorder do not assess inter-rater reliability or the effects of observer characteristics on the observation results (Schaefer-McDaniel et al., 2010). For example, in PHDCN, only one person in the vehicle observed each block face. Although several coders coded the videotapes independently, the authors do not report inter-coder agreement. However, when a random 10 percent of all block faces were recoded from the videotapes by new coders, the level of agreement between the old and new coders was 98 percent (Raudenbush and Sampson, 1999).

In Wei et al. (2005), only one observer completed the observations, but in 5% of the blocks, a researcher subsequently reassessed the block. The ICC for the two observations was 0.68. In several studies, only one person observed a given block face (Grafova, 2008; Miles, 2006) and in other studies where two or more people completed the observation it is not clear whether they conferred about the results (Franzini et al., 2009; Kelly et al., 2007).

2. Material and Methods

2.1 Data and Measurement

We use data from Wave 1 of the Los Angeles Family and Neighborhood Survey conducted in 2000–2001 in Los Angeles County. To assess physical disorder, L.A.FANS-1 trained observers to record levels of disorder using a standardized list of items. Each block face was observed by multiple observers working independently, at different times. Furthermore, all observers conducted observations in multiple tracts and block faces. Observers were recruited and trained as professional field interviewers for the L.A.FANS project, and were required to have a college education. They were selected to be of diverse race/ethnic backgrounds and region of residence within Los Angeles. The specific tracts and blocks that observers were assigned to visit were based on the observers’ residential location in order to reduce travel costs (because they worked from home); this procedure resulted in a correlation between observers’ race and ethnicity and that of the neighborhoods they observed. The non-random assignment of observers to tracts means that attempting to control for correlation in observations performed by the same observer is problematic because it may yield spurious findings. The scheduling of observations and the time to completion were also variable.3 The observation forms were adapted from those used in PHDCN.

L.A.FANS is based on a stratified probability sample of 1990 census tracts (Sastry et al., 2006). Three strata were defined based on the percent in poverty in 1997: very poor (tracts in the top 10 percent of the poverty distribution), poor (those in the next 30 percent), and non-poor (bottom 60 percent). Tracts in the very poor and poor strata were oversampled. In each tract, census blocks were sampled with probability proportional to population size. Observations were completed by specially trained L.A.FANS interviewers for each block face in sampled blocks. Observers completed three weeks of training in field interviewing, and an additional two days of training in completing the physical and social disorder observation protocol(Sastry and Pebley, 2003). To complete each observation, observers first drove around the entire block, then walked along each block face, observing both sides of the street. At the end of each block face, the observer completed a standardized form(Sastry and Pebley, 2003). Signs of both physical and social disorder were recorded, but the social disorder indicators are not included in this analysis because of low frequency and poor reliability.

Basic characteristics of the block face sample are shown in Table 1. In total, the L.A.FANS sample includes 2,071 block faces, 422 blocks, and 65 tracts. On average, each block comprised of about 5 block faces and there were 6.5 blocks per tract. An average of about 3 independent observations was completed per block face, 14 per block, and 92 per tract. Thirty-five unique observers completed a mean of 171 block-face observations. Multiple observations were completed for 98 percent of block faces, and three or more observations were completed for 80 percent of block faces.

Table 2 shows block-face level summary statistics for the situational characteristics of the observations. Observations were fairly evenly spread across the days of the week. About three-quarters of observations were completed at midday or in the afternoon. Almost all observations were completed in a single fieldwork campaign in the winter months. Wintertime temperatures are generally cool, but not cold, and there is occasional rain. Most observations were completed by observers who had no previous knowledge of the block being observed. The average time spent observing a block face and completing the form was 60 minutes, although there was considerable variation among observations as indicated by the large standard deviation (49 minutes).

Table 2
Block-face level Situational Characteristics for L.A.FANS Neighborhood Observations

The block face observation form contained 40 items.4 Specific items are shown in Table 3, along with their minimum and maximum values, mean, and standard deviation. Most responses were recorded using one of two Likert-style ordered response scales. One scale contained four response options: “none”, “a little”, “some”, or “a lot”. The other contained five response options: “none”, “very few”, “some”, “many”, or “all”. A few items were recorded on the original form as “yes” or “no” for the presence or absence of a condition.

Table 3
Block-Face Summary Statistics for L.A.FANS Neighborhood Observations Items

Sampson and Raudenbush found that most of the systematic variation in neighborhood observations in PHDCN was captured by dichotomizing the responses into “none” and “any” categories (Raudenbush and Sampson, 1999; Sampson and Raudenbush, 1999). To determine whether this finding holds in L.A.FANS-1, we conducted two analyses. First, we compared, at the block-face level, ICCs for each variable, dichotomized at each possible cutoff point in the scale (results not shown). The ICCs were almost always highest when data were dichotomized between “none” versus “any” categories, suggesting that this scheme was the most appropriate. Second, we conducted an analysis of inter-rater reliability for both the ordered form and the dichotomized form of these variables, to determine whether the results differ depending on specification, as described below.

The strongly residential and suburban character of L.A.FANS blocks (and most of Los Angeles County) is apparent from the traffic, land use, and building type variables. Streets were generally narrow residential streets with two lanes of traffic. The primary land use was residential and the majority of block faces contained stand-alone houses – the predominant housing type in Los Angeles County. Although observers coded 10 percent of block faces as having no residential housing, all blocks selected for L.A.FANS had some residential housing according to the 1990 census and other administrative data (Sastry et al., 2006).

To assess the association of physical disorder with neighborhood economic, social, and demographic characteristics, we used five tract-level summary variables created by the L.A.FANS project for all Los Angeles tracts using data from the 2000 U.S. Census.5 Data from the 2000 Census were matched with 1990 tracts using a standard cross-walk. The five variables are concentrated disadvantage, concentrated affluence, ethnic diversity, residential stability, and immigrant concentration indices. The concentrated disadvantage index includes the percent of the population: in poverty, with annual family income <$24,000, in female headed households, receiving public assistance, nonwhite, and <18 years old. The concentrated affluence index includes the percent: in executive or professional occupations, with 13+ years of schooling (adults 25+ only), with annual family income >$75,000, white, who speak English “very well” (among adults), and who speak only English (among adults). The ethnic diversity index reflects the probability that any two people chosen at random from the tract would be of different race/ethnicity. For this measure, race/ethnic groups are defined as: Latino, white, African American, Asian, and other. The score is calculated as:

1([%white]2+[%black]2+[%Latino]2+[%API]2+[%other race/eth]2)

The residential stability index was constructed using factor analysis and includes percent of: dwellings in multi-unit housing, owner-occupied housing units, households living in the same residence as 5 years earlier, and non-family households. The immigrant concentration index, also constructed using factor analysis, includes the percent of the population: non-citizen, foreign born, foreign born who arrived since 1990, foreign born who arrived since 1995, Spanish speaking, and Latino. In initial multivariate models (not shown), we also included population density and land use. However, the coefficients for these variables were not statistically significant and are omitted from analyses presented here.

To assess the association of physical disorder with individual-level outcomes, we use data from two subsamples within LAFANS-1. First, we look for significant effects of disorder on children’s academic achievement and behavioral problems. Then we look for significant effects of observed disorder on adult depression.

L.A.FANS was based on a multistage clustered sampling design (Sastry et al., 2006).. In the first stage census tracts in Los Angeles County were sampled from three strata (non-poor, poor, and very poor) based with probability proportional to size. The second stage sampled census blocks within each tract. In the third stage, households were sampled within each block and screened. Approximately 40–50 households were interviewed in each census tract, for a total sample size of 3,100 households.

In each sampled household, all household residents were listed and classified as adults (age 18 and older) or children (age 17 and younger). One adult was sampled at random and interviewed in each household. In households with children, one child was also sampled at random. If the child had one or more siblings age 17 or younger, one of them was chosen at random as a second sampled child. The sampled children’s primary caregiver, who was nearly always the children’s mother was also interviewed. Sampled children ages 3 to 17 and their mothers completed subtests of Woodcock-Johnson Revised standardized assessments (Woodcock and Johnson, 1989) to assess reading and mathematics skills. Mothers were asked to answer the Behavior Problems Index (Peterson and Zill, 1986) for each sampled child who was 3 to 17 years of age . The BPI measures are designed to provide a standardized score of behavior problems across ages. We report the association of neighborhood physical disorder with two subscales of the BPI: the internalizing index (BPI-I) assessing the presence of withdrawn and sad behaviors, and the externalizing index (BPI-E) assessing the presence of aggressive and related behaviors. BPI-I analyses are based on 1,957 children in 1,282 families, and BPI-E analyses are based on 1,954 children in 1,276 families.

Finally, we look at the association between adult depression outcomes and neighborhood level disorder. The Composite International Diagnostic Interview—Short Form questionnaire (CIDI) (Kessler et al., 1998) was administered to all primary caregivers in the LAFANS-1 sample. We use the estimated probability that a respondent would meet diagnostic criteria for depression if given the full CIDI interview as our outcome, for a sample of 1,511 adults.

2.2 Analytic Approach

Our first two goals are to determine the extent to which independent observers of the same block face recorded the same levels of physical disorder – i.e., the degree of inter-rater reliability – and to investigate whether situational characteristics (time of day, day of the week and season) affected perceptions of physical disorder. We also assess whether dichotomized versions of the ordered observation items perform as well as the original ordered items themselves on inter-rater reliability.

We use ICCs to measure inter-rater agreement. Kappa statistics are sometimes used for this purpose, but they are limited because they do not allow comparison of the amount of agreement across variables (Landis JR and Koch GG, 1977). ICCs provide a continuous measure of correlation that represents the degree of agreement among independent observers, and is therefore comparable across variables. The magnitude of the ICC provides a means of comparing the strength of agreement on one aspect of disorder (e.g., trash) vs. other aspects (e.g., conditions of buildings). We estimate ICCs using a set of multilevel models with random effects at the block face, tract, or both block face and tract. We estimate models in these three ways because we expect the actual variation in each disorder indicator to vary depending on scale; some items are likely to vary widely within the tract, while others are not. We also estimate models with and without the situational variables to assess how these variables affect the reliability of observation among observers.

Although examining each observed characteristic can be informative, our central interest is in the latent or composite construct of neighborhood physical disorder. We created two summary scales to measure this construct. The first scale approximates Sampson and Raudenbush’s (1999) physical disorder scale from PHDCN data. The PHDCN scale included 10 items: cigarettes/cigars, garbage or litter, empty beer bottles, three types of graffiti (political, tagging and gang), painted-over graffiti, abandoned cars, condoms, and needles/syringes. In the L.A.FANS observation form, the three types of graffiti were combined into one item.6 The presence of condoms and needles/syringes were also combined into a single item. Thus, the L.A.FANS scale is based on 7 rather than 10 individual items capturing the same elements of disorder.

The PHDCN scale is based on a relatively narrow definition of physical disorder. To take advantage of the wider range of indicators available in L.A.FANS, we created a second scale using factor analysis with varimax rotation to select the items to be included. The factor analysis included all observed physical characteristics of the block face. The items most strongly associated with the first factor were selected, using an eigenvalue cutoff of 1. These 20 items include the 7 items in the PHDCN scale. The scale has a Cronbach’s alpha of 0.90, and a slightly right-skew.

Next, we estimated multilevel item response models to analyze the neighborhood physical disorder scales. These models also allow us to assess the effects of situational variables (time of day, season, etc.) on reliability of reporting among observers of the same place. We estimated standard one-parameter item response models with random effects at the block face and tract levels (Raudenbush and Sampson, 1999). The dependent variables in these models are the binary item responses for the entire set of 7 or 20 items (for each of the two scales). The models were estimated using multilevel logistic regression via a maximum likelihood estimation procedure. These models can be viewed as three-level models, with the first level being item responses within the block face, the second level being block faces, and the third level being tracts. Dummy variables that identify each individual item appear at the first level of the models (which do not include a constant), while covariates that reflect situational characteristics appear at the models’ second level. The coefficients for the disorder items represent the probability of occurrence and consequently reflect how much each item contributes to tract-level disorder, or its severity. In particular, an item with a large negative coefficient is rarer and, hence, when that item is observed it is associated with a higher overall disorder score. The model-based estimate of the random effect for each tract represents the physical disorder scale for that tract. We also calculated tract-level ICCs for the scales using the estimated variances of random effects at the block face and tract levels.

In the final part of the analysis, we take two approaches to examining whether neighborhood disorder is substantively different from other measures of neighborhood disadvantage. First we estimate a linear regression of the association between tract-level structural factors and physical disorder. Variables describing the tract structural characteristics are derived from 2000 Census data, as described above. Our goal is to assess which types of tracts are most likely to have high levels of physical disorder and how effectively census tract characteristics serve as proxy variables for direct observation of neighborhood physical disorder in this setting.

Second, we use the tract disorder scores as independent variables in models with selected individual-level outcomes as the dependent variable. Our aim is to assess whether disorder has any independent effect on these selected outcomes, beyond the effects of other kinds of neighborhood disadvantage. We look at the effect of disorder on children’s math and reading achievement, children’s behavioral problems, and adult depression. Child outcomes are estimated using models with random effects at the family and tract level; the adult outcome is estimated using models with a random effect at the tract level.

3. Results and Discussion

3.1 Observer Agreement

Table 4 shows the ICCs from the multilevel random effects models without and with the situational variables. Model 1 includes a block face-level random effect and no covariates while Model 2 adds the situational variables. Similarly, Models 3 and 4 include tract-level random effects, without and with situational variables, respectively. Model 5 includes both block face-level and tract-level random effects in the same model plus the situational variables.

Table 4
Block-Face and Tract Intra-Class Correlation Coefficients across Multiple Observers for L.A.FANS Neighborhood Observations Items

There are four key results in Table 4. First, the level of inter-rater agreement at the block face level is generally high, ranging from a low of 0.32 to a high of 0.95 (Model 1). In Models 3 and 4, the ICCs at the tract level are generally lower than the same item at the block face level (Models 1 and 2), although the overall level of inter-rater agreement at the tract level is still moderate to high (range for Model 3: 0.11 to 0.76). However, in Models 1 through 4, block face and tract effects are not clearly distinguished because the included effect picks up the influence of the omitted effect. Model 5 includes both tract and block face random effects simultaneously and hence provides net estimates of the block face and tract ICCs. In contrast to the results in the preceding columns, the ICCs estimated in Model 5 are sometimes considerably larger at the tract level than the block face level and sometimes the reverse. Not surprisingly, block face ICCs are generally larger than tract ICCs for items which are block face-specific and exogenously generated such as the number of traffic lanes, availability of public transportation, and whether there are street barricades and trees. By contrast, tract-level ICCs are higher for items reflecting conditions that are common throughout the entire local area, such as graffiti, abandoned cars, and trash and garbage. Note that in Model 5, the tract ICCs are larger than the block face ICCs for the variables that are included in the 7-item scale (items 10 and 11–17), described below.

The second key result in Table 4 is that holding constant the situational variables makes little difference in the size of the ICCs (e.g., contrast Model 1 with Model 2 and Model 3 with Model 4). This finding suggests that relatively little of the divergence in ratings among observers is due to variations in the circumstances of the observations such as season of the year or day of the week.

Third, for the ordered variables based on Likert-type scales, dichotomizing them into “none” and “any” results in only minor changes to the ICCs in all of the five model specifications. The ICCs for the dichotomized variables are often very close to, and occasionally larger than, the ICCs for the ordered versions of the variables. This finding is consistent with Raudenbush and Sampson’s (1999) results from PHDCN and suggests that it is difficult for observers to distinguish consistently between qualitative descriptions such as “a little,” “some,” and “a lot.”

Fourth, the aspects of disorder that can change relatively easily over time such as garbage, strong odors, drug paraphernalia, cigarette butts, and beer or liquor containers have lower ICCs than more enduring aspects such as graffiti, vacant lots and abandoned buildings. More temporally stable aspects of the physical environment are more reliable items in terms of inter-rater agreement.

3.2 Situational Variables and their Contribution to Observing Disorder

The results in Table 4 suggest that situational variables account for very little of the variation among observers in their observations. Next we assess how much the situational variables affect the likelihood of observing physical disorder, across items, observers, block faces and tracts. We also assess the degree of severity contributed by each item in our scales. To do so, we estimate multilevel item response models. The parameter estimates and standard errors from these models for the 7-item PHDCN-equivalent scale and the 20-item expanded scale are shown in Table 5.

Table 5
Multilevel Item Response Models of Neighborhood Physical Characteristics Based on the L.A.FANS Neighborhood Observations Items

At the top of the Table 5 are estimated coefficients for each of the individual observation items that comprise the scale. Items with lower probabilities of occurrence (see Table 3) have more negative coefficients, while those with higher probabilities of occurrence have larger positive coefficients. For the 7-item scale, the item with the lowest probability of occurrence—observed drug paraphernalia, observed in just 3 percent of block faces—has the most negative coefficient, while the item with the highest probability of occurrence—the presence of garbage, litter or glass, observed in 73 percent of block faces—has the largest positive coefficient. A similar pattern of findings holds for the 20-item scale. The coefficients thus reflect how much each indicator contributes to the overall neighborhood disorder score. Graffiti is the only item in the 7-item scale that does not have an estimated coefficient significantly different from zero. This result is related to the fact that graffiti was observed in approximately half of all block faces, and hence this item is unlikely to be strongly related to whether a tract is high or low on the neighborhood disorder scale. A parallel set of findings emerge for the 20-item scale. The large observed variation in the estimates of item severity is an indication that the scale is well-behaved (Raudenbush and Sampson, 1999).

The situational variables are shown next in Table 5. The estimated parameters are interpreted as systematic effects—across items, block faces, tracts, and interviewers—on the likelihood of observing disorder. These variables are jointly significant in both the 7- item and 20-item models. With only one exception—the effects of time of day in the 20-item model—all of the variables are statistically significant, based on a set of joint tests for the multiple categorical variables used to characterize each discrete variable (results not shown). The magnitudes of the estimated parameter effects are, however, relatively modest, with the largest covariate effects found for season. Nevertheless, the results suggest that the likelihood of observing disorder decreased with the duration of time spent observing; was highest if the observation was conducted mid-week (Wednesday) or on the weekend (either Saturday or Sunday); was higher if the observation was conducted around midday (only for the 7-item scale); was higher in the summer and winter; and was higher if the interviewer knew the neighborhood through the L.A.FANS study.

At the bottom of Table 5, we show the variance of the block-face and tract random effects. Both random effects are statistically significant, with the variance of tract random effect about three times larger than the variance of the block-face random effect for both the 7-item and the 20-item models. In Table 6, we present the corresponding ICCs for block face and tract. Each of the ICCs is larger for the 7-item scale. The tract ICC is 0.36 for the 7-item scale and 0.30 for the 20-item scale. The tract ICCs are about three-times larger than the block-face ICCs, which are 0.14 for the 7-item scale and 0.10 for the 20-item scale. The ICCs are similar to those for each of the individual items (shown in Model 5 in Table 4).

Table 6
Summary Measures for Tract-Level Disorder Scales Based on L.A.FANS Neighborhood Observations

In the bottom panel of Table 6 we present summary statistics for the tract physical disorder scales that were obtained from the models presented in Table 5 as predicted values of the tract-specific random effects. The 7-item scale has a mean of zero, a standard deviation of 1.54 and a range of −2.71 to 2.35. The scale values have the same metric as the estimated coefficients for the individual observation items. Thus, as Raudenbush and Sampson (1999) point out, differences between tracts in their scores on the physical disorder scale can be interpreted as expected differences in the log-odds of finding disorder across the items in the scale. The interpretability of the scale and its well-behaved distributional properties mean that we can use it to characterize neighborhood physical disorder and to analyze the causes and consequences of disorder.

3.3 Neighborhood Social Characteristics and Physical Disorder

Next, we examine what types of tracts in Los Angeles County have the highest levels of physical disorder. Previous research suggests that high poverty, ethnically diverse, residentially unstable, and high immigrant neighborhoods are more likely to have significant physical disorder. We test these hypotheses using tract-level multivariate models in which the 7-item scale and the 20-item scale are regressed on variables (discussed above) that describe tract levels of concentrated disadvantage, concentrated affluence, immigrant concentration, residential stability, and ethnic diversity.7

The results are shown in Table 7. The results for the 7-item scale and the 20-item scale are similar and the two models fit the data equally well, based on the model F-statistic. Physical disorder is significantly higher in disadvantaged neighborhoods and lower in affluent ones. Although concentrated disadvantage and concentrated affluence are highly correlated with each other (r = −0.88), initial analysis (not shown) revealed that together they perform better in predicting neighborhood disorder than either does on its own, suggesting that they have separate contributions to the presence of physical disorder. These two variables also have a large combined effect: together they explain about 84% of the variation in disorder with no other variables included in the model (results not shown). The strong predictive value of these two variables suggests that they would be effective proxies for physical disorder, at least in Los Angeles County in 2000–2001. This result contrasts to the findings of Raudenbush and Sampson (1999) which show that the correlation between concentrated poverty and physical disorder in Chicago neighborhoods was about 0.64—substantially lower than what we find for Los Angeles neighborhoods.

Table 7
Linear Regression Models of L.A.FANS Tract Disorder Score on Neighborhood Structural Characteristics

Immigrant concentration is not significantly associated with physical disorder. Ethnic diversity is associated with a statistically significant decrease in disorder (marginally significant in the model based on the 7-item scale), contradicting the hypothesis that residents in diverse neighborhoods find it more difficult to exercise social control over physical disorder. This finding may reflect race/ethnic segregation patterns in Los Angeles because ethnically homogeneous tracts are predominately poor and Latino. Thus, in this setting, diversity may serve as another indicator of more advantaged neighborhood status. Greater residential stability is associated with lower levels of physical disorder, supporting the hypothesis that higher neighborhood turnover rates make it more difficult for residents to exert control over their neighborhood environment.

Finally, we perform a preliminary assessment of whether disorder is associated with selected individual-level outcomes. We add disorder as an independent variable in models of child math and reading achievement scores, child behavioral problems, and adult depression, as well as a large set of control variables (see Table 8 notes). Results indicate that disorder at the tract level is significantly associated with poorer reading achievement, internalizing behavior problems, and externalizing behavior problems for children, but is not associated with child math achievement or adult depression outcomes (Table 8, Models 1-a, 2-a, 3-a, 4-a, and 5-a).

Table 8
Models of Selected Outcomes using L.A.FANS Tract Disorder as a Covariate

In a second set of models (Table 8, Models 1-b, 2-b, 3-b, 4-b, and 5-b), we include other neighborhood-level factors to determine if the explanatory power of disorder persists when other measures of socioeconomic disadvantage at the tract level are included in the model. Once tract-level concentrated advantage and other measures of neighborhood disadvantage are added, disorder is no longer significantly associated with children’s reading scores. However, it remains a significant predictor of children’s internalizing and externalizing behavior problems, suggesting that disorder contributes to child behavior problems beyond the contributions of other measures of neighborhood disadvantage.

4. Conclusions

Concern about the potentially pernicious effects of physical and social disorder on residents of poor urban neighborhoods has pervaded policy and academic discussion in many fields. However, the development of reliable measures of disorder has lagged until recently. In this paper, we evaluated a method of assessing physical disorder in which multiple trained independent observers performed an observational survey on foot in a stratified probability sample of neighborhoods in Los Angeles County.

Unlike many previous studies, the L.A.FANS data allow us to investigate the reliability of measures of specific aspects of physical disorder and the effects of situational variables. Our results show that inter-rater agreement levels are generally high for multiple observers of the same block face and that these levels vary considerably by the item observed. More subjective and transitory aspects of disorder – e.g., garbage, strong odors, drug paraphernalia, cigarette butts, and beer or liquor containers – have lower levels of agreement than more enduring and objective aspects – e.g., vacant lots and abandoned buildings. This is an important finding because observational measures used in studies often include or are limited to the more subjective and transitory items to measure physical disorder. These results suggest that less ephemeral indicators of disorder such as vacant lots and abandoned buildings may provide a more reliable measure of neighborhood conditions – although these types of disorder may be less within residents’ power to control.

Levels of disorder observed were modestly affected by the length of time the observation took, day of the week, time of day, season, and the observers’ previous experience in the neighborhood. These results suggest that fieldwork designed to assess physical disorder should seek to minimize variation in scheduling neighborhood observations across days of the week, time of day, and season or, more realistically, should control for these variables in models based on neighborhood observations. Whenever possible, studies should also employ multiple trained observers to code each block face to assess inter-observer agreement and the effects of observers’ characteristics on the level of disorder observed. Multiple observations of each location also allow studies to improve the quality of observations by creating variables which remove the effects of interviewer characteristics as we have done in this paper. Increasing the number of independent observations also improves the reliability and the precision of the estimated neighborhood physical disorder scales.

In Los Angeles County, concentrated disadvantage and affluence are strong predictors of physical disorder. Residential stability is significantly associated with lower physical disorder. Contrary to our expectation, higher levels of ethnic diversity are weakly associated with less, rather than more, physical disorder. The reason may be that neighborhoods with low ethnic diversity in Los Angeles are predominantly Latino and are more likely to be disadvantaged in other ways. In contrast to Sampson and Raudenbush’s (1999) findings that Chicago neighborhoods with high levels of immigrant concentration had significantly more physical disorder, in our study, the coefficients on immigrant concentration were not significant – suggesting a very different effect of immigrant characteristics and settlement in the two cities.

Neighborhood observation by trained observers is an important means of measuring physical disorder in large social surveys. Our results indicate the importance of high quality training of observers, consideration of which aspects of physical disorder are more reliably observed, and, when possible, the use of multiple independent observers to allow researchers to examine the reliability of observations and improve the quality of the derived neighborhood scales.

Finally, our results indicate that neighborhood disorder may be associated with certain child outcomes. Although the effects of disorder on individual outcomes are partially captured by other measures of neighborhood disadvantage, physical disorder is an independent predictor for some outcomes, particularly child behavior problems. This result suggests a need for further research into the effects of neighborhood disorder on children.


The authors gratefully acknowledge support from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (Grants R01 HD35944 and R01 HD41486) and from the Russell Sage Foundation’s Social Inequality Project.

Abbreviations used in this paper

L.A.FANS-1Los Angeles Family and Neighborhood Survey, Wave 1
PHDCNProject on Human Development in Chicago Neighborhoods
ICCsIntra-class correlation coefficients
BPI-IBehavior Problems Index—Internalizing behaviors
BPI-EBehavior Problems Index—Externalizing behaviors
CIDIComposite International Diagnostic Interview—Short Form questionnaire


Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

1The “broken windows” theory of the effects of physical and social disorder on crime rates has also led, more famously, to changes in urban policing strategies focused on reducing petty crime (Bratton and Kelling, 2006). Although policing strategies are an important policy issue, they are outside the scope of this paper.

2One limitation of independent observations is that they do not capture residents’ perceptions or feelings about specific aspects of physical disorder (Raudenbush and Sampson, 1999). But this information can be collected in other ways—e.g., through interviews with residents themselves.

3Under optimal conditions both the observers and the time of visits would be randomly assigned across block faces. However, field conditions and budget prevent this randomization in L.A.FANS.

4The observation form and manual are available at: www.rand.org/pubs/drafts/2005/DRU2400.6-1.pdf

5This analysis is done at the tract level because census data on social characteristics are not available at the block level. They are available at the block group level, but since L.A.FANS observations were conducted only in sampled blocks, for many block groups, not all blocks in the group were observed.

6The L.A.FANS item on the presence of beer bottles was also broader than in the PHDCN: referring to both “beer containers and liquor bottles.”

7Initial analysis (not shown) compared models estimating the effects of concentrated affluence and disadvantage variables on the 20-item scale with models including other specifications of neighborhood disadvantage (i.e., concentrated disadvantage alone, concentrated affluence alone, the percent of the population in poverty, median family income, percent of families earning more than $75,000 per year, percent receiving public assistant, and percent female-headed households). The combination of concentrated disadvantage and affluence indices accounted for the greatest variance (adjusted R2) and had the lowest Bayesian Information Coefficient (BIC) across models.


  • Alwitt LF, Donley TD. Retail Stores in Poor Urban Neighborhoods. Journal of Consumer Affairs. 1997;31:139–164.
  • Barnard L. Graffiti Abatement and Management. Law and Order. 2006;50:115–119.
  • Cohen DA, Farley TA, Mason K. Why is poverty unhealthy? - Social and physical mediators. Social Science & Medicine. 2003;57:1631–1641. [PubMed]
  • Elo IT, Mykyta L, Margolis R, Culhane JF. Perceptions of Neighborhood Disorder: The Role of Individual and Neighborhood Characteristics. Soc Sci Q. 2009;90:1298. [PMC free article] [PubMed]
  • Franzini L, Elliott MN, Cuccaro P, et al. Influences of physical and social neighborhood environments on children’s physical activity and obesity. Am J Public Health. 2009;99:271–278. [PMC free article] [PubMed]
  • Grafova IB. Overweight children: assessing the contribution of the built environment. Prev Med. 2008;47:304–308. [PubMed]
  • Harcourt BE. Illusion of Order: The False Promise of Broken Windows Policing. Boston: Harvard University Press; 2001.
  • Hill TD, Ross CE, Angel RJ. Neighborhood disorder, psychophysiological distress, and health. J Health Soc Behav. 2005;46:170–186. [PubMed]
  • Jacobs J. The Death and Life of Great American Cities Vintage Books.
  • Kelling GL, Coles CM. Fixing Broken Windows: Restoring Order And Reducing Crime In Our Communities Free Press. 1998.
  • Kelly CM, Schootman M, Baker EA, et al. The association of sidewalk walkability and physical disorder with area-level race and poverty. J Epidemiol Community Health. 2007;61:978–983. [PMC free article] [PubMed]
  • Kessler RC, Andrews G, Mroczek D, et al. The World Health Organization Composite International Diagnostic Interview Short Form (CIDI-SF) International Journal of Methods in Psychiatric Research. 1998;7:171–185.
  • Kling JR, Liebman JB, Katz LF. Experimental analysis of neighborhood effects. Econometrica. 2007;75:83–119.
  • Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159–174. [PubMed]
  • Malone N, Baluja KF, Costanzo JM, Davis CJ. The Foreign-Born Population: 2000. Census 2000 Brief C2KBR-34. 2003
  • McEwen BS. In: Stress, adaptation, and disease - Allostasis and allostatic load. McCann SM, et al., editors. 1998. pp. 33–44. [PubMed]
  • Miles R. Neighborhood disorder and smoking: findings of a European urban survey. Soc Sci Med. 2006;63:2464–2475. [PubMed]
  • Molnar BE, Gortmaker SL, Bull FC, Buka SL. Unsafe to play? Neighborhood disorder and lack of safety predict reduced physical activity among urban children and adolescents. Am J Health Promot. 2004;18:378–386. [PubMed]
  • Ortiz V, Telles EE. Generations of Exclusion: Mexican Americans, Assimilation, and Race Russel Sage Foundation. New York: 2008.
  • Perkins DD, Taylor RB. Ecological assessments of community disorder: Their relationship to fear of crime and theoretical implications. American Journal of Community Psychology. 1996;24:63–107. [PubMed]
  • Perkins DD, Meeks JW, Taylor RB. The Physical-Environment of Street Blocks and Resident Perceptions of Crime and Disorder - Implications for Theory and Measurement. Journal of Environmental Psychology. 1992;12:21–34.
  • Peterson JL, Zill N. Marital disruption, parent-child relationships, and behavioral problems in children. Journal of Marriage and the Family. 1986;48:296–308.
  • Raudenbush SW, Sampson RJ. Ecometrics: Toward a science of assessing ecological settings, with application to the systematic social observation of neighborhoods. Sociological Methodology 1999. 1999;Vol 29:1–41.
  • Ross CE, Jang SJ. Neighborhood Disorder, Fear, and Mistrust: The Buffering Role of Social Ties with Neighbors. American Journal of Community Psychology. 2000;28:401–420. [PubMed]
  • Ross CE, Mirowsky J. Neighborhood disadvantage, disorder, and health. J Health Soc Behav. 2001;42:258–276. [PubMed]
  • Sampson RJ, Raudenbush S. Systematic Social Observation of Public Spaces: A new look at disorder in urban neighborhoods. American Journal of Sociology. 1999;105:603–651.
  • Sampson RJ, Raudenbush SW, Earls F. Neighborhoods and Violent Crime: A Multilevel Study of Collective Efficacy. Science. 1997;277:918–924. [PubMed]
  • Sastry N, Pebley AR. The Los Angeles Family and Neighborhood Survey: Neighborhood Observation Forms and Interviewer Manual, RAND Unrestricted Manuscript Drafts, Santa Monica.
  • Sastry N, Ghosh-Dastidar B, Adams J, Pebley AR. The design of a multilevel survey of children, families, and communities: The Los Angeles Family and Neighborhood Survey. Social Science Research. 2006;35:1000–1024.
  • Schaefer-McDaniel N, O'Brien CAughy M, O'Campo P, Gearey W. Examining methodological details of neighbourhood observations and the relationship to health: a literature review. Social Science & Medicine. 2010;70:277–292. [PubMed]
  • Skogan WG. Disorder and Decline. Berkley and Los Angeles, CA: University of California Press; 1990.
  • Taylor RB, Shumaker SA, Gottfredson SD. Neighborhood-Level Links between Physical Features and Local Sentiments - Deterioration, Fear of Crime, and Confidence. Journal of Architectural and Planning Research. 1985;2:261–275.
  • Wachter SM, Gillen KC. Public Investment Strategies: How They Matter for Neighborhoods in Philadelphia. 2006 [Accessed on October 23, 2009]; At: http://www.upenn.edu/penniur/pdf/Public%20Investment%20Strategies.pdf.
  • Wei E, Hipwell A, Pardini D, et al. Block observations of neighbourhood physical disorder are associated with neighbourhood crime, firearm injuries and deaths, and teen births. J Epidemiol Community Health. 2005;59:904–908. [PMC free article] [PubMed]
  • Wilson JQ, Kelling GL. The Police and Neighborhood Safety: Broken Windows. Atlantic: 1982. Monthly.
  • Wilson WJ. The Truly Disadvantaged: the inner city, the underclass, and public policy. Chicago: University of Chicago Press; 1987.
PubReader format: click here to try


Save items

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...