Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Soc Sci Med. Author manuscript; available in PMC Oct 11, 2012.
Published in final edited form as:
PMCID: PMC3469580

What is a cohort effect? Comparison of three statistical methods for modeling cohort effects in obesity prevalence in the United States, 1971–2006


Analysts often use different conceptual definitions of a cohort effect, and therefore different statistical methods, which lead to differing empirical results. A definition often used in sociology assumes that cohorts have unique characteristics confounded by age and period effects, whereas epidemiologists often conceive that period and age effects interact to produce cohort effects. The present study aims to illustrate these differences by estimating age, period, and cohort (APC) effects on obesity prevalence in the U.S. from 1971–2006 using both conceptual approaches. Data were drawn from seven cross-sectional waves of the National Health and Nutrition Examination Survey. Obesity was defined as BMI≥30 for adults and ≥95th percentile for children under the age of 20. APC effects were estimated using the classic constraint-based method (first-order effects estimated and interpreted), the Holford method (first-order effects estimated but second-order effects interpreted), and median polish method (second-order effects are estimated and interpreted). Results indicated that all methods report significant age and period effects, with lower obesity prevalence in early life as well as increasing prevalence in successive surveys. Positive cohort effects for more recently born cohorts emerged based on the constraint-based model; when cohort effects were considered second-order estimates, no significant effects emerged. First-order estimates of age-period-cohort effects are often criticized because of their reliance on arbitrary constraints, but may be conceptually meaningful for sociological research questions. Second-order estimates are statistically estimable and produce conceptually meaningful results for epidemiological research questions. Age-period-cohort analysts should explicitly state the definition of a cohort effect under consideration. Our analyses suggest that the prevalence of obesity in the U.S. in the latter part of the 20th century rose across all birth cohorts, in the manner expected based on estimated age and period effects. As such, the absence or presence of cohort effects depends on the conceptual definition and therefore statistical method used.

Keywords: USA, age-period-cohort, cohort effect, Holford method, median polish, obesity


Both medical sociology and epidemiology seek to understand the distribution and etiology of health outcomes. However the two disciplines often approach health problems from different theoretical starting points. For example, epidemiologists typically use information on health distributions to identify causes of disease, whereas medical sociologists generally seek to understand the impact of the social environment on populations regardless of the particular disease that arises from adverse social conditions (Aneshensel, Rutter, & Lachenbruch, 1991; Syme & Yen, 2000). Thus, epidemiologists often begin with a specific outcome and seek to identify salient exposures; sociologists often begin with an exposure and seek to identify salient health outcomes. These different conceptualizations inform the way in which research questions are asked and the analyses used to answer the questions. The study of cohort effects is one such area in which rich research traditions in both epidemiology and medical sociology emerge. Cohort effects (sometimes referred to as “generation effects” [Last, 2001]) are generally conceptualized as variation in the risk of a health outcome according to the year of birth, often coinciding with shifts in the population exposure to risk factors over time. Cohort analysis is used to identify particularly at-risk birth cohorts, providing vital information for both public health surveillance and for the identification of etiologic factors. This paper will illustrate conceptual differences in defining the cohort effect, the way in which these differences translate into separate statistical modeling strategies, and the alternative conclusions that can arise based on the differences.

We highlight these differences by estimating cohort effects in obesity prevalence in the United States during the last 40 years. Obesity prevalence has risen dramatically (Flegal, Carroll, Ogden, & Johnson, 2002; Ogden, Carroll, Curtin, McDowell, Tabak, & Flegal, 2006), and the factors most important in this increase remain unclear (Drewnowski, 2007; James, 2008; Prentice & Jebb, 2003). While most explanations focus on individual eating behavior and physical activity, other hypotheses suggest that rising obesity may be attributed to lack of sleep, decreasing smoking prevalence, or, provocatively, even one’s in utero environment. Specifically, the fetal over-nutrition hypothesis posits that increasing in utero exposure to maternal obesity may lead to inter-generational increases in offspring obesity (Cole, Power, & Moore, 2008; Gillman, 2004; Lawlor, Timpson, Harbord, Leary, Ness, McCarthy et al., 2008; Keith, Redden, Katzmarzyk, Boggiano, Hanlon et al., 2006). The contribution of this phenomenon may manifest as cohort effects in obesity prevalence, as each successively younger cohort is at higher risk for obesity. Evaluation of cohort effects in obesity prevalence can shed light on the plausibility of the over-nutrition hypothesis as well as other novel hypotheses that attempt to explain secular increases in obesity in the U.S.

Conceptual differences in the definition of a cohort effect

Cohort analysis began in the early 20th century as a descriptive tool to better understand mortality (Kuh & Davey Smith, 1993), mostly for the purpose of forecasting and calculating life expectancy (Tutt, 1953). Since these earliest studies, the definition, identification, and interpretation of cohort effects have been a subject of controversy (Derrick, 1928; Kermack, McKendrick, & McKinlay, 1934).. To define a cohort effect, it is necessary to first define the related effects associated with “age” and “period.” Age effects describe the common developmental processes that are associated with particular ages or stages in the life course. In other words, age effects represent accumulated exposure and/or the physiological changes associated with the process of aging. Period effects are the result of widespread environmental changes, the ubiquitous, population-wide exposures that occur at a circumscribed point in time. Two alternative accounts of the cohort effect exist, with one definition being relatively more common in medical sociology and other relatively more common in epidemiology. Of course, the fields of medical sociology and epidemiology are not mutually exclusive; neither are the two conceptualizations of a cohort effect exclusive to any field. Further, researchers from different fields may utilize similar constructs yet still conceptualize different research questions. However, for simplicity, we will refer to the two cohort conceptualizations as the “epidemiologic definition” and the “sociologic definition,” and discuss how these definitional differences give rise to alternative research questions, analyses, and interpretations.

The epidemiologic definition of a cohort effect suggests that a cohort effect occurs when different distributions of disease arise from a changing or new environmental cause affecting age groups differently. A cohort effect, therefore, is conceptualized as a period effect that is differentially experienced through age-specific exposure or susceptibility to that event or cause (i.e., interaction or effect modification). These effects can be short-lived or have long-term consequences on the health outcomes of the individuals within the affected cohort. In both public health and social science, we are often most interested in the identification of cohort effects that result in long-term health risks, but short-term fluctuations in health that result from age by period interactions are also important to document. As an obesity-related example, we might imagine that the availability and accessibility of sugar-laden soft drinks increases in the population (period effect), but the effect of that increase was more pronounced among the youngest cohorts because of higher consumption among children relative to adults (period by age interaction). In other words, a cohort effect could arise when a population-level environmental cause is unequally distributed in the population. Alternatively, a cohort effect could arise because a population-level exposure differentially affects age groups who are in the midst of a critical developmental period, during which exposure has long-lasting effects on lifetime disease risk (Gluckman & Hanson, 2004; Lawlor, Timpson, Harbord, Leary, Ness, McCarthy et al., 2008). This definition of the cohort effect applies more often to epidemiologic research questions, in which the primary objective is to better explain a particular pattern or emergence of a population health outcome in the most parsimonious yet comprehensive manner possible.

An alternative definition of a cohort effect has arisen, primarily although not exclusively out of sociological theory. This more sociologically-oriented view grows out of the conceptual starting place that the cohort itself represents an exposure that is rich with explanatory power. Thus, the conceptual orientation is on cohorts, and on determining the ways in which cohort membership affects the lives of persons across the life course. This sociological view of cohort was popularized by demographer Norman Ryder in his seminal 1965 publication ‘The Cohort as a Concept in the Study of Social Change.’ (Ryder, 1965) Ryder posited that a cohort can be conceived as a structural category, whereby the unique circumstances and conditions through which cohorts emerge, come of age, and die provide a record of social and structural change. As a result, the conditions, barriers, and resources that each cohort is born into and in which they live their collective lives may uniquely shape the patterns and experiences of health and mortality for that cohort. The focus of investigations adopting this conceptualization of the cohort effect seek to quantify the unique risks that are associated with cohort membership, defined broadly and inclusively with all exogenous factors that may impact the health of each cohort. Under the sociological definition, the long-term health risks of being born in a certain cohort are of primary interest, whereas short-term fluctuations in health among members of certain birth cohorts do not reflect the broad structural forces that shape health across the life course. In the obesity example, we might posit that the obesity epidemic is shaped by coming of age in a media-saturated environment where sedentary lifestyles are socially acceptable and where many families are priced out of healthy, nourishing food. The variation of a specific environmental cause across age is not of primary interest in this particular example; instead, the totalities of the societal structures that create reservoirs of risk and resilience across different cohorts become the exposures of interest for the sociological inquiry.

In contrast to the epidemiological definition, which defined a cohort effect as the interaction of period and age effects, the sociological definition conceives of age and period as confounders of the cohort effect (K. O. Mason, Mason, Winsborough, & Poole, 1973). As described, sociologists often conceptualize cohort effects as representing the totality of environmental influences for a particular birth group that are unique to the cohort itself. The effects of period and age obscure the ability to quantify a cohort effect because all three variables are linked with time. Thus, when examining population health outcomes, we do not know whether the prevalence is changing because of the experience of cohorts; changes in the age structure of the population; or the introduction or removal of wide-spread environmental influences. Teasing apart the independent effects of historical influences (cohort effects), contemporaneous influences (period effects), and exposure accumulation (age effects) becomes necessary to obtain a unique estimate of the cohort effect under the assumptions of the sociological definition.

The translation of conceptual into statistical

Age-period-cohort modeling strategies can be defined as statistical attempts to partition variance into the unique components attributable to age, period, and cohort effects. Regardless of conceptual definition, the majority of APC modeling strategies developed over the past thirty years assume that cohort effects can exist independently of age and period effects (the sociological definition). Therefore, age, period, and cohort are often modeled as having a linear relationship with the outcome of interest, and each linear slope is estimated controlling for the additive effect of the other two. These linear relationships are termed “first-order effects.” However, no statistical model can simultaneously estimate age, period, and cohort effects because of the collinearity among the three variables (Cohort = Period − Age). This collinearity results in a statistically non-identifiable design matrix, making simultaneous mathematical modeling of the linear functions of three effects impossible without additional restrictions in the model.

Research aimed at solving or mitigating this identifiability problem has generated a considerable body of literature and fostered the development of a variety of methodological approaches (e.g., Clayton & Schifflers, 1987; Glenn, 2005; K. O. Mason et al., 1973; O’Brien, 2000; Robertson & Boyle, 1986; Rodgers, 1982; Yang, Schulhofer-Wohl, Fu, & Land, 2008). The first and most common approach to mitigating the identification problem is the constraint-based regression (W. M. Mason & Fienberg, 1985), in which at least one category of age, period, and cohort is constrained in some manner. While this type of modeling strategy produces simultaneous estimates of age, period, and cohort effects, it has been criticized in the statistical literature because the results are sensitive to the constraint chosen and there is no empirical way to confirm the validity of the chosen constraints (Glenn, 2005; Holford, 1991; Kupper et al., 1985).

An alternative approach was developed by Theodore Holford (Holford, 1983, 1991, 1992). Acknowledging that constraint-based approaches were limited, Holford (and others [Clayton & Schifflers, 1987]) advocated for a focus on those aspects of the APC model that are immune to the constraints chosen for model identification: second-order effects. Second-order effects are those which have a nonlinear relationship with the outcome of interest. While there are many types of second-order effects that can be estimated in a model, the Holford approach focuses on linear contrasts, a measure which can be interpreted as reflecting a change in the direction or steepness of an underlying linear slope. Linear contrasts are calculated using first-order estimates derived from the constraint-based regression model. However, the magnitude of the underlying slope – the first-order estimate – remains uninterpreted. Thus, a perfectly linear slope as measured by a first-order estimate would evidence no significant linear contrast (second-order effect). Using the obesity example, suppose that the underlying unobservable truth is that the obesity rate is increasing linearly across birth cohorts, but the speed of this increase begins decelerating in a certain birth cohort. The deceleration would be detected in the estimates derived from the Holford approach, but not the underlying magnitude of the linear slope. The Holford method is commonly used in cancer epidemiology as a way to estimate cohort effects (e.g., Zheng, Holford, Chen, Ma, Flannery, Liu et al., 1996). The Holford approach can be conceptualized as a hybrid of the sociological and epidemiologic definition; while conceptually the Holford approach acknowledges the interpretive utility of linear effects for age, period, and cohort (i.e., the sociologically-oriented approach), it accepts the reality that these linear effects are not validly simultaneously estimable and thus focuses on the estimation and interpretation of the non-linear effects (i.e., the epidemiologically-oriented approach).

A third approach to age-period-cohort analysis is to reject first-order effects entirely and focus only on the second-order effects produced by the interaction of age and period effects (Greenberg et al., 1950; Keyes & Li, 2008; Selvin, 1996). The median polish technique (Keyes & Li, 2008; Selvin, 1996; Shahpar & Li, 1999; Tukey, 1977) is an example of an age-period-cohort method that explicitly defines cohort effects as age by period interactions and does not depend on the estimation of first-order effects. This method unambiguously applies the conceptual definition of cohort effects that is common in epidemiology. It captures non-linearities in the age and period effects and partitions this non-linear variance into a systematic component (cohort effect) and an unsystematic component (random error). In statistical models, interaction effects are, by definition, second-order effects because they represent deviations from linearity. Like the Holford method, the second-order effects produced by the median polish method model non-linearities; the difference between these two methods is in how the second-order effects are calculated. In Holford-based models, the second-order effects represent changes in slope, which are derived from the first-order linear slopes of fitted age, period, and cohort effects. The median polish does not estimate nor recognize validity in first-order effects at all. First-order cohort effects that control for the simultaneous linear effects of age and period effects are not of interest; instead, only the second-order joint effect of age and period is estimated and interpreted in the median polish approach.

The present analysis will highlight the implications of different conceptual definitions of a cohort effect by comparing three statistical methods to identify cohort effects on the prevalence of obesity in the United States from 1975–2006. The first method is the traditional constraint-based regression technique (K. O. Mason et al., 1973), which attempts to quantify cohort effects in an additive model with age and period effects as confounders. The second is the Holford model (Holford, 1983, 1992), which estimates the cohort effect as a second-order function in a model in which first- and second-order age and period effects are considered confounders of the first- and second-order cohort effects. The third is the median polish technique (Keyes & Li, 2008), which estimates the cohort effect as a partial interaction (second-order effect) of age and period effects.

These three methods were chosen to highlight an evolution of statistical methods: the constraint-based approach explicitly focuses on first-order effects but is limited by the identification problem; the Holford method is built on first-order effects but presents results of constraint-invariant second-order effects; and the median polish does not model first-order effects and interprets only second-order effects. The purpose of these comparisons is to explicitly describe the way in which the models make different assumptions about cohort effects and how these assumptions translate into results with varying interpretations and public health implications. Using obesity prevalence data for the past 40 years to compare the assumptions and interpretations of three APC modeling techniques provides substantively rich and insightful results regarding the role of age, period, and cohort effects in the U.S. obesity epidemic.


Sources of data

Data were drawn from seven cross-sectional waves of the National Health and Nutrition Examination Survey NHANES. The first wave was conducted in 1971 to 1975, and the most recent in 2005–2006. Each wave provides nationally representative data for the US civilian non-institutionalized population. NHANES utilized a complex, stratified, multi-stage probability cluster sampling design (National Center for Health Statistics, 1978, 1994, 2005); thus, all analyses are weighted to adjust for oversampling. The sample included persons age 2 to 74, with sample sizes ranging from 9,282 in the 1999–2000 wave to 23,808 in NHANES I wave (1971–1975). Individuals were excluded from the analytic sample under three conditions:

  1. Pregnancy (N=1,518), as pregnancy-related weight gain does not represent usual weight status.
  2. Non-U.S. born (N=10,575), to control for confounding of cohort composition due to in-migration.
  3. Missing both measured and self-reported information on height and weight (N=1187).

The final combined sample resulted in an analytic sample of 91,755.


Body Mass Index (BMI) was calculated from height and weight data, as measured by trained clinical staff during a medical examination. For those with missing examination data, self-reported height and weight were used when available (N=1,654). For adults (≥20 years), the standard CDC criterion of BMI≥30 kg/m2 was used to categorize respondents as obese. For children and adolescents (≥2 years & <20), obesity status was defined as BMI at or above the 95th percentile for sex and age, according to the 2000 CDC growth charts for the United States (Kuczmarski, Ogden, Grummer-Strawn, Flegal, Guo, Wei et al., 2000).

Age (in years) was self-reported by the respondent at the time of the survey (range 2–74). Period was recorded as the year in which the survey was completed (range 1971–2006). Cohort was defined as the year in which someone was born, calculated as survey year minus age (range 1901–2005).

Statistical analysis

Method 1: Constraint-based approach

A general three-factor regression model (the three factors are age, period, and cohort) for the risk of the dichotomous outcome (Yijk) is estimated as a function of the scalar αi, the ith of m-1 age effects; the scalar βj, the jth of n-1 period effects; and the scalar γk, the kth of m+n-2 cohort effects. The natural log of Yijk is proportional to a constant term (μ) plus αi, βj, γk, and an error term (εij):


The above model is not identifiable because of the collinearity among age, period, and cohort (Cohort = Period − Age). Identification is only possible if at least two parameters are constrained (e.g., constraining the effects for the youngest age group and the second-youngest age group to be zero: α 1 = 0 and α 2 = 0).

As noted earlier, model results are sensitive to the chosen constraint, and the validity of a given constraint cannot be empirically tested (Glenn, 2005). Thus, researchers often use graphical approaches, external information, and theory to select constraints for the model. We used graphical data describing the obesity trends by period, age, and cohort to establish the choice of constraints for this model (Utz, 2005). First, based on examination of age trends plotted by period of observation (Figure 1), the age effect was presumed to follow a quadratic function, where the slope of obesity prevalence increased the fastest at earlier ages and then decelerated in mid-adulthood. Second, based on examination of period trends stratified by age (not shown), period was presumed to follow the observed pattern in which obesity prevalence remained relatively unchanged during the 1970s but increased dramatically during the 1980s and 1990s. Finally, later-born cohorts were presumed to have successively higher body weights than earlier-born cohorts, based on the trends observed in cohort-stratified age trends of obesity rates (Figure 2). While other constraints could have been chosen based on alternative readings of the same data, the constraints chosen were motivated by a careful reading of the available evidence. Alternate constraints were tested to establish the stability of the effects associated with the chosen constraints.

Figure 1
Prevalence of Obesity in the United States (1971–2006), by Age & by Period* (N=91,755)
Figure 2
Prevalence of Obesity in the United States (1971–2006), by Age & Birth Cohort (N=91,755)

Method 2: Holford approach

The Holford approach focuses on second-order effects known as ‘curvatures’ (Holford, 1991). Curvatures are the linear contrasts derived from the parameter estimates from a constraint-based three-factor model.. Curvatures evaluate changes in the direction of the slope of the underlying age, period, and cohort effects without estimating the magnitude of the actual slope itself. The curvature can be interpreted as summarizing the overall direction of the non-linear trends over time (Holford, 1991). Curvatures are specific to each factor in the three-factor APC models. Curvatures follow the form:


where πh is the h-th age, period or cohort parameter estimate as derived from a constraint-based APC model such as that described in Equation 1. Standard errors were estimated using a variance formula that accounts for auto-correlation, as the estimates for changes in the slope of two adjacent years would be expected to be correlated (Shahpar & Li, 1999). Statistical significance (i.e., the increase or decrease in slope was significantly different than zero) was calculated for all three effects using chi-square tests.

In the Holford approach, the first-order estimates used to derive the second-order functions are not interpreted. The development of the Holford approach was sparked by the recognition that the same curvature estimates will emerge regardless of the particular constraint chosen for model identification; thus, the results are not reliant on any constraint (Holford, 1991, 1992; McNally, Alexander, Staines, & Cartwright, 1997; Robertson, Gandini, & Boyle, 1999). We calculated curvature estimates for both a model with age entered as a dummy variable and, separately, with the squared-term for age (a second-order effect) as this was the model chosen to best represent the constraint-based approach. Resulting curvature estimates were not dependent on the model constraints chosen. We present linear contrasts for the three-factor model with dummy variables for all three effects here for interpretability.

Method 3: Median polish approach

The third method explored in this paper is the median polish approach. This approach explicitly tests whether the effect of age and period interact to produce an effect that is more than what would be expected given their additive influences. The median polish approach estimates a two-factor model (age and period); thus no constraints are necessary (as is the case with the three-factor model with collinear slopes for age, period, and cohort). The foundation of the median polish approach lies with Tukey (Tukey, 1977), and was first used for APC analysis by Selvin to graphically display cohort effects as partial interactions of age and period effects (Selvin, 1996). Readers are referred to these sources for a more complete description of the median polish method, including descriptions of alternative strategies such as mean polish.

Median polish analysis relies on the use of a contingency table with obesity prevalence stratified by m age groups in rows and n period in columns (e.g., Table 1). The median polish approach removes the additive effect of age (row) and period (column) by iteratively subtracting the median value of each row and column. After several iterations, the residual values stabilize (the median residual of each row or column approximates zero). These residuals are then regressed on indicator variables for cohort membership using standard linear regression; the extent to which the cohort variable predicts the residual is the cohort effect. The remaining residual unaccounted for by cohort is considered to be non-systematic random error.

Table 1
Age-Period contingency table for obesity prevalence by age (rows) and period (columns) in the United States, 1971–2006 (N=91,755)

A general heuristic formula for the approach follows the notations described above in equation 1, whereby the outcome (Yijk) is a function of the scalar αi, the ith of m-1 age effects, and the scalar βj, the jth of n-1 period effects. The natural log of Yijk is defined as proportional to a constant term (μ) plus α i, β j, and an error term (εij):


Note that the difference between equation 3 and equation 1 is that there is no term for cohort (γk) in equation 3. This is the fundamental conceptual difference between the two approaches: the median polish approach does not consider a cohort effect to be an additive effect, net of period and cohort effects. The error term in equation 3 represents variance unaccounted for by the additive effect of age and period. As previously stated, this variance is then partitioned into systematic and non-systematic components using simple linear regression, where er (the residuals from the median polish) is a function of the intercept μk; the scalar γk (the kth of m+n-2 cohort effects) and ek (the error term representing the random error unaccounted for by the cohort effect):


The median polish method tests whether these deviations from additive age and period influence follow a systematic pattern that can be predicted by birth cohort; if so, the deviations are attributed to cohort effects. Confidence intervals for the estimated cohort effects are derived using generalized linear regression models. Uncertainty in the prevalence estimates from the underlying contingency table are not incorporated in confidence interval calculation.

Graphical trend analyses and the contingency table used in the median polish analysis were done using basic spreadsheet software (Excel, Microsoft Office 2006). Parameter estimates for the constraint-based approach, the Holford approach, and the median polish analysis were done using STATA version 9.0 (Stata Corp LP, College Station, Tex).


Graphical analysis

Obesity prevalence from 1971 to 2006 was plotted in two different ways in Figure 1 and Figure 2. Figure 1 shows the prevalence of obesity stratified by age and period. The age distributions of obesity exhibited curvilinear shapes, with prevalence increasing throughout the life course until approximately age 30, when prevalence begins to stabilize or decrease. While the age-specific slope of obesity prevalence was constant across period, the absolute magnitude of obesity increased for each age group in each successive time period (suggestive of a period effect). Figure 2 shows obesity prevalence stratified by age and cohort. All but the earliest-born cohorts exhibited increased risk of obesity in more recent periods. The later-born cohorts had steep increases in obesity prevalence at young ages. For example, the prevalence of obesity among those ages 10–14 in 1991–1994 (i.e., the cohort born approximately 1981–1985) was 13.6%; earlier born cohorts did not exhibit prevalence of 13% until much later ages. These graphical trends formed the basis for the chosen model constraints in Method 1.

Method 1: Constraint-based approach

With constraints on age (entered into the model as two parameters: age and age-squared) and period (periods before 1980 were constrained to be equal), a model using the constraint-based approach was identifiable. Table 2 shows the results of the constrained log-linear model of age, period, and cohort on obesity prevalence. Results indicated that the curvilinear relationship specified for age is significantly associated with obesity prevalence, consistent with the graph presented in Figure 1. Significant period effects were also observed: compared to the pre-1980 period, obesity prevalence increased in the U.S. population consistently. The period in which individuals were at the highest risk of obesity appeared to be 1999–2000 (RR=2.33, 95% C.I. 1.91–2.85). Finally, significant cohort effects were observed. Those cohorts born prior to 1941 had decreased risk for obesity, while those born after 1945 had increased risk for obesity after controlling for age and period effects. The magnitude of the cohort-specific risk ratios generally increased for cohorts younger than the reference cohort, and decreased for cohorts older than the reference cohort.

Table 2
Constraint-based generalized log-linear approach: age-period-cohort effects on obesity prevalence in the United States, 1971–2006 (N=91,755)

Method 2: Holford approach

Curvatures for age, period, and cohort are presented in Figure 3. Results for age were consistent with the quadratic relationship shown in Figure 1 and replicated in Table 2; the slope of the age effect increases until approximately the late 20’s (age 5–9 to 10–14, χ2=30.3, p<0.01; age 10–14 to 15–19, χ2=10.4, p=0.01; age 15–19 to age 20–24, χ2=30.3, p<0.01; 3.69, p=0.05), at which time the increase in slope slows and then remains constant (slope changes are not statistically significantly different from zero). As well, statistically significant increases were observed for two consecutive periods (1991–1994 to 1999–2000 [χ2=5.05, p=0.02], and 1999–2000 to 2001–2006 [χ2=6.2, p=0.01]), indicating that the increasing slope for these periods was significantly different from zero. There were no significant changes in the slopes observed for cohort, indicating no evidence of a cohort effect.

Figure 3
Holford approach: estimated curvature+ of age-period-cohort effects on obesity prevalence in the United States, 1971–2006 (N=91,755)

Method 3: Median polish approach

Table 3 shows the results of the median polish approach. Like in the other approaches, significant age effects were observed: compared to the reference group of age 30–34, those at younger ages were significantly less likely to be obese. Also similar to the results of the constraint-based and Holford approach, there was also a significant period effect. Compared to the reference period of 1971–1975, there was a statistically significant increase in obesity prevalence from 1989 through 2006. Results indicated no evidence of a systematic non-additive age by period interaction effect. Thus, no cohort effect was detected. Median polish analyses using different cohorts as the reference category also did not find any significant differences in the effect of age or period among cohort groups.

Table 3
Median polish approach: age-period-cohort effects on obesity prevalence in the United States, 1971–2006 (N=91,755)


We explored three statistical methods to estimate age, period, and cohort effects on obesity prevalence in the United States from 1971–2006 using nationally representative data. All models agreed that there was an age effect, whereby risk increased in childhood and early adulthood then stabilized by mid-adulthood. All models also found a strong, positive period effect whereby obesity prevalence increased across all ages beginning in the 1980s. Models diverged, however, on the estimation of cohort effects. Under the constraint-based model, a statistically significant cohort effect emerged in which successively born cohorts had increased risk of obesity, even after including age and period as covariates. However, under the Holford and median polish approaches, no significant cohort effect was observed. The difference between the constraint-based model and the other two approaches is in the empirical operationalization of cohort effects relative to age and period effects. The constraint-based approach estimated a linear or first-order cohort effect of cohort while controlling for age and period effects, while the Holford and median polish approaches estimated a non-linear or second-order cohort effect representing the interaction between, or effect modification of, age and period effects.

Reconciling differences across statistical methods

The differences in the results from these models do not necessarily represent non-replication; rather, the models are applying and estimating different conceptual definitions of a cohort effect. The constraint-based approach defines cohort effects as historical influences unique to cohorts and independent of but confounded by the effects of the contemporaneous environment and the accumulation of exposure experience across age. The median polish approach defines the cohort effect as modification in the effects of the contemporaneous environment across age. The Holford approach applies a definition somewhere in the middle; while first-order effects are estimated and considered conceptually meaningful, only non-linear second order effects are interpreted (more consistent with the non-linear median polish model).

The interpretation of the constraint-based model does not necessarily conflict with the median polish or Holford results, but instead argues that birth cohorts index a risk for obesity that operates outside of the effects of contemporaneous environmental exposures (period effects) and the accumulation of exposure experience (age effects). Adopting the epidemiological definition of a cohort effect, our results suggest that the environmental causes of obesity have not varied across age-groups to cause the emergence of the obesity epidemic in America. Additionally, from the sociological definition of a cohort effect, we learned that there may be are structural factors unique to the experience of each cohort as they progress through the life course that would produce higher obesity rates independently of the concurrent environmental conditions that ubiquitously impact the population at large.

The question that emerges from this exercise is which statistical model is more conceptually relevant: the model that allows for an additive effect of cohort in the presence of age and period effects (constraint-based approach) or the models that do not (Holford and median polish approach)? Unfortunately, no statistical test can tell us, with empirical certainty, which definition of a cohort effect is relevant. The first-order constraint-based approaches have been criticized in the statistical literature because of the sensitivity of the model to the constraint used (Glenn, 1976, 2005; Kupper et al., 1985). Nevertheless, this approach remains the most popular to date, and new methods continue to be developed with varying methods to impose these constraints in order to interpret first-order effects of cohort (Yang, Schulhofer-Wohl, Fu, & Land, 2008). The sociologically-oriented conceptualization of a cohort effect inherently relies on such models because a cohort is conceived of as a meaningful category which indexes barriers and resources that exist independently of the ubiquitous environmental conditions coinciding with the cohort’s collective experience through the life course. In contrast, the epidemiologically-oriented definition of a cohort effect as an interaction between age and period relies on statistical models that do not suffer from identification problems. These models are most appropriate when the goal of the analysis is to understand how environmental exposures impact the health of population differentially across age; the results of which can be used to inform public health programs for prevention and intervention. However, the epidemiologically-oriented models are not appropriate when linear cohort effects are hypothesized to be operative. Hypothesizing about the linearity or non-linearity of cohort effects, however, is a theoretical rather than statistical exercise, underscoring the importance of precise definition and theory in model selection decisions.

The need for construct specification and explicit theory in age-period-cohort research

The fundamental difference between the two definitions of a cohort effect derives from the fact that the same variables are used as proxies for different constructs. The differing results and interpretations obtained here emphasize the need to specify the underlying constructs which period, age, and cohort are used as proxies to represent. Hobcraft et al. (1982) and others (Winship & Harding, 2008) have noted that the constructs which we use age, period, and cohort to represent are often distinct from the true constructs of interest. Ambiguity can arise when analysts are unsure of the specific constructs that birth year might represent. An approach which ignores first-order effects may fail to detect the effects of historical indices of barriers and resources that are unique to cohorts. Likewise, the constraint-based approach would fail to detect constructs such as the unique risk of disease outcomes that arises due to the variation in the effect of contemporaneous influences across age. We recommend directly measuring and testing specific constructs as follow-up to descriptive APC analysis or as an initial analytic step if one hypothesizes a priori that specific constructs are acting to produce significant cohort effects.

In summary, APC models do not test hypotheses about the effects of environmental or historical influences; instead, they organize data and provide useful mathematical formulae for summarizing disease rates over time. The answer to the question of whether age and period should be treated as confounders or effect modifiers is not a statistical question; it is a theoretical question for which there is no statistical answer. Researchers who adopt the definition of age and period as confounders of the cohort effect will continue to be plagued with the identification problem unless the specific causal mechanisms that underlie the effect can be measured and analyzed. Analysts that consider cohort effects as second-order effects will yield consistently reliable empirical results that are not dependent on arbitrarily defined constraints, but may miss some constructs that manifest as first-order effects and also may not actually represent the ‘cohort’ as a unique category in and of itself.

Obesity in the United States: implications for public health efforts

Based on our analyses of obesity rates, we found evidence supporting both the existence and absence of a cohort effect. We believe that both interpretations of the data indicate important, albeit different, substantive interpretations of how different cohorts have been affected by the obesity epidemic of recent decades. In the present analysis, the interpretation of the median polish and Holford model, which adopt the epidemiological definition of a cohort effect as the interaction of age and period, suggests that the environmental changes of the past thirty years that have been hypothesized to increase rates of obesity (e.g., food supply, mass media advertising, and generalized changes in physical activity patterns) have affected age groups similarly. Thus, the median polish and Holford results do not offer evidence to support the fetal overnutrition hypothesis (Cole et al., 2008; Kivimaki, Lawlor, Smith, Elovainio, Jokela, Keltikangas-Jarvinen et al., 2007; Lawlor, 2008) or other hypotheses of increased physiological susceptibility to obesity in more recently born cohorts.

However, the results from the constraint-based approach, which adopts the sociological-oriented definition of a cohort effect, indicate that there may indeed be a greater burden of lifetime obesity in more recently born cohorts, in that these cohorts have a high prevalence of obesity at younger ages than older cohorts did. As each successive cohort enters midlife and older ages with a greater percentage of obese individuals than the cohort before them, the rates of obesity-related chronic ailments such as cardiovascular disease, diabetes, and osteoarthritis may become more prevalent and have long lasting effects on the health of those cohorts. The recent increase in adult-onset (Type II) diabetes (Kwon et al., 2008; Mokdad, Ford, Bowman, Dietz, Vinicor, Bales et al., 2003) is a foreboding example of how higher rates of obesity at the earlier stages of the life course could ultimately lead to an expansion of morbidity or premature mortality among the more recently born birth cohorts.

This exercise, where we have decomposed the age, period, and cohort effects in obesity rates in the United States since 1971 using multiple conceptual and statistical approaches, shows that the common problem of non-replication across APC modeling strategies may not be solely from statistical shortcomings of some modeling approaches over others, but rather a research community that does not consistently agree on what constitutes a cohort effect. More importantly, we believe that this methodological exercise has demonstrated the importance of considering more profoundly the constructs that underlie our models and the best statistical strategy to estimate these constructs. If we are to maximize the true utility of age-period-cohort models for advancing scientific knowledge, the dialogue should shift from one focusing solely on the statistical validity of modeling approaches to one that also considers the substantive and conceptual definitions of what constitutes a cohort effect.


This research was supported in part by a fellowship from the National Institute of Mental Health (T32MH013043-36, Keyes) and grants from the National Institute on Aging (R01AG13642, Li) and the National Institute on Alcohol Abuse and Alcoholism (R01AA09963, Li). Dr. Robinson is a Robert Wood Johnson Foundation Health & Society Scholar at the University of Michigan in the Center for Social Epidemiology and Population Health. The authors thank the Robert Wood Johnson Foundation Health & Society Scholars program for its financial support.


Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Contributor Information

Katherine Margaret Keyes, Columbia University, New York, NY UNITED STATES.

Rebecca L Utz, University of Utah.

Whitney Robinson, University of Michigan.

Guohua Li, Columbia University.


  • Allman-Farinelli MA, Chey T, Bauman AE, Gill T, James WP. Age, period and birth cohort effects on prevalence of overweight and obesity in Australian adults from 1990 to 2000. Eur J Clin Nutr. 2008;62(7):898–907. [PubMed]
  • Aneshensel CS, Rutter CM, Lachenbruch PA. Social structure, stress, and mental health: competing conceptual and analytic methods. American Sociological Review. 1991;56(2):166–178.
  • Centers for Disease Control and Prevention. [Accessed May 9th, 2009];National Health and Nutrition Examination Survey. Available at: http://www.cdc.gov/nchs/about/major/nhanes/datalink.htm.
  • Clayton D, Schifflers E. Models for temporal variation in cancer rates. II: Age-period-cohort models. Stat Med. 1987;6(4):469–481. [PubMed]
  • Cole TJ, Power C, Moore GE. Intergenerational obesity involves both the father and the mother. Am J Clin Nutr. 2008;87(5):1535–1536. author reply 1536–1537. [PubMed]
  • Derrick VPA. Observations on (i) errors of ages in the population statistics of England and Wales and, (ii) the changes in mortality indicated by the national records. Journal of the Institutes of Actuaries. 1928;59:125–137.
  • Drewnowski A. The real contribution of added sugars and fats to obesity. Epidemiol Rev. 2007;29:160–171. [PubMed]
  • Easterlin RA. Population, Labor Force, and Long Swings in Economic Growth: The American Experience. New York: National Bureau of Economic Research; 1968.
  • Flegal KM, Carroll MD, Ogden CL, Johnson CL. Prevalence and trends in obesity among US adults, 1999–2000. JAMA. 2002;288(14):1723–1727. [PubMed]
  • Gillman MW. A life course approach to obesity. In: Kuh D, Ben-Shlomo Y, editors. A life course approach to chronic disease epidemiology. Oxford: Oxford University Press; 2004. pp. 189–217.
  • Glenn ND. Cohort Analysts’ Futile Quest: Statistical Attempts to Separate Age, Period, and Cohort Effects. American Sociological Review. 1976;41:900–905.
  • Glenn ND. Cohort analysis. 2. Thousand Oaks, CA: Sage Publications Inc; 2005.
  • Gluckman PD, Hanson MA. The developmental origins of the metabolic syndrome. Trends Endocrinol Metab. 2004;15(4):183–187. [PubMed]
  • Greenberg BG, Wright JJ, Sheps CG. A technique for analyzing some factors affecting the incidence of syphilis. American Statistical Association Journal. 1950;45(251):373–399.
  • Hobcraft J, Menken J, Preston S. Age, Period, and Cohort Effects in Demography: A Review. Population Index. 1982;48(1):4–43. [PubMed]
  • Holford TR. The estimation of age, period and cohort effects for vital rates. Biometrics. 1983;39:311–324. [PubMed]
  • Holford TR. Understanding the Effects of Age, Period, and Cohort on Incidence and Mortality Rates. Annual Reviews in Public Health. 1991;12:425–457. [PubMed]
  • Holford TR. Analysing the temporal effects of age, period and cohort. Stat Methods Med Res. 1992;1(3):317–337. [PubMed]
  • James WP. The fundamental drivers of the obesity epidemic. Obes Rev. 2008;9(Suppl 1):6–13. [PubMed]
  • Keith SW, Redden DT, Katzmarzyk PT, Boggiano MM, Hanlon EC, Benca RM, et al. Putative contributors to the secular increase in obesity: exploring the roads less traveled. Int J Obesity. 2006;30:1585–1594. [PubMed]
  • Kermack WO, McKendrick AG, McKinlay PL. Death-rates in Great Britain and Sweden. Some general regularities and their significance. Lancet. 1934 Mar;31:698–703. [PubMed]
  • Keyes KM, Li G. A comprehensive approach to age-period-cohort analysis [abstract] Am J Epidemiol. 2008;167(11):S109.
  • Kivimaki M, Lawlor DA, Smith GD, Elovainio M, Jokela M, Keltikangas-Jarvinen L, et al. Substantial intergenerational increases in body mass index are not explained by the fetal overnutrition hypothesis: the Cardiovascular Risk in Young Finns Study. Am J Clin Nutr. 2007;86(5):1509–1514. [PubMed]
  • Kuczmarski RJ, Ogden CL, Grummer-Strawn LM, Flegal KM, Guo SS, Wei R, et al. CDC growth charts: United States. Adv Data. 2000;(314):1–27. [PubMed]
  • Kupper LL, Janis JM, Karmous A, Greenberg BG. Statistical age-period-cohort analysis: a review and critique. J Chronic Dis. 1985;38(10):811–830. [PubMed]
  • Last JM. A Dictionary of Epidemiology. 4. Oxford University Press; 2001.
  • Lawlor DA, Timpson NJ, Harbord RM, Leary S, Ness A, McCarthy MI, et al. Exploring the developmental overnutrition hypothesis using parental-offspring associations and FTO as an instrumental variable. PLoS Med. 2008;5(3):e33. [PMC free article] [PubMed]
  • Lawlor DA. The developmental origins of health and disease: where do we go from here? Epidemiology. 2008;19(2):206–208. [PubMed]
  • Leveille SG, Wee CC, Iezzoni LI. Trends in obesity and arthritis among baby boomers and their predecessors, 1971–2002. Am J Public Health. 2005;95(9):1607–1613. [PMC free article] [PubMed]
  • Mason KO, Mason WM, Winsborough HH, Poole K. Some methodological issues in cohort analysis of archival data. American Sociological Review. 1973;38:242–258.
  • Mason WM, Fienberg SE. Cohort Analysis in Social Research: Beyond the Identification Problem. New York: Springer-Verlag; 1985.
  • McNally RJ, Alexander FE, Staines A, Cartwright RA. A comparison of three methods of analysis for age-period-cohort models with application to incidence data on non-Hodgkin’s lymphoma. Int J Epidemiol. 1997;26(1):32–46. [PubMed]
  • Mokdad AH, Ford ES, Bowman BA, Dietz WH, Vinicor F, Bales VS, et al. Prevalence of obesity, diabetes, and obesity-related health risk factors, 2001. JAMA. 2003;289(1):76–79. [PubMed]
  • National Center for Health Statistics. Plan and Operation of the Health and Nutritional Examination Survey, United States: 1971–1973. Programs and Collection Procedure. Vital and Health Statistics. 1978;10:1–46. [PubMed]
  • National Center for Health Statistics. Plan and Operation of the Third National Health and Nutritional Examination Survey, 1988–1994. Vital and Health Statistics. 1994;32:1–407. [PubMed]
  • National Center for Health Statistics. [Accessed May 2nd 2009];Analytic and Reporting Guidelines: The National Health and Nutrition Examination Survey (NHANES) 2005 Available at: http://www.cdc.gov/nchs/data/nhanes/nhanes_03_04/nhanes_analytic_guidelines_dec_2005.pdf.
  • O’Brien RM. Age Period Cohort Characteristic Models. Social Science Research. 2000;29:123–139.
  • Ogden CL, Carroll MD, Curtin LR, McDowell MA, Tabak CJ, Flegal KM. Prevalence of overweight and obesity in the United States, 1999–2004. JAMA. 2006;295(13):1549–1555. [PubMed]
  • Popkin BM, Conde W, Hou N, Monteiro C. Is there a lag globally in overweight trends for children compared with adults? Obesity (Silver Spring) 2006;14(10):1846–1853. [PubMed]
  • Prentice AM, Jebb SA. Fast foods, energy density and obesity: a possible mechanistic link. Obes Rev. 2003;4(4):187–194. [PubMed]
  • Preston SH, Wang H. Sex mortality differences in the United States: the role of cohort smoking patterns. Demograpy. 2006;43(4):631–646. [PubMed]
  • Robertson C, Boyle P. Age, period and cohort models: The use of individual records’ Statistics in Medicine. 1986;5:527–538. [PubMed]
  • Robertson C, Gandini S, Boyle P. Age-Period-Cohort Models: A Comparative Study of Available Methodologies. Journal of Clinical Epidemiology. 1999;52:569–583. [PubMed]
  • Rodgers WL. Estimatable Functions of Age, Period, and Cohort Effects. American Sociological Review. 1982;47(6):774–787.
  • Ryder N. The Cohort as a Concept in the Study of Social Change. American Sociological Review. 1965;30(6):843–861. [PubMed]
  • Selvin S. Statistical analysis of epidemiologic data. New York: Oxford University Press; 1996.
  • Shahpar C, Li G. Homicide mortality in the United States, 1935–1994: age, period, and cohort effects. Am J Epidemiol. 1999;150(11):1213–1222. [PubMed]
  • Syme SL, Yen IH. Social Epidemiology and Medical Sociology: Different Approaches to the Same Problem. In: Bird CE, Conrad P, Fremont AM, editors. Handbook of Medical Sociology. Upper Saddle River, New Jersey: Prentice Hall; 2000.
  • Tukey JW. Exploratory Data Analysis. Reading, MS: Addison-Wesley Publishing Company; 1977.
  • Tutt LWG. The Mortality Aspect of Population Projections. Transactions of the Faculty of Actuaries. 1953;21:3–50.
  • Utz R. Obesity in America, 1960–2000: Is it an Age, Period, or Cohort Phenomenon? Population Association of America; Philadelphia, PA: 2005.
  • Winship C, Harding DJ. A Mechanism-Based Approach to the Identification of Age-Period-Cohort Models. American Sociological Review. 2008;36(3):362–401.
  • Yang Y, Schulhofer-Wohl S, Fu WJ, Land KC. The Intrinsic Estimator for Age-Period-Cohort Analysis: What it is and How to Use it? American Journal of Sociology. 2008;113:1697–1736.
  • Zheng T, Holford TR, Chen Y, Ma JZ, Mayne ST, Liu W, et al. Time trend and age-period-cohort effect on incidence of bladder cancer in Connecticut, 1935–1992. Int J Cancer. 1996;68(2):172–176. [PubMed]
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

See all...


  • MedGen
    Related information in MedGen
  • PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...