- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

- Journal List
- NIHPA Author Manuscripts
- PMC1421379

# A Varying-Coefficient Cox Model for the Effect of Age at a Marker Event on Age at Menopause

^{1}Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, U.S.A.

^{2}Department of Neurology, University of Michigan, Ann Arbor, Michigan 48109, U.S.A.

^{3}Department of Epidemiology, University of Michigan, Ann Arbor, Michigan 48109, U.S.A.

## Summary

It is of recent interest in reproductive health research to investigate the validity of a marker event for the onset of menopausal transition and to estimate age at menopause using age at the marker event. We propose a varying-coefficient Cox model to investigate the association between age at a marker event, defined as a specific bleeding pattern change, and age at menopause, where both events are subject to censoring and their association varies with age at the marker event. Estimation proceeds using the regression spline method. The proposed method is applied to the Tremin Trust data to evaluate the association between age at onset of the 60-day menstrual cycle and age at menopause. The performance of the proposed method is evaluated using a simulation study.

**Keywords:**

*B*-splines, Cox regression, Generalized cross-validation, Marker events, Nonparametric regression, Survival analysis, Time-dependent covariates

## 1. Introduction

It is of recent interest in female reproductive aging research to identify marker events for the onset of the menopausal transition, and to investigate their use for estimating age at menopause. Menopause is defined as the final menstrual period (FMP), with the FMP confirmed after at least 12 months of amenorrhea. Although several marker events based on menstrual bleeding criteria have been proposed (Mitchell, Woods, and Mariella, 2000; Soules et al., 2001; Taffe and Dennerstein, 2001), there is a lack of appropriate statistical models to formally evaluate their validity due to the complex nature of the data.

This article is motivated by the analysis of the Tremin Trust data. This data set provides a unique opportunity to evaluate the association between age at menopause and ages at onset of the marker events proposed by reproductive health experts based on bleeding criteria (Treloar et al., 1967). The study enrolled 1997 white college students at the University of Minnesota between 1935 and 1939 and followed them up to 40 years through their reproductive life. The study participants were asked to use menstrual diary cards to record the days when bleeding was experienced. Only limited covariate information was available in the data.

Lisabeth et al. (2004) analyzed a subset of 562 women from the original Tremin Trust cohort, who were age 25 or less at enrollment and still participating in the study at age 35. A total of 282 women experienced the 60-day cycle marker event. The median age at the 60-day cycle marker was 48.7 years. A total of 193 women experienced natural menopause. The median age at menopause was 51.7 years. There were 9 women who experienced menopause without having the 60-day cycle marker, and 271 women who were censored for both the 60-day cycle and menopause events. Note that these 271 women who were censored for the marker event were part of the 369 women who were censored for menopause and their censoring times for those two events were the same. The median age at enrollment was 19 years, the median age at menarche was 12 years and ranged from 9 to 18 years, and the length of follow-up ranged from 9 to 39 years with median of 27 years. The descriptive analysis results of Lisabeth et al. (2004) suggest that the 60-day cycle might be a useful marker for predicting age at menopause.

To explore the relationship between age at the 60-day cycle marker and age at menopause, we first restricted ourselves to the 282 women who had an observed marker event and classified them into several groups based on their ages at onset of the marker event as [35, 40), [40, 43), and so on, which is similar to what Lisabeth et al. (2004) did. For each marker age group, we calculated the quartiles of age at menopause using the Kaplan–Meier method and displayed these estimated quartiles using a boxplot. These boxplots are given in Figure 1. The number of women in each marker age group is given above the corresponding boxplot. Figure 1 shows that the relationship between age at the 60-day cycle marker and age at menopause is complicated and varies with age at the 60-day cycle marker. This relationship, however, is only explorative and may not be able to reflect the truth quantitatively because women who were censored for the marker event were excluded from the analysis. In other words, the complete case analysis makes a strong assumption that the marker event is missing completely at random (see e.g., Little and Rubin, 2002). An appropriate statistical model is proposed in this article which can easily handle censored markers without making this strong assumption. For more discussions, see Section 6.

**...**

The first scientific interest is to quantify the association between age at the 60-day marker and age at menopause using a statistical model. The second scientific interest, especially for clinicians and women themselves, is to estimate the distribution of age at menopause given age at onset of the 60-day cycle marker. For example, if a woman first experiences a 60-day cycle at age 40, she would like to know from her physician her expected median age of menopause. From a clinical point of view, this would be a very useful piece of information for helping determine a woman’s need for continued contraception and the likelihood of initiating interventions such as bone density screening.

Several approaches have been proposed for modeling intermediate marker events. Crowley and Hu (1977) analyzed the Stanford heart transplant data using the Cox partial likelihood method by treating the transplant status, an intermediate marker event, as a time-dependent covariate. Lefkopoulou and Zelen (1995) and Nam and Zelen (2001) studied the same model from a different angle which leads to a contingency table interpretation. For an overview of the existing methods handling intermediate marker events, see Kalbfleisch and Prentice (2002, Section 6.4). All of these authors assumed a constant regression coefficient for modeling the effect of the intermediate marker event (see Figure 2a). The results in Figure 1, however, suggest that this assumption is not appropriate for the Tremin Trust data. We need to allow the regression coefficient of the onset of the 60-day marker event to vary with age at the marker event. We hence consider a varying-coefficient model.

**...**

Hastie and Tibshirani (1993) proposed general varying-coefficient models. In the Cox model setting, it is commonly assumed in such models that the regression coefficient *β*(·) of a time-dependent covariate is a function of the follow-up time; e.g., see Murphy and Sen (1991), Marzec and Marzec (1997), among others. Their model can be illustrated by replacing the dot line (and dash line) in Figure 2a by an arbitrary curve that may not be parallel to the log-baseline hazard function. The interest of the Tremin Trust data, however, lies in evaluating the effect of age at the 60-day cycle marker event on age at menopause as a function of age at the marker event, as demonstrated in Figure 1. Hence, it is natural and biologically more desirable to assume that the regression coefficient *β*(·) of the time-dependent covariate, which indicates the onset of the marker event, be a function of age at the marker event, instead of a function of the follow-up time. The resulting regression coefficients are biologically more interpretable to address the scientific interest of the Tremin Trust data. The model is illustrated in Figure 2b. Both of these two situations are special cases of the general framework of varying-coefficient models for survival data of Hastie and Tibshirani (1993). The first situation has been investigated in detail, while limited attention has been paid to the second situation. Parameters in both models can be estimated by using the (penalized) partial likelihood method.

The model discussed in this article can also be regarded to fall into the general framework of the illness-death model of Joly et al. (2002). However, the estimation procedure in Joly et al. (2002) is only for the model where the effect of the marker event varies with the follow-up time, not the time at onset of the marker event which is our major interest.

The remainder of the article is organized as follows. We introduce in Section 2 a varying-coefficient Cox model for age at menopause, where the onset of the 60-day cycle marker is a time-dependent binary covariate and its coefficient is assumed to be a smooth function of the marker event age. We discuss in Section 3 an estimation procedure using regression splines. We analyze in Section 4 the Tremin Trust data, and conduct a simulation study in Section 5 to evaluate the performance of the proposed method, followed by concluding remarks in Section 6.

## 2. The Varying-Coefficient Model

Suppose the data consist of *n* subjects. Let *Y** _{i}* be the observed time to the event of interest, which is defined as the minimum of the survival time

*T*

*, e.g., age at menopause, and the censoring time*

_{i}*C*

*for the*

_{i}*i*th subject (

*i*= 1, …,

*n*). We assume independent censoring. Let Δ

*be a censoring indicator, which takes value 1 if a failure is observed and 0 otherwise. Let*

_{i}*Z*

*(*

_{i}*t*) be a time-dependent covariate.

Assume λ_{0}(*t*) is the baseline hazard and λ* _{i}*{

*t*|

*Z*

*(*

_{i}*t*)} is the hazard rate of the survival time to the endpoint event at

*t*given

*Z*

*(*

_{i}*t*). A standard Cox model with a time-dependent covariate has the following form:

It is common to use (1) to model the effect of an intermediate marker event (Crowley and Hu, 1977; Kalbfleisch and Prentice, 2002).

In the Tremin Trust data, *t* is age, time to the endpoint event is age at menopause, and time to the marker event is age at the first occurrence of the 60-day cycle marker event. Model (1) assumes log-relative risks of subjects who have experienced the marker event and who never experienced the marker event differ by a constant *β*, which is irrelevant to the age at the marker event. However, discussions in Section 1 suggest that the association between age at menopause and age at the marker event varies with age at the marker event in the Tremin Trust data.

Let *S** _{i}* be the age at the 60-day marker event for woman

*i*. Define

Equivalently, *Z** _{i}* (

*t*) =

*I*[

*t*≥

*S*

*], where*

_{i}*I*(·) is an indicator function. We extend model (1) to allow the association between age at menopause and age at the marker event to depend on age at the marker event

*S*

*=*

_{i}*s*as

where *β*(*s*) is an unknown smooth function.

The interpretations and difference between model (1) and model (3) are clearly illustrated using Figure 2a and 2b on the log-hazard scale by contrasting two subjects who have experienced the marker event at time 1 and time 2, respectively. Under the constant coefficient Cox model (1), the first subject’s log hazard is the log-baseline hazard before time 1 (*s*_{1}) and changes by an amount of *β* since time 1, while the second subject’s log hazard is the log-baseline hazard before time 2 (*s*_{2}) and changes since time 2 by the same amount *β*. Under the varying-coefficient Cox model (3), both women’s log hazards also change at time 1 and time 2, respectively, but by different constants *β*(*s*_{1}) and *β*(*s*_{2}), respectively. Note that the lines are all parallel and reflect the proportional hazards assumption. It should be noted that *Z** _{i}* (

*t*) is always observable in the analysis at any

*t*whenever subject

*i*appears in the risk set at time

*t*even if

*S*

*is not observed. Thus,*

_{i}*β*(

*s*) is estimable.

If baseline covariates *X** _{i}* are available, model (3) can be easily extended to incorporating baseline covariates

*X**as*

_{i}where *X** _{i}* is age at menarche in the Tremin Trust data. Because model (3) is a special case of model (4), we shall focus on model (4) in this article.

## 3. The Estimation Procedure

### 3.1 *Estimation Using* B-*Splines*

We consider estimation of the nonparametric function *β*(*s*) using the regression spline method by approximating *β*(*s*) using the natural cubic *B*-spline basis. Let *K* be the number of interior knots. Knot locations are usually chosen such that there are roughly equal numbers of observed data points between any two adjacent knots. This can be done by placing these knots using 100*j*/(*K* + 1) (*j* = 1, …, *K*) percentiles of observed marker event times. We discuss in Section 3.2 estimation of the number of knots *K* using generalized cross-validation.

Because a natural spline is constrained to be linear beyond two boundary knots, the function *β*(*s*) can be parameterized using *K* + 2 natural cubic *B*-spline basis functions *B** _{k}* (

*s*) (

*k*= 1,…,

*K*+ 2) as

Replacing *β*(*s*) by its *B*-spline approximation in equation (5), model (4) can be written as

where ** θ** = (

*θ*

_{1}, …,

*θ*

_{K}_{+2})′ and ${\tilde{\mathbf{Z}}}_{i}(t)=\{{B}_{1}(s){Z}_{i}(t),\dots ,{B}_{K+2}(s){Z}_{i}(t)\}\prime $. Note that ${\tilde{\mathbf{Z}}}_{i}(t)$ is always observable during follow-up because

*Z*

*(*

_{i}*t*) is fully observed during follow-up. Specifically, if the marker event is observed at

*S*

*for the*

_{i}*i*th woman during follow-up, then ${\tilde{\mathbf{Z}}}_{i}(t)=0\mathrm{\hspace{0.17em}\u200a\u200a}\text{if\hspace{0.28em}}t<{S}_{i}$ and ${\tilde{\mathbf{Z}}}_{i}(t)=\{{B}_{1}({S}_{i}),\dots ,{B}_{K+2}({S}_{i}){\}}^{\prime}\mathrm{\hspace{0.17em}\u200a\u200a}\text{if\hspace{0.28em}}t\ge {S}_{i}$. If the marker event is not observed, i.e.,

*S*

*is censored, then ${\tilde{\mathbf{Z}}}_{i}(t)=0$ at any observed follow-up time*

_{i}*t*.

Now model (6) becomes a standard Cox proportional hazards model with the time-dependent covariate vector
${\tilde{\mathbf{Z}}}_{i}(t)$ and the baseline covariate vector *X** _{i}*. Thus, the estimation of parameters (

**,**

*θ***) can be obtained using partial likelihood method. Denote the maximum partial likelihood estimators of (**

*γ***,**

*θ***) by $(\widehat{\mathbf{\theta}},\widehat{\mathbf{\gamma}})$ and their covariance estimators by $\text{cov}(\widehat{\mathbf{\theta}})$ and $\text{cov}(\widehat{\mathbf{\gamma}})$. The nonparametric function**

*γ**β*(

*s*) can then be estimated by

The pointwise confidence interval for
$\widehat{\beta}(s)$ can be estimated using its variance estimator
$\text{var}\{\widehat{\beta}(s)\}=\mathbf{B}(s{)}^{\prime}\text{cov}(\widehat{\mathbf{\theta}})\mathbf{B}(s)$, where ** B**(

*s*) = {

*B*

_{1}(

*s*), …

*B*

_{K}_{+2}(

*s*)}′.

As discussed in Section 1, it is of both clinical interest and a woman’s own interest to estimate age at menopause if a woman has experienced the 60-day marker event at a certain age. We first estimate the baseline cumulative hazard function Λ_{0}(*t*) using the Breslow estimator,

Then, the survival function for menopause given both age at the marker event *S* = *s* and covariates ** X** =

**can be estimated by**

*x*where *z*(*u*) = *I*(*u* ≥ *s*).

### 3.2 Estimation of the Number of Knots

An advantage of the use of a regression spline for estimating the nonparametric function *β*(*s*) is its computational simplicity. However, this method requires estimation of the number of knots. For uncensored data, cross-validation (CV) and generalized cross-validation (GCV) are commonly used; see, e.g., Hastie and Tibshirani (1990). For survival data, O’Sullivan (1988) proposed CV and GCV for choosing the smoothing parameter for the smoothing spline estimator assuming that the baseline cumulative hazard function
${\mathrm{\Lambda}}_{0}(t)={\int}_{0}^{t}{\lambda}_{0}(u)du$ is known. We extend O’Sullivan’s method to choose the number of knots in the regression spline setting and account for the fact that Λ_{0}(*t*) is unknown and is estimated.

We first consider the case when Λ_{0}(*t*) is known. Following O’Sullivan (1988), under model (6), for a given number of knots *K*, if Λ_{0}(*t*) is a known function, then the likelihood function of (** θ**,

**) is available and can be maximized using an iterated reweighted least square algorithm. If the estimators of (**

*γ***,**

*θ***) at the**

*γ**l*th iteration are $({\widehat{\mathbf{\theta}}}_{(l)},{\widehat{\mathbf{\gamma}}}_{(l)})$, the working weight

*w*

*and the working dependent variable*

_{i}*y*

*for subject*

_{i}*i*can then be written as

One calculates
$({\widehat{\mathbf{\theta}}}_{(l+1)},{\widehat{\mathbf{\gamma}}}_{(l+1)})$ by minimizing
${\sum}_{i=1}^{n}{w}_{i}{\{{y}_{i}-{\mathbf{\theta}}^{\prime}{\tilde{\mathbf{Z}}}_{i}({Y}_{i})-{\mathbf{\gamma}}^{\prime}{\mathbf{X}}_{i}\}}^{2}$. Let
${\tilde{\mathbf{X}}}_{i}=\{{\tilde{\mathbf{Z}}}_{i}({Y}_{i}),{{\mathbf{X}}^{\prime}}_{i}{\}}^{\prime}$ and
$\tilde{\mathbf{X}}=({{\tilde{\mathbf{X}}}^{\prime}}_{1},\dots ,{{\tilde{\mathbf{X}}}^{\prime}}_{n}{)}^{\prime}$. Denote the working dependent variable, the working weight matrix, and the predicted value vector at convergence by
$\widehat{\mathbf{y}}=({\widehat{y}}_{1},\dots ,{\widehat{y}}_{n}{)}^{\prime},\mathrm{\hspace{0.17em}\u200a\u200a}\widehat{\mathbf{W}}=\text{diag}({\widehat{w}}_{1},\dots ,{\widehat{w}}_{n}),\mathrm{\hspace{0.17em}\u200a\u200a}\text{and\hspace{0.28em}}\widehat{\mathbf{f}}=({\widehat{f}}_{1},\dots ,{\widehat{f}}_{n}{)}^{\prime}$. Then,
$\widehat{\mathbf{f}}$ can be calculated as
$\widehat{\mathbf{f}}=\tilde{\mathbf{X}}{({\tilde{\mathbf{X}}}^{\prime}\widehat{\mathbf{W}}\tilde{\mathbf{X}})}^{-1}{\tilde{\mathbf{X}}}^{\prime}\widehat{\mathbf{W}}\widehat{\mathbf{y}}=\widehat{\mathbf{H}}\widehat{\mathbf{y}}$ where
$\widehat{\mathbf{H}}$ is the linearized hat matrix. The GCV, which is a function of the number of knots *K*, is given by

where $\overline{h}$ is the average of the diagonal elements of $\widehat{\mathbf{H}}$, the so-called mean leverage.

We now consider the case when the baseline hazard Λ_{0}(*t*) is unknown and is estimated by the Breslow estimator (8). O’Sullivan (1988) suggested to calculate the Breslow estimator of Λ_{0}(·) for each *K* and plug it into (10) as if it were known. However, this plug-in procedure ignores the fact that different choices of *K* give different baseline hazard estimators of Λ_{0}(*t*), but the above procedure assumes that the same true baseline hazard is used for different *K*. We hence propose a modified procedure to account for this.

First, a series of Cox models as in (6) are fitted for a range of the number of interior knots *K*. We used 1–20 in the analysis of the Tremin Trust data. For each choice of *K*, the cumulative baseline hazard function estimator
${\widehat{\mathrm{\Lambda}}}_{0}(t;K)$ and the *B*-spline estimator
$\widehat{\beta}(s;K)$ are calculated. They are then plugged into equation (10) to calculate GCV(*K*). Note that different baseline hazard estimators are used for different *K* at this step. We then select *K* that minimizes GCV(*K*), call it *K*_{*}, and obtain the corresponding baseline hazard estimator
${\widehat{\mathrm{\Lambda}}}_{0}(t;{K}_{*})$. At the next step, we replace the true Λ_{0} (*t*) by this estimated
${\widehat{\mathrm{\Lambda}}}_{0}(t;{K}_{*})$ and treat it as fixed and known. Then, recalculate the GCV statistic (10) using the above least square procedure for each of the possible values of *K*, 1–20 in our analysis, and select a new *K*, the updated *K*_{*}, that minimizes GCV(*K*). Note here a common
${\widehat{\mathrm{\Lambda}}}_{0}(t;{K}_{*})$ is used to calculate GCV for different possible values of *K*. The procedure is repeated using a newly updated common
${\widehat{\mathrm{\Lambda}}}_{0}(t;{K}_{*})$ until the chosen *K*_{*} at the current step is the same as the *K*_{*} at the previous step. The CV statistic can be calculated similarly.

## 4. The Analysis of the Tremin Trust Data

We applied the proposed varying-coefficient Cox model to the analysis of the Tremin Trust data. The goals of our study were to investigate the relationship between age at menopause and age at the 60-day cycle marker event, and to estimate the distribution of age at menopause given any particular age at onset of the 60-day cycle marker. The data used in our analysis were the same as that used in Lisabeth et al. (2003), which were described in detail in Section 1. We used age 35 as the time origin in our analysis.

For each woman, the data set contained the observed menopause age which is the minimum of the age at menopause and the censoring age; a censoring indicator for age at menopause; a 60-day cycle marker event indicator; the age at the marker event if it occurred during the follow-up time; and age at menarche.

Let *Z** _{i}*(

*t*) be a time-dependent binary indicator for the onset of the 60-day cycle marker event and the baseline covariate

*X**be age at menarche. We first fitted (4) by assuming simple parametric forms for*

_{i}*β*(

*s*) as a cubic function. The linear, quadratic, and cubic terms were all found to be highly significant. This implies that a simple third-order polynomial function does not seem to be adequate for describing the effect of age at marker event. We then fit the semiparametric varying-coefficient Cox model (4) by estimating

*β*(

*s*) nonparametrically using the

*B*-spline method via the Cox model (6).

The method of Therneau and Grambsch (2000) was used to expand the data set for the time-dependent covariate *Z** _{i}* (

*t*). Observed marker times were used to determine knot allocations and generate natural cubic

*B*-spline basis functions

*B*

*(*

_{k}*s*) used for estimating

*β*(

*s*). Two extreme values of observed marker times were used as two boundary knots. The optimal number of interior knots estimated using GCV method described in Section 3.2 is

*K*

_{optimal}= 8. The spline estimator of

*β*(

*s*) and its 95% pointwise confidence interval are plotted in Figure 3. For illustrative purpose, we also considered approximating

*β*(

*s*) using piecewise constants as $\beta (s)={\sum}_{k=1}^{K+1}{\beta}_{k}I[{s}_{k-1}<s\le {s}_{k}]$ is the set of knots including the boundary knots, and fit $\lambda \left\{t\right|{Z}_{i}(t),{X}_{i}\}={\lambda}_{0}(t)\mathrm{\hspace{0.17em}\u200a\u200a}\text{exp}\{{\sum}_{k=1}^{K+1}{\beta}_{k}I[{s}_{k-1}<{S}_{i}\le {s}_{k}]{Z}_{i}(t)+\gamma {X}_{i}\}$. The piecewise constant estimator of

*β*(

*s*) using the age intervals [35, 38), [38, 40), etc., is superimposed in Figure 3. We can see that the

*B*-spline estimate and the piecewise constant estimate of

*β*(

*s*) agree well with each other.

*β*(

*s*) using the

*B*-spline and the step function for the Tremin Trust data: —— estimated

*β*(

*s*) using the

*B*-spline basis; … 95% CI; – – – estimated

*β*(

*s*) using piecewise constants.

**...**

The results in Figure 3 suggest that the 60-day cycle marker is strongly associated with age at menopause, and its effect varies with age at the 60-day cycle marker event. But when age at marker event is close to 35, the estimated *β*(*s*) does not significantly differ from zero which implies that having a marker around age 35 is uninformative about age at menopause. The curve is mainly positive and increases before age 44 and then starts to decrease. This indicates that before age 44, the association between age at menopause and age at the 60-day cycle marker becomes stronger as age increases. Among women who first experience the 60-day cycle before 44, as age at onset of the 60-day cycle increases, she is likely to have menopause more quickly. For example, consider two women: The first woman experiences the 60-day cycle at age 39 and the second woman experiences the 60-day cycle at age 42. Then, relative risk of menopause at any age after age 42 for the second woman is
$\text{exp\hspace{0.28em}}\{\widehat{\beta}(42)-\widehat{\beta}(39)\}=\text{exp}(4.1-2.2)=6.7$ times higher than the first woman (*p*-value < 0.0001).

The estimated *β*(*s*) curve starts decreasing after age 44. This indicates that after age 44, the association between age at menopause and age at the 60-day cycle marker becomes weaker as age increases. Among women who first experience the 60-day cycle after 44, as age at onset of the 60-day cycle increases, a woman is likely to have menopause at a later age. For example, consider two women: The first woman experiences the 60-day cycle at age 48 and the second woman experiences the 60-day cycle at age 51. Then, relative risk of menopause at any age after age 51 for the second woman is
$\text{exp\hspace{0.28em}}\{\widehat{\beta}(51)-\widehat{\beta}(48)\}=\text{exp}(1.9-3.2)=0.27$ times lower than the first woman (*p*-value < 0.0001). In other words, the relative risk of menopause at any age after age 51 for the first woman is 1/0.27 = 3.7 times higher than the second woman. It is much higher during ages 48–51.

The estimated log relative risk for age at menarche was −0.16 (*RR* = 0.85) for a 1-year increment (*p*-value = 0.01). This means that a younger age at menarche has a significant effect on advancing the expected age at menopause. We also found that the effect of age at the 60-day cycle marker was independent of age at menarche. Particularly, the estimated curves of *β*(*s*) were almost identical with and without adjusting for age at menarche.

Survival probabilities of age at menopause were calculated using equation (9) for several selected ages at the 60-day cycle marker event given age of menarche equaled to 12, which was the median age of menarche. Estimated survival curves are plotted in Figure 4a and estimated corresponding percentiles are summarized in Table 1. These results are consistent with the pattern of the estimated *β*(*s*) curve in Figure 3. For a woman who experiences the 60-day cycle marker before age 44, the later she experiences the marker event, the earlier she is likely to experience menopause. For a woman who experiences the 60-day cycle marker after age 44, the later she experiences the marker event, the later she is likely to experience menopause.

**...**

**...**

These results are biologically meaningful. Women who are observed to have a 60-day cycle before age 40 may belong to a subgroup of women who cycle infrequently, e.g., women with polycystic ovarian disease, and for whom the pattern of change in menstrual bleeding with age may differ from other women. Additional research on this subgroup of women is needed. To explore this, we conducted a subgroup analysis restricted to women with age at marker event greater or equal to 40. The estimated curve (with the same knots) matched the curve in Figure 3 well except for the left end within the first 2 or 3 years after age 40, where the curve was lower but still within the pointwise confidence band of the previous curve.

Another interesting and more intuitive piece of information for both clinicians and midlife women is the number of years from the onset of marker event to menopause. Percentiles of this quantity can be easily calculated by subtracting age at marker event from corresponding estimated percentiles for age at menopause, which are also given in Table 1. Survival curves for menopause after the onset of marker event are plotted in Figure 4b.

## 5. The Simulation Study

We conducted a simulation study to evaluate the performance of the natural cubic *B*-spline estimator for *β*(*s*) in model (3). The follow-up time was restricted from 0 to 1. To roughly mimic the shape of the estimated
$\widehat{\beta}(s)$ for the 60-day cycle marker event in Figure 3, we assumed that true *β*(*s*) = 3 sin(π*s*). The age at the marker event *S* was generated from a Weibull distribution with shape parameter 2 and scale parameter 1. The age at menopause *T* was generated from the model λ{*t* | *Z*(*t*)} = λ_{0}(*t*) exp{*β*(*s*)*Z*(*t*)}, where *Z*(*t*) = *I*(*t* ≥ *S*) and the baseline hazard λ_{0}(*t*) = 0.5*t*^{2}, which corresponds to the hazard of a Weibull distribution with shape parameter 2 and scale parameter 4. The censoring time *C* was generated by *C* = *U* · *I*(*U* ≤ 1) + *I*(*U* > 1), where *U* ~ Uniform(0, 2). Thus, the observed time *Y* = min(*T*, *C*) was within the interval [0, 1]. The censoring percentage was about 70%. We assumed a sample size of *n* = 500 in each simulated data set.

To reduce the computational burden, we chose the optimal number of interior knots in estimating *β*(·) by minimizing the mean square error of
$\widehat{\beta}(\cdot )$ defined as
$\text{MSE}={\sum}_{j=1}^{J}{\{\widehat{\beta}({t}_{j})-\beta ({t}_{j})\}}^{2}$, where *t** _{j}*,

*j*= 1, …,

*J*, are equally spaced grid points in (0, 1) with two boundary knots at 0 and 1. We used

*J*= 1000. The MSE criterion suffices for our purpose.

We performed 100 simulations and analyzed each simulated data set using the varying-coefficient model (3) using *B*-splines by fitting the Cox model (6). Each data set was analyzed by the proposed method using all the data and using only the complete cases, i.e., those women who have experienced the marker event or menopause, respectively. The estimated optimal numbers of interior knots varied from 1 to 6 with the average number of estimated knots equal to 1.6. The average of the 100 estimated
$\widehat{\beta}(\cdot )$ and the true curve *β*(·) are plotted in Figure 5. The 95% pointwise confidence intervals for
$\widehat{\beta}(\cdot )$ using empirical standard errors and the average of the 100 estimated standard errors are also plotted. Figure 5 suggests that the pointwise biases of the *B*-spline estimator
$\widehat{\beta}(\cdot )$ are close to zero, and the pointwise model-based SEs of
$\widehat{\beta}(\cdot )$ agree well with their empirical counterparts, except for boundaries. The average of the 100 estimated
$\widehat{\beta}(\cdot )$ using complete case analysis is also plotted, and the bias is obvious.

## 6. Discussion

We have proposed in this article a varying-coefficient Cox model to investigate the association between time to an intermediate marker event and time to a primary endpoint event, where the coefficient of the time-dependent marker indicator is assumed to be a nonparametric function of time at the marker event, and baseline covariate effects are modeled parametrically. We estimate the nonparametric regression function using *B*-splines which can be easily formulated into a standard Cox model and fitted using the standard partial likelihood method. We estimate the number of knots using a modification of O’Sullivan’s (1988) GCV method. Our simulation results suggest the proposed method works well in finite samples.

The large sample theory for the partial likelihood-based regression spline estimator $\widehat{\beta}(s)$ is beyond the scope of this article. For discussions of such spline estimators in linear regression settings, see e.g., Huang (2003). An extension to the Cox model setting requires further research. Our simulation results provide empirical evidence that similar results are likely to hold for Cox regression with varying coefficients.

We calculated the pointwise confidence intervals for the nonparametric function *β*(*s*) in the analysis of the Tremin Trust data. One could also calculate the global confidence band. However, such a global confidence band is often found to be too wide to be practically useful.

We used all the data in our analysis. Similar to Figure 1, one might want to restrict analysis to the subset of women who have experienced the marker event when fitting (3). However, such analysis requires a strong assumption, i.e., the age at marker is missing completely at random, which apparently does not hold for right-censored marker events. The estimators would be biased if this assumption is violated (Paik and Tsai, 1997). Our analysis however does not require this strong assumption and also does not add any major complexity in computation compared to the complete case analysis. An interesting phenomenon for the Tremin Trust data is that the complete case analysis only shows small bias compared to the analysis using all the data. The bias is obvious in the simulation study (see e.g., Figure 5).

## Acknowledgments

The work of Nan and Lin was supported in part by U.S. National Cancer Institute grant R01 CA76404. The work of Lisabeth and Harlow was supported in part by U.S. National Institute of Aging grant AG021543.

## References

- Crowley J, Hu M. Covariance analysis of heart transplant survival data. Journal of the American Statistical Association. 1977;72:27–36.
- Hastie, T. J. and Tibshirani, R. J. (1990).
*Generalized Additive Models*London: Chapman & Hall. - Hastie T, Tibshirani R. Varying-coefficient models. Journal of the Royal Statistical Society, Series B. 1993;55:757–796.
- Huang JH. Local asymptotics for polynomial spline regression. Annals of Statistics. 2003;31:1600–1635.
- Joly P, Commenges D, Helmer C, Letenneur L. A penalized likelihood approach for an illness-death model with interval-censored data: Application to age-specific incidence of dementia. Biostatistics. 2002;3:433–443. [PubMed]
- Kalbfleisch, J. D. and Prentice, R. L. (2002).
*The Statistical Analysis of Failure Time Data*, 2nd edition. Hoboken, New Jersey: John Wiley & Sons. - Lefkopoulou M, Zelen M. Intermediate clinical events, surrogate markers and survival. Lifetime Data Analysis. 1995;1:73–85. [PubMed]
- Lisabeth LD, Harlow SD, Gillespie B, Lin X, Sowers MF. Staging reproductive aging: A comparison of proposed bleeding criteria for the menopausal transition. Menopause. 2004;11:186–197. [PubMed]
- Little, R. J. A. and Rubin, D. B. (2002).
*Statistical Analysis with Missing Data*Hoboken, New Jersey: John Wiley & Sons. - Marzec L, Marzec P. On fitting Cox’s regression model with time-dependent coefficients. Biometrika. 1997;84:901–908.
- Mitchell ES, Woods NF, Mariella A. Three stages of the menopausal transition from the Seattle Midlife Women’s Health Study: Toward a more precise definition. Menopause. 2000;7:334–349. [PubMed]
- Murphy SA, Sen PK. Time-dependent coefficients in a Cox-type regression model. Stochastic Processes and Their Applications. 1991;39:153–180.
- Nam CM, Zelen M. Comparing the survival of two groups with an intermediate clinical event. Lifetime Data Analysis. 2001;7:5–19. [PubMed]
- O’Sullivan F. Nonparametric estimation of relative risk using splines and cross-validation. SIAM Journal on Scientific and Statistical Computing. 1988;9:531–542.
- Paik MC, Tsai WY. On using the Cox proportional hazards model with missing covariates. Biometrika. 1997;84:579–593.
- Soules MR, Sherman S, Parrott EP, Rebar R, Santoro N, Utian W, Woods N. Executive summary: Stages of reproductive aging workshop. Fertility and Sterility. 2001;76:874–878. [PubMed]
- Taffe J, Dennerstein L. Menstrual patterns leading to final menstrual period. Menopause. 2001;9:32–40. [PubMed]
- Therneau, T. M. and Grambsch, P. M. (2000).
*Modeling Survival Data: Extending the Cox Model*New York: Springer-Verlag. - Treloar AE, Boynton RE, Behn BG, Brown BW. Variation of the human menstrual cycle through reproductive life. International Journal of Fertility. 1967;12:77–126. [PubMed]

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (235K)

- Staging reproductive aging: a comparison of proposed bleeding criteria for the menopausal transition.[Menopause. 2004]
*Lisabeth LD, Harlow SD, Gillespie B, Lin X, Sowers MF.**Menopause. 2004 Mar-Apr; 11(2):186-97.* - Staging the menopausal transition: data from the TREMIN Research Program on Women's Health.[Womens Health Issues. 2004]
*Mansfield PK, Carey M, Anderson A, Barsom SH, Koch PB.**Womens Health Issues. 2004 Nov-Dec; 14(6):220-6.* - A new statistical approach demonstrated menstrual patterns during the menopausal transition did not vary by age at menopause.[J Clin Epidemiol. 2004]
*Lisabeth L, Harlow S, Qaqish B.**J Clin Epidemiol. 2004 May; 57(5):484-96.* - Recommendations from a multi-study evaluation of proposed criteria for staging reproductive aging.[Climacteric. 2007]
*Harlow SD, Crawford S, Dennerstein L, Burger HG, Mitchell ES, Sowers MF, ReSTAGE Collaboration.**Climacteric. 2007 Apr; 10(2):112-9.* - The menopausal transition--endocrinology.[J Sex Med. 2008]
*Burger H.**J Sex Med. 2008 Oct; 5(10):2266-73. Epub 2008 Jul 1.*

- A Method for Longitudinal Prospective Evaluation of Markers for a Subsequent Event[American Journal of Epidemiology. 2011]
*Little RJ, Yosef M, Nan B, Harlow SD.**American Journal of Epidemiology. 2011 Jun 15; 173(12)1380-1387* - Regression Splines in the Time-Dependent Coefficient Rates Model for Recurrent Event Data[Statistics in medicine. 2008]
*Amorim LD, Cai J, Zeng D, Barreto ML.**Statistics in medicine. 2008 Dec 10; 27(28)5890-5906* - Statistical Methods with Varying Coefficient Models[Statistics and its interface. 2008]
*Fan J, Zhang W.**Statistics and its interface. 2008; 1(1)179-195* - The ReSTAGE Collaboration: Defining Optimal Bleeding Criteria for Onset of Early Menopausal Transition[Fertility and sterility. 2008]
*Harlow SD, Mitchell ES, Crawford S, Nan B, Little R, Taffe J, for the ReSTAGE Collaboration.**Fertility and sterility. 2008 Jan; 89(1)129-140* - Evaluation of Four Proposed Bleeding Criteria for the Onset of Late Menopausal Transition[The Journal of clinical endocrinology and m...]
*Harlow SD, Cain K, Crawford S, Dennerstein L, Little R, Mitchell ES, Nan B, Randolph JF Jr, Taffe J, Yosef M.**The Journal of clinical endocrinology and metabolism. 2006 Sep; 91(9)3432-3438*

- PubMedPubMedPubMed citations for these articles

- A Varying-Coefficient Cox Model for the Effect of Age at a Marker Event on Age a...A Varying-Coefficient Cox Model for the Effect of Age at a Marker Event on Age at MenopauseNIHPA Author Manuscripts. Jun 2005; 61(2)576PMC

Your browsing activity is empty.

Activity recording is turned off.

See more...