- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

- Journal List
- NIHPA Author Manuscripts
- PMC3733282

# Joint Modeling of Longitudinal and Cure-survival Data

## Abstract

This article presents semiparametric joint models to analyze longitudinal measurements and survival data with a cure fraction. We consider a broad class of transformations for the cure-survival model, which includes the popular proportional hazards structure and the proportional odds structure as special cases. We propose to estimate all the parameters using the nonparametric maximum likelihood estimators (NPMLE). We provide the simple and efficient EM algorithms to implement the proposed inference procedure. Asymptotic properties of the estimators are shown to be asymptotically normal and semiparametrically efficient. Finally, we demonstrate the good performance of the method through extensive simulation studies and a real-data application.

**Keywords:**Cure-survival data, Joint models, Longitudinal data, Nonparametric maximum likelihood, Random effects, Transformation models

## 1 Introduction

In cancer research, along with repeatedly measured biomarkers, it is often observed that a certain proportion of patients are cured, immune, or unsusceptible to the event of interest. For example, cancer can be considered cured if all metastasis-competent tumor cells are successfully removed by treatment, and hence no recurrence will be observed. In these situations since the survival curve will never reach to zero due to these cured patients, the standard survival analyses based on non-cure models are inconvenient for inference on cure rates. The existence of this possible cure fraction can be evident from a stable plateau at the right tail of survival curve as in Figure 1 (a).

*t*)) of the uncured subpopulation under

*H*(

*x*) =

*x*. The solid curves are point estimates, and the dotted

**...**

To analyze such cure-survival data, two classes of cure rate models are commonly used: mixture cure model and promotion time cure model. The former mixture cure model is on the basic concept that the underlying population is a mixture of two subpopulations of the cured and uncured with the respective probabilities *p _{c}*(

*) and 1 −*

**Z**_{i}*p*(

_{c}*), that is*

**Z**_{i}*S*(

_{pop}*t*|

*) =*

**Z**_{i}*p*(

_{c}*) + {1 −*

**Z**_{i}*p*(

_{c}*)}*

**Z**_{i}*S*(

_{uc}*t*|

*), where*

**Z**_{i}*is the vector of covariates,*

**Z**_{i}*S*(

_{uc}*t*|

*) is the conditional survival function of the uncured population (Berkson and Gage, 1952; Laska and Meisner, 1992; Maller and Zhou, 1996; Kuk and Chen, 1992; Sy and Taylor, 2000; Lu and Ying, 2004). It is assumed that all patients in the uncured subpopulation will eventually experience the event while those in the cured subpopulation will never. Lu and Ying (2004) considered a class of transformation models for the event time where generalized estimating equations were used for parameter estimation, and the asymptotic properties were established by the usual counting process and its associated martingale theory. However, their linear transformation approach was limited to time-independent covariates only. Therefore, as alternative to the mixture cure models, we focus on the promotion time cure model that can handle transformation with time-dependent covariates when there is a need for transformations, for example, in cases where the violated proportional hazards assumption is of concern. The promotion time cure model has been proposed under biological assumptions that a patient has*

**Z**_{i}*N*metastatic tumor cells remaining after treatment (Yakovlev et al., 1996; Tsodikov, 1998; Chen et al., 1999; Tsodikov et al., 2003) in cancer clinical trials. Let

*N*be the number of metastatic cancerous cells of the

_{i}*i*th patient, which is an unobservable latent variable, following a Poisson distribution with mean π(

*). We denote the time for the*

**Z**_{i}*k*th metastatic cancer cell to produce a detectable tumor (promotion time) by

*(*

_{k}*k*= 1, …,

*N*) and assume that, conditional on

_{i}*N*,

_{i}*’s are independent and identically distributed (iid) with a common distribution function*

_{k}*F*(

*t*). Viewing

*F*(

*t*) = 1 −

*S*(

_{uc}*t*), it can be interpreted similarly to the distribution function of the uncured patients in the mixture cure model. Then, the time to relapse of cancer for the

*i*th patient, defined by

*T*=

_{i}*min*{

_{1}, …,

*}, has a form of the survival function*

_{Ni}
where π(* Z_{i}*) is a known link function. In the promotion time cure model (1), the survival function is integrated into one formulation regardless of cured or uncured, and we can see that (1) retains the proportional hazards structure when π(

*) = exp(β*

**Z**_{i}^{T}

*). Moreover, if the regression coefficients β include an intercept term, say β*

**Z**_{i}_{0}, the baseline cumulative hazard function is equal to exp(β

_{0})

*F*(

*t*), which implies that the model (1) becomes the Cox proportional hazards model with a bounded baseline cumulative hazard. If the cured patients exist in the population, the survival rate at

*t*= ∞ can naturally be interpreted as the cure rate, i.e., the cure rate is

*S*(∞|

*) = exp{−π(*

**Z**_{i}*)} ≠ 0, leading to an improper survival function.*

**Z**_{i}Some of these curable diseases are associated with longitudinal biomarkers, and it is often of interest to model these two different types of data as outcomes simultaneously. We will therefore propose a joint model to analyze longitudinal and survival data with a cure fraction. While there has been a great deal of work done on joint modeling of longitudinal and cure-survival data based on the mixture cure models (Law et al., 2002; Yu et al., 2004, 2008), there has been scant literature about joint modeling based on the promotion time cure models. For a more detailed review on the joint mixture cure models see Yu et al. (2004). Instead, we will provide a brief review of the alternative approach, joint promotion time cure models that we adopt here. Brown and Ibrahim (2003) and Chen et al. (2004) proposed joint promotion time cure model with emphasis on two different types of longitudinal models. To model immune response, Brown and Ibrahim (2003) proposed a longitudinal model with a point mass at zero that changed over time and subjects in probability. On the other hand, Chen et al. (2004) considered a true immune response was unobservable, and adopted the longitudinal model in the context of measurement errors. In common, they considered a piecewise constant function to estimate the baseline distribution *F*(*t*). For inference, they used Bayesian approaches via Gibbs sampler and Markov chain Monte Carlo sampling, which may be straightforward ways to proceed, but computationally intensive, compared to frequentist analyses. Therefore, we propose a joint cure model based on a frequentist approach to balance out complication brought by relaxing a functional form of the baseline distribution *F*(*t*) as well as the proportionality assumption.

The objective of the article is to present a flexible joint promotiontime cure model based on a frequentist inference that is a new approach to the existing joint promotiontime cure models. To account for the long-term plateau at the tail of survival distribution resulting from the existence of cured patients, we propose a broad class of transformed promotion time cure models, which integrate the popular proportional hazards and odds cure models in to one general form. Inference procedures using the nonparametric maximum likelihood estimation (NPMLE) are developed, a simple and efficient algorithm is provided for its implementation, and the new joint cure model is illustrated with the colorectal cancer data from the Health Professionals Follow-up Study (HPFS). The proposed work here advances to existing joint cure models in: 1) flexibility throughout nonparametric baseline distribution function and transformation models; 2) extended ability to handle time-varying covariates; and 3) well-established asymptotic properties of the NPMLEs.

## 2 Joint Transformation Models

### 2.1 Joint Models

Let *Y* (*t*) be the longitudinal measurement at time *t*, *T* be the time to the survival event, and = {*Z*(*t*); *t* ≥ 0} be the covariate process, where *Z*(*t*) is the vector of external covariates at time *t*, possibly time-varying. We introduce latent random effects to account for the correlation between longitudinal and survival components on the same subject. Particularly, let *b* denote the subject-specific random effects following a multivariate normal distribution with mean zeros and covariance matrix Σ_{b}. We further assume that *Y* (*t*) and *T* are independent, conditional on and *b*. Then, the proposed joint model for the longitudinal data *Y* (·) and the population survival function of *T* with a cure fraction are given by

where α and β are vectors of unknown regression parameters in the longitudinal and survival components, respectively, *Z _{k}*(

*t*) and

*(*

_{k}*t*) (

*k*= 1, 2) are subsets of

*Z*(

*t*) plus the unit component, and

*F*(

*t*) is an unspecified distribution function of the event times. In addition, ε(

*t*) is a white noise process with mean zero and variance ${\sigma}_{e}^{2}$, ψ is a set of unknown constants with the same number of elements as

*b*, and ψ ○

*b*denotes the component-wise product of ψ and

*b*. Note in (2) that the correlation among the longitudinal outcomes is formulated through the latent random effects

*b*, and that the association between longitudinal outcomes and the event time is characterized by ψ with the shared latent variables

*b*. Thus, for a fixed covariate , ψ > 0 implies the larger longitudinal measures are, the higher hazard rate of the event is. On the other hand, ψ = 0 implies that the association can be fully explained by the common covariates in both longitudinal and survival components. The transformation function

*H*(·) is assumed to be continuously differentiable and strictly increasing, and we will discuss about

*H*(·) in more detail in Section 2.2.

We notice that the survival model for the entire population in (2) encompasses more general regression models, extending to a cure rate defined as an asymptotic value of the population survival function when *t* → ∞. This definition does not imply that the observed survival time should be infinite since the censoring time (by death from other causes or the end of study, for example) is finite with probability 1. In practice, a sufficiently long follow-up period from a clinician’s perspective can be interpreted as *t* = ∞. That is, the cure rate model can be expressed as

Thus, our joint cure-survival model (2) allows us to explore a link between the longitudinal measures and the probability of being cured through the shared random effects as well as covariates. Especially, when *Z*_{2}(*t*) and _{2}(*t*) are time-independent covariates, *z*_{2} and _{2}, respectively, the cure rate can be simplified to

where *E _{b}* is taking expectation with respect to

*b*. In fact, it is always true that the conditional cure rate is lim

_{t→∞}

*E*[

_{b}*S*(

*t*|,

*b*)] > 0 (improper survival function), because

*H*(·) is assumed to be finite.

Let *C* be the non-informative censoring time which is independent of (*Y* (·), *T*, *b*) given , and let *X* = min(*T, C*) denote the observed event time. The observed data for the *i*th subject with*m _{i}* repeated measurements are defined as

*O*= {

_{i}*Y*(

_{i}*t*),

_{ik}*X*, Δ

_{i}_{i},

*Z*(

*t*);

*t*≤

_{ik}*X*,

_{i}*t*≤

*X*,

_{i}*i*= 1, …,

*n, k*= 1, …,

*m*}, where Δ

_{i}_{i}=

*I*(

*T*≤

_{i}*C*) with

_{i}*I*(·) being the indicator function. Under the model (2), the log-likelihood function for the observed data is given by

where *f*(*b*; Σ_{b}) is the density function of *b* with the parameters Σ_{b}, and *f*(*t*) = *dF*(*t*)/*dt* and *H*′(*x*) = *dH*(*x*)/*dx* are the first derivatives of *F*(*t*) and *H*(*x*), respectively.

### 2.2 Transformation of Promotion Time Cure Models

In the model (2), *H*(·) represents a transformation function of the conditional cumulative hazard function, which is required to be pre-specified in the analysis. For example, *H*(*x*) can take a form of the logarithmic transformation,

The choices of η = 0 and η = 1 lead to the proportional hazards structure and the proportional odds structure, respectively.

In fact, the transformation *H*(·) has been derived from a biological explanation. Remind that the promotion time cure model without transformation in (1) is based on the conditional independence assumption of {* _{k}* |

*N*;

_{i}*k*= 1, …,

*N*}. However, this assumption may not be satisfied in practice since there are common features shared by the same patient, such as the patient’s underlying health condition or dietary habits. As a solution to adjust the correlated cancer progression times, Zeng et al. (2006) have introduced a subject-specific frailty ζ

_{i}_{i}, and have assumed that {

*|*

_{k}*N*, ζ

_{i}_{i};

*k*= 1, …,

*N*} are mutually independent with the distribution function

_{i}*F*(

*t*). Note that ζ

_{i}can reflect the underlying heterogeneity for the rate of metastatic cancer cells through

*N*following the Poisson distribution with mean ζ

_{i}_{i}π(

*), conditional on (*

**Z**_{i}*, ζ*

**Z**_{i}_{i}). Following the similar derivation to (1), the resulting survival function for the time to relapse

*T*takes a form

where *E*_{ζi} denotes the expectation with respect to ζ_{i}. Explicitly specifying the distribution for ζ_{i} as a gamma distribution with unit mean and variance η, for instance, we can now see a desirable connection between (4) and the transformation *H*(·), as follows:

## 3 Inference Procedure

### 3.1 NPMLEs for Joint Transformation Models

We propose to use the nonparametric maximum likelihood estimation (NPMLE) for estimating parameters $\theta =(\alpha ,\beta ,\psi ,{\sigma}_{e}^{2},\text{Vec}({\mathrm{\Sigma}}_{b}))$ and infinite-dimensional parameter *F*(*t*), where Vec(Σ_{b}) denotes the vector consisting of the upper triangular elements of Σ_{b}. To obtain the NPMLEs, in the log-likelihood function (3), we treat *F* as a step function with jumps only at the observed failure times and replace *f*(*t*) by the jump size of *F* at *t*, which is denoted by *F*{*t*}.

For commonly used transformation functions such as a logarithmic transformation, exp{−*H*(*x*)} can be expressed as the Laplace transformation of some function ϕ(*t*), *t* ≥ 0, such that

For example, if we choose ϕ(*t*) = *t*^{1/η−1} exp(−*t*/η)/{Γ(1/η) η^{1/η}}, then it is true that *H*(*x*) = log(1 + η*x*)/η. Applying the Laplace transformation with a subject-specific frailty ζ_{i} and using the fact that

the observed log-likelihood function (3) can be rewritten as

where *q*_{2i}(*t*) = β^{T} Z_{2i}(*t*) + (ψ ○ *b*)^{T}
_{2i}(*t*), and we assume that ζ_{i} and *b* are independent. The most attractive feature about taking transformation in this way is that the modified log-likelihood (5) can be seen as the proportional hazards frailty model with the conditional hazard function

This makes the algorithm more stable and computationally efficient.

Now, the computation of the NPMLEs is identical to maximizing the modified log-likelihood function with respect to θ and all jump sizes of *F* at the observed failure times. This maximization can be carried out through the following EM algorithm.

### 3.2 EM Algorithm

We describe the EM algorithm, treating ζ_{i} and *b* as missing data to compute the NPMLEs of (θ, *F*{·}). In the E-step, we calculate the conditional expectation of the log-likelihood function for the complete data, given the observed data *O _{i}* and the current parameter estimates. Particularly, we need to evaluate the integration of certain functions of (ζ

_{i},

*b*), say

*Ê*[ζ

_{i}

*g*(

_{i}*b*) |

*O*]. Hereafter, we drop the conditional part on the observed data and the current parameter estimates, and abbreviate such expectation

_{i}*Ê*[ζ

_{i}

*g*(

_{i}*b*) |

*O*] as

_{i}*Ê*[ζ

_{i}

*g*(

_{i}*b*)]. Computation of this expectation can become doable by first obtaining the

*nested*conditional expectation of ζ

_{i}, given

*b*and the observed data. That is,

*Ê*[ζ

_{i}

*g*(

_{i}*b*)] can be calculated as

*Ê*[

_{b}*Ê*

_{ζi}[ζ

_{i}|

*b*]

*g*(

_{i}*b*)]. With the fact that the conditional distribution of ζ

_{i}given

*b*is proportional to

and the useful relationships by the Laplace transformation, the conditional expectation of ζ_{i} given *b* has the form of

where ${\mathrm{x\u0303}}_{i}(b)={\displaystyle {\int}_{0}^{{X}_{i}}}{e}^{{\beta}^{T}\phantom{\rule{thinmathspace}{0ex}}{Z}_{2i}(u)+{(\psi \u25cbb)}^{T}\phantom{\rule{thinmathspace}{0ex}}{\mathrm{Z\u0303}}_{2i}(u)}\mathit{\text{dF}}(u)$. Once *Ê*_{ζi} [ζ_{i} | *b*] is calculated, which is a function of *b*, the conditional expectation *Ê*[ζ_{i}
*g _{i}*(

*b*)] can be computed using numerical approximation methods such as the Gaussian quadrature with Hermite orthogonal polynomial. Since the conditional distribution of

*b*given

*O*is proportional to Γ(

_{i}*O*|

_{i}*b*)

*f*(

*b*; Σ

_{b}), the conditional expectation is calculated by

where

In the M-step, we maximize the following objective function of the expected log-likelihood for the complete data:

under the restriction of ${\sum}_{i=1}^{n}}{\mathrm{\Delta}}_{i}F\{{X}_{i}\}=1$. Maximizing the above objective function over $(\alpha ,{\sigma}_{e}^{2},{\mathrm{\Sigma}}_{b})$ is simple; whereas the rest of parameters (β, ψ, *F*{·}) do not yield the closed-form of maximizers, and hence it is required to involve a reliable numerical approach. By introducing the Lagrange multiplier μ, we solve the following equation for β:

the following equation for ψ:

and the following equation for μ:

where *R _{j}*(

*t*) =

*I*(

*X*≥

_{j}*t*) and

*q*

_{2j}(

*t*) = β

^{T}Z_{2j}(

*t*) + (ψ ○

*b*)

^{T}

_{2j}(

*t*). The restricted NPMLE along with a random effect induced transformation EM was presented by Tsodikov (2002), and the approach has been used in different models as reviewed in Tsodikov et al. (2003). Hence

*F*is estimated as a step function with the following jump size at

*X*:

_{i}
To solve these equations at each M-step, we consider a two-step optimization. In the first step, we estimate μ using the bisection method based on the equation (8) and the fact *F*{*X _{i}*} > 0 (

*i*= 1, …,

*n*). Since the left side of (8) is a monotone decreasing function of μ by considering

*F*{

*X*} as a function of μ in (9), the solution always exists. In the second step, to update β and ψ, we plug the estimates into equations (6) and (7), treat them as the functions of , and solve the equations using one-step Newton-Raphson algorithm. Updating the jump sizes of

_{i}*F*can be easily done by the equation (9) with .

To obtain the NPMLEs, we iterate the E-step and M-step until the parameter estimates converge. The variances of the NPMLEs can be estimated from the inverse of the observed information matrix for all parameters of (θ, *F*{·}), under the restriction of ${\sum}_{i=1}^{n}}{\mathrm{\Delta}}_{i}F\{{X}_{i}\}=1$. The observation information matrix can be computed from the complete data log-likelihood function denoted by ${\ell}_{i}^{c}$ for the *i*th subject using the following Louis formula (Louis, 1982) of

where *u*^{2} = *uu ^{T}*, and

^{2}denote the first and the second derivatives with respect to parameters, and

*Ê*denotes the conditional expectation of a function of

*b*given the observed data and is evaluated at the NPMLEs.

## 4 Asymptotic Properties

Let (, ) denote the NPMLEs and (θ_{0}, *F*_{0}) denote the true parameter values of (θ, *F*). Under the regularity conditions, we will establish the asymptotic properties of the NPMLEs under the following conditions:

- (A1)The true parameter value θ
_{0}belongs to the interior of a compact set Θ within the domain of θ. - (A2)With probability 1,
*Z*(*t*) is left-continuous with uniformly bounded left and right derivatives in [0, ∞]. - (A3)For some constant δ
_{0},*P*(*C*= ∞| ) > δ_{0}> 0 with probability 1. - (A4)For some positive constant ${M}_{0},{M}_{0}^{-1}<{\sigma}_{0e}^{2}<{M}_{0}\text{and}{M}_{0}^{-1}{c}^{T}{\mathrm{\Sigma}}_{0b}c{M}_{0}$ for any constant vector ‖
*c*‖ = 1. - (A5)The transformation functions
*H*(·) are four-times differentiable with*H*(0) = 0 and*H*′(0) > 0. In addition, there exist positive constants μ_{0}and κ_{0}such that$$(1+x)\phantom{\rule{thinmathspace}{0ex}}H\prime \phantom{\rule{thinmathspace}{0ex}}(x)\text{exp}\{-H(x)\}\le {\mu}_{0}{(1+x)}^{-{\kappa}_{0}}.$$Furthermore, there exists a constant ρ_{0}> 0 such thatwhere$$\underset{x}{\text{sup}}\phantom{\rule{thinmathspace}{0ex}}\left\{\frac{|H\u2033(x)|+|{H}^{(3)}(x)|+|{H}^{(4)}(x)|}{H\prime (x){(1+x)}^{{\rho}_{0}}}\right\}<\mathrm{\infty},$$*H*^{(3)}and*H*^{(4)}are the third and fourth derivatives. - (A6)For any deterministic function
*c*(*t*) and a constant υ such that*c*(*t*) ≠ 0 or υ ≠ 0,*P*{*c*(*t*) + υ(^{T}Z*t*) = 0;*t*[0, ∞]} = 0. - (A7)With some positive probability, ${\mathit{Z\u0303}}_{1}^{T}{\mathit{Z\u0303}}_{1}$ has full rank, where
_{1}denotes a matrix with each row equal to the observed covariate_{1}(*t*)^{T}at the time of each measurement. - (A8)Let
*K*be the number of repeated measures and let*d*be the dimension of_{b}*b*. With probability one,*P*(*K*>*d*| ,_{b}*X*) > 0.

Conditions (A1) – (A3) are the standard assumptions in survival analysis. Condition (A4) is necessary to prove the existence of the NPMLEs. It can be easily verified that Condition (A5) holds for all transformations commonly used, including the logarithmic transformations described in Section 2. Conditions (A6) – (A7) entail the linear independence of design matrices of covariates for the fixed and random effects. Condition (A8) prescribes that some subjects have at least *d _{b}* repeated measures.

Under the above conditions, the following theorem shows the consistency of the NPMLEs (, ).

**Theorem 1**
*Under Conditions (A1) – (A8),*

Theorem 1 then leads to the following results on the asymptotic normality of (, ) and the asymptotic efficiency of .

**Theorem 2**
*Under Conditions (A1) – (A8)*, $\sqrt{n}\phantom{\rule{thinmathspace}{0ex}}(\mathrm{\theta \u0302}-{\theta}_{0},\mathrm{F\u0302}(t)-{F}_{0}(t))$
*weakly converges to a zero-mean Gaussian process in R*^{dθ} × *BV* [0, ∞], *where d*_{θ}
*is the dimension of* θ *and BV* [0, ∞] *denotes the space of all functions with bounded variations in* [0, ∞]. *Furthermore, the asymptotic covariance matrix of*
$\sqrt{n}\phantom{\rule{thinmathspace}{0ex}}(\mathrm{\theta \u0302}-{\theta}_{0})$
*achieves the semiparametric efficiency bound for* θ_{0}.

Furthermore, in Appendix, we show that the inverse of the observed information matrix is a consistent estimator of the asymptotic covariance matrix of the NPMLEs. This result allows us to make inference for any functional of (θ, *F*(*t*)). To prove Theorems 1–2, we apply the general asymptotic theory of Zeng and Lin (2007). The desired asymptotic properties of the NPMLEs are established followed by the arguments in Appendix B of Zeng and Lin (2007) if we can verify that their regularity conditions hold for our joint cure-survival model setting. Checking the regularity conditions, however, is challenging in our cases. The detailed proofs are provided in Appendix.

## 5 Simulation Studies

In this section, we demonstrate the finite sample performance of the proposed method through extensive simulation studies. The longitudinal data are generated from

and the survival data with a cure proportion are generated from transformation models

where *z*_{1} is a dichotomous covariate taking the value of 0 or 1 with the equal probability of 0.5, *z*_{2} is a continuous covariate generated from a uniform distribution on [−1, 1], and $\epsilon (t)\phantom{\rule{thinmathspace}{0ex}}~\phantom{\rule{thinmathspace}{0ex}}N(0,{\sigma}_{e}^{2})$ is assumed with ${\sigma}_{e}^{2}=1$. The true failure distribution function in the uncured subpopulation is set to be *F*(*t*) = 1 − exp(−*t*).

For each subject, the correlation within repeated measures is reflected by the subject-specific random intercept $b\phantom{\rule{thinmathspace}{0ex}}~\phantom{\rule{thinmathspace}{0ex}}N(0,{\sigma}_{b}^{2})\text{with}{\sigma}_{b}^{2}=0.5$, and the negative, no, and positive dependences between the longitudinal measures and the cure-survival rate are simulated through different ψ values of −0.3, 0, and 0.3, respectively. For the cure-survival model, we consider three types of transformations *H*(·) representing the proportional hazards structure (η = 0), the proportional odds structure (η = 1), and a transformation in the middle of them with η = 0.5.

The non-informative censoring time *C _{i}* is generated from a uniform distribution with varying rates, depending on the chosen transformation, to design a 30~45% chance of being right-censored and a 20% chance of being cured. We set longitudinal measures to be observed every 0.2 unit of time so that each individual can have about 3 repeated measures, on average.

The results based on 1000 replications are presented in Tables 1–3 for *n*=200 and *n*=400. Tables 1–3 include the average of the differences between the true parameter and the estimates (Bias), the sample standard deviation of the parameter estimators (SE), and the average of the standard error estimators (SEE), and the coverage probability of 95% confidence intervals (CP). The confidence intervals for ${\sigma}_{e}^{2}\text{and}{\sigma}_{b}^{2}$ are constructed based on the the Satterthwaite approximation.

Table 1 shows that the NPMLEs under the proportional hazards structure *H*(*x*) = *x* are noticeably unbiased, the standard error estimators calculated via the Louis formula well reflect the true variations of the proposed estimators, and the coverage probabilities are in a reasonable range, even with a moderate sample size of 200. As the sample size increases to 400, the biases slightly increase for some estimates; however, they are still very small comparing to the sizes of true parameter values and the variations of the parameter estimators become smaller, and hence the coverage probabilities still lie in a reasonable range. The simulation results shown in Tables 2–3 are similar to those for Table 1, indicating that the proposed method seems to work well for *H*(*x*) = 2 log(1 + *x*/2) and *H*(*x*) = log(1 + *x*).

## 6 Data Application

The proposed method was applied to the data from the Health Professionals Follow-up Study (HPFS), a large observational study of male health professionals living in the United States. The main interest of this analysis was to jointly model the relationship between longitudinal vitamin D intake and the survival-cure rate of colorectal cancer (CRC) as endpoints. Since the focus was on the cure of CRC, we restricted our study population to 810 patients who were diagnosed with colorectal cancer between January 1986 and January 2006, without missing at any of the covariates included in the model.

For each subject, vitamin D intakes were assessed via food frequency questionnaires at approximately every four-year intervals between 1986 and 2002. To identify the cure of CRC, we set colorectal cancer-specific death as the event of interest (Ng et al., 2008), while treating deaths from other than CRC or alive until January 2006 as being censored. Then, the cured patients can be defined as a subpopulation among the censored one who have been followed-up sufficiently long enough to be considered cured. In the HPFS during 20 years of follow-up, 250 (31%) colorectal cancer-specific deaths were observed. Based on the Kaplan-Meier survival curve in Figure 1 (a), we found that the estimated survival rate at the end of study was very high (64%) even after a sufficient follow-up period (i.e. 20 years), and the earliest point in time that the curve goes flat, 12.5 year of follow-up, was the point at which all remaining disease-free survivors were declared to be cured. There were 116 patients who has been considered cured in the HPFS. We note that some of patients who were right-censored before 12.5 years might indeed have been cured to CRC, but it was inconclusive due to the right-censoring.

We fitted the proposed joint cure model for the longitudinal vitamin D trend and CRC death with the patient’s medical information at diagnosis; age, body mass index (BMI), and indicators for tumor differentiation grade (i.e. 1=poor or unspecified, 0=well or moderate) and distant metastases (i.e. 1=yes, 0=no) were included as covariates. Among them, age variable was centered at mean 68 and divided by 10 to represent a decade, and BMI was centered at mean 25*kg/m*^{2}. In addition, a subject-specific random intercept was included in both longitudinal and cure-survival models to account for the correlation between these two outcomes. To explore the possibility of the proportional hazards and the proportional odds structures in cure-survival data, we also applied transformation models *H*(*x*) = log(1 + η)/η to cure-survival data by varying η values in [0, 1] at every 0.1 increment. We used the Akaike information criterion (AIC) to determine the best form of transformation (i.e. η), and the smallest AIC value was achieved at η = 0, implying the joint proportional hazards cure (PHC) model was the best fit to the data. Although in the HPFS example the final transformation turned out to be the joint PHC model, in the lack of model-diagnostic tools for the joint modeling, it is valuable to consider transformations to confirm the fit is the best among the class of transformations we consider. To show the impact of transformation on the parameter estimates, Table 4 summarizes the analysis results under the joint PHC model (η = 0) and the joint proporitional odds cure model (η = 1).

^{2}distributions is used for testing variances.

Under the selected best transformation model, Figure 1 (b) displays the estimated baseline survival distribution for the uncured patients (*X _{i}* < ∞) along with their pointwise 95% confidence intervals. In Figure 1 (b) we note that the tail probability of the estimated baseline survival curve reached zero. The results in Table 4 show that 1) older patients tended to take more vitamin D, and were more likely to be uncured to CRC; and 2) patients with distant metastases appeared to take vitamin D less, and were more likely to be uncured to CRC. The significant negative suggested that there was a protective effect of vitamin D intake in relation to the risk of CRC death, which was not explained by the common covariates in both longitudinal and survival components (

*p*= 0.026). As an example of quantitative interpretation, the marginal survival rates (≤12.5 years) and cure rates (>12.5 years) for the whole population are given in Figure 2. For instance, when comparing the curves to reference (age of 68, BMI = 25

*kg/m*

^{2}, well or moderate differentiation, no distant metastases), we can see that the cure rate at age of 78 decreased to 68% from 80% at age of 68, while one for CRC patients with distance metastases decreased to 6% from 80% with no distance metastases. The curves have been obtained by

*E*[

_{b}*S*(

*t*|

*z*

_{2},

*b*)] =

*E*[exp{−

_{b}*H*(

*e*

^{βT z2+ψb}

*F*(

*t*))}] evaluated at the NPMLEs for a given covariate

*z*

_{2}, and their 95% pointwise confidence intervals can also be obtained by applying the functional delta method and evaluating at the NPMLEs.

## 7 Concluding Remarks

We have proposed the joint transformation model for longitudinal and survival data which takes the possibility of patients being cured or immune to disease into account. The proposed approach has the advantages of handling time-varying covariates and providing an easier way to explore a large class of cure models in a unified way. We have used the NPMLEs for estimating the model parameters, and the resulting NPMLEs have been shown to be asymptotically normal and efficient. Simulation studies have showed that the proposed estimation procedures produced consistent estimators, and the new EM algorithm enabled to compute the NPMLEs in a simpler and more stable way.

As an example of *H*(·), we considered a class of logarithmatic transformations, which can be misspecified in practice because of limited knowledge or complex relationships between covariates. As an alternative choice, we also explored the performance of a class of Box-Cox transformations,

and the selected transformation function was robust to the class of transformations considered. Based on our experiences, it appears that the form of transformations is less important than the problem of selecting the transformation parameter η. We used the AIC to determine the best transformation parameter, but there exist other criteria for model selection such as the Bayes information criterion and cross-validation (‘leave-one-subject-out’). The differentiability conditions on *H*, as in the first part of the Condition (A5), are satisfied with any class of transformations induced by a random effect. Indeed, the validity of our asymptotic properties proven here is not restricted to these frailty class transformations. Other transformations which are not generated by a frailty, for instance, the Box-Cox transformations with γ > 1, can also satisfy the Condition (A.5). We further note that in this article the frailty representation relating to the Laplace transformation has been introduced to facilitate easy use of EM computation.

In this paper, we assumed that the number of observations of repeated measures are independent of cure-survival data. To account for the informative observation times, our joint cure model can be extended by jointly modeling another recurrent event process. Another promising extension of our joint cure model would be to the context of generalized linear mixed models (GLM) to analyze discrete longitudinal outcomes. It is rather obvious that the general approach presented here is still applicable to GLMs, but some specific parts related to estimation procedures of longitudinal components need to be modified accordingly.

## Acknowledgements

The authors would like to thank the referees for helpful comments and gratefully acknowledge use of the Health Professionals Follow-up Study data, funded by NIH/NCI grant P01 CA055075.

## Appendix

#### Proofs of Asymptotic Properties

This section proves Theorems 1–2 stated in Section 4 by applying the general asymptotic theory of Zeng and Lin (2007). Specifically, it is easy to see that our conditions (A1) – (A8) imply (C1) – (C4), (C6), (C8) of Zeng and Lin (2007), and it remains to prove the two identifiability conditions (C5) and (C7) of Zeng and Lin (2007). The first identifiability is the key step to prove the consistency of the NPMLEs, and the second is to entail the invertibility of the observed information matrix at the true parameters for the proof of the asymptotic normality.

**Proof 1**
*First, we verify the first identifiability condition (C5) in Appendix B of*
Zeng and Lin (2007). *Suppose that the likelihood function for*
$\mathit{(}\alpha ,\beta ,\psi ,{\sigma}_{e}^{2},\mathit{\text{Vec}}\mathit{(}{\mathrm{\Sigma}}_{b}\mathit{)}\mathit{)}$
*is the same as that for the true parameter values*
$\mathit{(}{\alpha}_{0},{\beta}_{0},{\psi}_{0},{\sigma}_{0e}^{2},\mathit{\text{Vec}}\mathit{(}{\mathrm{\Sigma}}_{0b}\mathit{)}\mathit{)}$. *That is, for arbitrary K* > 0,

*where bold Y denotes the vector of the observed longitudinal measures at time s*

_{1}, …,

*s*

_{K}, and**Z**_{1}

*and*

_{1}

*in bold type denote matrices with each row equal to the observed covariate Z*

_{1}(

*s*)

_{k}

^{T}and_{1}(

*s*)

_{k}*= 1, …,*

^{T}at k*K, respectively. In addition,*$q(t)={\displaystyle {\int}_{0}^{t}}{e}^{{\beta}^{T}\phantom{\rule{thinmathspace}{0ex}}{Z}_{2}(u)+{(b\u25cb\psi )}^{T}\phantom{\rule{thinmathspace}{0ex}}{\mathrm{Z\u0303}}_{2}(u)}\mathit{\text{dF}}(u)$,

*and q*

_{0}(

*t*)

*is q*(

*t*)

*evaluated at the true parameter values, and f*(

*b*; Σ

_{b})

*is the density function of the (multivariate) normal distribution with mean zeros and covariance matrix*Σ

_{b}.

*From now, we take the following actions on both sides of*(10).

**Step 1:**
*For the proof of the identifiability of the longitudinal component, we consider a case* Δ = 0 *and X* ≈ 0.

*Using the fact that* ∫ *b f*(*b*; Σ_{b}) *db* = ∫ *b f*(*b*; Σ_{0b}) *db* = 0 *and considering E*[*Y* (*s _{k}*)]

*conditional on b, we have*${\alpha}^{T}\phantom{\rule{thinmathspace}{0ex}}{Z}_{1}({s}_{k})={\alpha}_{0}^{T}{Z}_{1}({s}_{k})$,

*for k*= 1, …,

*K. By Condition (A6), we prove*α = α

_{0}.

*Similarly, we consider E*[

*Y*(

*s*)

_{k}*Y*(

*s*

_{k′})]

*and Var*(

*Y*(

*s*)),

_{k}*given b, and obtain for k*≠

*k*′

*followed by the proof of* Σ_{b} = Σ_{0b}
*from (A6), and*

*for k* = 1, …, *K. Accordingly, we have that*
${\sigma}_{e}^{2}={\sigma}_{0e}^{2}$.

**Step 2:**
*For the survival component, suppose* Δ = 0 *and X* = *t*. *Then,*
(10)
*implies*

*where b follows a normal distribution with mean*
${\mu}_{b}={V}_{b}{\mathit{Z\u0303}}_{1}^{T}(\mathit{Y}-{\mathit{Z}}_{1}{\alpha}_{0})/{\sigma}_{0e}^{2}$
*and covariance matrix*
${V}_{b}={[{\displaystyle {\mathrm{\Sigma}}_{0b}^{-1}}+{\mathit{Z\u0303}}_{1}^{T}{\mathit{Z\u0303}}_{1}/{\sigma}_{0e}^{2}]}^{-1}$. *For fixed Y, Z*

_{1},

*and*

_{1},

*since b is the complete statistic for*μ

_{b},

*we can have that*

*Furthermore, it is followed from the one-to-one mapping of H and exponential function that*

*with probability 1. By taking the expectation with respect to b for fixed Y, Z*

_{1},

*and*

_{1},

*we conclude that*β = β

_{0},

*f*(

*t*) =

*f*

_{0}(

*t*)

*and*ψ = ψ

_{0}

*from the Condition (A6)*.

**Proof 2**
*Next, we verify the second identifiability condition (C7) in Appendix B of*
Zeng and Lin (2007). *It starts from the score equation along with the path*
$\mathit{(}{\alpha}_{0}+\xi {\nu}_{1},{\beta}_{0}+\xi {\nu}_{2},{\psi}_{0}+\xi {\nu}_{3},{\sigma}_{0e}^{2}+\xi {\nu}_{4},\mathit{\text{Vec}}({\mathrm{\Sigma}}_{0b})+\xi {\nu}_{b},{F}_{0}+\xi \phantom{\rule{thinmathspace}{0ex}}{\displaystyle \int}{\mathit{\text{h dF}}}_{0}\mathit{)}$. *We define D _{b} as the symmetric matrix such that Vec*(

*D*) = ν

_{b}_{b}.

**Step 1:**
*To make the score equation simple for the proofs of* ν_{1} = 0, ν_{4} = 0 *and D _{b}* = 0,

*we consider the same case*Δ = 0

*and X*≈ 0

*as used in Step 1 of the first identifiability proof. We define*

*then, the score equation is given by*

*By comparing coefficients for the constant, linear and quadratic terms of* (** Y** −

*Z*_{1}α

_{0}),

*we have that*

*Since*
$[I-{\mathit{Z\u0303}}_{1}{V}_{b}{\mathit{Z\u0303}}_{1}^{T}/{\sigma}_{0e}^{2}]$
*is positive definite, we can see that* ν_{1} = 0 *in*
(13). *To simplify*
(14), *we multiply*
${\mathit{Z\u0303}}_{1}^{T}$
*from the left, *_{1}
*from the right, and then*
${[{\mathit{Z\u0303}}_{1}^{T}{\mathit{Z\u0303}}_{1}]}^{-1}$
*from the right on both sides of*
(14). *Using the fact that*
${\mathrm{\Sigma}}_{0b}^{-1}}{D}_{b}=I-{\mathit{Z\u0303}}_{1}^{T}{\mathit{Z\u0303}}_{1}{V}_{b}/{\sigma}_{0e}^{2},$, *the*
equation (14)
*becomes*

*and the*
equation (12)
*becomes*

*After taking the trace of*
(15)
*and subtracting from the*
equation (16), *we obtain that*

*where d _{b} stands for the dimension of b. Based on Condition (A8), we conclude that* ν

_{4}= 0,

*and hence D*= 0

_{b}*in*(15)

*by Condition (A7)*.

**Step 2:**
*For the second identifiability of the survival component, we set* Δ = 0 *and X* = *t. Then, the score equation can be written as*

*where*
${\mathrm{q\u0307}}_{0}(t)={\displaystyle {\int}_{0}^{t}}\{h(u)+{\nu}_{2}^{T}{Z}_{2}(u)+{({\nu}_{3}\u25cb\phantom{\rule{thinmathspace}{0ex}}b)}^{T}\phantom{\rule{thinmathspace}{0ex}}{\mathrm{Z\u0303}}_{2}(u)\}\phantom{\rule{thinmathspace}{0ex}}{e}^{{\beta}_{0}^{T}{Z}_{2}(u)+{(b\u25cb{\psi}_{0})}^{T}\phantom{\rule{thinmathspace}{0ex}}{\mathrm{Z\u0303}}_{2}(u)}{\mathit{\text{dF}}}_{0}(u)$
*and b is normally distributed with mean* μ_{b} and covariance matrix V_{b}. By the completeness of the exponential family of b, we can have

*for any fixed Y, Z*

_{1}

*and*

_{1}

*with probability 1. Since H*(

*q*

_{0}(

*t*)) > 0

*for*∀

*t*> 0

*from (A5), we can obtain*

_{0}(

*t*) = 0,

*and hence*

*Clearly, we attain* ν_{2} = 0, ν_{3} = 0 *and h* = 0 *by (A6)*.

Finally, we complete the proofs of Theorems 1 – 2 by Theorems 1 – 2 in Zeng and Lin (2007). Let *I _{n}* denote the negative Hessian matrix of the observed log-likelihood function with respect to (θ,

*F*{·}). As a remark, by following Theorem 3 in Zeng and Lin (2007), we can show that

*I*is invertible for large n, and $({\nu}^{T},{U}^{T}){\mathit{\text{nI}}}_{n}^{-1}{({\nu}^{T},{U}^{T})}^{T}$ is the consistent estimator of the asymptotic variance of

_{n}
where *U* is the vector of *u*(·) at the observed failure times.

## Contributor Information

Sehee Kim, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, U.S.A.

Donglin Zeng, Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599, U.S.A.

Yi Li, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, U.S.A.

Donna Spiegelman, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA 02115, U.S.A.

## References

- Berkson J, Gage R. Survival curve for cancer patients following treatment. Journal of the American Statistical Association. 1952;47:501–515.
- Brown E, Ibrahim J. Bayesian approaches to joint cure-rate and longitudinal models with applications to cancer vaccine trials. Biometrics. 2003;59:686–693. [PubMed]
- Chen M, Ibrahim J, Sinha D. A new Bayesian model for survival data with a surviving fraction. Journal of the American Statistical Association. 1999;94:909–919.
- Chen M, Ibrahim J, Sinha D. A new joint model for longitudinal and survival data with a cure fraction. Journal of Multivariate Analysis. 2004;91:18–34.
- Kuk A, Chen C. A mixture model combining logistic regression with proportional hazards regression. Biometrika. 1992;79:531–541.
- Laska E, Meisner M. Nonparametric estimation and testing in a cure model. Biometrics. 1992;48:1223–1234. [PubMed]
- Law N, Taylor J, Sandler H. The joint modeling of a longitudinal disease progression marker and the failure time process in the presence of cure. Biostatistics. 2002;3:547–563. [PubMed]
- Louis T. Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society, Series B. 1982;44:226–233.
- Lu W, Ying Z. On semiparametric transformation cure models. Biometrika. 2004;91:331–343.
- Maller R, Zhou X. Survival analysis with long-term survivors. Wiley Chichester; 1996.
- Ng K, Meyerhardt J, Wu K, Feskanich D, Hollis B, Giovannucci E, Fuchs C. Circulating 25-hydroxyvitamin d levels and survival in patients with colorectal cancer. Journal of Clinical Oncology. 2008;26:2984–2991. [PubMed]
- Sy J, Taylor J. Estimation in a Cox proportional hazards cure model. Biometrics. 2000;56:227–236. [PubMed]
- Tsodikov A. A proportional hazards model taking account of long-term survivors. Biometrics. 1998;54:1508–1516. [PubMed]
- Tsodikov A. Semiparametric models of long- and short-term survival: an application to the analysis of breast cancer survival in utah by age and stage. Statistics in medicine. 2002;21:895–920. [PubMed]
- Tsodikov A, Ibrahim J, Yakovlev A. Estimating cure rates from survival data: an alternative to two-component mixture models. Journal of the American Statistical Association. 2003;98:1063–1078. [PMC free article] [PubMed]
- Yakovlev A, Tsodikov A, Asselain B. Stochastic models of tumor latency and their biostatistical applications. New Jersey: World Scientific; 1996.
- Yu M, Law N, Taylor J, Sandler H. Joint longitudinal-survival-cure models and their application to prostate cancer. Statistica Sinica. 2004;14:835–862.
- Yu M, Taylor J, Sandler H. Individual prediction in prostate cancer studies using a joint longitudinal survival-cure model. Journal of the American Statistical Association. 2008;103:178–187.
- Zeng D, Lin DY. Maximum likelihood estimation in semiparametric regression models with censored data. Journal of the Royal Statistical Society, Series B. 2007;69:507–564.
- Zeng D, Yin G, Ibrahim J. Semiparametric transformation models for survival data with a cure fraction. Journal of the American Statistical Association. 2006;101:670–684.

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (1.1M)

- Joint Models of Longitudinal Data and Recurrent Events with Informative Terminal Event.[Stat Biosci. 2012]
*Kim S, Zeng D, Chambless L, Li Y.**Stat Biosci. 2012 Nov 1; 4(2):262-281.* - Semiparametric transformation models with random effects for joint analysis of recurrent and terminal events.[Biometrics. 2009]
*Zeng D, Lin DY.**Biometrics. 2009 Sep; 65(3):746-52. Epub 2008 Sep 29.* - Semiparametric Maximum Likelihood Estimation in Normal Transformation Models for Bivariate Survival Data.[Biometrika. 2008]
*Li Y, Prentice RL, Lin X.**Biometrika. 2008 Dec; 95(4):947-960.* - Gamma frailty transformation models for multivariate survival times.[Biometrika. 2009]
*Zeng D, Chen Q, Ibrahim JG.**Biometrika. 2009 Jun; 96(2):277-291.* - SEMIPARAMETRIC TRANSFORMATION MODELS WITH RANDOM EFFECTS FOR CLUSTERED FAILURE TIME DATA.[Stat Sin. 2008]
*Zeng D, Lin DY, Lin X.**Stat Sin. 2008 Jan 1; 18(1):355-377.*

- PubMedPubMedPubMed citations for these articles