• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
J Stat Theory Pract. Author manuscript; available in PMC Aug 5, 2013.
Published in final edited form as:
J Stat Theory Pract. Apr 1, 2013; 7(2): 324–344.
Published online Apr 5, 2013. doi:  10.1080/15598608.2013.772036
PMCID: PMC3733282

Joint Modeling of Longitudinal and Cure-survival Data


This article presents semiparametric joint models to analyze longitudinal measurements and survival data with a cure fraction. We consider a broad class of transformations for the cure-survival model, which includes the popular proportional hazards structure and the proportional odds structure as special cases. We propose to estimate all the parameters using the nonparametric maximum likelihood estimators (NPMLE). We provide the simple and efficient EM algorithms to implement the proposed inference procedure. Asymptotic properties of the estimators are shown to be asymptotically normal and semiparametrically efficient. Finally, we demonstrate the good performance of the method through extensive simulation studies and a real-data application.

Keywords: Cure-survival data, Joint models, Longitudinal data, Nonparametric maximum likelihood, Random effects, Transformation models

1 Introduction

In cancer research, along with repeatedly measured biomarkers, it is often observed that a certain proportion of patients are cured, immune, or unsusceptible to the event of interest. For example, cancer can be considered cured if all metastasis-competent tumor cells are successfully removed by treatment, and hence no recurrence will be observed. In these situations since the survival curve will never reach to zero due to these cured patients, the standard survival analyses based on non-cure models are inconvenient for inference on cure rates. The existence of this possible cure fraction can be evident from a stable plateau at the right tail of survival curve as in Figure 1 (a).

Figure 1
In the HPFS data (N=810) (a) Kaplan-Meier survival curve of the entire study population; (b) Estimated baseline survival function (i.e. F(t)) of the uncured subpopulation under H(x) = x. The solid curves are point estimates, and the dotted ...

To analyze such cure-survival data, two classes of cure rate models are commonly used: mixture cure model and promotion time cure model. The former mixture cure model is on the basic concept that the underlying population is a mixture of two subpopulations of the cured and uncured with the respective probabilities pc(Zi) and 1 − pc(Zi), that is Spop(t|Zi) = pc(Zi) + {1 − pc(Zi)}Suc(t|Zi), where Zi is the vector of covariates, Suc(t|Zi) is the conditional survival function of the uncured population (Berkson and Gage, 1952; Laska and Meisner, 1992; Maller and Zhou, 1996; Kuk and Chen, 1992; Sy and Taylor, 2000; Lu and Ying, 2004). It is assumed that all patients in the uncured subpopulation will eventually experience the event while those in the cured subpopulation will never. Lu and Ying (2004) considered a class of transformation models for the event time where generalized estimating equations were used for parameter estimation, and the asymptotic properties were established by the usual counting process and its associated martingale theory. However, their linear transformation approach was limited to time-independent covariates only. Therefore, as alternative to the mixture cure models, we focus on the promotion time cure model that can handle transformation with time-dependent covariates when there is a need for transformations, for example, in cases where the violated proportional hazards assumption is of concern. The promotion time cure model has been proposed under biological assumptions that a patient has N metastatic tumor cells remaining after treatment (Yakovlev et al., 1996; Tsodikov, 1998; Chen et al., 1999; Tsodikov et al., 2003) in cancer clinical trials. Let Ni be the number of metastatic cancerous cells of the ith patient, which is an unobservable latent variable, following a Poisson distribution with mean π(Zi). We denote the time for the kth metastatic cancer cell to produce a detectable tumor (promotion time) by Tk (k = 1, …, Ni) and assume that, conditional on Ni, Tk’s are independent and identically distributed (iid) with a common distribution function F(t). Viewing F(t) = 1 − Suc(t), it can be interpreted similarly to the distribution function of the uncured patients in the mixture cure model. Then, the time to relapse of cancer for the ith patient, defined by Ti = min{T1, …, TNi}, has a form of the survival function

S(t|Zi)=P[Ni=0]+k1P[1>t,,Ni>t|Ni=k]P[Ni=k]=exp{π(Zi)}+k1{1F(t)}kπ(Zi)k exp{π(Zi)}k!=exp{π(Zi)F(t)},

where π(Zi) is a known link function. In the promotion time cure model (1), the survival function is integrated into one formulation regardless of cured or uncured, and we can see that (1) retains the proportional hazards structure when π(Zi) = exp(βT Zi). Moreover, if the regression coefficients β include an intercept term, say β0, the baseline cumulative hazard function is equal to exp(β0)F(t), which implies that the model (1) becomes the Cox proportional hazards model with a bounded baseline cumulative hazard. If the cured patients exist in the population, the survival rate at t = ∞ can naturally be interpreted as the cure rate, i.e., the cure rate is S(∞|Zi) = exp{−π(Zi)} ≠ 0, leading to an improper survival function.

Some of these curable diseases are associated with longitudinal biomarkers, and it is often of interest to model these two different types of data as outcomes simultaneously. We will therefore propose a joint model to analyze longitudinal and survival data with a cure fraction. While there has been a great deal of work done on joint modeling of longitudinal and cure-survival data based on the mixture cure models (Law et al., 2002; Yu et al., 2004, 2008), there has been scant literature about joint modeling based on the promotion time cure models. For a more detailed review on the joint mixture cure models see Yu et al. (2004). Instead, we will provide a brief review of the alternative approach, joint promotion time cure models that we adopt here. Brown and Ibrahim (2003) and Chen et al. (2004) proposed joint promotion time cure model with emphasis on two different types of longitudinal models. To model immune response, Brown and Ibrahim (2003) proposed a longitudinal model with a point mass at zero that changed over time and subjects in probability. On the other hand, Chen et al. (2004) considered a true immune response was unobservable, and adopted the longitudinal model in the context of measurement errors. In common, they considered a piecewise constant function to estimate the baseline distribution F(t). For inference, they used Bayesian approaches via Gibbs sampler and Markov chain Monte Carlo sampling, which may be straightforward ways to proceed, but computationally intensive, compared to frequentist analyses. Therefore, we propose a joint cure model based on a frequentist approach to balance out complication brought by relaxing a functional form of the baseline distribution F(t) as well as the proportionality assumption.

The objective of the article is to present a flexible joint promotiontime cure model based on a frequentist inference that is a new approach to the existing joint promotiontime cure models. To account for the long-term plateau at the tail of survival distribution resulting from the existence of cured patients, we propose a broad class of transformed promotion time cure models, which integrate the popular proportional hazards and odds cure models in to one general form. Inference procedures using the nonparametric maximum likelihood estimation (NPMLE) are developed, a simple and efficient algorithm is provided for its implementation, and the new joint cure model is illustrated with the colorectal cancer data from the Health Professionals Follow-up Study (HPFS). The proposed work here advances to existing joint cure models in: 1) flexibility throughout nonparametric baseline distribution function and transformation models; 2) extended ability to handle time-varying covariates; and 3) well-established asymptotic properties of the NPMLEs.

2 Joint Transformation Models

2.1 Joint Models

Let Y (t) be the longitudinal measurement at time t, T be the time to the survival event, and Z = {Z(t); t ≥ 0} be the covariate process, where Z(t) is the vector of external covariates at time t, possibly time-varying. We introduce latent random effects to account for the correlation between longitudinal and survival components on the same subject. Particularly, let b denote the subject-specific random effects following a multivariate normal distribution with mean zeros and covariance matrix Σb. We further assume that Y (t) and T are independent, conditional on Z and b. Then, the proposed joint model for the longitudinal data Y (·) and the population survival function of T with a cure fraction are given by

Y(t)=αTZ1(t)+bT1(t)+ε(t),S(t|𝒵,b)=exp {H(0teβTZ2(u)+(ψb)T2(u) dF(u))},

where α and β are vectors of unknown regression parameters in the longitudinal and survival components, respectively, Zk(t) and Zk(t) (k = 1, 2) are subsets of Z(t) plus the unit component, and F(t) is an unspecified distribution function of the event times. In addition, ε(t) is a white noise process with mean zero and variance σe2, ψ is a set of unknown constants with the same number of elements as b, and ψ ○ b denotes the component-wise product of ψ and b. Note in (2) that the correlation among the longitudinal outcomes is formulated through the latent random effects b, and that the association between longitudinal outcomes and the event time is characterized by ψ with the shared latent variables b. Thus, for a fixed covariate Z, ψ > 0 implies the larger longitudinal measures are, the higher hazard rate of the event is. On the other hand, ψ = 0 implies that the association can be fully explained by the common covariates in both longitudinal and survival components. The transformation function H(·) is assumed to be continuously differentiable and strictly increasing, and we will discuss about H(·) in more detail in Section 2.2.

We notice that the survival model for the entire population in (2) encompasses more general regression models, extending to a cure rate defined as an asymptotic value of the population survival function when t → ∞. This definition does not imply that the observed survival time should be infinite since the censoring time (by death from other causes or the end of study, for example) is finite with probability 1. In practice, a sufficiently long follow-up period from a clinician’s perspective can be interpreted as t = ∞. That is, the cure rate model can be expressed as

limtS(t|𝒵,b)=exp {H(0eβTZ2(u)+(ψb)T2(u) dF(u))}.

Thus, our joint cure-survival model (2) allows us to explore a link between the longitudinal measures and the probability of being cured through the shared random effects as well as covariates. Especially, when Z2(t) and Z2(t) are time-independent covariates, z2 and z2, respectively, the cure rate can be simplified to


where Eb is taking expectation with respect to b. In fact, it is always true that the conditional cure rate is limt→∞ Eb[S(t|Z, b)] > 0 (improper survival function), because H(·) is assumed to be finite.

Let C be the non-informative censoring time which is independent of (Y (·), T, b) given Z, and let X = min(T, C) denote the observed event time. The observed data for the ith subject withmi repeated measurements are defined as Oi= {Yi(tik), Xi, Δi, Z(t); tikXi, tXi, i = 1, …, n, k = 1, …, mi}, where Δi = I(TiCi) with I(·) being the indicator function. Under the model (2), the log-likelihood function for the observed data is given by

i=1nlogbk=1mi[12πσe2 exp {(Yi(tik)αTZ1i(tik)bT1i(tik))22σe2}]×[f(Xi)eβTZ2i(Xi)+(ψb)T2i(Xi)H(0XieβTZ2i(u)+(ψb)T2i(u) dF (u))]Δi×exp {H(0XieβTZ2i(u)+(ψb)T2i(u) dF (u))}×f(b;Σb) db,

where f(b; Σb) is the density function of b with the parameters Σb, and f(t) = dF(t)/dt and H′(x) = dH(x)/dx are the first derivatives of F(t) and H(x), respectively.

2.2 Transformation of Promotion Time Cure Models

In the model (2), H(·) represents a transformation function of the conditional cumulative hazard function, which is required to be pre-specified in the analysis. For example, H(x) can take a form of the logarithmic transformation,


The choices of η = 0 and η = 1 lead to the proportional hazards structure and the proportional odds structure, respectively.

In fact, the transformation H(·) has been derived from a biological explanation. Remind that the promotion time cure model without transformation in (1) is based on the conditional independence assumption of {Tk | Ni; k = 1, …, Ni}. However, this assumption may not be satisfied in practice since there are common features shared by the same patient, such as the patient’s underlying health condition or dietary habits. As a solution to adjust the correlated cancer progression times, Zeng et al. (2006) have introduced a subject-specific frailty ζi, and have assumed that {Tk | Ni, ζi; k = 1, …, Ni} are mutually independent with the distribution function F(t). Note that ζi can reflect the underlying heterogeneity for the rate of metastatic cancer cells through Ni following the Poisson distribution with mean ζiπ(Zi), conditional on (Zi, ζi). Following the similar derivation to (1), the resulting survival function for the time to relapse T takes a form


where Eζi denotes the expectation with respect to ζi. Explicitly specifying the distribution for ζi as a gamma distribution with unit mean and variance η, for instance, we can now see a desirable connection between (4) and the transformation H(·), as follows:


3 Inference Procedure

3.1 NPMLEs for Joint Transformation Models

We propose to use the nonparametric maximum likelihood estimation (NPMLE) for estimating parameters θ=(α,β,ψ,σe2,Vec(Σb)) and infinite-dimensional parameter F(t), where Vec(Σb) denotes the vector consisting of the upper triangular elements of Σb. To obtain the NPMLEs, in the log-likelihood function (3), we treat F as a step function with jumps only at the observed failure times and replace f(t) by the jump size of F at t, which is denoted by F{t}.

For commonly used transformation functions such as a logarithmic transformation, exp{−H(x)} can be expressed as the Laplace transformation of some function ϕ(t), t ≥ 0, such that

exp{H(x)}=0exp(xt)ϕ(t) dt.

For example, if we choose ϕ(t) = t1/η−1 exp(−t/η)/{Γ(1/η) η1/η}, then it is true that H(x) = log(1 + ηx)/η. Applying the Laplace transformation with a subject-specific frailty ζi and using the fact that

H(x) exp{H(x)}=0ζ exp(xζ)ϕ(ζ)dζ,

the observed log-likelihood function (3) can be rewritten as

ln(θ,F{·})=i=1nlogbk=1mi[12πσe2 exp {(Yi(tik)αTZ1i(tik)bT1i(tik))22σe2}]×ζi[ζiF{Xi}eq2i(Xi)]Δi exp {0Xiζieq2i(u) dF(u)}ϕ(ζi)dζi×f(b;Σb) db,

where q2i(t) = βT Z2i(t) + (ψ ○ b)T Z2i(t), and we assume that ζi and b are independent. The most attractive feature about taking transformation in this way is that the modified log-likelihood (5) can be seen as the proportional hazards frailty model with the conditional hazard function

λ(t|Z(t),ζi,bi)=ζif(t) exp{βTZ2i(t)+(ψb)T2i(t)}.

This makes the algorithm more stable and computationally efficient.

Now, the computation of the NPMLEs is identical to maximizing the modified log-likelihood function with respect to θ and all jump sizes of F at the observed failure times. This maximization can be carried out through the following EM algorithm.

3.2 EM Algorithm

We describe the EM algorithm, treating ζi and b as missing data to compute the NPMLEs of (θ, F{·}). In the E-step, we calculate the conditional expectation of the log-likelihood function for the complete data, given the observed data Oi and the current parameter estimates. Particularly, we need to evaluate the integration of certain functions of (ζi, b), say Êi gi(b) | Oi]. Hereafter, we drop the conditional part on the observed data and the current parameter estimates, and abbreviate such expectation Êi gi(b) | Oi] as Êi gi(b)]. Computation of this expectation can become doable by first obtaining the nested conditional expectation of ζi, given b and the observed data. That is, Êi gi(b)] can be calculated as Êb[Êζii | b] gi(b)]. With the fact that the conditional distribution of ζi given b is proportional to

h(ζi,b)=ζiΔi exp {0XiζieβTZ2i(u)+(ψb)T2i(u) dF(u)},

and the useful relationships by the Laplace transformation, the conditional expectation of ζi given b has the form of


where i(b)=0XieβTZ2i(u)+(ψb)T2i(u) dF(u). Once Êζii | b] is calculated, which is a function of b, the conditional expectation Êi gi(b)] can be computed using numerical approximation methods such as the Gaussian quadrature with Hermite orthogonal polynomial. Since the conditional distribution of b given Oi is proportional to Γ(Oi| b)f(b; Σb), the conditional expectation is calculated by

Ê[ζigi(b)]=bÊζi[ζi|b]gi(b)Γ(Oi|b)f(b;Σb)bΓ(Oi|b)f(b;Σb) dbdb,


Γ(Oi|b)=exp {k=1mi[(bT1i(tik))22(Yi(tik)α1TZ1i(tik))bT1i(tik)]/(2σe2)}× exp {Δi[(ψb)T2i(Xi)+log H(0XieβTZ2i(u)+(ψb)T2i(u) dF(u))]}× exp {H(0XieβTZ2i(u)+(ψb)T2i(u) dF(u))}.

In the M-step, we maximize the following objective function of the expected log-likelihood for the complete data:

i=1nk=1mi{log σe2/2Ê[(Yi(tik)αTZ1i(tik)bT1i(tik))2/(2σe2)]}+i=1nΔi{Ê[log ζi]+log F{Xi}+βTZ2i(Xi)+Ê[ψb]T2i(Xi)}+i=1n{Ê[0XiζieβTZ2i(u)+(ψb)T2i(u) dF(u)]+Ê[log ϕ(ζi)+log f(b;Σb)]},

under the restriction of i=1nΔiF{Xi}=1. Maximizing the above objective function over (α,σe2,Σb) is simple; whereas the rest of parameters (β, ψ, F{·}) do not yield the closed-form of maximizers, and hence it is required to involve a reliable numerical approach. By introducing the Lagrange multiplier μ, we solve the following equation for β:


the following equation for ψ:


and the following equation for μ:


where Rj(t) = I(Xjt) and q2j(t) = βT Z2j(t) + (ψ ○ b)T Z2j(t). The restricted NPMLE along with a random effect induced transformation EM was presented by Tsodikov (2002), and the approach has been used in different models as reviewed in Tsodikov et al. (2003). Hence F is estimated as a step function with the following jump size at Xi:


To solve these equations at each M-step, we consider a two-step optimization. In the first step, we estimate μ using the bisection method based on the equation (8) and the fact F{Xi} > 0 (i = 1, …, n). Since the left side of (8) is a monotone decreasing function of μ by considering F{Xi} as a function of μ in (9), the solution always exists. In the second step, to update β and ψ, we plug the estimates [mu] into equations (6) and (7), treat them as the functions of [mu], and solve the equations using one-step Newton-Raphson algorithm. Updating the jump sizes of F can be easily done by the equation (9) with [mu].

To obtain the NPMLEs, we iterate the E-step and M-step until the parameter estimates converge. The variances of the NPMLEs can be estimated from the inverse of the observed information matrix for all parameters of (θ, F{·}), under the restriction of i=1nΔiF{Xi}=1. The observation information matrix can be computed from the complete data log-likelihood function denoted by ic for the ith subject using the following Louis formula (Louis, 1982) of


where u[multiply sign in circle]2 = uuT, [nabla] and [nabla]2 denote the first and the second derivatives with respect to parameters, and Ê denotes the conditional expectation of a function of b given the observed data and is evaluated at the NPMLEs.

4 Asymptotic Properties

Let ([theta w/ hat], F) denote the NPMLEs and (θ0, F0) denote the true parameter values of (θ, F). Under the regularity conditions, we will establish the asymptotic properties of the NPMLEs under the following conditions:

  • (A1)
    The true parameter value θ0 belongs to the interior of a compact set Θ within the domain of θ.
  • (A2)
    With probability 1, Z(t) is left-continuous with uniformly bounded left and right derivatives in [0, ∞].
  • (A3)
    For some constant δ0, P(C = ∞| Z) > δ0 > 0 with probability 1.
  • (A4)
    For some positive constant M0,M01<σ0e2<M0 and M01<cTΣ0bc<M0 for any constant vector ‖c‖ = 1.
  • (A5)
    The transformation functions H(·) are four-times differentiable with H(0) = 0 and H′(0) > 0. In addition, there exist positive constants μ0 and κ0 such that
    (1+x)H(x) exp{H(x)}μ0(1+x)κ0.
    Furthermore, there exists a constant ρ0 > 0 such that
    where H(3) and H(4) are the third and fourth derivatives.
  • (A6)
    For any deterministic function c(t) and a constant υ such that c(t) ≠ 0 or υ ≠ 0, P{c(t) + υT Z(t) = 0; t [set membership] [0, ∞]} = 0.
  • (A7)
    With some positive probability, 1T1 has full rank, where Z1 denotes a matrix with each row equal to the observed covariate Z1(t)T at the time of each measurement.
  • (A8)
    Let K be the number of repeated measures and let db be the dimension of b. With probability one, P(K > db| Z,X) > 0.

Conditions (A1) – (A3) are the standard assumptions in survival analysis. Condition (A4) is necessary to prove the existence of the NPMLEs. It can be easily verified that Condition (A5) holds for all transformations commonly used, including the logarithmic transformations described in Section 2. Conditions (A6) – (A7) entail the linear independence of design matrices of covariates for the fixed and random effects. Condition (A8) prescribes that some subjects have at least db repeated measures.

Under the above conditions, the following theorem shows the consistency of the NPMLEs ([theta w/ hat], F).

Theorem 1 Under Conditions (A1) – (A8),

|θ̂θ0|0,supt[0,]|(t)F0(t)|0,  a.s.

Theorem 1 then leads to the following results on the asymptotic normality of ([theta w/ hat], F) and the asymptotic efficiency of [theta w/ hat].

Theorem 2 Under Conditions (A1) – (A8), n(θ̂θ0,(t)F0(t)) weakly converges to a zero-mean Gaussian process in Rdθ × BV [0, ∞], where dθ is the dimension of θ and BV [0, ∞] denotes the space of all functions with bounded variations in [0, ∞]. Furthermore, the asymptotic covariance matrix of n(θ̂θ0) achieves the semiparametric efficiency bound for θ0.

Furthermore, in Appendix, we show that the inverse of the observed information matrix is a consistent estimator of the asymptotic covariance matrix of the NPMLEs. This result allows us to make inference for any functional of (θ, F(t)). To prove Theorems 1–2, we apply the general asymptotic theory of Zeng and Lin (2007). The desired asymptotic properties of the NPMLEs are established followed by the arguments in Appendix B of Zeng and Lin (2007) if we can verify that their regularity conditions hold for our joint cure-survival model setting. Checking the regularity conditions, however, is challenging in our cases. The detailed proofs are provided in Appendix.

5 Simulation Studies

In this section, we demonstrate the finite sample performance of the proposed method through extensive simulation studies. The longitudinal data are generated from


and the survival data with a cure proportion are generated from transformation models

S(t|z1,z2,b)=exp {H(e0.5z1z2+ψbF(t))},

where z1 is a dichotomous covariate taking the value of 0 or 1 with the equal probability of 0.5, z2 is a continuous covariate generated from a uniform distribution on [−1, 1], and ε(t)~N(0,σe2) is assumed with σe2=1. The true failure distribution function in the uncured subpopulation is set to be F(t) = 1 − exp(−t).

For each subject, the correlation within repeated measures is reflected by the subject-specific random intercept b~N(0,σb2) with σb2=0.5, and the negative, no, and positive dependences between the longitudinal measures and the cure-survival rate are simulated through different ψ values of −0.3, 0, and 0.3, respectively. For the cure-survival model, we consider three types of transformations H(·) representing the proportional hazards structure (η = 0), the proportional odds structure (η = 1), and a transformation in the middle of them with η = 0.5.

The non-informative censoring time Ci is generated from a uniform distribution with varying rates, depending on the chosen transformation, to design a 30~45% chance of being right-censored and a 20% chance of being cured. We set longitudinal measures to be observed every 0.2 unit of time so that each individual can have about 3 repeated measures, on average.

The results based on 1000 replications are presented in Tables 13 for n=200 and n=400. Tables 13 include the average of the differences between the true parameter and the estimates (Bias), the sample standard deviation of the parameter estimators (SE), and the average of the standard error estimators (SEE), and the coverage probability of 95% confidence intervals (CP). The confidence intervals for σe2 and σb2 are constructed based on the the Satterthwaite approximation.

Table 1
Simulation results for H(x) = x. tp represents the pth percentile.
Table 3
Simulation results for H(x) = log(1 + x). tp represents the pth percentile.

Table 1 shows that the NPMLEs under the proportional hazards structure H(x) = x are noticeably unbiased, the standard error estimators calculated via the Louis formula well reflect the true variations of the proposed estimators, and the coverage probabilities are in a reasonable range, even with a moderate sample size of 200. As the sample size increases to 400, the biases slightly increase for some estimates; however, they are still very small comparing to the sizes of true parameter values and the variations of the parameter estimators become smaller, and hence the coverage probabilities still lie in a reasonable range. The simulation results shown in Tables 23 are similar to those for Table 1, indicating that the proposed method seems to work well for H(x) = 2 log(1 + x/2) and H(x) = log(1 + x).

Table 2
Simulation results for H(x) = 2 log(1+x/2). tp represents the pth percentile.

6 Data Application

The proposed method was applied to the data from the Health Professionals Follow-up Study (HPFS), a large observational study of male health professionals living in the United States. The main interest of this analysis was to jointly model the relationship between longitudinal vitamin D intake and the survival-cure rate of colorectal cancer (CRC) as endpoints. Since the focus was on the cure of CRC, we restricted our study population to 810 patients who were diagnosed with colorectal cancer between January 1986 and January 2006, without missing at any of the covariates included in the model.

For each subject, vitamin D intakes were assessed via food frequency questionnaires at approximately every four-year intervals between 1986 and 2002. To identify the cure of CRC, we set colorectal cancer-specific death as the event of interest (Ng et al., 2008), while treating deaths from other than CRC or alive until January 2006 as being censored. Then, the cured patients can be defined as a subpopulation among the censored one who have been followed-up sufficiently long enough to be considered cured. In the HPFS during 20 years of follow-up, 250 (31%) colorectal cancer-specific deaths were observed. Based on the Kaplan-Meier survival curve in Figure 1 (a), we found that the estimated survival rate at the end of study was very high (64%) even after a sufficient follow-up period (i.e. 20 years), and the earliest point in time that the curve goes flat, 12.5 year of follow-up, was the point at which all remaining disease-free survivors were declared to be cured. There were 116 patients who has been considered cured in the HPFS. We note that some of patients who were right-censored before 12.5 years might indeed have been cured to CRC, but it was inconclusive due to the right-censoring.

We fitted the proposed joint cure model for the longitudinal vitamin D trend and CRC death with the patient’s medical information at diagnosis; age, body mass index (BMI), and indicators for tumor differentiation grade (i.e. 1=poor or unspecified, 0=well or moderate) and distant metastases (i.e. 1=yes, 0=no) were included as covariates. Among them, age variable was centered at mean 68 and divided by 10 to represent a decade, and BMI was centered at mean 25kg/m2. In addition, a subject-specific random intercept was included in both longitudinal and cure-survival models to account for the correlation between these two outcomes. To explore the possibility of the proportional hazards and the proportional odds structures in cure-survival data, we also applied transformation models H(x) = log(1 + η)/η to cure-survival data by varying η values in [0, 1] at every 0.1 increment. We used the Akaike information criterion (AIC) to determine the best form of transformation (i.e. η), and the smallest AIC value was achieved at η = 0, implying the joint proportional hazards cure (PHC) model was the best fit to the data. Although in the HPFS example the final transformation turned out to be the joint PHC model, in the lack of model-diagnostic tools for the joint modeling, it is valuable to consider transformations to confirm the fit is the best among the class of transformations we consider. To show the impact of transformation on the parameter estimates, Table 4 summarizes the analysis results under the joint PHC model (η = 0) and the joint proporitional odds cure model (η = 1).

Table 4
Analysis results for the HPFS study. The 50:50 mixture of χ2 distributions is used for testing variances.

Under the selected best transformation model, Figure 1 (b) displays the estimated baseline survival distribution for the uncured patients (Xi < ∞) along with their pointwise 95% confidence intervals. In Figure 1 (b) we note that the tail probability of the estimated baseline survival curve reached zero. The results in Table 4 show that 1) older patients tended to take more vitamin D, and were more likely to be uncured to CRC; and 2) patients with distant metastases appeared to take vitamin D less, and were more likely to be uncured to CRC. The significant negative [psi] suggested that there was a protective effect of vitamin D intake in relation to the risk of CRC death, which was not explained by the common covariates in both longitudinal and survival components (p = 0.026). As an example of quantitative interpretation, the marginal survival rates (≤12.5 years) and cure rates (>12.5 years) for the whole population are given in Figure 2. For instance, when comparing the curves to reference (age of 68, BMI = 25kg/m2, well or moderate differentiation, no distant metastases), we can see that the cure rate at age of 78 decreased to 68% from 80% at age of 68, while one for CRC patients with distance metastases decreased to 6% from 80% with no distance metastases. The curves have been obtained by Eb[S(t|z2, b)] = Eb[exp{−H(eβT z2+ψb F(t))}] evaluated at the NPMLEs for a given covariate z2, and their 95% pointwise confidence intervals can also be obtained by applying the functional delta method and evaluating at the NPMLEs.

Figure 2
Predicted marginal survival rates of the entire population using the results in Table 4. The rates beyond the cure threshold are interpreted as cure rates (CR). Reference rate is taken for age of 68, BMI = 25kg/m2, well or moderate differentiation, and ...

7 Concluding Remarks

We have proposed the joint transformation model for longitudinal and survival data which takes the possibility of patients being cured or immune to disease into account. The proposed approach has the advantages of handling time-varying covariates and providing an easier way to explore a large class of cure models in a unified way. We have used the NPMLEs for estimating the model parameters, and the resulting NPMLEs have been shown to be asymptotically normal and efficient. Simulation studies have showed that the proposed estimation procedures produced consistent estimators, and the new EM algorithm enabled to compute the NPMLEs in a simpler and more stable way.

As an example of H(·), we considered a class of logarithmatic transformations, which can be misspecified in practice because of limited knowledge or complex relationships between covariates. As an alternative choice, we also explored the performance of a class of Box-Cox transformations,


and the selected transformation function was robust to the class of transformations considered. Based on our experiences, it appears that the form of transformations is less important than the problem of selecting the transformation parameter η. We used the AIC to determine the best transformation parameter, but there exist other criteria for model selection such as the Bayes information criterion and cross-validation (‘leave-one-subject-out’). The differentiability conditions on H, as in the first part of the Condition (A5), are satisfied with any class of transformations induced by a random effect. Indeed, the validity of our asymptotic properties proven here is not restricted to these frailty class transformations. Other transformations which are not generated by a frailty, for instance, the Box-Cox transformations with γ > 1, can also satisfy the Condition (A.5). We further note that in this article the frailty representation relating to the Laplace transformation has been introduced to facilitate easy use of EM computation.

In this paper, we assumed that the number of observations of repeated measures are independent of cure-survival data. To account for the informative observation times, our joint cure model can be extended by jointly modeling another recurrent event process. Another promising extension of our joint cure model would be to the context of generalized linear mixed models (GLM) to analyze discrete longitudinal outcomes. It is rather obvious that the general approach presented here is still applicable to GLMs, but some specific parts related to estimation procedures of longitudinal components need to be modified accordingly.


The authors would like to thank the referees for helpful comments and gratefully acknowledge use of the Health Professionals Follow-up Study data, funded by NIH/NCI grant P01 CA055075.


Proofs of Asymptotic Properties

This section proves Theorems 1–2 stated in Section 4 by applying the general asymptotic theory of Zeng and Lin (2007). Specifically, it is easy to see that our conditions (A1) – (A8) imply (C1) – (C4), (C6), (C8) of Zeng and Lin (2007), and it remains to prove the two identifiability conditions (C5) and (C7) of Zeng and Lin (2007). The first identifiability is the key step to prove the consistency of the NPMLEs, and the second is to entail the invertibility of the observed information matrix at the true parameters for the proof of the asymptotic normality.

Proof 1 First, we verify the first identifiability condition (C5) in Appendix B of Zeng and Lin (2007). Suppose that the likelihood function for (α,β,ψ,σe2,Vec(Σb)) is the same as that for the true parameter values (α0,β0,ψ0,σ0e2,Vec(Σ0b)). That is, for arbitrary K > 0,

b(2πσe2)K/2exp {(YZ1α1b)T(YZ1α1b)2σe2}×[f(x)eβTZ2(x)+(bψ)T2(x)H(q(x))]ΔeH(q(x))f(b;Σb) db=b(2πσ0e2)K/2exp {(YZ1α01b)T(YZ1α01b)2σ0e2}×[f0(x)eβ0TZ2(x)+(bψ0)T2(x)H(q0(x))]ΔeH(q0(x))f(b;Σ0b) db,

where bold Y denotes the vector of the observed longitudinal measures at time s1, …, sK, and Z1 and Z1 in bold type denote matrices with each row equal to the observed covariate Z1(sk)T and Z1(sk)T at k = 1, …, K, respectively. In addition, q(t)=0teβTZ2(u)+(bψ)T2(u) dF(u), and q0(t) is q(t) evaluated at the true parameter values, and f(b; Σb) is the density function of the (multivariate) normal distribution with mean zeros and covariance matrix Σb. From now, we take the following actions on both sides of (10).

Step 1: For the proof of the identifiability of the longitudinal component, we consider a case Δ = 0 and X ≈ 0.

Using the fact thatb f(b; Σb) db = ∫ b f(b; Σ0b) db = 0 and considering E[Y (sk)] conditional on b, we have αTZ1(sk)=α0TZ1(sk), for k = 1, …, K. By Condition (A6), we prove α = α0. Similarly, we consider E[Y (sk)Y (sk)] and Var(Y (sk)), given b, and obtain for kk

b{α0TZ1(sk)+bT1(sk)}{α0TZ1(sk)+bT1(sk)}f(b;Σ0b) db=b{α0TZ1(sk)+bT1(sk)}{α0TZ1(sk)+bT1(sk)}f(b;Σ0b) db,

followed by the proof of Σb = Σ0b from (A6), and

b{σe2+bT1(sk)1(sk)Tb}f(b;Σb) db=b{σ0e2+bT1(sk)1(sk)Tb}f(b;Σ0b) db,

for k = 1, …, K. Accordingly, we have that σe2=σ0e2.

Step 2: For the survival component, suppose Δ = 0 and X = t. Then, (10) implies


where b follows a normal distribution with mean μb=Vb1T(YZ1α0)/σ0e2 and covariance matrix Vb=[Σ0b1+1T1/σ0e2]1. For fixed Y, Z1, and Z1, since b is the complete statistic for μb, we can have that

exp {H(0teβTZ2(u)+(bψ)T2(u) dF(u))}=exp {H(0teβ0TZ2(u)+(bψ0)T2(u)dF0(u))}.

Furthermore, it is followed from the one-to-one mapping of H and exponential function that


with probability 1. By taking the expectation with respect to b for fixed Y, Z1, and Z1, we conclude that β = β0, f(t) = f0(t) and ψ = ψ0 from the Condition (A6).

Proof 2 Next, we verify the second identifiability condition (C7) in Appendix B of Zeng and Lin (2007). It starts from the score equation along with the path (α0+ξν1,β0+ξν2,ψ0+ξν3,σ0e2+ξν4,Vec(Σ0b)+ξνb,F0+ξh dF0). We define Db as the symmetric matrix such that Vec(Db) = νb.

Step 1: To make the score equation simple for the proofs of ν1 = 0, ν4 = 0 and Db = 0, we consider the same case Δ = 0 and X ≈ 0 as used in Step 1 of the first identifiability proof. We define

Vb1=Σ0b1+1T1/σ0e2,  and  μb=Vb1T(YZ1α0)/σ0e2,

then, the score equation is given by


By comparing coefficients for the constant, linear and quadratic terms of (YZ1α0), we have that




Since [I1Vb1T/σ0e2] is positive definite, we can see that ν1 = 0 in (13). To simplify (14), we multiply 1T from the left, Z1 from the right, and then [1T1]1 from the right on both sides of (14). Using the fact that Σ0b1Db=I1T1Vb/σ0e2,, the equation (14) becomes


and the equation (12) becomes


After taking the trace of (15) and subtracting from the equation (16), we obtain that


where db stands for the dimension of b. Based on Condition (A8), we conclude that ν4 = 0, and hence Db = 0 in (15) by Condition (A7).

Step 2: For the second identifiability of the survival component, we set Δ = 0 and X = t. Then, the score equation can be written as


where 0(t)=0t{h(u)+ν2TZ2(u)+(ν3b)T2(u)}eβ0TZ2(u)+(bψ0)T2(u) dF0(u) and b is normally distributed with mean μb and covariance matrix Vb. By the completeness of the exponential family of b, we can have


for any fixed Y, Z1 and Z1 with probability 1. Since H(q0(t)) > 0 fort > 0 from (A5), we can obtain q0(t) = 0, and hence


Clearly, we attain ν2 = 0, ν3 = 0 and h = 0 by (A6).

Finally, we complete the proofs of Theorems 1 – 2 by Theorems 1 – 2 in Zeng and Lin (2007). Let In denote the negative Hessian matrix of the observed log-likelihood function with respect to (θ, F{·}). As a remark, by following Theorem 3 in Zeng and Lin (2007), we can show that In is invertible for large n, and (νT,UT) nIn1(νT,UT)T is the consistent estimator of the asymptotic variance of


where U is the vector of u(·) at the observed failure times.

Contributor Information

Sehee Kim, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, U.S.A.

Donglin Zeng, Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599, U.S.A.

Yi Li, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, U.S.A.

Donna Spiegelman, Departments of Epidemiology and Biostatistics, Harvard School of Public Health, Boston, MA 02115, U.S.A.


  • Berkson J, Gage R. Survival curve for cancer patients following treatment. Journal of the American Statistical Association. 1952;47:501–515.
  • Brown E, Ibrahim J. Bayesian approaches to joint cure-rate and longitudinal models with applications to cancer vaccine trials. Biometrics. 2003;59:686–693. [PubMed]
  • Chen M, Ibrahim J, Sinha D. A new Bayesian model for survival data with a surviving fraction. Journal of the American Statistical Association. 1999;94:909–919.
  • Chen M, Ibrahim J, Sinha D. A new joint model for longitudinal and survival data with a cure fraction. Journal of Multivariate Analysis. 2004;91:18–34.
  • Kuk A, Chen C. A mixture model combining logistic regression with proportional hazards regression. Biometrika. 1992;79:531–541.
  • Laska E, Meisner M. Nonparametric estimation and testing in a cure model. Biometrics. 1992;48:1223–1234. [PubMed]
  • Law N, Taylor J, Sandler H. The joint modeling of a longitudinal disease progression marker and the failure time process in the presence of cure. Biostatistics. 2002;3:547–563. [PubMed]
  • Louis T. Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society, Series B. 1982;44:226–233.
  • Lu W, Ying Z. On semiparametric transformation cure models. Biometrika. 2004;91:331–343.
  • Maller R, Zhou X. Survival analysis with long-term survivors. Wiley Chichester; 1996.
  • Ng K, Meyerhardt J, Wu K, Feskanich D, Hollis B, Giovannucci E, Fuchs C. Circulating 25-hydroxyvitamin d levels and survival in patients with colorectal cancer. Journal of Clinical Oncology. 2008;26:2984–2991. [PubMed]
  • Sy J, Taylor J. Estimation in a Cox proportional hazards cure model. Biometrics. 2000;56:227–236. [PubMed]
  • Tsodikov A. A proportional hazards model taking account of long-term survivors. Biometrics. 1998;54:1508–1516. [PubMed]
  • Tsodikov A. Semiparametric models of long- and short-term survival: an application to the analysis of breast cancer survival in utah by age and stage. Statistics in medicine. 2002;21:895–920. [PubMed]
  • Tsodikov A, Ibrahim J, Yakovlev A. Estimating cure rates from survival data: an alternative to two-component mixture models. Journal of the American Statistical Association. 2003;98:1063–1078. [PMC free article] [PubMed]
  • Yakovlev A, Tsodikov A, Asselain B. Stochastic models of tumor latency and their biostatistical applications. New Jersey: World Scientific; 1996.
  • Yu M, Law N, Taylor J, Sandler H. Joint longitudinal-survival-cure models and their application to prostate cancer. Statistica Sinica. 2004;14:835–862.
  • Yu M, Taylor J, Sandler H. Individual prediction in prostate cancer studies using a joint longitudinal survival-cure model. Journal of the American Statistical Association. 2008;103:178–187.
  • Zeng D, Lin DY. Maximum likelihood estimation in semiparametric regression models with censored data. Journal of the Royal Statistical Society, Series B. 2007;69:507–564.
  • Zeng D, Yin G, Ibrahim J. Semiparametric transformation models for survival data with a cure fraction. Journal of the American Statistical Association. 2006;101:670–684.
PubReader format: click here to try


Related citations in PubMed

See reviews...See all...


  • PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...