- We are sorry, but NCBI web applications do not support your browser and may not function properly. More information

- Journal List
- NIHPA Author Manuscripts
- PMC2875310

# Structural Nested Mean Models for Assessing Time-Varying Effect Moderation

^{1}Center for Health Services Research in Primary Care, Durham VA Medical Center

^{2}Department of Biostatistics & Bioinformatics, Duke University Medical Center

^{3}Center for Clinical Epidemiology and Biostatistics, University of Pennsylvania Medicine

^{4}Department of Statistics & Institute for Social Research, University of Michigan

## SUMMARY

This article considers the problem of assessing causal effect moderation in longitudinal settings in which treatment (or exposure) is time-varying and so are the covariates said to moderate its effect. *Intermediate Causal Effects* that describe time-varying causal effects of treatment conditional on past covariate history are introduced and considered as part of Robins’ Structural Nested Mean Model. Two estimators of the intermediate causal effects, and their standard errors, are presented and discussed: The first is a proposed 2-Stage Regression Estimator. The second is Robins’ G-Estimator. The results of a small simulation study that begins to shed light on the small versus large sample performance of the estimators, and on the bias-variance trade-off between the two estimators are presented. The methodology is illustrated using longitudinal data from a depression study.

**Keywords:**Causal inference, Effect modification, Estimating equations, G-Estimation, 2-stage estimation, Time-varying treatment, Time-varying covariates, Bias-variance trade-off

## 1. Introduction

In this article, we are interested in assessing the causal effect of treatments as a function of variables that may lessen or increase this effect. That is, we are interested in assessing *effect moderation* (or *effect modification*) of the causal effect of treatments on outcomes. Studying effect moderation usually involves characterizing a population along different levels of a concomitant pre-treatment variable *S*, and then studying how the causal effect of treatment *A* on *Y* varies according to the different levels of *S*. Typically, effect moderation is assessed using treatment-moderator interactions terms (e.g., *A* × *S*) in regression models for *Y* given (*S*, *A*) (Baron and Kenny, 1986; Kraemer et al., 2002).

A distinctive feature of assessing effect moderation in the *time-varying setting* is that both treatment (the cause *A*) and the set of putative moderators (*S*) vary over time. This feature of the data provides both an *opportunity* for improved empirical research and also provides a *methodological challenge*. An *opportunity* presents itself in the form of more varied and interesting questions that scientists may ask from time-varying data. For instance, consider our motivating example the PROSPECT study (Bruce and Pearson, 1999; Bruce et al., 2004), in which, over time, some patients switch out of depression treatment with their mental health specialist. Using time-varying information about suicidal thoughts, we can ask how does switching out of treatment early versus later affect depression severity scores as a function of time-varying suicidal ideation.

A *methodological challenge* arises because moderators of the effect of future treatment may themselves be outcomes of earlier instances of treatment (Robins, 1987, 1989b, 1994, 1997); or, in the context of PROSPECT, suicidal ideation measured at the second visit (*S*_{2}) is a moderator of the effect of switching out of treatment after the second visit (*A*_{2}) on depression severity (*Y*), and switching off of treatment after the baseline visit (*A*_{1}) affects suicidal ideation at the second visit (*S*_{2}). In this setting, a naïve extension of the treatment-moderator interaction framework, in which, for instance, a regression model such as the following one is used, creates at least two problems for causal inference:

First, conditioning on *S*_{2} cuts off any portion of the effect of *A*_{1} on *Y* that occurs via *S*_{2}, including *A*_{1} × *S*_{1} interaction effects. Secondly, there are likely common, unknown, causes of both *S*_{2} and *Y*; thus, conditioning on *S*_{2} (an outcome of treatment *A*_{1}) in (1) may introduce biases in the coefficients of the *A*_{1} terms. The end result is that *A*_{1} and its interactions (e.g., β_{1} and β_{3}) may appear to be (un)correlated with *Y* solely because *A*_{1} impacts *S*_{2} and both *S*_{2} and *Y* are affected by a common unknown cause. These problems can occur regardless of whether *A*_{1} and/or *A*_{2} are randomized (Robins, 1987, 1989b, 1994, 1997).

A framework for studying time-varying effect moderation that also addresses both of these challenges involves the notion of an *conditional intermediate causal effect* at each time point. These causal effects are a part of Robins’ Structural Nested Mean Model (SNMM; Robins, 1994). They isolate the average effects of treatment at each time interval as a function of moderators available prior to that time interval.

This article contributes to the literature on modelling and estimating causal effects in the time-varying setting by (1) clarifying and illustrating the use of Robins’ SNMM to assess time-varying effect moderation, (2) proposing a 2-Stage parametric regression estimator for the parameters of a SNMM, (3) comparing the proposed parametric estimator to Robins’ Semi-parametric G-Estimator (Robins, 1994) in terms of a bias-variance trade-off, and (4) suggesting how the proposed 2-Stage estimator can be used to obtain high quality starting values for the G-Estimator.

In Section 2, the causal effects of interest are defined in the context of Robins’ SNMM. The two estimators of the intermediate causal effects are presented and discussed in Section 3. The results of a small simulation study that sheds light on the bias-variance trade-off between the two estimators is presented in Section 4. The methodology is illustrated in Section 5 using data from the PROSPECT study. Finally, a discussion of the paper, including methodological improvements suggested by the illustrative analysis, is presented in Section 6.

## 2. Effect Moderation with Time-varying Treatment and Time-varying Moderators

### 2.1 Notation and Potential Outcomes

To define the structural parameters and to state the structural assumptions necessary for valid causal inference we use the potential outcomes framework for causation (Rubin, 1974; Holland, 1986; Robins, 1987, 1989a, 1994, 1997, 1999)). In general, we suppose there are *K* time intervals under study. Treatment is denoted by *a _{t}*, at each time interval

*t*, where

*t*= 1,…,

*K*. We denote the treatment pattern/vector over

*K*intervals by ā

_{K}= (

*a*

_{1},…,

*a*); where

_{K}*a*= 0 represents standard, or baseline, treatment. Let

_{t}_{K}be the countable collection of all possible treatment vectors. Corresponding to each fixed value of the treatment vector, ā

_{K}, we conceptualize potential, possibly counterfactual, intermediate responses {

*S*

_{2}(

*a*

_{1}), ‥‥,

*S*(ā

_{K}_{K−1})} and a potential final response

*Y*(ā

_{K}). Thus, for example,

*S*(ā

_{t}) is the response at the end of the

*t*th interval that a subject would have, had he/she followed the treatment pattern, ā

_{t}. A subject’s complete set of potential responses, intermediate and final, is denoted by

*O*= {

*S*

_{2}(

*a*

_{1}),

*S*

_{3}(ā

_{2}),…,

*S*(ā

_{K}_{K−1}),

*Y*(ā

_{K}) : ā

_{K}

_{K}}. The intermediate outcomes denoted by

*S*are the putative time-varying moderators of the effect of ā

_{K}on

*Y*(ā

_{K}). Below, we define more formally in the context of Robins’ Structural Nested Mean Model what it means for

*S*(ā

_{t}_{t−1}) to be a moderator of the effects of (

*a*,…,

_{t}*a*) on the response. Putative baseline moderators are denoted by the vector

_{K}*S*

_{1}.

### 2.2 The Conditional Intermediate Causal Effects

Henceforth, for simplicity, we focus on the case where *K* = 2. Thus, we have the following objects at our disposal: (*S*_{1}, *a*_{1}, *S*_{2}(*a*_{1}), *a*_{2}, *Y*(*a*_{1}, *a*_{2})). The response *Y*(*a*_{1}, *a*_{2}) is taken to be continuous with unbounded support. We are only concerned with modeling the mean of the response *Y*(ā_{K}) as a function of ā_{K} and *S _{K}*(ā

_{K−1}). Thus, we do not consider treatment or covariate effects on the variance of the response. Using potential outcomes we can express the average causal effect of ā

_{2}on

*Y*(

*a*

_{1},

*a*

_{2}) as

*E*[

*Y*(

*a*

_{1},

*a*

_{2}) −

*Y*(0, 0)], where

*a*= 0 is the baseline level of treatment. Using

_{t}_{2}(

*a*

_{1}) and applying the law of iterated expectations, we can write this difference as an arithmetic decomposition of conditional means:

with the outer expectations in Equation (2) over _{2}(*a*_{1}) and *S*_{1}, respectively. The inner expectations are *conditional intermediate causal effects* of treatment. Let μ_{2}(_{2}(*a*_{1}),ā_{2}) denote *E*[*Y*(*a*_{1}, *a*_{2}) − *Y*(*a*_{1}, 0) | _{2}(*a*_{1})], the effect of treatment (*a*_{1}, *a*_{2}) relative to the treatment (*a*_{1}, 0) within levels of _{2}(*a*_{1}); and let μ_{1}(*S*_{1}, *a*_{1}) denote *E*[*Y*(*a*_{1}, 0) − *Y*(0, 0) | *S*_{1}], the effect of treatment (*a*_{1}, 0) relative to (0, 0) within levels of *S*_{1}. Note the following constraints on the causal effects: μ_{2}(_{2}(*a*_{1}), *a*_{1}, 0) = 0 and μ_{1}(*S*_{1}, 0) = 0. The effects μ_{1} and μ_{2} are *intermediate* causal effects because they isolate the causal effect of treatment at time 1 and time 2, respectively. The “isolation” here is achieved by setting future instances of treatment at their inactive levels – in our case, the zero level. Hence, μ_{1} corresponds to a contrast of the potential outcomes in *a*_{1} for which *a*_{2} (a future level of treatment) is set to its inactive level. On the other hand, μ_{2}, which corresponds to the effect at the last time point, is defined exclusively as a contrast in *a*_{2} where, in general, *a*_{1} can take on any value in its domain.

### 2.3 Robins’ Structural Nested Mean Model

In this section we use the Structural Nested Mean Model (SNMM), developed by Robins (1994), to combine the intermediate average causal effect functions additively in a model for the conditional mean of *Y*(*a*_{1}, *a*_{2}) given _{2}(*a*_{1}). Using the SNMM, the conditional mean of *Y*(*a*_{1}, *a*_{2}) given _{2}(*a*_{1}) is expressed as:

where β_{0} = *E*[*Y*(0, 0)], the mean response to baseline treatment averaged over levels of _{2}(*a*_{1}); ε_{2}(_{2}(*a*_{1}), *a*_{1}) = *E*[*Y*(*a*_{1}, 0) | _{2}(*a*_{1})] − *E*[*Y*(*a*_{1}, 0) | *S*_{1}]; and ε_{1}(*S*_{1}) = *E*[*Y*(0, 0) | *S*_{1}] − *E*[*Y*(0, 0)]. In the Appendix we describe the model for general *K* time points. The SNMM depicts how the intermediate effect functions relate to the conditional mean of *Y*(*a*_{1}, *a*_{2}) given the past. Note that ε_{1} and ε_{2} are defined so that the decomposition on the right hand side of Equation (3) is indeed equal to *E*[*Y*(*a*_{1}, *a*_{2}) | _{2}(*a*_{1})]. They satisfy the constraints *E*[ε_{2}(_{2}(*a*_{1}), *a*_{1}) | *S*_{1}] = 0, and *E*_{S1}[ε_{1}(*S*_{1})] = 0, and they are considered nuisance parameters because they contain no information regarding the conditional intermediate causal effects of ā_{2} on the mean of *Y*(*a*_{1}, *a*_{2}).

## 3. Estimation Strategies for the SNMM

We consider two estimation strategies for β.

### 3.1 Observed Data and Assumptions Underlying Estimation

Denote the *observed* treatment history by the random vector, Ā_{K} := (*A*_{1}, *A*_{2},…,*A _{K}*); denote the

*observed*time-varying covariate history by the random vector,

_{K}:= (

*S*

_{1},

*S*

_{2},…,

*S*); and denote the

_{K}*observed*outcome by the random variable

*Y*. Both estimation strategies described below rely on the assumptions of

*consistency*and

*sequential ignorability*(Robins, 1994, 1997) in order

*to make causal inferences*.

The *consistency assumption* states that $Y=Y({\overline{A}}_{K})={\displaystyle {\sum}_{{\overline{a}}_{K}\in \mathcal{A}}I({\overline{a}}_{K}={\overline{A}}_{K})Y({\overline{a}}_{K})}$, where *I*(ā_{K} = Ā_{K}) denotes the indicator function that ā_{K} is equal to Ā_{K}. The consistency assumption is the link between objects defined as potential outcomes, and objects that are actually observed. Assuming consistency for the intermediate outcomes {*S*_{2}(*a*_{1}), *S*_{3}(ā_{2}),…,*S _{K}*(ā

_{K−1}) : ā

_{K}

_{K}} as well, then a similar relationship holds between the counterfactual objects in the SNMM and a corresponding set of observed data. The actual data observed (not to be confused with the complete set of potential outcomes,

*O*, defined above) for one individual in our study is

*D*= (

*S*

_{1},

*A*

_{1},

*S*

_{2},

*A*

_{2},…,

*S*,

_{K}*A*,

_{K}*Y*) where for each

*t*> 1,

*S*takes on some value in the set {

_{t}*S*(ā

_{t}_{t−1}) : ā

_{t−1}

_{K}},

*A*takes on some value in the collection

_{t}_{K}, and

*Y*takes on some value in the set {

*Y*(ā

_{K}) : ā

_{K}

_{K}}.

Another key assumption used to identify the causal parameters of the SNMM using observed data is the *sequential ignorability assumption*: For each *t* = 1,2,…,*K*, *A _{t}* is independent of

*O*given (

*S*

_{1},

*A*

_{1},

*S*

_{2},

*A*

_{2},…,

*S*), where, recall,

_{t}*O*is the entire set of potential outcomes. The assumption implies that observed treatment

*A*may depend on the set of observed moderators

_{t}_{t}, but that no other variables known or unknown, measured or unmeasured, directly affect both

*A*and

_{t}*O*.

While the causal meaning of the parameters in models for the μ_{t}’s depend on the above assumptions, the estimation of parameters requires other modelling (or statistical) assumptions. These modelling assumptions include the choice of parametric models for the causal effects. One possible parameterization of the intermediate causal effects is linear in the parameters. For example, in our presentation of both estimators below, we use

where β_{t} represents a *q _{t}*-dimensional parameter vector at time

*t*; and

*H*is a known function of

_{t}_{t}, Ā

_{t−1}). Using this general form ensures that the following constraint is always satisfied: μ

_{t}(

_{t}, Ā

_{t−1}, 0; β

_{t}) = 0. Typically, the first element in

*H*is one. Additional modelling assumptions are described below in turn, as the estimators are presented. Note that since the following subsections are concerned only with estimation, only the observed data

_{t}*D*(not the potential outcomes

*O*) are considered.

### 3.2 Parametric 2-Stage Estimator:

We propose a parametric 2-Stage Estimator that employs the following *general approach* for estimating the parameters of a SNMM: In the first stage, for every *t*, we model the conditional distribution of *S _{t}* given (

_{t−1}, Ā

_{t−1}) denoted by

*f*(

_{t}*S*|

_{t}_{t−1}, Ā

_{t−1}), based on a set of finite-dimensional parameters γ

_{t}. Then, based on one or more features of this distribution (e.g., the conditional mean

*m*), and based on an additional (optional) set of finite-dimensional parameters η

_{t}_{t}for added flexibility, we pose a model for the nuisance functions at each time point, say ε

_{t}(

_{t−1}, Ā

_{t−1}; η

_{t}, γ

_{t}). This general model for the nuisance functions is based on the fact that the constraints on ε

_{t}are a function of

*f*, for every

_{t}*t*; recall that ∫ ε

_{t}

*df*= 0, for every

_{t}*t*. In the second stage, these models for the nuisance functions are put together with models for the intermediate causal effects in a SNMM for the conditional mean of

*Y*given (

_{K}, Ā

_{K}) (see Equation (3)). Estimates for β are then based on solutions to the following estimating equations:

where for any function *V*() of the observed data *D*, _{n}*V*(*D*) denotes $\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$n$}\right.{\displaystyle {\sum}_{i=1}^{n}V({D}_{i})}$. In the following, we present a *particular linear implementation* of this general approach that uses linear models for the ε_{t}’s.

#### 3.2.1 A Linear Regression Implementation of the 2-Stage Estimator

For simplicity, assume that *S _{t}* at each time point is univariate (i.e., one time-varying moderator per time

*t*is used); an extension of the method to multivariate

*S*is presented in the Supplementary Materials. The proposed parametrization of the nuisance functions used here is based on linear models for the conditional mean of

_{t}*S*given the past. These conditional means are denoted by

_{t}*m*and are based on an unknown

_{t}*l*-dimensional vector of parameters γ

_{t}_{t}, so that

*m*(

_{t}_{t−1}, Ā

_{t−1}; γ

_{t}) =

*E*(

*S*

_{t}|

_{t−1}, Ā

_{t−1}). We employ generalized linear models (GLMs, McCullagh and Nelder (1989)) for the

*m*: Let

_{t}*F*be a row-vector of the data (

_{t}_{t−1}, Ā

_{t−1}). Thus, when

*S*is continuous, we use

_{t}*m*(

_{t}_{t−1}, Ā

_{t−1}; γ

_{t}) =

*F*γ

_{t}_{t}. When

*S*is binary, we use

_{t}*m*(

_{t}_{t−1}, Ā

_{t−1}; γ

_{t}) =

*Pr*(

*S*= 1 |

_{t}_{t−1}, Ā

_{t−1}) = expit(

*F*γ

_{t}_{t}). We use following linear form for parameteric models for the error terms ε

_{t}(based on the γ

_{t}): ε

_{t}(

_{t}, Ā

_{t−1}; η

_{t}, γ

_{t}) =

*G*η

_{t}_{t}× (

*S*−

_{t}*m*(

_{t}_{t−1}, Ā

_{t−1}; γ

_{t})), where

*G*is a row-vector summary of the past (

_{t}_{t−1}, Ā

_{t−1}), and η

_{t}is an unknown

*w*-dimensional vector of parameters. Denote the “residual”

_{t}*S*−

_{t}*m*(

_{t}_{t−1}, Ā

_{t−1}; γ

_{t}) by δ

_{t}(

_{t}, Ā

_{t−1}; γ

_{t}). A simple model for ε

_{t}will have

*G*= (1), so that ε

_{t}_{t}= η

_{t0}δ

_{t}(

_{t}, Ā

_{t−1}; γ

_{t}), for example. Note that δ

_{t}ensures that the parameterization satisfies the necessary constraint. Note also that using this linear model for the nuisance functions, we can multiply every element of

*G*by the residual δ

_{t}_{t}, denoted ${G}_{t}^{*}({\gamma}_{t})$, and re-write the parametric model for the nuisance functions as ${\epsilon}_{t}({\gamma}_{t},{\eta}_{t})={G}_{t}^{*}({\gamma}_{t}){\eta}_{t}$. If γ

_{t}were known, this would imply a linear (in the β’s and η’s) parametric model for the SNMM. For example, for $K=2,E[Y\phantom{\rule{thinmathspace}{0ex}}|\phantom{\rule{thinmathspace}{0ex}}{\overline{S}}_{2},{\overline{A}}_{2}]={\beta}_{0}+{A}_{1}{H}_{1}{\beta}_{1}+{A}_{2}{H}_{2}{\beta}_{2}+{G}_{1}^{*}{\eta}_{1}+{G}_{2}^{*}{\eta}_{2}$. This idea forms the basis for the linear implementation of the 2-Stage approach, given here for general

*K*time points:

- Stage 1 Regression. Generalized linear model regression analyses are used in the first stage to obtain the estimates
_{t}based on regressions of*S*on (_{t}*S*_{t−1},*A*_{t−1}). These are carried out for each time point*t*= 1, 2,…,*K*. When*S*is binary, a logistic regression is used to obtain_{t}_{t}. - Use the predicted means
_{t}(_{t}) from the first stage regression to construct the predicted residuals_{t}=*S*−_{t}_{t}. - Combine the model vectors for the conditional intermediate effects (and a column for the intercept) and denote this quantity by
*X*; that is,*X*= (1,*A*_{1}*H*_{1},…,*A*). Note that $X\beta ={\beta}_{0}+{\displaystyle {\sum}_{t=1}^{K}{A}_{t}{H}_{t}{\beta}_{t}}$ represents the functional of interest of the SNMM._{K}H_{K} - Multiply each element in
*G*by the predicted residual_{t}_{t}and denote this quantity by ${\widehat{G}}_{t}^{*}$; that is, ${\widehat{G}}_{t}^{*}={\widehat{\delta}}_{t}{G}_{t}$. Note that if $\eta ={({\eta}_{1}^{T},\dots ,{\eta}_{K}^{T})}^{T}$ were known, then ${G}^{*}\eta ={\displaystyle {\sum}_{t=1}^{K}{\widehat{G}}_{t}^{*}}{\eta}_{t}$ would represent an estimate of the sum of the nuisance functionals of the SNMM. - Augment the row-vector
*X*to include the ${\widehat{G}}_{t}^{*}$’s; that is, ${X}_{\mathit{\text{aug}}}=(X,{\widehat{G}}_{1}^{*},\dots ,{\widehat{G}}_{K}^{*})$. Define the $(1+{\displaystyle {\sum}_{t=1}^{K}{q}_{t}+}{\displaystyle {\sum}_{t=1}^{K}{w}_{t}})$-dimensional column-vector of parameters θ = (β^{T}, η^{T})^{T}. - Stage 2 Regression. The final step involves a standard linear regression of
*Y*on*X*to obtain the estimates = (), which gives = () and an estimate for the nuisance η_{aug}_{t}’s simultaneously.

### 3.3 Robins’ Semi-parametric Efficient G-Estimator:

The following estimator, derived in Robins (1994), does not require correct models for the nuisance functions, in order to achieve consistency. It is an extension to the longitudinal setting of the semi-parametric regression E-Estimator considered in Robins et al. (1992) and Newey (1990). In *K* = 2, the estimate is based on these estimating functions:

where *p _{t}*(

_{t}, Ā

_{t−1}; α

_{t}) is a model for

*Pr*[

*A*= 1 |

_{t}_{t}, Ā

_{t−1}];

*b*

_{2}(

_{2},

*A*

_{1}; ξ

_{2}) is a model for

*E*[

*Y*−

*H*

_{2}β

_{2}

*A*

_{2}|

_{2},

*A*

_{1}], and

*b*

_{1}(

*S*

_{1}; ξ

_{1}) is a model for

*E*[

*Y*−

*A*

_{2}

*H*

_{2}β

_{2}−

*H*

_{1}β

_{1}

*A*

_{1}|

*S*

_{1}]; Δ(

*S*

_{1}; κ) is a model for $E[{A}_{2}{H}_{2}^{T}\phantom{\rule{thinmathspace}{0ex}}|\phantom{\rule{thinmathspace}{0ex}}{S}_{1},{A}_{1}=1]-E[{A}_{2}{H}_{2}^{T}\phantom{\rule{thinmathspace}{0ex}}|\phantom{\rule{thinmathspace}{0ex}}{S}_{1},{A}_{1}=0]$; and 0

_{q1}is a

*q*

_{1}-dimensional row-vector of zeros. The set of equations (6) is (

*q*

_{1}+

*q*

_{2})-dimensional because

*H*is

_{t}*q*-dimensional for

_{t}*t*= 1, 2. We denote this system of equations by

_{n}ψ

_{β}(

*D*; α, ξ, κ), where $\alpha ={({\alpha}_{1}^{T},{\alpha}_{2}^{T})}^{T}$ (of dimension

*r*

_{1}+

*r*

_{2}), $\xi ={({\xi}_{1}^{T},{\xi}_{2}^{T})}^{T}$, and κ are all unknown parameters. The conditional variances ${\sigma}_{1}^{2}({S}_{1})$ and ${\sigma}_{2}^{2}({\overline{S}}_{2},{A}_{1})$ are defined as ${\sigma}_{1}^{2}({S}_{1})=\mathit{\text{Var}}(Y-{H}_{2}{\beta}_{2}{A}_{2}-{H}_{1}{\beta}_{1}{A}_{1}\phantom{\rule{thinmathspace}{0ex}}|\phantom{\rule{thinmathspace}{0ex}}{S}_{1})=\mathit{\text{Var}}(Y-{H}_{2}{\beta}_{2}{A}_{2}-{H}_{1}{\beta}_{1}{A}_{1}\phantom{\rule{thinmathspace}{0ex}}|\phantom{\rule{thinmathspace}{0ex}}{S}_{1},{A}_{1})$ and ${\sigma}_{2}^{2}({\overline{S}}_{2},{A}_{1})=\mathit{\text{Var}}(Y-{H}_{2}{\beta}_{2}{A}_{2}\phantom{\rule{thinmathspace}{0ex}}|\phantom{\rule{thinmathspace}{0ex}}{\overline{S}}_{2},{A}_{1})=\mathit{\text{Var}}(Y-{H}_{2}{\beta}_{2}{A}_{2}\phantom{\rule{thinmathspace}{0ex}}|\phantom{\rule{thinmathspace}{0ex}}{\overline{S}}_{2},{\overline{A}}_{2})$ where the second inequality in each follows by assumption (without this partially homogenous variance assumption, the estimating equations are intractable). In our implementation, we further assume that these variances are constant in (

_{2},

*A*

_{1}) and

*S*

_{1}, respectively. In order to use

_{n}ψ

_{β}(

*D*; α, ξ, κ) for estimation, we substitute estimates of the parameters α, ξ, and κ in

*p*(α

_{t}_{t}),

*b*(ξ

_{t}_{t}), and Δ(κ)—denoted

_{t}(

_{t}),

_{t}(

_{t}), and ()—and solve for β in the estimating equations 0 =

_{n}ψ

_{β}(

*D*; , , ). The resulting estimator := (, , ) is known as

*Robins’ locally efficient semi-parametric G-Estimator*for β. The Appendix describes the estimator for general

*K*time points. Modelling the

*b*(ξ

_{t}_{t}) terms is important for variance reduction in ; the next subsection describes a method for obtaining

_{t}based on the 2-Stage Estimator (in addition to obtaining (

_{t}, )).

#### 3.3.1 Implementing Robins’ G-Estimator

To obtain , we use logistic regression models at each time point *t* to model the probability of receiving treatment at time *t* (*A _{t}* = 1) given

*Z*, where

_{t}*Z*is a row-vector of the data (

_{t}_{t}, Ā

_{t−1})—that is, we use

*p*(

_{t}_{t}, Ā

_{t−1}; α

_{t}) = expit(

*Z*α

_{t}_{t}). Then the predicted probabilities from the logistic regression are used to get

_{t}. To obtain , we use ordinary multivariate regression models for $\lambda ({S}_{1},{A}_{1};\kappa )=E[{A}_{2}{H}_{2}^{T}\phantom{\rule{thinmathspace}{0ex}}|\phantom{\rule{thinmathspace}{0ex}}{S}_{1},{A}_{1}]$ to get , and then predict Δ using (

*S*

_{1}; ) = (

*S*

_{1}, 1; ) − (

*S*

_{1}, 0; ). Formally, for a fixed

*t*, the quantity

*b*is a model for the sum of the nuisance functions and intermediate causal effects for all

_{t}*t*' <

*t*plus the nuisance function at time

*t*. Hence, the 2-stage estimator presented above can be used to obtain the “guesses”

_{t}(

_{t}) needed to solve the equations. To do this, we use the relevant portions of the (2-stage) estimated conditional mean at each time

*t*to create the estimates for

_{t}(

_{t}). Thus, at time 1, for instance, one can simply use

_{1}(

*S*

_{1};

_{1}) =

_{0}+

_{1}(

*S*

_{1};

_{1},

_{1}), where ${\xi}_{1}:={({\beta}_{0},{\eta}_{1}^{T},{\gamma}_{1}^{T})}^{T}$ would be estimated using the 2-stage estimator. Finally, the numerical search for the solution to

_{n}ψ

_{β}(

*D*; , , ) = 0 is itself an iterative process that requires starting values. Here, we use the estimates obtained from the 2-stage estimator, , as the starting values for this search. The free and publicly available MINPACK FORTRAN subroutine HYBRD was adapted and used to find the zeros of the system of functions.

### 3.4 Estimated Standard Errors for and

Estimated asymptotic standard errors for and are computed using the delta method, based on one-step Taylor series expansions. They are shown in the Supplementary Materials. $\widehat{\mathit{\text{SE}}}(\widehat{\beta})$ takes into account the variability in the estimation of γ. $\widehat{\mathit{\text{SE}}}(\tilde{\beta})$ takes into account the variability in the estimation of α.

### 3.5 A Comparison of the Properties of the Two Estimators

The G-Estimating equations _{n}ψ_{β} provide unbiased estimating functions for β given correct models for the intermediate causal effects and the *p _{t}*’s, regardless of our choice of models for the

*b*’s and Δ. Indeed, even if Δ = 0 and

_{t}*b*= 0 for all

_{t}*t*, we still have

*E*ψ

_{β}= 0. Conversely, given correct models for the intermediate causal effects and the

*b*’s, unbiasedness is still achieved with the G-Estimator regardless of our choice of models for the

_{t}*p*’s and Δ. This is known as the

_{t}*double-robustness property*of the G-Estimator (Robins, 1994). Now, provided true models for both

*b*and

_{t}*p*(for all

_{t}*t*; and true model for Δ), the resulting estimates are also asymptotically efficient. By efficient, we mean that the asymptotic variance of the resulting G-Estimates of β achieve the semi-parametric efficiency variance bound (Bickel et al. (1993)) for this class of models.

The 2-Stage Estimator relies on correct models for *both* the intermediate causal effects and the nuisance functions in order to provide unbiased estimates for β. At the correct model fit, the 2-Stage estimator enjoys better efficiency than the G-Estimator. This gain in precision, however, may be offset by a lack of robustness to mis-specifications in the ε_{t}’s. Exactly how to balance the trade-off between bias and variance (i.e., the choice between these two estimators) is an open question. The simulation experiments in the next section shed light on this question. We do this by purposefully mis-specifying the 2-Stage Estimator and exploring at what level of mis-specification the G-Estimator begins to dominate over the 2-Stage Estimator in terms of mean squared error (MSE). In addition, since we find that the choice of models for *b _{t}* has a profound impact on the efficiency of the G-Estimator, the simulations also explore the utility of the 2-Stage estimator as a feasible method for obtaining guesses for

*b*versus setting

_{t}*b*= 0.

_{t}## 4. Simulation Experiments

### 4.1 The Generative Model

All simulations are based on *N* = 1000 simulated data sets. The generative model mimics the PROSPECT data (discussed briefly in the Introduction, and used in our data analysis illustration below), with *K* = 3. We generated continuous time-varying covariates {*S*_{1}, *S*_{2}, *S*_{3}} and continuous outcome *Y* such that their implied marginal distributions and bivariate correlations are similar to those found in PROSPECT, where *S _{t}* is suicidal ideation at time

*t*, and

*Y*is end-of-study depression scores. Specifically, [

*S*

_{1}] ~

*N*(

*m*

_{1}= 0.5, sd = 0.82), [

*S*

_{2}|

*S*

_{1},

*A*

_{1}] ~

*N*(

*m*

_{2}= 0.5 + 0.10

*S*

_{1}− 0.5

*A*

_{1}+ 0.35

*S*

_{1}

*A*

_{1}, sd = 0.65), and [

*S*

_{3}|

_{2}, Ā

_{2}] ~

*N*(

*m*

_{3}= 0.5 + 0.17

*S*

_{2}+ 0.1

*S*

_{1}− 0.5

*A*

_{2}+ 0.5

*S*

_{2}

*A*

_{2}, sd = 0.65), where binary treatment

*A*(0, 1) at each time point is generated as a binomial random variable with

_{t}*Pr*(

*A*= 1 |

_{t}_{t}, Ā

_{t−1}=

*p*= expit(0.5 − 1.5

_{t}*S*). The nuisance functions were chosen as ε

_{t}_{1}= 0.1 × (

*S*

_{1}−

*m*

_{1}), ε

_{2}= (0.2 + 0.18

*S*

_{1}+ 0.4

*A*

_{1}+ 0.35

*A*

_{1}

*S*

_{1}+ sin(4.5

*S*

_{1})) × (

*S*

_{2}−

*m*

_{2}), and ε

_{3}= (0.3 + 0.18

*S*

_{2}+ 0.4

*A*

_{2}+ 0.35

*A*

_{2}

*S*

_{2}+ sin(2.5

*S*

_{2})) × (

*S*

_{3}−

*m*

_{3}).

^{1}The intermediate causal effect functions in the SNMM were set to: μ

_{t}=

*H*β

_{t}_{t}= (

*A*,

_{t}*A*) × (β

_{t}S_{t}_{t,0}, β

_{t,1})

^{T}=

*A*× (β

_{t}_{t,0}+ β

_{t,1}

*S*) for

_{t}*t*= 1, 2, 3. The true value for all six causal parameters was set to β

_{t,j}= 0.45, where

*j*= 0, 1. The outcome

*Y*was generated as a normal random variable with a conditional mean structure according to a SNMM, and residual standard deviation for

*Y*set to 1.0.

### 4.2 The Simulation Design

Two simulation experiment, A and B, were carried out, both using the same generative model described above. Three estimators were compared in both experiments: (1) the 2-Stage Regression Estimator, (2) the G-Estimator with *b _{t}* = 0, and (3) the G-Estimator using guesses for

*b*that are derived from the 2-Stage Estimator. For both versions of the G-Estimator: (a) true logistic regression models were fit to obtain the

_{t}_{t}predictions; (b) starting values for the iterative solving procedure were derived from the 2-Stage Estimator; and (c) multivariate linear regression models that included all main effects and all second-order interaction terms were used for $E({A}_{2}{H}_{2}^{T}\phantom{\rule{thinmathspace}{0ex}}|\phantom{\rule{thinmathspace}{0ex}}{S}_{1},{A}_{1}),E({A}_{3}{H}_{3}^{T}\phantom{\rule{thinmathspace}{0ex}}|\phantom{\rule{thinmathspace}{0ex}}{S}_{1},{A}_{1})$, and $E({A}_{3}{H}_{3}^{T}\phantom{\rule{thinmathspace}{0ex}}|\phantom{\rule{thinmathspace}{0ex}}{\overline{S}}_{2},{\overline{A}}_{2})$ to obtain . True models for the intermediate causal effects, the μ

_{t}’s, were always specified for all three estimators across both experiments.

#### 4.2.1 Experiment A

The first set was designed to study the small versus large sample properties of the different estimators when the *correct* 2-Stage Estimator is fit to the data. In these simulations, the sample size for each data set was varied (*n* = 50, *n* = 300, and *n* = 600) and performance measures for point estimates and standard errors (mean, variance, mean squared error, coverage percentage) were compared across the estimators. The (middle) sample size of *n* = 300 is approximately the size of the data set used in our illustrative analysis of Section 5; *n* = 50 and *n* = 1000 were chosen to look at the effect of relatively smaller and larger data sets.

#### 4.2.2 Experiment B

The goal of the second set of experiments is to shed light on the bias-variance trade-off between the 2-Stage Estimator and the two versions of Robins’ G-Estimator. In this set of experiments, the nuisance functions in the 2-Stage Estimator were mis-specified and the relative performance of the estimators (in terms of MSE) was assessed. Only data sets of size *n* = 300 were considered in Experiment B. Mis-specification of the nuisance functions is measured using the Scaled Root-Mean Squared Difference = $\text{SRMSD}(\nu )=\sqrt{{{\displaystyle {\sum}_{t}^{K}E({\epsilon}_{t}^{\text{TRUE}}-{\epsilon}_{t}(\nu ))}}^{2}/\text{Var}(Y)}$, where for a fixed value of ν, ε_{t}(ν) denotes the mis-specified nuisance function at time *t*. SRMSD has the interpretation of an *effect-size*, so that SRMSD values of 0.2 and 0.5, for example, correspond to small and moderate levels of mis-specification, respectively (see Cohen (1988)). We varied values of SRMSD using ν, by replacing every *S _{t}* in the ε

_{t}’s (including those in the models for the

*m*’s) with

_{t}*S*×

_{t}*U*, where

*U*is a draw from the normal distribution

*N*(1,

*sd*= ν). Note that when ν = 0 the correct 2-Stage Estimator is fit to the data.

### 4.3 Simulation Results and Discussion

#### 4.3.1 Experiment A

Table 1 shows the results of Experiment A. As expected according to large sample theory, all three estimators are unbiased for all β_{t,j} when *n* = 1000; and empirical standard deviations (SD) and mean standard errors (MEAN SE) show good agreement for all three estimators when *n* = 1000. All 95%CI coverage probabilities at *n* = 1000 show coverages between the expected 93.6% and 96.4% range for *N* = 1000 replicates, with the exception of the *b _{t}* = 0 G-Estimates for β

_{2,1}. Increasing the sample size from

*n*= 1000 to

*n*= 1200 (results not shown here) brought the

*b*= 0 G-Estimate coverage probability for β

_{t}_{2,1}to 93.9%, which is within the acceptable range. Performance in terms of mean bias is only slightly worse at

*n*= 300 relative to

*n*= 1000 across all three estimators, although as expected, the variance (both SD and MEAN SE) increases significantly with the smaller sample size. This trend continues with

*n*= 50, as well. The 95%CI’s show under-coverage at smaller sample sizes, especially at

*n*= 50; and, for the 2-Stage Estimator and the G-Estimator using 2-Stage Guesses, the coverage gets worse for the parameters at later time points.

_{t}= 0, and Robins’ G-Estimator relying on starting guesses for b

_{t}from the 2-Stage Estimator. Parameters have

**...**

REL MSE denotes relative mean squared error of the G-Estimator relative to the 2-Stage Estimator. As expected, the 2-Stage Estimator is equivalent or better than both G-Estimators in terms of relative mean squared error (REL MSE) across all the scenarios. REL MSE values for the G-Estimator using correct 2-Stage Guesses for *b _{t}* decrease for the parameters at later time points. The same trend is not true under the G-Estimator with

*b*= 0; and in particular, the largest REL MSE values in Table 1 are observed for the G-Estimators of β

_{t}_{3,1}with

*b*= 0 (across all time points).

_{t}In large samples (*n* = 1000), the simulation results suggest that the G-Estimator using the 2-Stage Guesses has variance in between the variance of the other two estimators. This observation is in-line with large-sample theoretical results that describe the usefulness of (correctly) modeling the *b _{t}*’s to achieve variance reduction (Robins, 1994). In fact, an interesting trend over time exists such that parameter estimates under both G-Estimators at

*t*= 1 have nearly identical variance, whereas for the

*t*= 2 parameters the variance of the G-Estimator using the 2-Stage Guesses is approximately half-way between the variance of the G-Estimator with

*b*= 0 and the 2-Stage Estimator; then at the final time point

_{t}*t*= 3 the G-Estimator with

*b*= 0 and the 2-Stage Estimator come closer to having similar variances.

_{t}Despite having markedly larger REL MSE’s (especially for the parameters at later time points), the *n* = 50 coverage percentages are much better for Robins’ G-Estimator with bad guesses (*b _{t}* = 0) relative to Robins’ G-Estimator with correct

*b*guesses. This suggests that important improvements are needed in small sample estimation of standard errors when estimated

_{t}*b*’s are used with the G-Estimator.

_{t}#### 4.3.2 Experiment B

Figure 1 shows the results of Experiment B in a 3 × 2 array of plots. The six panels correspond to the six causal parameters of interest in the SNMM, with each row corresponding to the parameters at a particular time point. Each point corresponds to a separate experiment with *N* = 1000 data sets (replicates) of size *n* = 300. The abscissa specifies different levels of mis-specification of the ε_{t} terms in the 2-Stage fits; measures of mis-specification are shown in terms of both ν and RSMSD. The ordinate measures the relative mean squared (REL MSE) of the G-Estimator relative to the 2-Stage Estimator. Within each panel are two plots/curves, one for each of the two G-Estimators considered above. The error bars for REL MSE are defined as $\text{REL MSE}\phantom{\rule{thinmathspace}{0ex}}\pm 2\phantom{\rule{thinmathspace}{0ex}}\times \phantom{\rule{thinmathspace}{0ex}}\sqrt{\text{BVE}}$ where BVE is a bootstrap variance estimate (based on resampling with replacement) of the REL MSE.

*n*= 300: Understanding the bias-variance trade-off in terms of relative mean squared error (REL MSE) of the G-Estimator relative to the 2-Stage Estimator, as a function of mis-specifications of the nuisance functions

**...**

Values of ν were varied from 0.0 to 5.0; this, in turn, corresponded to values of SRMSD between 0.0 and 0.55 – that is, from no mis-specification to just beyond “moderate” amounts of mis-specification.

As expected, the curves decline for larger values of ν. The point at which the curves drop below 1.0 denotes the point at which the 2-Stage Estimator no longer dominates in terms of MSE. The results of the experiment indicate that at roughly a “moderate” amount of mis-specification (SRMSD ≈ 0.5), the G-Estimators begin to dominate, although there is some variation by parameter in the trajectories. In the case of β_{10}, for instance, the curve never drops below 1.0, indicating that the 2-Stage Estimator always dominates. In the case of the *t* = 3 parameters using the G-Estimator based on 2-Stage guesses for *b _{t}*, on the other hand, the two estimators perform so similarly at the true model (ν = 0), that the REL MSE quickly falls below 1.0. As expected, the G-Estimator with

*b*= 0 never performed better than the other two estimators.

_{t}## 5. An Illustration using the PROSPECT Data

A subset of the PROSPECT data with *n* = 277 is used to illustrate the methodology for *K* = 3 time points. The sample used in this illustration uses only patients randomized to the treatment arm in PROSPECT. *A _{t}* denotes binary treatment assignment at time

*t*, where

*A*= 1 means the subject received treatment at time

_{t}*t*; that is, had contact with a mental health specialist at time

*t*. ${S}_{t}={\overline{\mathit{\text{SSI}}}}_{t}$ denotes the Scale for Suicidal Ideation at time

*t*, a continuous measure of suicidal thoughts for which higher values of $\overline{\mathit{\text{SSI}}}$ means more suicidality. The outcome

*Y*is defined as the Hamilton Depression Scale score at the final, 12-month, visit:

*Y*=

*HAMD*

_{12}. Higher levels of

*Y*means more depression. In the actual analysis, square-root-transformed versions of the scores were used for both the

*SSI*’s and

_{t}*HAMDA*

_{12}. The monotonic sqrt-transformation preserves the original SSI and HAMDA interpretations, and produces more stable covariate and final outcome measures by correcting for skewness; this, in turn, improves asymptotic approximations to the test statistics employed in the data analysis.

We use simple linear parameterizations as in Equation (4) for the intermediate causal effects μ_{1} through μ_{3}. Specifically, we use μ_{1} = *A*_{1} × (1, *S*_{1}) × (β_{10}, β_{11})^{T}, μ_{2} = *A*_{1} × (1, (*S*_{1} + *S*_{2})/2) × (β_{20}, β_{21})^{T}, and μ_{3} = *A*_{3} × (1, (*S*_{1} + *S*_{2} + *S*_{3})/3) × (β_{30}, β_{31})^{T}. These intermediate causal effects cannot vary according to previous levels of treatment, because in this sample treatment is monotonic; in other words, _{3} = {(0, 0, 0), (1, 0, 0), (1, 1, 0), (1, 1, 1)} in PROSPECT. μ_{t} models the causal effect of switching off treatment at time *t*, as a function of a summary score of mean history of suicidal ideation over time. In this analysis, we expect that for _{3} = 0, treatment reduces levels of depression—that is, the time-varying “baseline” effects of treatment are negative: β_{t0} < 0. In addition, we expect that β_{t1} > 0—that is, that the reduction in depression as a result of treatment is not as strong for patients with higher levels of suicidal thoughts.

The results of this illustrative analysis are shown in Table (2) for both the 2-Stage Regression Method and Robins’ Method. Simple linear regression models were used for the components of ε_{t} in the 2-Stage Method, and for the *p _{t}*, ξ

_{t}, and Δ

_{t}in Robins’ Method. These models are not shown here for reasons of space.

**...**

Both estimators show good agreement in terms of *direction* of the effects, but not in terms of *magnitude*. Although the 2-Stage Estimates and the G-Estimates do not differ in magnitude when their standard errors are taken into account, the 2-Stage Estimator consistently yields estimates that are closer to zero compared to the estimates provided by the G-Estimator. (The exception is β_{10}.) In our experience with a variety of simulated data sets (including those described in the previous section, and others not shown here), the 2-Stage Estimator did not show consistently smaller estimates compared to the G-Estimates, but this may require more in-depth study.

Neither estimator suggests effect moderation of the effect of treatment during the first four months of treatment given levels of *SSI*_{1}. The effect of no treatment between the 4- and 8-month visits is positive for both estimators (significantly different from zero under Robins’ G-Estimator). The estimated main effect of treatment between the 8- and 12-month visits shows a significant negative effect (significantly different from zero for both estimators). Finally, both the 2-Stage Estimator and Robins’ G-Estimator suggest that (*S*_{1} + *S*_{2} + *S*_{3})/3 moderates the effect of *A*_{3}.

## 6. Discussion

This article presents and discusses the use of intermediate causal effects to study time-varying causal effect moderation. It is shown how the intermediate causal effects are a part of Robins’ Structural Nested Mean Model (Robins (1994)). In fact, the time-varying intermediate causal effect functions presented here are a version of Robins’ *blip* functions. Two estimators—one parametric and one semi-parametric—of the intermediate causal effects are presented and compared. Two simulation experiments that shed light on the bias-variance trade-off between the parametric 2-Stage Regression Estimator and the semi-parametric G-Estimator were carried out.

The SNMM and Robins’ G-Estimator have also been used previously to study the effects of randomization to an intervention in the presence of non-compliance (Goetghebeur and Fischer-Lapp, 1997; Fischer-Lapp and Goetghebeur, 1999), including when the outcome is binary (Robins and Rotnitzky, 2005; Vansteeldandt and Goetghebeur, 2003); and the G-Estimator has been used recently for studying causal effect mediation (Ten Have et al., 2007; Joffe et al., 2007). Petersen and van der Laan (2005) have also proposed a method for assessing time-varying effect moderation, called Historically-Adjusted Marginal Structural Models (HA-MSMs). With HA-MSMs, Petersen and van der Laan have generalized MSMs (Robins (1999)) to allow conditioning on time-varying covariates; this is accomplished by positing different MSMs, one per time point, and estimating them simultaneously. HA-MSM’s differ from SNMMs in one important respect; namely, SNMMs are fully structural models for the conditional mean of *Y* given (_{K}, Ā_{K}), whereas with HA-MSMs there is no requirement, for instance, that the model posed for the causal effect of *a*_{1} in the MSM at *t* = 1 be equivalent to the model for the causal effect of *a*_{1} that is *implied* by the last MSM at *t* = *K*. Future work that further compares HA-MSMs and SNMMs for modelling time-varying effect moderation will be important.

The 2-Stage Estimator requires more knowledge about portions of the conditional mean of *Y* given (_{K}, Ā_{K} than does Robins’ G-Estimator. If this additional knowledge (concerning the nuisance functions ε_{t}) is incorrect, it is possible that is biased for the true β. On the other hand, scientists may tolerate bias in if its variance is smaller than an unbiased . The simulation studies presented above begin to shed light on this bias-variance trade-off. The simulation experiments suggest that it may be useful to consider parametric estimators such as the 2-Stage Estimator over the G-Estimator under *moderate* mis-specifications in models for the nuisance functions using the parametric estimator. Of course, the scientist will never really know the amount of mis-specification s/he may incur in the process of modeling the error terms. In addition, it may be possible that the scientist will mis-specify the μ_{t} as well.

An important limitation of our simulation Experiment B is that our results are contingent upon our method for exploring the space of mis-specified 2-Stage Regression fits. Though we have found similar results (not shown) when we have considered other one-dimensional paths through the truth, it is possible that other approaches, to making the fitted model differ from the correct model, may lead to different results. More work, including theoretical work as well as simulation studies, that compares and sheds more light on various bias-variance properties of the two proposed estimators is necessary. In particular, more work is needed in this area to understand the extent to which parametric estimators in noisy settings may dominate semi-parametric estimators of the SNMM, including the different possible generative models (i.e., scenarios) under which this may or may not be true.

While the 2-Stage Estimator can serve a stand-alone estimator for the intermediate causal effects, it is also quite useful as a method to obtain high quality starting values for Robins’ G-Estimator. Recall that the semi-parametric efficiency of Robins’ G-Estimator requires correct models for the *b _{t}* functions. The 2-Stage Estimator provides a principled method, from the standpoint of attempting to model the nuisance functions and respecting its constraints, for obtaining starting values for

*b*. In moderate to large sample sizes, the simulation experiments show a marked improvement in the performance of Robins’ G-Estimator (in terms of MSE) when the 2-Stage Estimator is used to obtain the

_{t}_{t}compared to having no model for the

*b*’s. Experiment B demonstrates how this improvement persists (though diminishes, slightly) in moderate sample sizes even as the amount of mis-specification of the nuisance functions in the guesses for

_{t}*b*increases.

_{t}The methodology was illustrated in Section 5 with observational data from the PROSPECT study. An important concern that comes to mind when interpreting the results of this illustrative analysis, is that the assumption of sequential ignorability may be violated in our particular analysis. The illustrative analysis assumes only that suicidality both (a) affects depression outcomes, and (b) determines whether or not a patient receives treatment at the next time point. Yet it may be possible that subjects that were worse off, in terms of having higher depression scores and more emotional and physical problems, are more likely to receive treatment at subsequent visits to the clinic. If this is true, then the estimates of β (under both estimators) are likely biased due to baseline and/or time-varying confounders. A more in-depth (and proper) analysis of this data will seek to understand what are the possible confounders (baseline and/or time-varying) of the effect of treatment, by discovering what are the predictors of time-varying treatment Ā_{3}. It would be possible to adjust for these additional time-varying confounders using the estimation methods proposed here in combination with inverse-probability-of-treatment weights (Robins (1999)), for example, but this is beyond the scope and purpose of this article. This is a promising future research direction that is currently being explored, as it begins to pull together the ideas of estimation used in Marginal Structural Models (Robins (1999)) with the ideas of estimation used in SNMMs (e.g., the methodology presented in this article).

Another natural extension of the SNMM methodology described in this article is to extend it to accommodate a time-varying longitudinal outcome *Y _{t}* (one that is recorded at each visit, for example). This model would require a separate SNMM specification for

*Y*given (

_{t}_{t}, Ā

_{t}) for every

*t*(i.e., a SNMM at each time point). A generalization of this sort along a GEE framework should be relatively straightforward; and in this case, both the 2-Stage Estimator and Robins’ G-Estimator presented here can be used with little modification. The development of a Maximum Likelihood Estimator (MLE) is needed, however, before moving towards a growth-model or mixed-models framework. The 2-Stage Estimator can be seen as paving the way for a MLE for the SNMM. Indeed, the 2-Stage Estimator already requires models for the conditional mean of

*Y*given (

_{K}, Ā

_{K}), and for portions of the conditional distribution of

*S*given (

_{t}_{t−1}, Ā

_{t−1}) for all

*t*. To develop the MLE, an additional step would involve positing distributional assumptions (e.g., normality) for the

*Y*given (

_{K}, Ā

_{K}) and

*S*given (

_{t}_{t−1}, Ā

_{t−1}) distributions. Note that as moments-based estimators, neither the 2-Stage Estimator nor Robins’ G-Estimator require distributional assumptions on the full likelihood for (

_{K}, Ā

_{K},

*Y*).

## Supplementary Material

#### Appendix

**SUPPLEMENTARY MATERIALS:**

Web Appendices referenced throughout this article are available under the Paper Information link at the Biometrics website http://www.tibs.org/biometrics.

^{(99K, pdf)}

## ACKNOWLEDGEMENTS

We would like to thank ‥‥**insert later**…‥ Funding was provided by NIMH grants R01-MH-61892-01A2 (TenHave), R01-MH-080015-01 (Murphy), and a NIDA grant P50-DA-010075-02 (Murphy).

## Footnotes

^{1}The sinusoidal functions were placed in the generative model to add some complexity to the nuisance function terms in the generative model.

## REFERENCES

- Baron R, Kenny D. The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology. 1986;51:1173–1182. [PubMed]
- Bickel P, Klaassen C, Ritov Y, Wellner J. Efficient and Adaptive Estimation for Semiparametric Models. Johns Hopkins University Press; 1993.
- Bruce M, Pearson J. Designing an intercention to prevent suicide: Prospect (prevention of suicide in primary care elderly: Collaborative trial) Dialogues in Clinical Neuroscience. 1999;1:100–112. [PMC free article] [PubMed]
- Bruce ML, Ten Have TR, Reynolds CFI, Katz IR, Schulberg HC, Mulsant BH, Brown GK, McAvay GJ, Pearson JL, Alexopoulos GS. Reducing suicidal ideation and depressive symptoms in depressed older primary care patients: A randomized controlled trial. Journal of the American Medical Association. 2004;291:1081–1091. [PubMed]
- Cohen J. Statistical power analysis for the behavioral sciences. 2nd edition. Hillsdale, NJ: Lawrence Earlbaum Associates; 1988.
- Fischer-Lapp K, Goetghebeur E. Practical properties of some structural mean analyses of the effect of compliance in randomized trials. Controlled Clinical Trials. 1999;20:531–546. [PubMed]
- Goetghebeur E, Fischer-Lapp K. The effect of treatment compliance in a placebo-controlled trial: Regression with unpaired data. Applied Statistics. 1997;46:351–364.
- Holland P. Statistics and causal inference. Journal of the American Statistical Association. 1986;81:945–970.
- Joffe M, Small D, Hsu C. Defining and estimating intervention effects for groups who will develop an auxiliary outcome. Statistical Science. 2007;22:74–97.
- Kraemer HC, Wilson G, Fairburn C. Mediators and moderators of treatment effects in randomized clinical trials. Archives of General Psychiatry. 2002;59:877–883. [PubMed]
- McCullagh P, Nelder J. Generalized Linear Models. 2nd edition. London: Chapman and Hall; 1989.
- Newey W. Semiparametric efficiency bounds. Journal of Applied Econometrics. 1990;5:99–135.
- Petersen ML, van der Laan MJ. History-adjusted marginal structural models: Time-varying effect modification. Technical Report 173. U.C. Berkeley Division of Biostatistics; 2005. http://www.bepress.com/ucbbiostat/paper173.
- Robins J. A graphical approach to the identification and estimation of causal parameters in mortality studies with sustained exposure periods. Journal of Chronic Disease. 1987;40:139s–161s. [PubMed]
- Robins J. Health Service Research Methodology: A Focus on AIDS. NCHSR, US Public Health Service; 1989a. The analysis of randomized and non-randomized aids treatment trials using a new approach to causal inference in longitudinal studies. pp. 113–159.
- Robins J. The control of confounding by intermediate variables. Statistics in Medicine. 1989b;8:679–701. [PubMed]
- Robins J. Correcting for non-compliance in randomized trials using structural nested mean models. Communications in Statistics, Theory and Methods. 1994;23:2379–2412.
- Robins J. Estimating causal effects of time-varying endogenous treatments by g-estimation of structural nested models. In: Berkane M, editor. Latent Variable Modeling and Applications to Causality, Lecture Notes in Statistics. New York: Springer; 1997. pp. 69–117.
- Robins J, Rotnitzky A. Estimation of treatment effects in randomised trials with non-compliance and a dichotomous outcome using structural mean models. Biometrika. 2005;91 763783.
- Robins JM. Association, causation, and marginal structural models. Synthese. 1999;121:151–179.
- Robins JM, Mark S, Newey W. Estimating exposure effects by modelling the expectation of exposure conditional on confounders. Biometrics. 1992;48:479–495. [PubMed]
- Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology. 1974;66:688–701.
- Ten Have T, Joffe M, Lynch K, Maisto S, Brown G, Beck A. Causal mediation analyses with rank preserving models. Biometrics. 2007;63 926934. [PubMed]
- Vansteeldandt S, Goetghebeur E. Causal inference with generalized structural mean models. Journal of the Royal Statistical Society, Series B. 2003;65:817–835.

## Formats:

- Article |
- PubReader |
- ePub (beta) |
- PDF (883K)

- Subgroups analysis when treatment and moderators are time-varying.[Prev Sci. 2013]
*Almirall D, McCaffrey DF, Ramchand R, Murphy SA.**Prev Sci. 2013 Apr; 14(2):169-78.* - Time-varying effect moderation using the structural nested mean model: estimation using inverse-weighted regression with residuals.[Stat Med. 2013]
*Almirall D, Griffin BA, McCaffrey DF, Ramchand R, Yuen RA, Murphy SA.**Stat Med. 2013 Jul 19; . Epub 2013 Jul 19.* - A comparison of methods for estimating the causal effect of a treatment in randomized clinical trials subject to noncompliance.[Biometrics. 2009]
*Little RJ, Long Q, Lin X.**Biometrics. 2009 Jun; 65(2):640-9. Epub 2008 May 28.* - Statistical inference for a two-stage outcome-dependent sampling design with a continuous outcome.[Biometrics. 2011]
*Zhou H, Song R, Wu Y, Qin J.**Biometrics. 2011 Mar; 67(1):194-202.* - Mediation and moderation of treatment effects in randomised controlled trials of complex interventions.[Stat Methods Med Res. 2010]
*Emsley R, Dunn G, White IR.**Stat Methods Med Res. 2010 Jun; 19(3):237-70. Epub 2009 Jul 16.*

- Cluster randomized adaptive implementation trial comparing a standard versus enhanced implementation intervention to improve uptake of an effective re-engagement program for patients with serious mental illness[Implementation Science : IS. ]
*Kilbourne AM, Abraham KM, Goodrich DE, Bowersox NW, Almirall D, Lai Z, Nord KM.**Implementation Science : IS. 8136* - Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions[Biometrika. 2013]
*Zhang B, Tsiatis AA, Laber EB, Davidian M.**Biometrika. 2013; 100(3)10.1093/biomet/ast014* - The Parametric G-Formula to Estimate the Effect of Highly Active Antiretroviral Therapy on Incident AIDS or Death[Statistics in medicine. 2012]
*Westreich D, Cole SR, Young JG, Palella F, Tien PC, Kingsley L, Gange SJ, Hernán MA.**Statistics in medicine. 2012 Aug 15; 31(18)2000-2009* - Estimating the optimal dynamic antipsychotic treatment regime: Evidence from the sequential multiple assignment randomized CATIE Schizophrenia Study[Journal of the Royal Statistical Society. S...]
*Shortreed SM, Moodie EE.**Journal of the Royal Statistical Society. Series C, Applied statistics. 2012 Aug 1; 61(4)577-599* - Subgroups Analysis when Treatment and Moderators are Time-varying[Prevention science : the official journal o...]
*Almirall D, McCaffrey DF, Ramchand R, Murphy SA.**Prevention science : the official journal of the Society for Prevention Research. 2013 Apr; 14(2)169-178*

- PubMedPubMedPubMed citations for these articles