Survival models for familial aggregation of cancer.

It has recently been shown that the relative risks of the order of 2 to 4 that are frequently found for cancer among relatives of affected cases are unlikely to be explainable by shared environmental risk factors. Classical methods of epidemiological analysis are not well suited to such analysis because they assume that the outcomes of each individual are independent. Classical methods of genetic analysis, on the other hand, are limited in their handling of environmental factors and variable ages of onset. The recent development of random effects models for survival analysis, however, appears to bridge this gap. Specifically, a proportional hazards model is postulated for the effects of measured covariates and of one or more components of frailty that are unmeasured but assumed to have some common distribution and known covariance structure within each family. From these assumptions, the posterior expectation of the hazard for each individual can be derived, given the covariate value and the observed and expected disease history of the family. These are then treated as known in a standard partial likelihood analysis; this is essentially a form of expectation-maximization algorithm. However, this does not provide a valid estimate of the covariance matrix because it fails to take account of the variability in the estimates of the frailties; an alternative approach using the imputation-posterior algorithm is suggested. This paper describes extensions of this approach to multivariate frailty distributions, modifications for application to pedigree and case-control studies, some simulation results, and applications to studies of breast cancer in twins and of lung cancer in relation to family smoking habits.


Introduction
Studies of familial aggregation of chronic disease pose considerable complexities for the data analyst. These include a) the correlation in outcomes resulting from the sharing of unmeasured genetic and/or environmental influences within families; b) the different degrees of sharing expected between different types ofrelatives; c) the censored survival time nature of the response variable, requiring the use of appropriate survival analysis techniques; and d) the need to consider multiple measured covariates (usually environmental but possibly including genetic markers); and e) the possibility of gene-environment interactions. Classical segregation and path analysis methods in genetics are designed specifically to address the first two issues, but are not wellsuited to the latter three. Standard logistic and Cox regression techniques in epidemiology are well suited to dealing with the third and fourth issues but assume independent responses across individuals, and so they cannot be directly applied to the first two. The recent development of multivariate survival models (1,2) offers considerable promise to bridge the gap between these two approaches.
Basically the approach postulates the existence of an unobserved variable or variables (frailty) that take on the same or correlated values within a family and reflect unmeasured genetic and environmental influences on disease risk. Conditional on this frailty, the outcome is assumed to be independent between family members and to follow a proportional hazards model. In the analysis the unknown frailties are replaced by their posterior expectations, given the covariate values, periods of observation, outcomes of their family members, and current model parameters. These estimates are used as if they were known in a standard partial likelihood; this is essentially a form of expectation-maximization (EM) algorithm (3). In the next section we review this approach for the case where all family members are assumed to have the same frailty, and we discuss an approach to the problem of variance estimation using the imputation-posterior (IP) algorithm (4). The section "Multivariate Frailty Models" discusses some approaches to the case of correlated frailties. Our approach is closely related to that of Bonney (5), who proposes a logistic regression model for dichotomous outcomes using a regressive approach by ordering the subjects in the data set in such a way that each subject depends only on those preceding him or her in the data. A likelihood is constructed by assuming that the probability of each subject's outcome is given by a logistic function of his own observed covariates and unobserved genotype and then summing over all possible sets of genotypes, weighted by their prior probabilities under a particular inheritance model. Our approach generalizes this model to survival time data and does not require any ordering of the data.
The posterior estimate of the frailty is essentially a comparison of the observed number of cases in the family to an expected number based on the family members' ages and times at risk and their covariate values. The naturalness of this estimate can be seen by some calculations of familial relative risk (i.e., the risk of disease in relatives of an affected family member divided by that in relatives of unaffected family members) based on simple genetic models of single gene inheritance (6). These familial relative risks increase markedly with the number of affected family members but decrease only slowly with the number of unaffected members (Table  1). This occurs because if the disease is rare, each affected member considerably increases the posterior probability that the gene is present in the family, whereas each unaffected member only slightly reduces it, since the absence ofa rare disease is not a particularly informative observation. An important corollary of this observation is that genetic relative risks (i.e., the risk of disease given the gene is present divided by that given the gene is absent) as large as 100 can easily produce familial relative risks of only 2 or 3. Conversely, since environmental risks this large are seldom observed, familial relative risks of 2 or 3 are difficult to explain by shared risk factors (7).
The method just described applies to the analysis of cohort data. A commonly used technique in genetics involves the ascertainment of affected probands and investigation of the disease history of their family members. This design is closely related to the case-control study in epidemiology in which affected cases are ascertained and matched with unaffected and unrelated controls, and their family histories are compared. These designs require some adaptation of the frailty analysis approach, which is discussed in the section "Modifications for Proband and Case-Control Designs." We hoped that the use of multivariate frailty models where i = 1,. . ., I indicates the family, j = 1 .... mi the members of the family X and N = Imi. Note that in this model, the unobserved frailties e,i are assumed to be the same for all members of the family; in the third section we consider extensions in which the Ei are mi-vectors having some known covariance structure. By integrating over possible values of the unknown frailty, Clayton and Cuzick (2) and Self and Prentice (1) show that if the Ei are assumed to have a log-gamma distribution with zero mean and variance -y, then the posterior expectation of the hazard is given by where n(tk) is the number of failures at tk and t E (tk-l,tk).
We found that convergence was improved if the iteration in step a was taken to convergence before a new set of baseline rates was estimated in step b; usually, only two or three cycles were then needed to obtain overall convergence. Self and Prentice (1) estimate the frailties conditional on F,, the history of events, covariate values, and censoring up to time t, in order to avoid having the hazard at time t depend on events in the future. However, this approach fails to produce a true partial likelihood due to the dependence of the frailty estimates on -y, which is estimated from the entire data set including events in the future. It also has the undesirable result that the estimate of the frailty for each family varies as information about the family accumulates, even though the true frailty is assumed to be constant over time; thus, less information is used to predict the frailty effect for the early failures than for the later failures. For this reason, Clayton and Cuzick (2) prefer to base their frailty estimate on the lifetime history of the entire family, which should produce a more efficient estimator. Self and Prentice do not give a variance estimator for i and y, although Clayton and Cuzick do. The problem with naive use ofthe information matrix from the partial likelihood is that it assumes the covariates, including Ei, are known. The obvious correction would be to compute derivatives of the log likelihood with respect to 13 and y including terms for the dependence of Ei on 1, 'y and Xo. But the Xo in turn depend on 13, -y, and i, thus leading f-o an infinite-recursion, which seems intractable. A more promising approach would be to use the full survival likelihood and invert the full information matrix with respect to 13, y, and Xo. Representing the full survival likelihood a-s LF(0,Ao iE) = Hij [Xo(tij) exp(zij 1 + Ei)]Dii expE -AO(tij) exp(z,' + si)], a potential aproach would be to use the full expected likelihood, L , obtained by integrating out the unknown frailties, A more attractive approach is to use the IP algorithm (4), which provides a Monte Carlo estimate of the entire posterior distribution of model parameters. Essentially one would proceed as follows. Given the current estimates of 1, y, and X0, one would draw a single random sample of Ei for each family from their posterior distributions, which is also gamma with parameters 1/y + Di and (1/y + Ei) -'. Then treating these as known, one would draw a single random sample of Xo(t), 1, and y from their respective posterior distributions, g-lven the current estimates of the other parameters. The process continues indefinitely, and after a sufficient number of iterations, it settles down to produce successive random samples from the posterior distributions of each of the parameters. Details of this approach are given in Appendix 1.

Multivariate Frailty Models
The assumption of a common frailty E for all members of the family is simplistic in that family members share different degrees of genetic and environmental influences. This suggests that one might consider replacing Ei by an mi-vector, whose elements have some covariance structure determined by their relationships to each other. The log-gamma distribution is not easily generalized to multivariate settings, but a multivariate normal distribution provides a close approximation. By fitting the first two derivatives of the posterior distribution of Ei, Mack (9) has developed approximate expressions for the posterior expectation of fi which can be used as described above. For example, suppose there are two components of frailty, one a genetic effect and one an environmental effect; let sijp (p = 1,2) denote these two unobserved variables and let Pijkp denote the correlation in Ei.p between members j and k of family i (e.g., 1 for the genetic factor for monozygotic twins, 1/2 for first degree relatives, 1/4 for second degree relatives, etc.; c1 for the environmental factor between spouses, c2 between sibs, C3 between parents and offspring, etc.).
Then we would estimate Fijp by mi 2 (Dik -E k) Pijkp As before, we take the regression coefficient for frailty to be unity and estimate the variance of each component of the frailty distribution and any unknown parameters in their correlation matrices.
Unfortunately, for simple family structures, estimates of the variance and correlation in frailty are virtually colinear. Thus, a strong familial aggregation can be equally well explained either by a large variance with a small correlation or by a small variance with a high correlation (Fig. 1). Thus, it would be necessary to arbitrarily fix at least one of the unknown correlations and hope to estimate the other correlations relative to it, leaving the variance free; with sufficiently rich family structures this may be feasible, although the simulations described later are not encouraging.
We have also considered a two-point frailty distribution based on single gene models ofinheritance. These are expressed in terms of two parameters, the gene frequency and the genetic relative risk, with the association between family members determined by Men- delian laws. Detailed expressions are given elsewhere (9) and summarized in Appendix 2. Unfortunately, this approach also seems to suffer from similar problems of colinearity: there is no obvious way to simultaneously estimate both the gene frequency and the genetic relative risk. It appears that, without other information to estimate the gene frequency, it would be difficult to estimate the two parameters simultaneously, although complex family structures may be more informative. Again, it is tempting to consider an approach using the IP algorithm. For each individual, one would draw a random Eij from its posterior distribution given the subject's disease status Dij, his expected incidence E%j [Eq. (ic) except for the summation over family members], and the current values of Eik for his family members. This is simplified by knowing the conditional independence structure of the family: for example, for a genetic model, Eij is dependent only on the Eik of his or her parents, spouse, and offspring. The rest of the iteration proceeds as described above. Details are given in Appendix 1.

Modifications for Proband and Case-Control Designs
In pedigree analysis, cases of a disease are ascertained from some disease registry (these are called the probands). All their family members are identified and their disease status determined (10). Thus, probands have the disease with probability one by virtue of the sampling scheme. The standard approach in genetic analysis is to exclude these subjects and determine whether the occurrence of disease in the rest of the family is consistent with Mendelian laws given the presence of the affected proband. We propose to use the same principle in fitting the frailty model. Thus, the cohort would consist only of the nonprobands, who would be viewed as a birth cohort. However, the probands would be counted as observed events in Eq. (lb) for estimating the frailties of the nonprobands, calculating their Eu by applying the estimated baseline rates from the cohort of nonprobands to their time from birth to diagnosis.
This can be viewed as a purely internal analysis, which does not take advantage of the information that the cohort may have a higher incidence rate than the population at large. Such a higher incidence rate would presumably be a reflection ofthe same phenomenon that leads to clustering within families and an analysis that takes advantage of that information should be more powerful, particularly if the number of nonproband cases in the family is small. Assuming a set ofpopulation rates were available and thought to be applicable to the cohort (i.e., comparable ascertainment of cases within the family and in the population and no selection biases other than as a result of sampling by probands), then in principle one could use the population rates to derive an alternative estimate of the baseline rates. The difficulty comes in adjusting the population rates for covariates, because their distributions in the population may be different from those in the cohort. If the disease is rare and we are prepared to assume that the covariate distribution in the nonaffected members of the family is representative of the population, then we can approximately estimate the baseline rates as where X(t) is the population rate. An important advantage of the case-control design is that such assumptions are unnecessary as the controls essentially provide an estimator of the baseline rates.
In the typical case-control study, cases of the disease of interest are ascertained and matched with one or more controls drawn from the population at risk. For the purpose of this discussion, the sample of controls might be drawn from population lists, by canvassing the neighborhood ofeach case, by random digit dialing, from lists of the cases' friends, from hospitalized or dead patients with other diseases, but not from relatives of the cases. In the standard approach, a family history covariate, such as the presence or absence of a positive family history, the number of affected family members, or the number affected weighted by their degree of relatedness, is computed for each case and control and used in standard conditional logistic regression analyses. This is inefficient and potentially biased because it does not take into account the number of events expected in each family. For example, if cases tend to have larger families (perhaps through some socioeconomic correlate), then they will be more likely to have positive family histories even if there is no shared genetic or environmental risk. It is natural, therefore, to compute a family history covariate as the difference of observed and expected events. The problem then arises as to how the expected events are to be computed.
One approach would be to treat all the family members other than the sampled cases and controls themselves as a birth cohort and to analyze these data in the same way as just described for the proband design. A simpler approach would be to analyze only the cases and controls using conditional logistic regression, treating the frailty estimate as a known covariate. For this purpose, baseline rates could be estimated either from the cohort of family members or from population rates as previously described, and the cases and controls themselves would not contribute to the frailty estimation (otherwise the case families would have higher expected frailties than the control families under the null hypothesis).
To fully exploit the power of the case-control design, one needs to have covariate values, not just for the cases and controls but also for all their family members. This can be quite difficult to obtain, particularly for those who have died. A reasonable approximation may be to impute values for missing covariate information by randomly sampling from the posterior distribution of covariate values given a) the covariate value of their matched case or control, b) the assumed value of the intra-family correlation of the covariate, c) the age-specific population distribution of covariate values, d) whether they and their matched case or control were affected or not, and e), the current fitted value of the regression coefficient for the covariate. If covariate values are known for some family members, this information will allow the intra-family correlation to be estimated. Detailed examples of such imputation rules are described elsewhere (9). For example, suppose for unaffected subjects at age t, the zij were normally distributed with mean ,u(t), variance cr2(t) and correlation p. Then for affected subjects, the marginal distribution of zu would also be normal with mean ,u(t) + P'o-2(t) and variance of k2(t) (11). Then conditional on zio, Du, and Dio, zu is normally distributed with mean ,u(t) + p3u2(t)Di0 + P[zi0 -,u(t) -Oa2(t)Dio] and variance u2(t)(1 _ p2), where D is an indicator for whether the subject is affected or not.

Simulation Results
In order to assess the feasibility of disentangling genetic and environmental influences using multivariate frailty models, we carried out a number of simulations. In most of these simulations, 25 four-generation cohorts of family members were generated; in relation to a subject in the third generation, the possible relatives would include siblings, parents, grandparents, offspring, aunts/uncles, and nieces/nephews; other family members might also have spouse and in-law relations. Each simulation comprised a total of 418 to 471 members. Two measured environmental covariates (one continuous and one dichotomous) and two multivariate normally distributed frailties (one for genetic and one for environmental influences, with a known covariance structures that were chosen to be as different as possible) were randomly assigned to each family member. Exponentially distributed failure times, conditional on the measured and unmeasured covariates, were generated for the cause of interest as well as for competing causes. Parameter values were adjusted to produce from 27 to 47 cases. Each simulation was analyzed using the method of Self and Prentice including various combinations of genetic and environmental frailty estimates. Because of the amount of computing required for the fitting, it was not feasible to replicate the simulations, so we are unable to describe the test size or power, but the consistency of our findings across simulations suggests that the results are unlikely to be due to chance. Table 2 summarizes the results of a portion of the simulations we conducted. In simulations 1 through 4, a highly significant familial aggregation was detected, but in none were we able to fit both a genetic and an environmental component simultaneously. Indeed, each time we tried, the estimate of the environmental vari- ance was negative and the likelihood for the geneticonly model was always slightly larger than for the environment-only model (even for simulation 3 in which the true environmental variance was larger than the genetic variance). Simulations 2 and 5 are identical except that simulation 2 includes both firstand second-degree relatives (25 families, 469 total subjects, 46 cases), whereas simulation 5 is restricted to first-degree relatives (50 families, 418 total subjects, 34 cases). This was done to assess the relative value of trying to obtain larger pedigrees versus a larger number of small pedigrees. Intuitively, one would imagine that the relative informativeness of the two designs would be approximately proportional to the number of cases from multiple-case families. In simulation 2, these numbered 40, whereas in simulation 5 they numbered 24. Furthermore, the average number of cases in families with more than one case was 4.1 in simulation 2 but only 2.25 in simulation 5. The pattern of likelihood ratio statistics was very similar between the two simulations, but the chi-square values were about 3.3 times larger for simulation 2. Given that the two simulations had roughly the same total number of subjects, we would conclude that the larger pedigrees were more infornative per subject; even scaling the statistics by the number of cases, we would also conclude that the larger pedigrees were more informative per case (presumably owing to the larger number of affected family members per case). However, this conclusion is based on the assumption that the quality of the data on second degree relatives is as good as that of the first degree relatives, which is unlikely to be true in practice.
Simulation 2 was also used to compare the results of the frailty analysis with what might be expected, using simpler family history covariates in standard methods of analysis (Table 3). Somewhat to our surprise, fitting the simulated model using frailty methods did not produce more significant results than a simple binary covariate (presence or absence of other affected family members). Also treating this binary covariate, or the number of affected family members, or the observed minus expected number as fixed covariates (i.e., using all times but excluding the index subject) consistently produced larger chi squares than treating the same covariates as time-dependent (i.e., using only events prior to the current time, but including the index subject); to  (1) assess whether this might be because of some liberality in the procedure under the null hypothesis, simulation 6 was generated with the true frailty variance being zero. All chi squares in this case were trivial. Simulation 7 addressed the case of a measured covariate being intermediate on a causal path from unmeasured genotype to disease (e.g., hormones as mediators of a genetic effect for breast cancer). As expected, the addition of the measured covariate considerably reduced the estimate of the genetic variance, but addition of the genetic frailty did not affect the measured covariate effect.
Finally, in simulation 8 we considered a design that ought to be optimal for separating genetic and environmental influences: monozygotic (MZ) and dizygotic (DZ) twins reared together and apart. Even in this case, the two frailty components could not be fitted simultaneously.

Applications Swedish Cohort Study of Breast Cancer in Twins
The details of this application of the Self and Prentice model for univariate frailty are described elsewhere (12). Basically, a cohort of 11581 female twin pairs was assembled from the Swedish registry oftwin births from 1886 to 1958, consisting of all those for whom both members responded to a questionnaire in 1961 (if born before 1925) or 1971 (if born after 1925) and for whom zygosity could be determined. This was then linked with the Swedish cancer registry that was created in 1958 to identify cancer cases and with the national death registry to assess vital status (13).
To illustrate the methods, we constructed a sample of the cohort consisting of all disease concordant pairs, a random 10% sample of discordant pairs, and a random 1% sample of nondiseased pairs. Because both the observed and expected number ofdisease concordant pairs are overestimated by the same factor, these overestimates will cancel in computing relative risks. Results of these analyses are presented in Table 4.
In the absence of measured covariates, a significant frailty variance was found with an estimate of 1.37 (SE = 0.75). This estimate was reduced only slightly by adjustment for birth cohort, cigarette smoking, and relative weight. (It would have been desirable to have more relevant covariates for breast cancer, but the study was not designed with this disease as its primary focus, and the relevant questions were not asked.) The approximate frailty covariate, observed minus expected cases, produced an estimate of 0.70 for all twins, 0.98 for MZ and 0.55 for DZ twins. The best fit was obtained by taking observed minus expected cases weighted by 1 for MZ and 1/2 for DZ twins. This can be seen as an approximation to the bivariate genetic frailty model.
There was a significant interaction between attained age and this genetic frailty covariate (X21 = 5.44), such that the frailty effect was stronger at younger ages. This is consistent with the suggestion that the genetic effect is strongest for premenopausal breast cancer.

Case-Control Study of Adenocarcinoma of the Lung and Familial Smoking
A population-based case-control study of adenocarcinoma in Los Angeles females was done to assess risk factors, including personal and passive smoking and family history. Details ofthe study design and the major findings can be found in the publication by Wu et al. (14). In particular, a highly significant effect of a family history of lung cancer was found, even after adjusting for personal smoking and other risk factors. In our analysis, we sought to determine whether some of this familial relative risk could be explained by correlation of family members' smoking habits.
For the analyses of passive smoking effects, each case and control was asked questions about the smoking habits of her parents, siblings, spouse, and other cohabitants. We also knew which of the subjects' first-degree relatives had had lung cancer and if so, whether or not they smoked. Finally, we knew how many brothers and sisters the subject had. Because of the design of the questionnaire, however, we did not know the lifetime smoking histories for the subjects' parents (only their status during the subjects' childhood and at diagnosis if they had lung cancer) nor which of the sibs smoked. Using the information that we did have on each family, we therefore tried to impute values for the unknown smoking histories to arrive at a random decision as to whether each family member smoked and if so, his age at starting and quitting and average number of cigarettes per day. This imputation applied the age-specific distributions of variables for cases and controls to affected and unaffected subjects, respectively, in the spirt of the section "Modifications for Proband and Case-Control Designs." The various decision rules are described elsewhere (9).
The analysis is based on the conditional likelihood for the cases and their matched controls, taking as a family history covariate the expectation of the frailty given the lifetime covariate, disease, and censoring histories of the family members. (We have not attempted a cohortstyle analysis because of the large size of the resulting  Table 5). Addition of personal smoking reduced this estimate to 1.99 (LR X 1 = 14.38); to obtain this estimate, the average rates for the entire cohort were used as XO(t) and the smoking covariate was not used in estimating the Ei terms in the frailty. In the next iteration, the smoking-adjusted baseline rates and family members' smoking habits were used to obtain smokingadjusted Ei and Ei, the resulting variance estimated reduced to 1.59 on the first iteration, but was still highly significant (LR X21 = 11.82). Thus, we would conclude that the familial aggregation of lung cancer was only partially explained by familial aggregation of smoking. Although this conclusion can only be tentative in view of the probable high degree of misclassification of family members' smoking habits, we designed the imputation rules in such a way as to maximize the smoking x lung cancer association, thereby giving familial smoking the largest possible opportunity for explaining the association.

Discussion
The methods we have described provide a means of analyzing survival data for families, taking into account their interrelationships and any measured covariates. The latter could include environmental exposures, genetic markers, or variables on a causal pathway from genotype to outcome (such as hormones or reproductive events in breast cancer). Thus, they appear to address the major limitations of classical genetic and epidemiologic methods, as enumerated at the beginning. Numerous details remain to be resolved, however, including the development of a tractable variance estimator, the identifiability of the multivariate models, and the validity of the proposals for applications to noncohort designs. Although we have developed a feasible program for the univariate frailty model (but not the correct variance estimator), it is highly computer-intensive and the proposed extensions to multivariate frailty models and the IP algorithm are likely to be even more so. The simulations suggest that simple approximations may perform quite well. Thus, the development of practical procedures and the study of the power to distinguish various alternatives remain high priorities.
The incorporation of genetic markers into such an analysis will be addressed in a separate paper. Such markers mij could simply be included as measured covariates in the models we have described above. However, an approach that would be more in the spirit of linkage analysis (15) would assume that m is informative about Ei and that conditional on Eij, Dij is independent of m. Assuming a logistic dependence of mij on sij, the contribution of subject ij to the likelihood would be of the form fHij P(Dij Ej,zjij,Eij) P(mij Eij) P(ti) dEl.
This work was supported in part by National Cancer Institute grant numbers CA42949 and CA14089. We are grateful to David Clayton for many helpful suggestions and to Birgitta Floderus and Anna Wu for making available the data that were used for illustration.

Appendix 1 Formulation of the IP Algorithm for Frailty Models
For univariate gamma frailty models, one would proceed as follows: Step 0. One could obtain initial estimates of Xo at ,B y = 0 by a simple Kaplan-Meier survival analysis on the entire cohort, or better, by applying a Cox regression analysis to obtain the maximum likelihood estimate (MLE) of 1 and Xo at y = 0. Better yet, one could use the EM-algorithimi approach described in the text to obtain the MLE of 1B, Xo, and -y.
Step 1. For each7?amiily, one would randomly sample a single value of the frailty Ei from the posterior distribution of eei Fi 1, y, Xo for the current values of these parameters. Assuming the prior distribution of e'i is gamma with shape parameter 11y and scale parameter -y, then the posterior distribution of eEi is also gamma with parameters Di + (1Iy) and [Ei + (l/y)] l Thus, it suffices to randomly sample frailties from this gamma distribution.
Step 2a. Now treating the Ei as known, one randomly samples values of Xo(tk) = Xk4(0) for each failure time tk from their posterior distributions given F, 1 and E. Assuming Xk has a flat prior on [0,o01, then the probability density function (pdf) for Xk is given by the likelihood, Xk eXkSk dXk where Sk = ijRk exp(zi1j' + 'j).
Thus the cumulative distribution function (cd) is simply e XkSk, so it suffices to draw a uniform [0,1] deviate F and compute Xk as -InFISk.
Step 2b. Assuming a flat prior on RK for 13, the posterior distribution of 1B given E and Xo is again given by the likelihood function, whichcan be written as InL(P) = Pl;j^_jDij -IijAo(tij) exp(zt>j'I + E).
Thus, we can draw the next value of 1 from a multivariate normal distribution with mean , -A2 1A1 and covariance matrix A2-1.
Step 2c. One ran=domly samples y from the posterior i'j : 1 + Mijl4 and a-2 is an unknown parameter assumed to be constant for all subjects. Then the posterior density of ij is proportional to exp(EijDij -Eije'ij) N(Eij uij, iZj) which can be approximated by a normal density with mean + (Dij -Ei1)I& and variance iE/(1 + ) In step 1, one would therefore simply sample Eij from these approximating normal distributions. In step 2c, 62 would be sampled from the posterior distribution of o2 given the set of residuals E&j -[LiJ. Assuming a flat prior for lnr, this has density E(&i -ij)2 (1 -Mt,/4) / XN_I Thus, one only computes this sum of squares and divides it by a random chi square deviate.

Appendix 2
Two-Point Model for Frailty among Sibs Let -y = gg, gG, and GG denote the possible genotypes, wr the prevalence of allele G, and R', the relative risk associated with genotype -y (1 of gg or gG and R for GG in a recessive model; 1 for gg and R for gG and GG in a dominant model). We assume a proportional hazards model of the form A(t,z,y) = Xo(t)e'8RV.