We now fit a Cox model with shared frailty to data on child mortality in Guatemala. We have analized this data using piecewise exponential models with Gamma heterogeneity, using Stata, see this page. We now try a Cox model with log-normal heterogeneity using R.
The dataset is available as a Stata file in long format (one record per child)
> library(haven) > gu <- read_dta("https://grodri.github.io/datasets/pebleystupp.dta")
The first step is to conduct an analysis equivalent to Pebley and Stupp’s original. We will split the data at durations 1, 6, 12, 24 and 60. This opens the option of reproducing their results exactly, and also allows us to create a key time-varying covaraite.
> library(survival) > gux <- survSplit(gu, cut = c(1, 6, 12, 24, 60), + end = "time", event="death", start="t0", episode="dur")
The model includes mother’s age and age squared, a linear term on birth order, and an indicator of whether the previous kid in the family died. There are also indicators of the length of the previous interval (in lieu of a factor).
The model also includes a time-varying covariate with time-varying
effects. The variable is the length of the following birth
interval, with indicators f0011
for intervals under a year
and f1223
for intervals of one to two years, capturing very
short and short intervals where a child would face competition from a
sibling.
These are coded as time-varying covariates because a sibling is assumed to affect the life of the index child after it is born. So we consider very short intervals (< 12) only at ahes 12 months and higher, and short intervals (12-23 months) only at ages 24 months and higher. This is the time-varying part. But the effect of very short intervals is also allowed to vary, havving different effects at ages 12-23 and 24 or more. Here are all the variables we need:
> library(dplyr) > gux <- mutate(gux, + mage2 = mage^2, + i011a1223 = f0011 * (t0 == 12), + i011a24p = f0011 * (t0 == 24), + i1223a24p = f1223 * (t0 == 24))
We are now ready to fit the model using Cox’s partial likelihood.
> phaz <- coxph(Surv(t0, time, death) ~ mage + mage2 + borde + pdead + + p0014 + p1523 + p2435 + p36up + i011a1223 + i011a24p + i1223a24p, + data = gux) > phaz Call: coxph(formula = Surv(t0, time, death) ~ mage + mage2 + borde + pdead + p0014 + p1523 + p2435 + p36up + i011a1223 + i011a24p + i1223a24p, data = gux) coef exp(coef) se(coef) z p mage -0.149145 0.861444 0.058235 -2.561 0.0104 mage2 0.002572 1.002576 0.001033 2.491 0.0127 borde 0.061282 1.063199 0.033444 1.832 0.0669 pdead 0.096835 1.101679 0.149488 0.648 0.5171 p0014 0.540841 1.717451 0.212542 2.545 0.0109 p1523 -0.121965 0.885179 0.186371 -0.654 0.5128 p2435 -0.257204 0.773210 0.184563 -1.394 0.1634 p36up -0.391115 0.676302 0.208744 -1.874 0.0610 i011a1223 0.813024 2.254716 0.716411 1.135 0.2564 i011a24p 1.615700 5.031410 0.736588 2.193 0.0283 i1223a24p 0.068558 1.070962 0.376923 0.182 0.8557 Likelihood ratio test=47.65 on 11 df, p=1.648e-06 n= 13594, number of events= 403
The similarity of the results to those obtained using a piecewise
exponential model is remarkable. The easiest was to do the comparison in
R is to use the Poisson trick, defining exposure as
time - t0
and treating death
as Poisson with
mean given by the product of the hazard rate (which is allowed to depend
on duration) and exposure.
We now introduce a shared frailty term at the mother’s level to allow
for intra-family correlation in child survival. R allows fitting a
frailty model via coxph
by adding a frailty()
term to the model formula. There is a new and more general approach in
Therneau’s coxme
library, which includes the
coxme()
function to fit mixed Cox survival models with
Gaussian random effects using a Laplace approximation. In this example
the two approaches give very similar answers, but this is not always the
case.
Here’s a run. All we do is add a term (1 | momid)
to the
model formula to indicate that we want a random intercept at the
mother’s level.
> library(coxme) > sfrail <- coxme(Surv(t0, time, death) ~ mage + mage2 + borde + pdead + + p0014 + p1523 + p2435 + p36up + i011a1223 + i011a24p + i1223a24p + + (1 | momid), data = gux)
Again, the results are remarkably similar to the estimates Guo and I obtained using a piecewise exponential model with gamma frailty, and which we have reproduced exactly using Stata. (This in spite of the fact that we have many ties, an average of almost ten deaths per distinct time.)
Let us compare the fixed effect estimates obtained with and without frailty:
> exp(cbind(coef(phaz), coef(sfrail))) [,1] [,2] mage 0.8614439 0.8563498 mage2 1.0025757 1.0026849 borde 1.0631985 1.0578345 pdead 1.1016790 0.9409469 p0014 1.7174514 1.7646553 p1523 0.8851791 0.9049408 p2435 0.7732102 0.7944649 p36up 0.6763022 0.6884454 i011a1223 2.2547157 2.2188139 i011a24p 5.0314096 5.0531009 i1223a24p 1.0709625 1.0696592
The estimates of the covariate effects are remarkably stable. The one
change worth mentioning is the coefficient for pdead
, which
changes sign, from 10.3% higher risk to 7.3% lower
risk when the previous child died. This variable was clearly acting as a
proxy for unobserved family effects.
The estimate of the variance of the random effect is 0.178. Because a log normal frailty term can be written in the log-scale as σ x where z is standard normal, we can interpret the estimate in the same way as other Cox coefficients. Specifically, exponentiating the standard deviation of 0.421 to obtain 1.524 we learn that children in families with frailty one standard deviation above the mean have 52.4% higher risk than children in average families with the same observed covariate values.
The piecewise exponential model with gamma frailty had a variance of 0.214. To compare results note that when log-frailty is N(0, σ2) frailty itself has variance (exp(σ2)-1) exp(σ2), so 0.178 translates to 0.232, a much closer result. (This also affects the baseline hazard, as mean frailty is not one but exp(σ2/2) or 1.093 in this case, but in a Cox model the point is moot.)
By the way coxph()
can also fit a model with shared
frailty via penalized partial likelihood by adding the model formula
term frailty(momid)
, for which the default is
gamma frailty. It estimates the variance as 0.2. For gamma frailty the
penalized likelihood produces exact maximum likelihood estimates.
Alternatively, one can fit a model using log-normal frailty by relying
on a Laplace approximation to the marginal likelihood.
Stata users: Stata can fit both models. The command
stcox
with the efron
option gives exactly the
same results as here for the first model. Adding the shared frailty
option shared(momid)
fits a model using gamma frailty, but
is very slow, taking 46 minutes on my home machine (compared to about 20
seconds in R). The results, however, are very similar to those obtained
here if you allow for the use of a different frailty distribution (and
almost identical to the R results using gamma rather than lognormal
frailty). In particular, the variance of gamma frailty is estimated as
0.210.