Germán Rodríguez
Generalized Linear Models Princeton University

7.3 Approaches to Survival Modeling

Up to this point we have been concerned with a homogeneous population, where the lifetimes of all units are governed by the same survival function \( S(t) \). We now introduce the third distinguishing characteristic of survival models—the presence of a vector of covariates or explanatory variables that may affect survival time—and consider the general problem of modeling these effects.

7.3.1 Accelerated Life Models*

Let \( T_i \) be a random variable representing the (possibly unobserved) survival time of the \( i \)-th unit. Since \( T_i \) must be non-negative, we might consider modeling its logarithm using a conventional linear model, say

\[ \log T_i = \boldsymbol{x}_i'\boldsymbol{\beta} + \epsilon_i, \]

where \( \epsilon_i \) is a suitable error term, with a distribution to be specified. This model specifies the distribution of log-survival for the \( i \)-th unit as a simple shift of a standard or baseline distribution represented by the error term.

Exponentiating this equation, we obtain a model for the survival time itself

\[ T_i = \exp\{\boldsymbol{x}_i'\boldsymbol{\beta}\} T_{0i}, \]

where we have written \( T_{0i} \) for the exponentiated error term. It will also be convenient to use \( \gamma_i \) as shorthand for the multiplicative effect \( \exp\{\boldsymbol{x}_i'\boldsymbol{\beta}\} \) of the covariates.

Interpretation of the parameters follows along standard lines. Consider, for example, a model with a constant and a dummy variable \( x \) representing a factor with two levels, say groups one and zero. Suppose the corresponding multiplicative effect is \( \gamma=2 \), so the coefficient of \( x \) is \( \beta=\log(2)=0.6931 \). Then we would conclude that people in group one live twice as long as people in group zero.

There is an interesting alternative interpretation that explains the name ‘accelerated life’ used for this model. Let \( S_0(t) \) denote the survivor function in group zero, which will serve as a reference group, and let \( S_1(t) \) denote the survivor function in group one. Under this model,

\[ S_1(t) = S_0(t/\gamma). \]

In words, the probability that a member of group one will be alive at age \( t \) is exactly the same as the probability that a member of group zero will be alive at age \( t/\gamma \). For \( \gamma=2 \), this would be half the age, so the probability that a member of group one would be alive at age 40 (or 60) would be the same as the probability that a member of group zero would be alive at age 20 (or 30). Thus, we may think of \( \gamma \) as affecting the passage of time. In our example, people in group zero age ‘twice as fast’.

For the record, the corresponding hazard functions are related by

\[ \lambda_1(t) = \lambda_0(t/\gamma)/\gamma, \]

so if \( \gamma=2 \), at any given age people in group one would be exposed to half the risk of people in group zero half their age.

The name ‘accelerated life’ stems from industrial applications where items are put to test under substantially worse conditions than they are likely to encounter in real life, so that tests can be completed in a shorter time.

Different kinds of parametric models are obtained by assuming different distributions for the error term. If the \( \epsilon_i \) are normally distributed, then we obtain a log-normal model for the \( T_i \). Estimation of this model for censored data by maximum likelihood is known in the econometric literature as a Tobit model.

Alternatively, if the \( \epsilon_i \) have an extreme value distribution with p.d.f.

\[ f(\epsilon) = \exp\{ \epsilon - \exp \{ \epsilon \} \}, \]

then \( T_{0i} \) has an exponential distribution, and we obtain the exponential regression model, where \( T_i \) is exponential with hazard \( \lambda_i \) satisfying the log-linear model

\[ \log \lambda_i = \boldsymbol{x}_i'\boldsymbol{\beta}. \]

An example of a demographic model that belongs to the family of accelerated life models is the Coale-McNeil model of first marriage frequencies, where the proportion ever married at age \( a \) in a given population is written as

\[ F(a) = c F_0( \frac{a-a_0}{k} ), \]

where \( F_0 \) is a model schedule of proportions married by age, among women who will ever marry, based on historical data from Sweden; \( c \) is the proportion who will eventually marry, \( a_0 \) is the age at which marriage starts, and \( k \) is the pace at which marriage proceeds relative to the Swedish standard.

Accelerated life models are essentially standard regression models applied to the log of survival time, and except for the fact that observations are censored, pose no new estimation problems. Once the distribution of the error term is chosen, estimation proceeds by maximizing the log-likelihood for censored data described in the previous subsection. For further details, see Kalbfleish and Prentice (1980).

7.3.2 Proportional Hazard Models

A large family of models introduced by Cox (1972) focuses directly on the hazard function. The simplest member of the family is the proportional hazards model, where the hazard at time \( t \) for an individual with covariates \( \boldsymbol{x}_i \) (not including a constant) is assumed to be

\[\tag{7.10}\lambda_i(t|\boldsymbol{x}_i) = \lambda_0(t) \exp\{ \boldsymbol{x}b \}.\]

In this model \( \lambda_0(t) \) is a baseline hazard function that describes the risk for individuals with \( \boldsymbol{x}_i=\boldsymbol{0} \), who serve as a reference cell or pivot, and \( \exp\{\boldsymbol{x}_i'\boldsymbol{\beta}\} \) is the relative risk, a proportionate increase or reduction in risk, associated with the set of characteristics \( \boldsymbol{x}_i \). Note that the increase or reduction in risk is the same at all durations \( t \).

To fix ideas consider a two-sample problem where we have a dummy variable \( x \) which serves to identify groups one and zero. Then the model is

\[ \lambda_i(t|x) = \left\{ \begin{array}{ll} \lambda_0(t)& \mbox{if $x=0$,} \cr \lambda_0(t) e^\beta& \mbox{if $x=1$.} \end{array} \right. \]

Thus, \( \lambda_0(t) \) represents the risk at time \( t \) in group zero, and \( \gamma=\exp\{\beta\} \) represents the ratio of the risk in group one relative to group zero at any time \( t \). If \( \gamma=1 \) (or \( \beta=0 \)) then the risks are the same in the two groups. If \( \gamma=2 \) (or \( \beta=0.6931) \), then the risk for an individual in group one at any given age is twice the risk of a member of group zero who has the same age.

Note that the model separates clearly the effect of time from the effect of the covariates. Taking logs, we find that the proportional hazards model is a simple additive model for the log of the hazard, with

\[ \log \lambda_i(t|\boldsymbol{x}_i) = \alpha_0(t) + \boldsymbol{x}b, \]

where \( \alpha_0(t)=\log \lambda_0(t) \) is the log of the baseline hazard. As in all additive models, we assume that the effect of the covariates \( \boldsymbol{x} \) is the same at all times or ages \( t \). The similarity between this expression and a standard analysis of covariance model with parallel lines should not go unnoticed.

Returning to Equation 7.10, we can integrate both sides from \( 0 \) to \( t \) to obtain the cumulative hazards

\[ \Lambda_i(t|\boldsymbol{x}_i) = \Lambda_0(t) \exp\{ \boldsymbol{x}b \}, \]

which are also proportional. Changing signs and exponentiating we obtain the survivor functions

\[\tag{7.11}S_i(t|\boldsymbol{x}_i) = S_0(t) ^{ \exp\{ \boldsymbol{x}b \}},\]

where \( S_0(t) = \exp\{ -\Lambda_0(t) \} \) is a baseline survival function. Thus, the effect of the covariate values \( \boldsymbol{x}_i \) on the survivor function is to raise it to a power given by the relative risk \( \exp\{ \boldsymbol{x}_i'\boldsymbol{\beta}\} \).

In our two-group example with a relative risk of \( \gamma=2 \), the probability that a member of group one will be alive at any given age \( t \) is the square of the probability that a member of group zero would be alive at the same age.

7.3.3 The Exponential and Weibull Models

Different kinds of proportional hazard models may be obtained by making different assumptions about the baseline survival function, or equivalently, the baseline hazard function. For example if the baseline risk is constant over time, so \( \lambda_0(t) = \lambda_0 \), say, we obtain the exponential regression model, where

\[ \lambda_i(t,\boldsymbol{x}_i) = \lambda_0 \exp \{ \boldsymbol{x}b \}. \]

Interestingly, the exponential regression model belongs to both the proportional hazards and the accelerated life families. If the baseline risk is a constant and you double or triple the risk, the new risk is still constant (just higher). Perhaps less obviously, if the baseline risk is constant and you imagine time flowing twice or three times as fast, the new risk is doubled or tripled but is still constant over time, so we remain in the exponential family.

You may be wondering whether there are other cases where the two models coincide. The answer is yes, but not many. In fact, there is only one distribution where they do, and it includes the exponential as a special case. The one case where the two families coincide is the Weibull distribution, which has survival function

\[ S(t) = \exp \{ - (\lambda t)^p \} \]

and hazard function

\[ \lambda(t) = p \lambda (\lambda t)^{p-1}, \]

for parameters \( \lambda>0 \) and \( p>0 \). If \( p=1 \), this model reduces to the exponential and has constant risk over time. If \( p>1 \), then the risk increases over time. If \( p<1 \), then the risk decreases over time. In fact, taking logs in the expression for the hazard function, we see that the log of the Weibull risk is a linear function of log time with slope \( p-1 \).

If we pick the Weibull as a baseline risk and then multiply the hazard by a constant \( \gamma \) in a proportional hazards framework, the resulting distribution turns out to be still a Weibull, so the family is closed under proportionality of hazards. If we pick the Weibull as a baseline survival and then speed up the passage of time in an accelerated life framework, dividing time by a constant \( \gamma \), the resulting distribution is still a Weibull, so the family is closed under acceleration of time.

For further details on this distribution see Cox and Oakes (1984) or Kalbfleish and Prentice (1980), who prove the equivalence of the two Weibull models.

7.3.4 Time-varying Covariates

So far we have considered explicitly only covariates that are fixed over time. The local nature of the proportional hazards model, however, lends itself easily to extensions that allows for covariates that change over time. Let us consider a few examples.

Suppose we are interested in the analysis of birth spacing, and study the interval from the birth of one child to the birth of the next. One of the possible predictors of interest is the mother’s education, which in most cases can be taken to be fixed over time.

Suppose, however, that we want to introduce breastfeeding status of the child that begins the interval. Assuming the child is breastfed, this variable would take the value one (‘yes’) from birth until the child is weaned, at which time it would take the value zero (‘no’). This is a simple example of a predictor that can change value only once.

A more elaborate analysis could rely on frequency of breastfeeding in a 24-hour period. This variable could change values from day to day. For example a sequence of values for one woman could be 4,6,5,6,5,4,\( \ldots \)

Let \( \boldsymbol{x}_i(t) \) denote the value of a vector of covariates for individual \( i \) at time or duration \( t \). Then the proportional hazards model may be generalized to

\[\tag{7.12}\lambda_i(t, \boldsymbol{x}_i(t)) = \lambda_0(t) \exp\{\boldsymbol{x}_i(t)'\boldsymbol{\beta}\}.\]

The separation of duration and covariate effects is not so clear now, and on occasion it may be difficult to identify effects that are highly collinear with time. If all children were weaned when they are around six months old, for example, it would be difficult to identify effects of breastfeeding from general duration effects without additional information. In such cases one might still prefer a time-varying covariate, however, as a more meaningful predictor of risk than the mere passage of time.

Calculation of survival functions when we have time-varying covariates is a little bit more complicated, because we need to specify a path or trajectory for each variable. In the birth intervals example one could calculate a survival function for women who breastfeed for six months and then wean. This would be done by using the hazard corresponding to \( x(t)=0 \) for months 0 to 6 and then the hazard corresponding to \( x(t)=1 \) for months 6 onwards. Unfortunately, the simplicity of Equation 7.11 is lost; we can no longer simply raise the baseline survival function to a power.

Time-varying covariates can be introduced in the context of accelerated life models, but this is not so simple and has rarely been done in applications. See Cox and Oakes (1984, p.66) for more information.

7.3.5 Time-dependent Effects

The model may also be generalized to allow for effects that vary over time, and therefore are no longer proportional. It is quite possible, for example, that certain social characteristics might have a large impact on the hazard for children shortly after birth, but may have a relatively small impact later in life. To accommodate such models we may write

\[ \lambda_i(t,\boldsymbol{x}_i) = \lambda_0(t) \exp \{ \boldsymbol{x}_i'\boldsymbol{\beta}(t)\}, \]

where the parameter \( \boldsymbol{\beta}(t) \) is now a function of time.

This model allows for great generality. In the two-sample case, for example, the model may be written as

\[ \lambda_i(t|x) = \left\{ \begin{array}{ll} \lambda_0(t)& \mbox{if $x=0$} \cr \lambda_0(t) e^\beta(t)& \mbox{if $x=1$}\end{array} \right., \]

which basically allows for two arbitrary hazard functions, one for each group. Thus, this is a form of saturated model.

Usually the form of time dependence of the effects must be specified parametrically in order to be able to identify the model and estimate the parameters. Obvious candidates are polynomials on duration, where \( \beta(t) \) is a linear or quadratic function of time. Cox and Oakes (1984, p. 76) show how one can use quick-dampening exponentials to model transient effects.

Note that we have lost again the simple separation of time and covariate effects. Calculation of the survival function in this model is again somewhat complicated by the fact that the coefficients are now functions of time, so they don’t fall out of the integral. The simple Equation 7.11 does not apply.

7.3.6 The General Hazard Rate Model

The foregoing extensions to time-varying covariates and time-dependent effects may be combined to give the most general version of the hazard rate model, as

\[ \lambda_i(t,\boldsymbol{x}_i(t)) = \lambda_0(t) \exp\{ \boldsymbol{x}_i(t)'\boldsymbol{\beta}(t) \}, \]

where \( \boldsymbol{x}_i(t) \) is a vector of time-varying covariates representing the characteristics of individual \( i \) at time \( t \), and \( \boldsymbol{\beta}(t) \) is a vector of time-dependent coefficients, representing the effect that those characteristics have at time or duration \( t \).

The case of breastfeeding status and its effect on the length of birth intervals is a good example that combines the two effects. Breastfeeding status is itself a time-varying covariate \( x(t) \), which takes the value one if the woman is breastfeeding her child \( t \) months after birth. The effect that breastfeeding may have in inhibiting ovulation and therefore reducing the risk of pregnancy is known to decline rapidly over time, so it should probably be modeled as a time dependent effect \( \beta(t) \). Again, further progress would require specifying the form of this function of time.

7.3.7 Model Fitting

There are essentially three approaches to fitting survival models:

A complete discussion of these approaches in well beyond the scope of these notes. We will focus on the intermediate or semi-parametric approach because (1) it is sufficiently flexible to provide a useful tool with wide applicability, and (2) it is closely related to Poisson regression analysis.

Math rendered by