The course is organized around five topics listed below. The materials for each topic usually include a handout in PDF format and one or more computing logs in HTML format showing how to do the relevant calculations in R and Stata. You can use the table of contents to jump directly to each computing log.
The beamer slides used in class in the Spring of 2
018 are available here. A bundle with all slides arranged four to a
page in 2x2 layout is available here.
This half-course offered in the first half of the spring term focuses on the statistical analysis of time-to-event or survival data. We introduce the hazard and survival functions; censoring mechanisms, parametric and non-parametric estimation, and comparison of survival curves. We cover continuous and discrete-time regression models with emphasis on Cox’s proportional hazards model and partial likelihood estimation. We discuss competing risk models, unobserved heterogeneity, and multivariate survival models including event history analysis.
The course emphasizes basic concepts and techniques, as well as applications in social science research using R or Stata. Prerequisite: Generalized Linear Models or equivalent. For a more detailed description of the course, including a list of topics covered each week, see the syllabus, available also in printer-friendly PDF.
Materials for week 1 include a handhout on
Parametric Survival Models,
a plot of the 2013 U.S. survival and hazard functions, and a
computing log fitting log-normal and weibull parametric models to
recidivism data.
Weeks 2 and 3 are devoted to
Non-parametric Estimation in Survival Models.
Computing materials include a
log applying Kaplan-Meier and Mantel-Haenzsel, and a log
fitting Cox’s proportional hazards model to a two-group comparison.
See also this application of Cox regression to the recidivism data.
We compare flexible discrete and continuous time models fit to the
same data. By popular demand we have added an example fitting
splines in a piecewise exponential model.
Week 4 deals with
Competing Risks, the
analysis of survival time when there are multiple causes of failure.
Additional materials include a discussion of
cumulative incidence,
and Fine and Gray’s competing risk model. The computing logs apply these
ideas to the tenure of U.S. Supreme Court justices, including estimating
cumulative incidence functions and a nice status plot,
fitting a Cox model of competing risks, and fitting
the Fine and Gray model. We close with a competing risk
simulation.
In week 5 we tackle
Unobserved Heterogeneity,
discussing univariate frailty models and the identification problem,
including very useful formulas for converting back and forth between
subject-specific and population-average hazards. Illustrations include
two shiny apps, one shows frailty acting on
proportional hazards, and another shows how heterogeneity
can undo a mortality crossover.
Week 6 is devoted to
Multivariate Survival,
where we review various approaches to the analysis of multiple-spell
survival data, focusing on shared-frailty models. Don’t miss the
computing handouts fitting shared frailty models to child survival data
from Guatemala, we fit a piecewise exponential model using
Stata and a Cox model using R. We also have a discussion of model
interpretation via post-estimation, including computation of survival
probabilities, and a closing note on log-normal frailty.
Note. The computing logs were all produced using the markstat
command to combine
Markdown with Stata and R, as described here,
see also this example.
All the scripts are available on GitHub, just follow the link on each page.
Updated fall 2022