The course is organized around five topics listed below. The materials for each topic usually include a handout in PDF format and one or more computing logs in HTML format showing how to do the relevant calculations in R and Stata. You can use the table of contents to jump directly to each computing log.
The beamer slides used in class in the Spring of 2 018 are available here. A bundle with all slides arranged four to a page in 2x2 layout is available here.
This half-course offered in the first half of the spring term focuses on the statistical analysis of time-to-event or survival data. We introduce the hazard and survival functions; censoring mechanisms, parametric and non-parametric estimation, and comparison of survival curves. We cover continuous and discrete-time regression models with emphasis on Cox’s proportional hazards model and partial likelihood estimation. We discuss competing risk models, unobserved heterogeneity, and multivariate survival models including event history analysis.
The course emphasizes basic concepts and techniques, as well as applications in social science research using R or Stata. Prerequisite: Generalized Linear Models or equivalent. For a more detailed description of the course, including a list of topics covered each week, see the syllabus, available also in printer-friendly PDF.
Materials for week 1 include a handhout on Parametric Survival Models, a plot of the 2013 U.S. survival and hazard functions, and a computing log fitting log-normal and weibull parametric models to recidivism data.
Weeks 2 and 3 are devoted to Non-parametric Estimation in Survival Models. Computing materials include a log applying Kaplan-Meier and Mantel-Haenzsel, and a log fitting Cox’s proportional hazards model to a two-group comparison. See also this application of Cox regression to the recidivism data. We compare flexible discrete and continuous time models fit to the same data. By popular demand we have added an example fitting splines in a piecewise exponential model.
Week 4 deals with Competing Risks, the analysis of survival time when there are multiple causes of failure. Additional materials include a discussion of cumulative incidence, and Fine and Gray’s competing risk model. The computing logs apply these ideas to the tenure of U.S. Supreme Court justices, including estimating cumulative incidence functions and a nice status plot, fitting a Cox model of competing risks, and fitting the Fine and Gray model. We close with a competing risk simulation.
In week 5 we tackle Unobserved Heterogeneity, discussing univariate frailty models and the identification problem, including very useful formulas for converting back and forth between subject-specific and population-average hazards. Illustrations include two shiny apps, one shows frailty acting on proportional hazards, and another shows how heterogeneity can undo a mortality crossover.
Week 6 is devoted to Multivariate Survival, where we review various approaches to the analysis of multiple-spell survival data, focusing on shared-frailty models. Don’t miss the computing handouts fitting shared frailty models to child survival data from Guatemala, we fit a piecewise exponential model using Stata and a Cox model using R. We also have a discussion of model interpretation via post-estimation, including computation of survival probabilities, and a closing note on log-normal frailty.
Note. The computing logs were all produced using the markstat
command to combine
Markdown with Stata and R, as described here,
see also this example.
All the scripts are available on GitHub, just follow the link on each page.
Updated fall 2022