Let’s have another look at the recidivism data. We will split duration into single years with an open-ended category at 5+ and fit a piecewise exponential model with the same covariates as Wooldridge.
We will then treat the data as discrete, assuming that all we know is that recidivism occured somewhere in the year. We will fit a binary data model with a logit link, which corresponds to the discrete time model, and using a complementary-log-log link, which corresponds to a grouped continuous time model.
. use https://www.stata.com/data/jwooldridge/eacsap/recid, clear . gen fail = 1-cens . gen id = _n . stset durat, fail(fail) id(id) Survival-time data settings ID variable: id Failure event: fail!=0 & fail<. Observed time interval: (durat[_n-1], durat] Exit on or before: failure ────────────────────────────────────────────────────────────────────────── 1,445 total observations 0 exclusions ────────────────────────────────────────────────────────────────────────── 1,445 observations remaining, representing 1,445 subjects 552 failures in single-failure-per-subject data 80,013 total analysis time at risk and under observation At risk from t = 0 Earliest observed entry t = 0 Last observed exit t = 81 . stsplit year, at(12 24 36 48 60 100) // max is 81 (5,273 observations (episodes) created) . replace year = year/12 (5,273 real changes made) . local x workprg priors tserved felon alcohol drugs /// > black married educ age . streg i.year `x', distribution(exponential) nohr Failure _d: fail Analysis time _t: durat ID variable: id Iteration 0: log likelihood = -1739.8944 Iteration 1: log likelihood = -1653.6678 Iteration 2: log likelihood = -1587.2575 Iteration 3: log likelihood = -1583.7542 Iteration 4: log likelihood = -1583.7129 Iteration 5: log likelihood = -1583.7129 Exponential PH regression No. of subjects = 1,445 Number of obs = 6,718 No. of failures = 552 Time at risk = 80,013 LR chi2(15) = 312.36 Log likelihood = -1583.7129 Prob > chi2 = 0.0000 ─────────────┬──────────────────────────────────────────────────────────────── _t │ Coefficient Std. err. z P>|z| [95% conf. interval] ─────────────┼──────────────────────────────────────────────────────────────── year │ 1 │ .036532 .1093659 0.33 0.738 -.1778212 .2508851 2 │ -.3738156 .1296172 -2.88 0.004 -.6278607 -.1197706 3 │ -.8115436 .1564067 -5.19 0.000 -1.118095 -.5049921 4 │ -.9382311 .1683272 -5.57 0.000 -1.268146 -.6083159 5 │ -1.547178 .2033594 -7.61 0.000 -1.945755 -1.148601 │ workprg │ .0838291 .0907983 0.92 0.356 -.0941324 .2617906 priors │ .0872458 .0134763 6.47 0.000 .0608327 .113659 tserved │ .0130089 .0016863 7.71 0.000 .0097039 .0163139 felon │ -.2839252 .1061534 -2.67 0.007 -.491982 -.0758685 alcohol │ .4324425 .1057254 4.09 0.000 .2252245 .6396605 drugs │ .2747141 .0978667 2.81 0.005 .0828989 .4665293 black │ .4335559 .0883658 4.91 0.000 .2603622 .6067497 married │ -.1540477 .1092154 -1.41 0.158 -.3681059 .0600104 educ │ -.0214162 .0194453 -1.10 0.271 -.0595283 .016696 age │ -.00358 .0005223 -6.85 0.000 -.0046037 -.0025563 _cons │ -3.830127 .280282 -13.67 0.000 -4.37947 -3.280785 ─────────────┴──────────────────────────────────────────────────────────────── . estimates store pwe
> library(haven) > library(survival) > library(dplyr) > recid <- read_dta("https://www.stata.com/data/jwooldridge/eacsap/recid.dta") > recid$fail <- 1 - recid$cens > recidx <- survSplit(recid, cut = seq(12, 60, 12), + start = "t0", end = "durat", event = "fail", episode = "interval") > labels <- paste("(",seq(0,60,12),",",c(seq(12,60,12),81), "]",sep="") > recidx <- mutate(recidx, + exposure = durat - t0, + interval = factor(interval + 1, labels = labels)) > mf <- fail ~ interval + workprg + priors + tserved + felon + + alcohol + drugs + black + married + educ + age > pwe <- glm(mf, offset = log(exposure), data = recidx, family = poisson) > coef(summary(pwe)) Estimate Std. Error z value Pr(>|z|) (Intercept) -3.830127469 0.280267334 -13.6659789 1.621090e-42 interval(12,24] 0.036531989 0.109361775 0.3340471 7.383440e-01 interval(24,36] -0.373815644 0.129611909 -2.8841150 3.925154e-03 interval(36,48] -0.811543632 0.156401452 -5.1888497 2.115971e-07 interval(48,60] -0.938231113 0.168321156 -5.5740534 2.488794e-08 interval(60,81] -1.547177936 0.203348918 -7.6084886 2.773196e-14 workprg 0.083829106 0.090794162 0.9232874 3.558575e-01 priors 0.087245826 0.013473463 6.4753825 9.457203e-11 tserved 0.013008862 0.001685901 7.7162667 1.197865e-14 felon -0.283925203 0.106148770 -2.6747856 7.477705e-03 alcohol 0.432442493 0.105721133 4.0904073 4.306163e-05 drugs 0.274714115 0.097863462 2.8071162 4.998720e-03 black 0.433555955 0.088362277 4.9065729 9.268154e-07 married -0.154047742 0.109211869 -1.4105403 1.583802e-01 educ -0.021416177 0.019444026 -1.1014271 2.707108e-01 age -0.003580003 0.000522249 -6.8549738 7.132557e-12
The estimates are very close to those obtained using a Cox model.
For a discrete-time survival analysis we have to make sure we only include intervals with complete exposure, where we can classify the outcome as failure or survival. The convicts were released between July 1, 1977 and June 30, 1978 and the data were collected in April 1984, so the length of observation ranges between 70 and 81 months. We therefore restrict our attention to 5 years or 60 months. (We could go up to 6 years or 72 months for some convicts, but unfortunately we don’t have the date of release, so we can’t identify these cases and must censor everyone at 60.)
. drop if _t0 >= 60 (921 observations deleted) . logit _d i.year `x' Iteration 0: log likelihood = -1759.0583 Iteration 1: log likelihood = -1654.9242 Iteration 2: log likelihood = -1637.1916 Iteration 3: log likelihood = -1637.1267 Iteration 4: log likelihood = -1637.1267 Logistic regression Number of obs = 5,797 LR chi2(14) = 243.86 Prob > chi2 = 0.0000 Log likelihood = -1637.1267 Pseudo R2 = 0.0693 ─────────────┬──────────────────────────────────────────────────────────────── _d │ Coefficient Std. err. z P>|z| [95% conf. interval] ─────────────┼──────────────────────────────────────────────────────────────── year │ 1 │ .0305282 .1193583 0.26 0.798 -.2034098 .2644661 2 │ -.4131403 .1384065 -2.98 0.003 -.6844119 -.1418686 3 │ -.8641487 .1639958 -5.27 0.000 -1.185575 -.5427229 4 │ -.9936625 .1756322 -5.66 0.000 -1.337895 -.6494297 │ workprg │ .1109887 .1003087 1.11 0.269 -.0856129 .3075902 priors │ .0992921 .0164654 6.03 0.000 .0670205 .1315636 tserved │ .0149221 .0021429 6.96 0.000 .0107221 .0191222 felon │ -.3196621 .1178117 -2.71 0.007 -.5505687 -.0887555 alcohol │ .4724998 .1184177 3.99 0.000 .2404055 .7045941 drugs │ .316729 .1086092 2.92 0.004 .1038589 .5295992 black │ .4580275 .0973977 4.70 0.000 .2671315 .6489235 married │ -.2048073 .1204593 -1.70 0.089 -.4409032 .0312885 educ │ -.0267259 .0215052 -1.24 0.214 -.0688754 .0154235 age │ -.0040231 .000584 -6.89 0.000 -.0051678 -.0028784 _cons │ -1.140803 .3084159 -3.70 0.000 -1.745287 -.5363185 ─────────────┴──────────────────────────────────────────────────────────────── . estimates store logit
> recidx <- filter(recidx, interval != "(60,81]") > logit <- glm(mf, data = recidx, family = binomial) # no offset > coef(summary(logit)) Estimate Std. Error z value Pr(>|z|) (Intercept) -1.140802599 0.3084159337 -3.6989094 2.165279e-04 interval(12,24] 0.030528163 0.1193582701 0.2557692 7.981291e-01 interval(24,36] -0.413140262 0.1384064532 -2.9849783 2.835984e-03 interval(36,48] -0.864148699 0.1639957690 -5.2693353 1.369186e-07 interval(48,60] -0.993662524 0.1756321916 -5.6576332 1.534747e-08 workprg 0.110988653 0.1003087410 1.1064704 2.685230e-01 priors 0.099292063 0.0164653717 6.0303566 1.635983e-09 tserved 0.014922136 0.0021429307 6.9634244 3.320994e-12 felon -0.319662098 0.1178116529 -2.7133318 6.661038e-03 alcohol 0.472499810 0.1184176515 3.9901130 6.604183e-05 drugs 0.316729032 0.1086092071 2.9162264 3.542934e-03 black 0.458027506 0.0973977193 4.7026512 2.568049e-06 married -0.204807338 0.1204592720 -1.7002206 8.908944e-02 educ -0.026725931 0.0215052145 -1.2427651 2.139544e-01 age -0.004023087 0.0005840427 -6.8883431 5.644594e-12
We will compare all estimates below.
Finally we use a complementary log-log link
. glm _d i.year `x', family(binomial) link(cloglog) Iteration 0: log likelihood = -1818.8831 Iteration 1: log likelihood = -1640.7899 Iteration 2: log likelihood = -1637.5235 Iteration 3: log likelihood = -1637.5083 Iteration 4: log likelihood = -1637.5083 Generalized linear models Number of obs = 5,797 Optimization : ML Residual df = 5,782 Scale parameter = 1 Deviance = 3275.016541 (1/df) Deviance = .5664159 Pearson = 5908.117581 (1/df) Pearson = 1.021812 Variance function: V(u) = u*(1-u) [Bernoulli] Link function : g(u) = ln(-ln(1-u)) [Complementary log-log] AIC = .5701253 Log likelihood = -1637.50827 BIC = -46826.57 ─────────────┬──────────────────────────────────────────────────────────────── │ OIM _d │ Coefficient std. err. z P>|z| [95% conf. interval] ─────────────┼──────────────────────────────────────────────────────────────── year │ 1 │ .0216127 .1095455 0.20 0.844 -.1930925 .2363179 2 │ -.3926148 .1296978 -3.03 0.002 -.6468179 -.1384117 3 │ -.8249973 .1564479 -5.27 0.000 -1.131629 -.5183651 4 │ -.9483385 .1683997 -5.63 0.000 -1.278396 -.6182811 │ workprg │ .1044651 .0932517 1.12 0.263 -.0783049 .2872351 priors │ .0887063 .0139849 6.34 0.000 .0612964 .1161163 tserved │ .013267 .0017417 7.62 0.000 .0098534 .0166806 felon │ -.2885449 .1091355 -2.64 0.008 -.5024465 -.0746433 alcohol │ .4397795 .1090665 4.03 0.000 .226013 .6535459 drugs │ .2991025 .1002774 2.98 0.003 .1025625 .4956425 black │ .4272096 .0909458 4.70 0.000 .248959 .6054602 married │ -.1830403 .1137539 -1.61 0.108 -.405994 .0399133 educ │ -.0233346 .0201545 -1.16 0.247 -.0628367 .0161674 age │ -.003851 .0005466 -7.04 0.000 -.0049224 -.0027796 _cons │ -1.238797 .2893845 -4.28 0.000 -1.80598 -.6716138 ─────────────┴──────────────────────────────────────────────────────────────── . estimates store cloglog
> cloglog <- glm(mf, data = recidx, family = binomial(link = cloglog)) > coef(summary(cloglog)) Estimate Std. Error z value Pr(>|z|) (Intercept) -1.238795113 0.2893607427 -4.2811444 1.859347e-05 interval(12,24] 0.021613951 0.1095604758 0.1972787 8.436094e-01 interval(24,36] -0.392613793 0.1297681301 -3.0255024 2.482204e-03 interval(36,48] -0.824996440 0.1566132100 -5.2677321 1.381194e-07 interval(48,60] -0.948338328 0.1684247392 -5.6306356 1.795467e-08 workprg 0.104466422 0.0934228228 1.1182109 2.634769e-01 priors 0.088706984 0.0145113085 6.1129556 9.780261e-10 tserved 0.013266906 0.0018142064 7.3127875 2.616567e-13 felon -0.288542238 0.1096491770 -2.6315039 8.500789e-03 alcohol 0.439780479 0.1090998881 4.0309893 5.554258e-05 drugs 0.299102966 0.1003895869 2.9794222 2.887925e-03 black 0.427210098 0.0910947168 4.6897352 2.735589e-06 married -0.183040394 0.1136073451 -1.6111669 1.071433e-01 educ -0.023334468 0.0202349457 -1.1531767 2.488379e-01 age -0.003851008 0.0005486362 -7.0192372 2.230827e-12
All that remains is to compare the estimates
. estimates table pwe cloglog logit, eq(1:1:1) ─────────────┬─────────────────────────────────────── Variable │ pwe cloglog logit ─────────────┼─────────────────────────────────────── year │ 1 │ .03653199 .02161269 .03052816 2 │ -.37381564 -.39261481 -.41314026 3 │ -.81154363 -.82499726 -.8641487 4 │ -.93823111 -.94833849 -.99366251 5 │ -1.5471779 │ workprg │ .0838291 .1044651 .11098865 priors │ .08724582 .08870634 .09929206 tserved │ .01300886 .01326703 .01492214 felon │ -.28392523 -.28854488 -.3196621 alcohol │ .43244249 .43977945 .47249981 drugs │ .27471411 .29910246 .31672903 black │ .43355595 .42720963 .4580275 married │ -.15404774 -.18304032 -.20480734 educ │ -.02141618 -.02333462 -.02672593 age │ -.00358 -.00385099 -.00402309 _cons │ -3.8301275 -1.238797 -1.1408026 ─────────────┴───────────────────────────────────────
> cbind(coef(pwe)[-6], coef(cloglog), coef(logit)) [,1] [,2] [,3] (Intercept) -3.830127469 -1.238795113 -1.140802599 interval(12,24] 0.036531989 0.021613951 0.030528163 interval(24,36] -0.373815644 -0.392613793 -0.413140262 interval(36,48] -0.811543632 -0.824996440 -0.864148699 interval(48,60] -0.938231113 -0.948338328 -0.993662524 workprg 0.083829106 0.104466422 0.110988653 priors 0.087245826 0.088706984 0.099292063 tserved 0.013008862 0.013266906 0.014922136 felon -0.283925203 -0.288542238 -0.319662098 alcohol 0.432442493 0.439780479 0.472499810 drugs 0.274714115 0.299102966 0.316729032 black 0.433555955 0.427210098 0.458027506 married -0.154047742 -0.183040394 -0.204807338 educ -0.021416177 -0.023334468 -0.026725931 age -0.003580003 -0.003851008 -0.004023087
As one would expect, the estimates of the relative risks based on the c-log-log link are closer to the continuous time estimates than those based on the logit link.
This result makes sense because the piece wise exponential and c-log-log link models are estimating the same continuous time hazard, one from continuous and one from grouped data, while the logit model is estimating a discrete time hazard.
Recall that in a continuous time model the relative risk multiplies the hazard or instantaneous failure rate, whereas in a discrete time logit model it multiplies the conditional odds of failure at a given time (or in a given time interval) given survival to that time (or the start of the interval). Interpretation of the results should take this fact into account.
All three approaches, however, lead to similar predicted survival probabilities.