Germán Rodríguez
Demographic Methods Princeton University

Model Mortality Schedules

The state-of the art model mortality schedules are the modified logit relational system of Murray et al (2003) and the log-quadratic model of Wilmoth et al (2012). Let’s see how well they do with the life table in Box 3.1 of the textbook, for Austrian males in 1992.

The Code

I wrote some code to facilitate fitting these models in Stata and R, so the first step will be to read it directly from GitHub:

. clear mata

. quietly do https://grodri.github.io/demography/mrlogit.mata

. quietly do https://grodri.github.io/demography/logquad.mata
> library(dplyr)
> library(ggplot2)
> source("https://grodri.github.io/demography/mrlogit.R")
> source("https://grodri.github.io/demography/logquad.R")

The Data

I have also saved the complete life table in a plain text format as austria92, so you can easily read it in Stata or R.

. import delimited using https://grodri.github.io/datasets/austria92.dat, ///
>   delim(" ", collapse) varn(1) case(preserve) clear
(encoding automatically selected: ISO-8859-1)
(15 vars, 19 obs)

. quietly replace lx = lx/100000 //!

. mata b31 = st_data(., ("age", "a", "q", "m", "lx"))

. mata b31[1::6,]
                 1             2             3             4             5
    ┌───────────────────────────────────────────────────────────────────────┐
  1 │            0   .0680000037   .0086719999   .0087430002             1  │
  2 │            1   1.626000047       .001479        .00037   .9913278222  │
  3 │            5           2.5       .000766       .000153   .9898614287  │
  4 │           10   3.142999887       .000963       .000193      .9891029  │
  5 │           15   2.723999977   .0048719998       .000976   .9881504774  │
  6 │           20   2.519999981   .0064050001       .001285   .9833365679  │
    └───────────────────────────────────────────────────────────────────────┘
> library(dplyr)
> b31 <- read.table("https://grodri.github.io/datasets/austria92.dat", header=TRUE)
> b31x <- mutate(b31, lx=lx/100000) |> select(age, a, q, m, lx)
> head(b31x)
  age     a        q        m        lx
1   0 0.068 0.008672 0.008743 1.0000000
2   1 1.626 0.001479 0.000370 0.9913278
3   5 2.500 0.000766 0.000153 0.9898614
4  10 3.143 0.000963 0.000193 0.9891029
5  15 2.724 0.004872 0.000976 0.9881505
6  20 2.520 0.006405 0.001285 0.9833366

The Modified Logit System

You may want to try taking logits of the survival function and plotting it against the Brass standard. The relationship doesn’t look very linear. So let’s try instead the modified transformation, which uses corrections based on survival to ages 5 and 60.

. mata zx = mrlogit_zx(b31[2::19, 5], "male") //transform data

. mata zs = mrlogit_zs("male")                // transform standard

. getmata zx, force replace

. getmata zs, force replace

. // scatter zx zs
. quietly regress zx zs                       // fit by ols

. predict zfit
(option xb assumed; fitted values)
(1 missing value generated)

. mata b = st_matrix("e(b)") // constant last!

. mata lfit = mrlogit_lx(b[2], b[1], "male")

. // might want to plot
> zx <- mrlogit_zx(b31x$lx[-1], "male") # transform data
> zs <- mrlogit_zs("male")              # transform standard
> # data.frame(zx, zs) |>
> #  ggplot(aes(zs, zx)) + geom_point() + geom_smooth(method="lm")
> b <- coef(lm(zx ~ zs))               # ols fit
> lfit<- mrlogit_lx(b[1], b[2], "male")   # convert to survival
> # data.frame(age=b31$age, lx=b31$lx/100000, fit=c(1, lfit)) |>
> #  ggplot(aes(age, lx)) + geom_point() + geom_line(aes(age, fit))

Of course the fit is going to look good on the survival scale, but let’s get a closer look by computing mortality rates. I first compute death probabilities and then convert to rates with the same assumptions used to construct the table.

. mata qfit = 1 :- (lfit\0) :/ (1\lfit)      // convert to probabilities

. mata n = 1\ 4\ J(17, 1, 5)

. mata mfit = qfit :/ (n - qfit :* (n - b31[,2]))     // and then to rates    

. getmata mfit, replace

. twoway scatter m age, yscale(log) ///      // plot
>   ||   line mfit age, yscale(log) legend(off)

. graph export mrlogits.png, width(500) replace
file mrlogits.png saved as PNG format

> qfit <- 1 - c(lfit, 0)/c(1, lfit)  # death probabilities
> a <- b31$a                         # nax
> n <- c(1, 4, rep(5,17))            # width of intervals
> mfit <- qfit/(n - qfit*(n - a))    # probs to rates
> data.frame(age = b31$agem, mobs = b31$m, mfit) |>
+ ggplot(aes(age, log(mobs))) + geom_point() +
+   geom_line(aes(age,log(mfit)))
> ggsave("mrlogit.png", width=500/72, height=400/72, dpi=72)

The model gets infant mortality right and does an excellent job at age 30 and above, but has a bit of trouble in the 5 to 25 range.

The Log-Quadratic System

Because this model is non-linear on the parameters we need to use an optimization routine, R’s optim() or Stata’s optimize(). In both cases we write a simple “residual sum of squares” function. Our function is “hardwired” to use b31 and the male model. Note that the data go up to age 85+ but the model continues to 110+. For simplicity I will only use ages 0 to 85 and skip the open-ended group. I use as starting values log (1q0) and zero.

. mata:
───────────────────────────────────────────────── mata (type end to exit) ──────────────────────────────
:   obs = log(b31[1::18, 4])        

:   void rss(todo, b, f, g, H) {
>     external obs
>     fit = logquad(b[1], b[2], "male")[1::18]
>     f = -sum((obs - fit):^2)
>   }
note: argument todo unused.
note: argument g unused.
note: argument H unused.

:   S = optimize_init()

:   optimize_init_evaluator(S, &rss())

:   h = log(1 - b31[3, 5])  // log(5q0)

:   optimize_init_params(S, (h, .2))

:   optimize(S)
Iteration 0:   f(p) = -.42176202  
Iteration 1:   f(p) = -.28965267  
Iteration 2:   f(p) = -.28942899  
Iteration 3:   f(p) = -.28942899  
                  1              2
    ┌───────────────────────────────┐
  1 │  -4.801873991    .3079241028  │
    └───────────────────────────────┘

: end
────────────────────────────────────────────────────────────────────────────────────────────────────────
> obs <- log(b31$m[-19])
> rss <- function(b) {
+   fit <- logquad(b[1], b[2], "male")$logmx[1:18]
+   sum((obs - fit)^2)
+ }
> b <- c(log(1- b31$lx[3]/100000), 0)
> r <- optim(b, rss)
> r    
$par
[1] -4.8019124  0.3079723

$value
[1] 0.289429

$counts
function gradient 
      55       NA 

$convergence
[1] 0

$message
NULL

The optimization procedure has converged. All we need now is to plot the result.

. mata b = optimize_result_params(S)

. mata lqfit = logquad(b[1], b[2], "male")[1::18]

. getmata lqfit, force replace

. gen logm = log(m)

. twoway scatter logm age ///
>   ||  line lqfit age,  legend(off)

. graph export logquads.png, width(500) replace
file logquads.png saved as PNG format

> fit <- logquad(r$par[1], r$par[2], "male")$logmx[1:18]
> lqfit <- data.frame(age = b31$agem[-19], obs, fit)
> p <- ggplot(lqfit, aes(age, obs)) + geom_point() + geom_line(aes(age, fit))
> ggsave("logquad.png", plot=p, width=500/72, height=400/72, dpi=72)

The fit is pretty good overall but, not unlike the previous model, it seems to have some difficulty at ages 10 to 25.

Comparison

Here is a graph that overlays the two fits

. gen lmfit = log(mfit)

. twoway scatter logm age ///
>   ||  line lmfit age , lpat(dash) ///
>   ||  line lqfit age , legend(ring(0) pos(5) cols(1) ///
>       order(2 "Modified Logit" 3 "Log Quadratic"))

. graph export mortmods.png, width(500) replace
file mortmods.png saved as PNG format

> fits <- mutate(lqfit, fitm = log(mfit[-19]))
> p <- ggplot(fits, aes(age, obs)) + geom_point() +  
+   geom_line(aes(age, fit, color="c1")) + 
+   geom_line(aes(age, fitm, color="c2")) +
+   scale_color_manual(name="Model", values=c("c1" = "blue", "c2" = "red"), 
+   labels=c("Log-quadratic", "Modified logit")) + 
+   theme(legend.position =c(.9, .12))
> ggsave("mortmodsr.png", plot=p, width=500/72, height=400/72, dpi=72)

There seems to be some room for improvement, but in my opinion it will be hard to do much better without adding a third parameter.