Germán Rodríguez
Demographic Methods Princeton University ## Competing Risks

Let us work through the example in the textbook on multiple decrement and associated single decrement life tables, see Boxes 4.1 and 4.2 on pages 77 and 85.

Rather than just repeat all the calculations, however, I will illustrate a simpler approach to cause-deleted tables, based on the assumption that cause-specific hazards are constant within each age interval. This approach gives almost the same results as the textbook, yet allows us to concentrate on the underlying concepts.

We start by reading the data, which have deaths for all causes and from neoplasms for U.S. females in 1991. I also include the lx and nax columns.

```. set type double

. infile age D Di lx a using ///
>   https://grodri.github.io/datasets/preston41.dat, clear
```
```> url = "https://grodri.github.io/datasets/preston41.dat"
> names(b41) <- c("age","D","Di","lx","a")
```

### Multiple Decrements

The first task is to estimate mortality due to neoplasms in the presence of other causes. We start by estimating overall survival using conventional life table techniques. Then we compute conditional probabilities of death. (These usually would have been computed on the way to lx.)

Many calculations need special handling for the first or last entry. I take advantage that these are set to NA and encapsulate the fix in a function `edit.na()`

```. gen q = 1 - lx[_n + 1]/lx
(1 missing value generated)

. replace q = 1 in -1
```
```> library(dplyr)
> edit.na <- function(x, value) { x[is.na(x)] <- value; x}
> b41 <- mutate(b41,
+   q = edit.na(1 - lead(lx)/lx, 1)) # so q[last]=1
```

The conditional probability of dying of a given cause given survival to the age group is easy to obtain, we just multiply the overall probability by the ratio of deaths of a given cause to all deaths:

```. gen qi = q * Di/D
```
```> b41 <- mutate(b41, qi = q * Di/D)
```

The unconditional counts of deaths of any cause and of a given cause are calculated multiplying by the number surviving to the start of each age group, which is lx. Recall that to die of cause i in the interval [x, x+n) one must survive all causes up to age x.

```. gen d = lx * q

. gen di = lx * qi
```
```> b41 <- mutate(b41, d = lx * q, di = lx * qi)
```

Let us print our results

```. format %8.5f q qi

. format %8.0fc lx d di

. list age lx q qi d di

┌────────────────────────────────────────────────────┐
│ age        lx         q        qi        d      di │
├────────────────────────────────────────────────────┤
1. │   0   100,000   0.00783   0.00003      783       3 │
2. │   1    99,217   0.00168   0.00015      167      14 │
3. │   5    99,050   0.00092   0.00015       91      15 │
4. │  10    98,959   0.00090   0.00012       89      12 │
5. │  15    98,870   0.00236   0.00019      233      19 │
├────────────────────────────────────────────────────┤
6. │  20    98,637   0.00262   0.00025      258      24 │
7. │  25    98,379   0.00314   0.00041      309      41 │
8. │  30    98,070   0.00425   0.00087      417      85 │
9. │  35    97,653   0.00584   0.00171      570     167 │
10. │  40    97,083   0.00818   0.00314      794     304 │
├────────────────────────────────────────────────────┤
11. │  45    96,289   0.01330   0.00582    1,281     561 │
12. │  50    95,008   0.02095   0.00963    1,990     915 │
13. │  55    93,018   0.03371   0.01547    3,136   1,439 │
14. │  60    89,882   0.05155   0.02191    4,633   1,969 │
15. │  65    85,249   0.07669   0.02920    6,538   2,489 │
├────────────────────────────────────────────────────┤
16. │  70    78,711   0.11552   0.03696    9,093   2,909 │
17. │  75    69,618   0.17427   0.04439   12,132   3,091 │
18. │  80    57,486   0.27363   0.05012   15,730   2,881 │
19. │  85    41,756   1.00000   0.10212   41,756   4,264 │
└────────────────────────────────────────────────────┘

. quietly sum(di)

. di r(sum)
21204.543
```
```> select(b41, lx, q, qi, d, di) # rounding!
lx            q           qi     d         di
1  100000 0.0078300000 0.0000313041   783    3.13041
2   99217 0.0016831793 0.0001460632   167   14.49195
3   99050 0.0009187279 0.0001506849    91   14.92534
4   98959 0.0008993624 0.0001240697    89   12.27781
5   98870 0.0023566299 0.0001894838   233   18.73426
6   98637 0.0026156513 0.0002468693   258   24.35044
7   98379 0.0031409142 0.0004142078   309   40.74935
8   98070 0.0042520649 0.0008677315   417   85.09843
9   97653 0.0058369943 0.0017141651   570  167.39336
10  97083 0.0081785689 0.0031350391   794  304.35900
11  96289 0.0133037003 0.0058245125  1281  560.83648
12  95008 0.0209456046 0.0096319746  1990  915.11464
13  93018 0.0337139048 0.0154730071  3136 1439.26817
14  89882 0.0515453595 0.0219109626  4633 1969.40114
15  85249 0.0766929817 0.0291963905  6538 2488.96309
16  78711 0.1155238785 0.0369608321  9093 2909.22406
17  69618 0.1742652762 0.0443948299 12132 3090.67927
18  57486 0.2736318408 0.0501203389 15730 2881.21780
19  41756 1.0000000000 0.1021249119 41756 4264.32782
```

We obtain a total of 21,205 life table deaths due to neoplasms. Thus, the probability that a female will die of this cause under the age-cause-specific rates prevailing in the U.S. in 1991 is 21.2%. Can you compute the probability that a woman who survives to age 50 will die of neoplasms?

In preparation for the next part, note that if we had nmx and we were willing to assume that the hazard is constant in each age group we would have had a slightly different estimate of the survival function. Let us “back out” the rates from the probabilities:

```. gen n = age[_n + 1] - age
(1 missing value generated)

. gen m = q/(n - q * (n - a))
(1 missing value generated)

. replace m = 1/a in -1
```
```> b41 <- mutate(b41, n = c(diff(age),NA),
+   m =  edit.na( q/(n - q * (n - a)), 1/tail(a,1))) # m[last] = 1/a[last]
```

With these rates we compute the cumulative hazard and survival as

```. gen H = sum(n * m)

. gen S = 1

. replace S = exp(-H[_n - 1]) in 2/-1
```
```> b41 <- mutate(b41, H = cumsum(n * m),
+   S = edit.na(exp(-lag(H)), 1)) # S = 1
```

The result differs from the survival function in the text by less than 0.1% in all age groups except the last two, where the difference is 0.2 and 0.6% respectively. I think this is a small price to pay for simplicity.

### Cause-Deleted Life Tables (I)

The next question is how female mortality would look if deaths due to neoplasms could be avoided. The honest answer, of course, is that we don’t know. But assuming independence of the underlying risks we can do some calculations.

Let me start with the simpler approach assuming constant risks in each age group. We compute cause-specific rates by dividing deaths of a given cause into person-years of exposure, which is equivalent to multiplying the overall rate by the ratio of deaths of a given cause to the total. Here we want deaths for causes other than neoplasms. I will use the subscript d for deleted:

```. gen Rd = (D - Di)/D

. gen md = m * Rd
```
```> b41 <- mutate(b41, Rd = (D - Di)/D,
+   md = m * Rd)
```

Then we construct a survival function in the usual way, but treating this hazard as if it was the only one operating:

```. gen Hd = sum(n * md)

. gen Sd = 1

. replace Sd = exp(-Hd[_n-1]) in 2/-1
```
```> b41 <- mutate(b41, Hd = cumsum(n * md),
+   Sd = edit.na(exp(-lag(Hd)), 1)) # Sd = 1
```

That’s it. Really. We take a hazard that represents a subset of causes and build a life table from it. If the hazard is constant within each age group we can also compute person-years lived at each age quite easily

```. gen Pd = (Sd-Sd[_n+1])/md
(1 missing value generated)

. replace Pd = Sd/md in -1

. quietly sum Pd

. di r(sum)
82.391779
```
```> b41 <- mutate(b41,
+   Pd =  edit.na((Sd - lead(Sd))/md, tail(Sd/md, 1)))
> sum(b41\$Pd)
 82.39178
```

So if neoplasm deaths could be avoided, female life expectancy at birth would be 82.4 years, under the strong (and untestable) assumption of independence of the competing causes.

You might be interested to see the shape of the overall, cause-specific, and cause-deleted hazards:

```. gen agem = age + n/2
(1 missing value generated)

. gen mi = m - md // rate for neoplasms

. line m md mi agem, xtitle(age) yscale(log) ///
>   lp(solid dash solid) ///
>   title("Hazards for U.S. Females 1991") ///
>   legend(order(1 "All Causes" 2 "No Neoplasms" 3 "Neoplasms") ///
>   ring(0) pos(5) cols(1))

. graph export usf91neo.png, width(500) replace
file usf91neo.png saved as PNG format
``` ```> bx <- mutate(b41, agem = age + n/2, mi = m - md)[-nrow(b41), ]
> library(ggplot2)
> ggplot(bx, aes(agem, m)) + geom_line() + scale_y_log10() +
+   geom_line(aes(agem, md), linetype="dashed") +
+   geom_line(aes(agem, mi)) + ggtitle("Hazards for U.S. Females 1991")
> ggsave("usf91neor.png", width=500/72, height=400/72, dpi=72)
``` ### Cause-Deleted Life Tables (II)

Let us review the alternative approach used in the textbook. The authors use Chiang’s method, which assumes proportionality of cause-specific hazards within an age group (weaker than the constant risk assumption). We compute the conditional probability of surviving an age group after deleting a cause as the overall probability raised to Rd, and then calculate the survival function as a cumulative product

```. gen pd = (1 - q)^Rd

. gen ld = 100000

. replace ld = ld[_n-1] * pd[_n-1] in 2/-1
```
```> b41 <- mutate(b41, pd = (1 - q)^Rd,
+   ld = 100000 * cumprod(c(1, pd[-length(pd)])))
```

The cause-deleted survival functions computed by the two methods differ by less than 0.1% at all ages except the last two, where the differences are 0.2% and 0.5%, respectively.

To compute time lived we need the nax factors. The text uses the Taylor series method for ages 0, 1, 5, and 80, and a quadratic interpolation formula from 10 to 75. Let’s do that:

```. gen dd = ld - ld[_n+1] // deaths after cause-deletion
(1 missing value generated)

. replace dd = ld in -1

. gen qd = dd/ld

. gen ad = a/Rd in -1
(18 missing values generated)

. // Equation 4.8:
. replace ad = n + Rd * (q/qd) * (a - n) ///
>   if age < 10 | age==80

. // Equation 4.6:
. replace ad =( (-5/24) * dd[_n - 1] + 2.5*dd + (5/24) * dd[_n + 1] )/dd ///
>   if age >= 10 & age <= 75
```
```> b41 <- mutate(b41,
+   dd = edit.na(ld - lead(ld), tail(ld,1)),
+   qd = dd/ld,
+   ad = ifelse(age < 10 | age == 80,
+     n + Rd * (q/qd) * (a - n),
+       ifelse(age >= 10 & age <= 75,
+       ((-5/24) * lag(dd) + 2.5 * dd + (5/24) * lead(dd) )/dd,
+       a/Rd)))
```

We can then compute time lived and life expectancy:

```. gen Ld = dd * ad + (ld - dd) * n
(1 missing value generated)

. replace Ld = ld * ad in -1

. quietly summarize Ld

. di r(sum)/100000
82.457483
```
```> b41 <- mutate(b41,
+   Ld = edit.na(dd * ad + (ld - dd) * n,  tail(ld * ad, 1)))
> summarize(b41, e0 = sum(Ld)/100000)
e0
1 82.45748
```

We get 82.46 years, in agreement with the text, and very close to the 82.39 computed using the simpler approach.

We now compute life expectancy at every age and print the “official” cause-deleted life table.

```. gen Td = r(sum) - sum(Ld) + Ld

. gen ed = Td/ld

. format %8.6f Rd

. format %5.2f ed

. format %8.0fc lx ld

. list age Rd lx a ld ad ed

┌────────────────────────────────────────────────────────────┐
│ age         Rd        lx       a        ld      ad      ed │
├────────────────────────────────────────────────────────────┤
1. │   0   0.996002   100,000   0.152   100,000   0.152   82.46 │
2. │   1   0.913222    99,217   1.605    99,220   1.605   82.10 │
3. │   5   0.835985    99,050   2.275    99,068   2.275   78.23 │
4. │  10   0.862047    98,959   2.843    98,992   2.875   73.29 │
5. │  15   0.919595    98,870   2.657    98,915   2.653   68.34 │
├────────────────────────────────────────────────────────────┤
6. │  20   0.905618    98,637   2.547    98,700   2.548   63.48 │
7. │  25   0.868125    98,379   2.550    98,467   2.577   58.63 │
8. │  30   0.795927    98,070   2.616    98,198   2.585   53.78 │
9. │  35   0.706327    97,653   2.677    97,866   2.582   48.96 │
10. │  40   0.616676    97,083   2.685    97,462   2.637   44.15 │
├────────────────────────────────────────────────────────────┤
11. │  45   0.562189    96,289   2.681    96,969   2.672   39.36 │
12. │  50   0.540143    95,008   2.655    96,242   2.695   34.64 │
13. │  55   0.541050    93,018   2.647    95,148   2.703   30.00 │
14. │  60   0.574919    89,882   2.646    93,399   2.695   25.51 │
15. │  65   0.619308    85,249   2.631    90,600   2.696   21.22 │
├────────────────────────────────────────────────────────────┤
16. │  70   0.680059    78,711   2.628    86,231   2.686   17.16 │
17. │  75   0.745246    69,618   2.618    79,325   2.676   13.42 │
18. │  80   0.816833    57,486   2.570    68,776   2.637   10.07 │
19. │  85   0.897875    41,756   6.539    52,969   7.283    7.28 │
└────────────────────────────────────────────────────────────┘
```
```> b41 <- mutate(b41, Td = sum(Ld) - cumsum(Ld) + Ld, ed = Td/ld)
> select(b41, Rd, lx, a, ld, ad, ed)
Rd     lx     a        ld        ad       ed
1  0.9960020 100000 0.152 100000.00 0.1520133 82.45748
2  0.9132218  99217 1.605  99220.12 1.6051750 82.10441
3  0.8359853  99050 2.275  99067.59 2.2752054 78.22835
4  0.8620470  98959 2.843  98991.50 2.8753672 73.28673
5  0.9195954  98870 2.657  98914.75 2.6526438 68.34137
6  0.9056184  98637 2.547  98700.37 2.5482565 63.48405
7  0.8681251  98379 2.550  98466.54 2.5765317 58.62875
8  0.7959270  98070 2.616  98197.99 2.5847700 53.78204
9  0.7063274  97653 2.677  97865.51 2.5824632 48.95597
10 0.6166763  97083 2.685  97461.68 2.6369136 44.14812
11 0.5621885  96289 2.681  96969.36 2.6723714 39.35888
12 0.5401434  95008 2.655  96241.98 2.6945692 34.63615
13 0.5410497  93018 2.647  95147.84 2.7030360 30.00346
14 0.5749188  89882 2.646  93398.60 2.6949490 25.51476
15 0.6193082  85249 2.631  90599.70 2.6959050 21.21973
16 0.6800589  78711 2.628  86231.38 2.6864399 17.15811
17 0.7452457  69618 2.618  79324.76 2.6757623 13.41813
18 0.8168329  57486 2.570  68775.62 2.6367747 10.06584
19 0.8978751  41756 6.539  52969.13 7.2827502  7.28275
```

How much would a woman’s life expectancy at 40 change if neoplasm deaths were eliminated, assuming again independence of the underlying risks?

