Here’s a quick run showing how to fit Coale’s model of marital
fertility using Poisson regression. This is the original example in
Brostrom (1985), Demography, 22:625-631. File brostrom.dat
in the datasets section has the births, exposure, and the five-year
standards na
and va
in comma-separated
format.
. clear . import delimited using https://grodri.github.io/datasets/brostrom.dat, clear (encoding automatically selected: ISO-8859-9) (5 vars, 6 obs)
> br <- read.csv("https://grodri.github.io/datasets/brostrom.dat", header=TRUE)
We treat births as Poisson and the log of exposure time times natural fertility as an offset:
. gen os = log(expo * na) . poisson births va, offset(os) Iteration 0: log likelihood = -22.290463 Iteration 1: log likelihood = -19.308136 Iteration 2: log likelihood = -19.303228 Iteration 3: log likelihood = -19.303228 Poisson regression Number of obs = 6 LR chi2(1) = 19.05 Prob > chi2 = 0.0000 Log likelihood = -19.303228 Pseudo R2 = 0.3304 ─────────────┬──────────────────────────────────────────────────────────────── births │ Coefficient Std. err. z P>|z| [95% conf. interval] ─────────────┼──────────────────────────────────────────────────────────────── va │ .3584115 .0827943 4.33 0.000 .1961376 .5206854 _cons │ -.0274852 .0599075 -0.46 0.646 -.1449018 .0899315 os │ 1 (offset) ─────────────┴────────────────────────────────────────────────────────────────
> pr <- glm(births ~ va + offset(log(exposure * na)), data=br, family=poisson) > pr Call: glm(formula = births ~ va + offset(log(exposure * na)), family = poisson, data = br) Coefficients: (Intercept) va -0.02749 0.35841 Degrees of Freedom: 5 Total (i.e. Null); 4 Residual Null Deviance: 20.19 Residual Deviance: 1.146 AIC: 42.61
We find that log M = -0.027 (so M = 0.973) and m = 0.358, indicating substantial control of fertility. Brostrom obtained log M = -0.026 and m = 0.361 using GLIM. Stata and R usually go an extra iteration.
To check model fit we can compare observed and fitted rates or use a formal chi-squared test:
. estat gof Deviance goodness-of-fit = 1.146498 Prob > chi2(4) = 0.8868 Pearson goodness-of-fit = 1.133861 Prob > chi2(4) = 0.8889 . gen am = age + 2.5 . gen obs = births/expo . predict fv (option n assumed; predicted number of events) . gen fit = fv/expo . scatter obs am || line fit am, title(Brostrom's Example) /// > legend(off) xtitle(age) ytitle(marital fertility) . graph export brostrom.png, width(500) replace file brostrom.png saved as PNG format
> library(dplyr) > library(ggplot2) > br <- mutate(br, obs = births/exposure, fit = fitted(pr)/exposure) > ggplot(br, aes(age, obs)) + geom_point() + geom_line(aes(age,fit)) > ggsave("brostromr.png", width=500/72, height=400/72, dpi=72)
The model fits pretty well, although it slightly overestimates fertility in the youngest age group, 20-24. For comparison, here are the estimates one would obtain using OLS as in the textbook (page 206).
. gen y = log(obs/na) . reg y va, noheader ─────────────┬──────────────────────────────────────────────────────────────── y │ Coefficient Std. err. t P>|t| [95% conf. interval] ─────────────┼──────────────────────────────────────────────────────────────── va │ .3930404 .0391997 10.03 0.001 .2842045 .5018764 _cons │ -.0288292 .0404857 -0.71 0.516 -.1412356 .0835772 ─────────────┴────────────────────────────────────────────────────────────────
> lm(log(obs/na) ~ va, data=br) Call: lm(formula = log(obs/na) ~ va, data = br) Coefficients: (Intercept) va -0.02883 0.39304
The results are similar, with logM = -0.29 and m = 0.393.
A useful diagnostic plot is log(f(a)/n(a)) versus v(a); under the model that relationship should be a straight line, which is then estimated by OLS or Poisson maximum likelihood. For an example see Box 9.3 in the textbook.