Here’s a quick run showing how to fit Coale’s model of marital
fertility using Poisson regression. This is the original example in
Brostrom (1985), Demography, 22:625-631. File brostrom.dat
in the datasets section has the births, exposure, and the five-year
standards na and va in comma-separated
format.
. clear . import delimited using https://grodri.github.io/datasets/brostrom.dat, clear (encoding automatically selected: ISO-8859-9) (5 vars, 6 obs)
> br <- read.csv("https://grodri.github.io/datasets/brostrom.dat", header=TRUE)
We treat births as Poisson and the log of exposure time times natural fertility as an offset:
. gen os = log(expo * na)
. poisson births va, offset(os)
Iteration 0: log likelihood = -22.290463
Iteration 1: log likelihood = -19.308136
Iteration 2: log likelihood = -19.303228
Iteration 3: log likelihood = -19.303228
Poisson regression Number of obs = 6
LR chi2(1) = 19.05
Prob > chi2 = 0.0000
Log likelihood = -19.303228 Pseudo R2 = 0.3304
─────────────┬────────────────────────────────────────────────────────────────
births │ Coefficient Std. err. z P>|z| [95% conf. interval]
─────────────┼────────────────────────────────────────────────────────────────
va │ .3584115 .0827943 4.33 0.000 .1961376 .5206854
_cons │ -.0274852 .0599075 -0.46 0.646 -.1449018 .0899315
os │ 1 (offset)
─────────────┴────────────────────────────────────────────────────────────────
> pr <- glm(births ~ va + offset(log(exposure * na)), data=br, family=poisson)
> pr
Call: glm(formula = births ~ va + offset(log(exposure * na)), family = poisson,
data = br)
Coefficients:
(Intercept) va
-0.02749 0.35841
Degrees of Freedom: 5 Total (i.e. Null); 4 Residual
Null Deviance: 20.19
Residual Deviance: 1.146 AIC: 42.61
We find that log M = -0.027 (so M = 0.973) and m = 0.358, indicating substantial control of fertility. Brostrom obtained log M = -0.026 and m = 0.361 using GLIM. Stata and R usually go an extra iteration.
To check model fit we can compare observed and fitted rates or use a formal chi-squared test:
. estat gof
Deviance goodness-of-fit = 1.146498
Prob > chi2(4) = 0.8868
Pearson goodness-of-fit = 1.133861
Prob > chi2(4) = 0.8889
. gen am = age + 2.5
. gen obs = births/expo
. predict fv
(option n assumed; predicted number of events)
. gen fit = fv/expo
. scatter obs am || line fit am, title(Brostrom's Example) ///
> legend(off) xtitle(age) ytitle(marital fertility)
. graph export brostrom.png, width(500) replace
file brostrom.png saved as PNG format

> library(dplyr)
> library(ggplot2)
> br <- mutate(br, obs = births/exposure, fit = fitted(pr)/exposure)
> ggplot(br, aes(age, obs)) + geom_point() + geom_line(aes(age,fit))
> ggsave("brostromr.png", width=500/72, height=400/72, dpi=72)

The model fits pretty well, although it slightly overestimates fertility in the youngest age group, 20-24. For comparison, here are the estimates one would obtain using OLS as in the textbook (page 206).
. gen y = log(obs/na)
. reg y va, noheader
─────────────┬────────────────────────────────────────────────────────────────
y │ Coefficient Std. err. t P>|t| [95% conf. interval]
─────────────┼────────────────────────────────────────────────────────────────
va │ .3930404 .0391997 10.03 0.001 .2842045 .5018764
_cons │ -.0288292 .0404857 -0.71 0.516 -.1412356 .0835772
─────────────┴────────────────────────────────────────────────────────────────
> lm(log(obs/na) ~ va, data=br) Call: lm(formula = log(obs/na) ~ va, data = br) Coefficients: (Intercept) va -0.02883 0.39304
The results are similar, with logM = -0.29 and m = 0.393.
A useful diagnostic plot is log(f(a)/n(a)) versus v(a); under the model that relationship should be a straight line, which is then estimated by OLS or Poisson maximum likelihood. For an example see Box 9.3 in the textbook.