Germán Rodríguez
Demographic Methods Princeton University

Coale’s Marital Fertility Model

Here’s a quick run showing how to fit Coale’s model of marital fertility using Poisson regression. This is the original example in Brostrom (1985), Demography, 22:625-631. File brostrom.dat in the datasets section has the births, exposure, and the five-year standards na and va in comma-separated format.

. clear

. import delimited using https://grodri.github.io/datasets/brostrom.dat, clear
(encoding automatically selected: ISO-8859-9)
(5 vars, 6 obs)
> br <- read.csv("https://grodri.github.io/datasets/brostrom.dat", header=TRUE)

We treat births as Poisson and the log of exposure time times natural fertility as an offset:

. gen os = log(expo * na)

. poisson births va, offset(os)

Iteration 0:   log likelihood = -22.290463  
Iteration 1:   log likelihood = -19.308136  
Iteration 2:   log likelihood = -19.303228  
Iteration 3:   log likelihood = -19.303228  

Poisson regression                                      Number of obs =      6
                                                        LR chi2(1)    =  19.05
                                                        Prob > chi2   = 0.0000
Log likelihood = -19.303228                             Pseudo R2     = 0.3304

─────────────┬────────────────────────────────────────────────────────────────
      births │ Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
─────────────┼────────────────────────────────────────────────────────────────
          va │   .3584115   .0827943     4.33   0.000     .1961376    .5206854
       _cons │  -.0274852   .0599075    -0.46   0.646    -.1449018    .0899315
          os │          1  (offset)
─────────────┴────────────────────────────────────────────────────────────────
> pr <- glm(births ~ va + offset(log(exposure * na)), data=br, family=poisson)
> pr

Call:  glm(formula = births ~ va + offset(log(exposure * na)), family = poisson, 
    data = br)

Coefficients:
(Intercept)           va  
   -0.02749      0.35841  

Degrees of Freedom: 5 Total (i.e. Null);  4 Residual
Null Deviance:	    20.19 
Residual Deviance: 1.146 	AIC: 42.61

We find that log M = -0.027 (so M = 0.973) and m = 0.358, indicating substantial control of fertility. Brostrom obtained log M = -0.026 and m = 0.361 using GLIM. Stata and R usually go an extra iteration.

To check model fit we can compare observed and fitted rates or use a formal chi-squared test:

. estat gof

         Deviance goodness-of-fit =  1.146498
         Prob > chi2(4)           =    0.8868

         Pearson goodness-of-fit  =  1.133861
         Prob > chi2(4)           =    0.8889

. gen am = age + 2.5

. gen obs = births/expo

. predict fv
(option n assumed; predicted number of events)

. gen fit = fv/expo

. scatter obs am || line fit am, title(Brostrom's Example) ///
>   legend(off) xtitle(age) ytitle(marital fertility)

. graph export brostrom.png, width(500) replace
file brostrom.png saved as PNG format

> library(dplyr)
> library(ggplot2)
> br <- mutate(br, obs = births/exposure, fit = fitted(pr)/exposure)
> ggplot(br, aes(age, obs)) + geom_point() + geom_line(aes(age,fit))
> ggsave("brostromr.png", width=500/72, height=400/72, dpi=72)

The model fits pretty well, although it slightly overestimates fertility in the youngest age group, 20-24. For comparison, here are the estimates one would obtain using OLS as in the textbook (page 206).

. gen y = log(obs/na)

. reg y va, noheader
─────────────┬────────────────────────────────────────────────────────────────
           y │ Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
─────────────┼────────────────────────────────────────────────────────────────
          va │   .3930404   .0391997    10.03   0.001     .2842045    .5018764
       _cons │  -.0288292   .0404857    -0.71   0.516    -.1412356    .0835772
─────────────┴────────────────────────────────────────────────────────────────
> lm(log(obs/na) ~ va, data=br)

Call:
lm(formula = log(obs/na) ~ va, data = br)

Coefficients:
(Intercept)           va  
   -0.02883      0.39304  

The results are similar, with logM = -0.29 and m = 0.393.

A useful diagnostic plot is log(f(a)/n(a)) versus v(a); under the model that relationship should be a straight line, which is then estimated by OLS or Poisson maximum likelihood. For an example see Box 9.3 in the textbook.