4. Poisson Models for Counts

This unit illustrates the use of Poisson regression for modeling count data. We will be using the poisson command, often followed by estat gof to compute the model’s deviance, which we can use as a goodness of fit test with both individual and grouped data. An alternative way to fit these models is to use the glm command to fit generalized linear models in the Poisson family with link log. An advantage of that command is that it reports the deviance and Pearson’s chi-squared statistics. There’s also an option to adjust standard errors for extra-Poisson variation. We will illustrate its use in the context of models for overdispersed count data.

4.3 A Poisson Model for Counts

We will use the data from Fiji on children ever born that appear on Table 4.1 of the lecture notes. The data are available in the datasets section in both plain text and Stata formats. We will read the Stata file:

. use https://grodri.github.io/datasets/ceb, clear
(Children Ever Born Data, Fiji, 1976)

. list in 1/6

     ┌────────────────────────────────────────────────────┐
     │ i   dur     res            educ   mean    var    n │
     ├────────────────────────────────────────────────────┤
  1. │ 1   0-4    Suva            None     .5   1.14    8 │
  2. │ 2   0-4    Suva   Lower primary   1.14    .73   21 │
  3. │ 3   0-4    Suva   Upper primary     .9    .67   42 │
  4. │ 4   0-4    Suva      Secondary+    .73    .48   51 │
  5. │ 5   0-4   Urban            None   1.17   1.06   12 │
     ├────────────────────────────────────────────────────┤
  6. │ 6   0-4   Urban   Lower primary    .85   1.59   27 │
     └────────────────────────────────────────────────────┘

The file has 70 observations, one for each cell in the table. Each observation has a sequence number, numeric codes for marriage duration, residence and education, the mean and variance of children ever born, and the number of women in the cell.

Mean-Variance Relationship

We start by doing Figure 4.1, plotting the cell variances versus the cell means using a log-log-scale for cell with at least 20 cases. Because Stata has an option to use log scales we don’t need to take logs ourselves:

. twoway (scatter var mean if n > 20) ///
>   (function y=x, range (.7 7.8)) ///  
>   , xscale(log) yscale(log) legend(off) ///
>     xtitle(Mean CEB) ytitle(Variance of CEB) ///
>     title("Figure 4.1. Mean-Variance Relationship") ///
>     subtitle("Children Ever Born in Fiji, 1976")   

. graph export c4fig1.png, width(500) replace
file c4fig1.png saved as PNG format

Clearly the variance increases with the mean. Most of the points lie below the 45 degree line, indicating that the variance is not exactly equal to the mean. Still, the assumption of proportionality brings us much closer to the data than the assumption of constant variance.

Response, Offset and Predictors

The dataset does not have information about the number of children ever born (CEB) to each woman, but it turns out that we can still model the mean by working with the cell totals and introducing the log of the number of women in the cell as an offset.

If the number of CEB to one woman in a given cell is a Poisson random variable with mean (and variance) μ, then the number born to all n women in that cell is a Poisson r.v. with mean (and variance) nμ. The log of the expected sum is log (n) + log (μ), and consists of a known offset plus the quantity we are interested in modeling. See the notes for further details

We therefore start by computing the outcome, the total CEB in each cell, and the offset.

. gen y = round( mean * n, 1)

. gen os = log(n)

The Null Model

We are ready to fit the null model, which has an offset but no predictors.

. poisson y, offset(os)

Iteration 0:   log likelihood =  -2080.664  
Iteration 1:   log likelihood =  -2080.664  

Poisson regression                                     Number of obs =      70
                                                       LR chi2(0)    =   -0.00
                                                       Prob > chi2   =       .
Log likelihood = -2080.664                             Pseudo R2     = -0.0000

─────────────┬────────────────────────────────────────────────────────────────
           y │ Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
─────────────┼────────────────────────────────────────────────────────────────
       _cons │   1.376346   .0097119   141.72   0.000     1.357311    1.395381
          os │          1  (offset)
─────────────┴────────────────────────────────────────────────────────────────

. di exp(_b[_cons])
3.9604033

. quietly sum mean [fw=n]

. di r(mean)
3.9604968

. estat gof

         Deviance goodness-of-fit =  3731.851
         Prob > chi2(69)          =    0.0000

         Pearson goodness-of-fit  =  3375.349
         Prob > chi2(69)          =    0.0000

The constant is the log of the mean number of children ever born. Exponentiating we see that the estimated mean is almost four children per woman. The estimate coincides with the sample mean, as we verified by averaging the cell means with the number of women as a frequency weight.

The deviance of 3,732 on 69 d.f. gives a clear indication that the model doesn’t fit the data. The hypothesis that the expected number of CEB is the same for all women regardless of marriage duration, residence and education, is soundly rejected,

In rate models the offset usually represents the log of exposure, and Stata lets us specify it directly using the offset() option with the name of the variable representing the offset, or using the exposure() option with the name of the variable representing exposure, in which case Stata takes the log.

One-Factor Models

Next we fit the three one-factor models, starting with residence:

. poisson y i.res, offset(os)

Iteration 0:   log likelihood = -2051.3779  
Iteration 1:   log likelihood = -2044.3868  
Iteration 2:   log likelihood = -2044.3778  
Iteration 3:   log likelihood = -2044.3778  

Poisson regression                                      Number of obs =     70
                                                        LR chi2(2)    =  72.57
                                                        Prob > chi2   = 0.0000
Log likelihood = -2044.3778                             Pseudo R2     = 0.0174

─────────────┬────────────────────────────────────────────────────────────────
           y │ Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
─────────────┼────────────────────────────────────────────────────────────────
         res │
      Urban  │   .1442896    .032448     4.45   0.000     .0806926    .2078866
      Rural  │   .2280596   .0278321     8.19   0.000     .1735097    .2826095
             │
       _cons │   1.204598   .0249922    48.20   0.000     1.155614    1.253581
          os │          1  (offset)
─────────────┴────────────────────────────────────────────────────────────────

. di exp(_b[2.res]), exp(_b[3.res])
1.1552186 1.2561602

. estat gof

         Deviance goodness-of-fit =  3659.279
         Prob > chi2(67)          =    0.0000

         Pearson goodness-of-fit  =  3304.611
         Prob > chi2(67)          =    0.0000

The estimates show that women in urban and rural areas have on average 16 and 26% more children than women in Suva. The model chi-squared of 73 on 2 d.f. tells us that this model is a significant improvement over the null. The deviance, still in the thousands, tells us that this model is far from fitting the data.

Now for education

. poisson y i.educ, offset(os)

Iteration 0:   log likelihood = -1588.3352  
Iteration 1:   log likelihood = -1545.4751  
Iteration 2:   log likelihood = -1545.2371  
Iteration 3:   log likelihood = -1545.2371  

Poisson regression                                     Number of obs =      70
                                                       LR chi2(3)    = 1070.85
                                                       Prob > chi2   =  0.0000
Log likelihood = -1545.2371                            Pseudo R2     =  0.2573

───────────────┬────────────────────────────────────────────────────────────────
             y │ Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
───────────────┼────────────────────────────────────────────────────────────────
          educ │
Lower primary  │  -.2117869   .0216769    -9.77   0.000    -.2542729   -.1693008
Upper primary  │  -.6160532   .0288581   -21.35   0.000    -.6726141   -.5594922
   Secondary+  │  -1.224676   .0514108   -23.82   0.000     -1.32544   -1.123913
               │
         _cons │   1.647278   .0146932   112.11   0.000      1.61848    1.676076
            os │          1  (offset)
───────────────┴────────────────────────────────────────────────────────────────

. mata exp(st_matrix("e(b)"))
                 1             2             3             4             5
    ┌───────────────────────────────────────────────────────────────────────┐
  1 │            1   .8091371376   .5400718104   .2938527957   5.192824803  │
    └───────────────────────────────────────────────────────────────────────┘

. estat gof 

         Deviance goodness-of-fit =  2660.997
         Prob > chi2(66)          =    0.0000

         Pearson goodness-of-fit  =  2426.918
         Prob > chi2(66)          =    0.0000

The estimates show that the number of CEB declines substantially with education. Women with secondary education or more have 71% fewer children than women with no education (or only 29% as many). The educational differential is highly significant, but this model doesn’t fit the data.

Finally, here’s duration:

. poisson y i.dur, offset(os)

Iteration 0:   log likelihood =  -315.2481  
Iteration 1:   log likelihood = -297.80021  
Iteration 2:   log likelihood = -297.77426  
Iteration 3:   log likelihood = -297.77426  

Poisson regression                                     Number of obs =      70
                                                       LR chi2(5)    = 3565.78
                                                       Prob > chi2   =  0.0000
Log likelihood = -297.77426                            Pseudo R2     =  0.8569

─────────────┬────────────────────────────────────────────────────────────────
           y │ Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
─────────────┼────────────────────────────────────────────────────────────────
         dur │
        5-9  │   1.044886   .0523975    19.94   0.000     .9421893    1.147584
      10-14  │   1.444947   .0502397    28.76   0.000     1.346479    1.543416
      15-19  │   1.706756   .0497474    34.31   0.000     1.609253     1.80426
      20-24  │   1.877474   .0496492    37.81   0.000     1.780164    1.974785
        25+  │   2.078855    .047507    43.76   0.000     1.985743    2.171967
             │
       _cons │  -.1036046   .0441511    -2.35   0.019    -.1901391     -.01707
          os │          1  (offset)
─────────────┴────────────────────────────────────────────────────────────────

. estat gof 

         Deviance goodness-of-fit =  166.0717
         Prob > chi2(64)          =    0.0000

         Pearson goodness-of-fit  =  156.3573
         Prob > chi2(64)          =    0.0000

Not surprisingly, the number of CEB is much higher for women who have been married longer. This is by far the most important predictor of CEB, with a chi-squared of 3,566 on just 5 d.f. In fact, a demographer wouldn’t even have looked at models that did not include a control for duration of marriage. It’s nice to see that Poisson regression can uncover the obvious :) Note that this model still doesn’t fit the data.

The deviances given in this section are pretty close to the deviances in Table 4.3 of the notes. You will notice small differences due to the use of different rounding procedures. In the notes we multiplied the mean CEB by the number of women and retained a few decimals. Here we rounded the total number of CEB to the nearest integer. If you omit the rounding you will reproduce the results in the notes exactly.

Two-Factor Models

We now consider models that take two of the three factors into account. Following the notes we consider only models that include duration of marriage, an essential control when we study cumulative fertility. This leaves two models with main effects of two factors, and another two models that add one interaction.

Because we are only interested in deviances I will run the estimation commands quietly.

So here are the additive models

. quietly poisson y i.dur i.res , offset(os)

. quietly estat gof

. di "Deviance = ", r(chi2_d),"on", r(df),"d.f. p-value =",r(p_d)
Deviance =  120.68035 on 62 d.f. p-value = .00001189

. quietly poisson y i.dur i.educ, offset(os)

. quietly estat gof

. di "Deviance = ", r(chi2_d),"on", r(df),"d.f. p-value =",r(p_d)
Deviance =  100.19163 on 61 d.f. p-value = .00116423

And here are the models with one interaction

. quietly poisson y i.dur#i.res , offset(os)

. estat gof

         Deviance goodness-of-fit =  108.8965
         Prob > chi2(52)          =    0.0000

         Pearson goodness-of-fit  =  104.9353
         Prob > chi2(52)          =    0.0000

. quietly poisson y i.dur#i.educ, offset(os)

. estat gof 

         Deviance goodness-of-fit =  84.53043
         Prob > chi2(46)          =    0.0005

         Pearson goodness-of-fit  =  84.69985
         Prob > chi2(46)          =    0.0004

The best fit so far is the model that includes duration and education, but it exhibits significant lack of fit with a chi-squared of 84.5 on 46 d.f.

The Three-Factor Additive Model

We are now ready to look at models that include all three factors. We start with the additive model.

. poisson y i.dur i.res  i.educ, offset(os)

Iteration 0:   log likelihood = -623.59688  
Iteration 1:   log likelihood = -252.64903  
Iteration 2:   log likelihood = -250.07248  
Iteration 3:   log likelihood = -250.07108  
Iteration 4:   log likelihood = -250.07108  

Poisson regression                                     Number of obs =      70
                                                       LR chi2(10)   = 3661.19
                                                       Prob > chi2   =  0.0000
Log likelihood = -250.07108                            Pseudo R2     =  0.8798

───────────────┬────────────────────────────────────────────────────────────────
             y │ Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
───────────────┼────────────────────────────────────────────────────────────────
           dur │
          5-9  │   .9969348   .0527437    18.90   0.000     .8935591    1.100311
        10-14  │   1.369395   .0510688    26.81   0.000     1.269302    1.469488
        15-19  │   1.613757   .0511949    31.52   0.000     1.513417    1.714097
        20-24  │   1.784911   .0512138    34.85   0.000     1.684534    1.885288
          25+  │   1.976405   .0500341    39.50   0.000      1.87834     2.07447
               │
           res │
        Urban  │   .1124186   .0324963     3.46   0.001      .048727    .1761102
        Rural  │   .1516602   .0283292     5.35   0.000      .096136    .2071845
               │
          educ │
Lower primary  │   .0229728   .0226563     1.01   0.311    -.0214327    .0673783
Upper primary  │  -.1012738   .0309871    -3.27   0.001    -.1620073   -.0405402
   Secondary+  │  -.3101495   .0552107    -5.62   0.000    -.4183605   -.2019386
               │
         _cons │  -.1170972   .0549118    -2.13   0.033    -.2247222   -.0094721
            os │          1  (offset)
───────────────┴────────────────────────────────────────────────────────────────

. estat gof

         Deviance goodness-of-fit =   70.6653
         Prob > chi2(59)          =    0.1421

         Pearson goodness-of-fit  =  71.53348
         Prob > chi2(59)          =    0.1269

This model passes the goodness of fit hurdle, with a deviance of 70.67 on 59 d.f. and a corresponding P-value of 0.14, so we have no evidence against this model.

To exponentiate the parameter estimates we can reissue the poisson command with the irr option, which is short for incidence-rate ratios.

. poisson, irr

Poisson regression                                     Number of obs =      70
                                                       LR chi2(10)   = 3661.19
                                                       Prob > chi2   =  0.0000
Log likelihood = -250.07108                            Pseudo R2     =  0.8798

───────────────┬────────────────────────────────────────────────────────────────
             y │        IRR   Std. err.      z    P>|z|     [95% conf. interval]
───────────────┼────────────────────────────────────────────────────────────────
           dur │
          5-9  │   2.709963   .1429334    18.90   0.000     2.443812    3.005099
        10-14  │   3.932972   .2008521    26.81   0.000     3.558369     4.34701
        15-19  │   5.021644   .2570824    31.52   0.000     4.542226    5.551663
        20-24  │    5.95905   .3051855    34.85   0.000     5.389938    6.588254
          25+  │   7.216753   .3610835    39.50   0.000     6.542636    7.960327
               │
           res │
        Urban  │   1.118981   .0363628     3.46   0.001     1.049934    1.192569
        Rural  │   1.163765   .0329685     5.35   0.000     1.100909    1.230209
               │
          educ │
Lower primary  │   1.023239   .0231828     1.01   0.311     .9787954      1.0697
Upper primary  │   .9036856   .0280026    -3.27   0.001      .850435    .9602706
   Secondary+  │   .7333373    .040488    -5.62   0.000     .6581249    .8171451
               │
         _cons │   .8894988   .0488439    -2.13   0.033     .7987381    .9905726
            os │          1  (offset)
───────────────┴────────────────────────────────────────────────────────────────
Note: _cons estimates baseline incidence rate.

Briefly, the estimates indicate that the number of CEB increases rapidly with marital duration; in each category of residence and education women married 15-19 years have five times as many children as those married less than five years. Women who live in urban and rural areas have 12% and 16% more children than women who live in Suva and have the same marriage duration and education. Finally, more educated women have fewer children, as women with secondary or more education have on average 27% fewer children than women with no education who live in the same type of place of residence and have been married just as long.

Three-Factor Models with Interactions

We now put the additive model to some “stress tests” by considering all possible interactions. The code below runs the models and estat gof in a loop using quietly to save space, printing just the model, deviance, d.f. and p-value for each model.

. local heading "Model {col 37} Deviance {col 48} d.f. {col 57}p-value"

. foreach model in  "i.dur i.res##i.educ" "i.dur##i.res i.educ"  ///
>    "i.dur##i.educ i.res" "(i.dur i.res)##i.educ" "(i.dur i.educ)##i.res" ///
>    "i.dur##(i.res i.educ)" "i.dur##(i.res i.educ) i.res#i.educ" {
  2.     if "`heading'" != "" di "`heading'"
  3.     local heading = ""
  4.     quietly poisson y `model', offset(os)
  5.     quietly estat gof
  6.     di "`model'", _column(37) %8.2f r(chi2_d), ///
>       _column(47) %5.0f r(df), _column(57) %6.4f r(p_d)
  7. }
Model                                Deviance   d.f.    p-value
i.dur i.res##i.educ                    59.92     53     0.2391
i.dur##i.res i.educ                    57.13     49     0.1986
i.dur##i.educ i.res                    54.80     44     0.1274
(i.dur i.res)##i.educ                  44.52     38     0.2163
(i.dur i.educ)##i.res                  44.31     43     0.4162
i.dur##(i.res i.educ)                  42.65     34     0.1467
i.dur##(i.res i.educ) i.res#i.educ     30.86     28     0.3235

These calculations complete Table 4.3 in the notes. I reported the deviances for consistency with the notes, but could just as well have reported likelihood ratio tests comparing each of these models to the additive model. Make sure you know how to use the output to test, for example, whether we need to add a duration by education interaction. It should be clear from the list of deviances that we don’t need to add any of these terms. We conclude that the additive model does a fine job indeed.

It’s important to note that the need for interactions depends exactly on what’s being modeled. Here we used the log link, so all effects are relative. In this scale no interactions are needed. If we used the identity link we would be modeling the actual number of children ever born, and all effects would be absolute. In that scale we would need, at the very least, interactions with duration of marriage. See the notes for further discussion.

Updated fall 2022