Germán Rodríguez
Generalized Linear Models Princeton University

B.5 Poisson Errors and Link Log

Let us now apply the general theory to the Poisson case, with emphasis on the log link function.

B.5.1 The Poisson Distribution

A Poisson random variable has probability distribution function

\[\tag{B.20}f_i(y_i) = \frac{ e^{-\mu_i} \mu_i^{y_i}} {y_i!}\]

for \( y_i = 0, 1, 2, \ldots \). The moments are

\[ E(Y_i) = \mbox{var}(Y_i) = \mu_i. \]

Let us verify that this distribution belongs to the exponential family as defined by Nelder and Wedderburn (1972). Taking logs we find

\[ \log f_i(y_i) = y_i \log(\mu_i) - \mu_i - \log(y_i!). \]

Looking at the coefficient of \( y_i \) we see immediately that the canonical parameter is

\[\tag{B.21}\theta_i = \log(\mu_i),\]

and therefore that the canonical link is the log. Solving for \( \mu_i \) we obtain the inverse link

\[ \mu_i = e^{\theta_i}, \]

and we see that we can write the second term in the p.d.f. as

\[ b(\theta_i) = e^{\theta_i}. \]

The last remaining term is a function of \( y_i \) only, so we identify

\[ c(y_i,\phi) = -\log(y_i!). \]

Finally, note that we can take \( a_i(\phi) = \phi \) and \( \phi=1 \), just as we did in the binomial case.

Let us verify the mean and variance. Differentiating the cumulant function \( b(\theta_i) \) we have

\[ \mu_i = b'(\theta_i) = e^{\theta_i} = \mu_i, \]

and differentiating again we have

\[ v_i = a_i(\phi) b''(\theta_i) = e^{\theta_i} = \mu_i. \]

Note that the mean equals the variance.

B.5.2 Fisher Scoring in Log-linear Models

We now consider the Fisher scoring algorithm for Poisson regression models with canonical link, where we model

\[\tag{B.22}\eta_i = \log(\mu_i).\]

The derivative of the link is easily seen to be

\[ \frac{d\eta_i}{d\mu_i} = \frac{1}{\mu_i}. \]

Thus, the working dependent variable has the form

\[\tag{B.23}z_i = \eta_i + \frac{y_i - \mu_i} {\mu_i}.\]

The iterative weight is

\[\tag{B.24}w_i = 1 / \left[ b''(\theta_i) (\frac{d\eta_i}{d\mu_i})^2 \right] \] \[ w_i = 1 / \left[ \mu_i \frac{1}{\mu_i^2} \right] \]

and simplifies to

\[\tag{B.25}w_i = \mu_i.\]

Note again that the weight is inversely proportional to the variance of the working dependent variable.

B.5.3 The Poisson Deviance

Let \( \hat{\mu_i} \) denote the m.l.e. of \( \mu_i \) under the model of interest and let \( \tilde{\mu_i} = y_i \) denote the m.l.e. under the saturated model. From first principles, the deviance is

\[\tag{B.26}D = 2 \sum [ y_i \log(y_i) - y_i - \log(y_i!) \] \[ - y_i \log(\hat{\mu_i}) + \hat{\mu_i} + \log(y_i!)] \]

Note that the terms on \( y_i! \) cancel out. Collecting terms on \( y_i \) we have

\[\tag{B.27}D = 2 \sum [ y_i \log(\frac{y_i}{\hat{\mu_i}}) - (y_i - \hat{\mu_i})].\]

The similarity of the Poisson and Binomial deviances should not go unnoticed. Note that the first term in the Poisson deviance has the form

\[ D = 2 \sum o_i \log(\frac{o_i}{e_i}), \]

which is identical to the Binomial deviance. The second term is usually zero. To see this point, note that for a canonical link the score is

\[ \frac{\partial \log L}{\partial \boldsymbol{\beta}} = \boldsymbol{X}'(\boldsymbol{y}-\boldsymbol{\mu}), \]

and setting this to zero leads to the estimating equations

\[ \boldsymbol{X}'\boldsymbol{y} = \boldsymbol{X}'\hat{\boldsymbol{\mu}}. \]

In words, maximum likelihood estimation for Poisson log-linear models—and more generally for any generalized linear model with canonical link—requires equating certain functions of the m.l.e.’s (namely \( \boldsymbol{X}'\hat{\boldsymbol{\mu}}) \) to the same functions of the data (namely \( \boldsymbol{X}'\boldsymbol{y} \)). If the model has a constant, one column of \( \boldsymbol{X} \) will consist of ones and therefore one of the estimating equations will be

\[ \sum y_i = \sum \hat{\mu_i} \quad\mbox{or}\quad \sum (y_i-\hat{\mu_i})=0, \]

so the last term in the Poisson deviance is zero. This result is the basis of an alternative algorithm for computing the m.l.e.’s known as “iterative proportional fitting”, see Bishop et al. (1975) for a description.

The Poisson deviance has an asymptotic chi-squared distribution as \( n \rightarrow \infty \) with the number of parameters \( p \) remaining fixed, and can be used as a goodness of fit test. Differences between Poisson deviances for nested models (i.e. the log of the likelihood ratio test criterion) have asymptotic chi-squared distributions under the usual regularity conditions.

Math rendered by