Statistics and Population

Lecture Notes

Home Lecture Notes Stata Logs R Logs Datasets Problem Sets

B.2 Maximum Likelihood Estimation

An important practical feature of generalized linear models is that they can all be fit to data using the same algorithm, a form of iteratively re-weighted least squares. In this section we describe the algorithm.

Given a trial estimate of the parameters \( \hat{\boldsymbol{\beta}} \), we calculate the estimated linear predictor \( \hat{\eta_i} = \boldsymbol{x}_i'\hat{\boldsymbol{\beta}} \) and use that to obtain the fitted values \( \hat{\mu_i} = g^{-1}(\hat{\eta_i}) \). Using these quantities, we calculate the working dependent variable

\[\tag{B.6}z_i = \hat{\eta_i} + (y_i - \hat{\mu_i}) \frac{\mbox{d} \eta_i}{\mbox{d} \mu_i},\]

where the rightmost term is the derivative of the link function evaluated at the trial estimate.

Next we calculate the iterative weights

\[\tag{B.7}w_i = p_i/[ b''(\theta_i) (\frac{\mbox{d}\eta_i}{\mbox{d}\mu_i})^2],\]

where \( b''(\theta_i) \) is the second derivative of \( b(\theta_i) \) evaluated at the trial estimate and we have assumed that \( a_i(\phi) \) has the usual form \( \phi/p_i \). This weight is inversely proportional to the variance of the working dependent variable \( z_i \) given the current estimates of the parameters, with proportionality factor \( \phi \).

Finally, we obtain an improved estimate of \( \boldsymbol{\beta} \) regressing the working dependent variable \( z_i \) on the predictors \( \boldsymbol{x}_i \) using the weights \( w_i \), i.e. we calculate the weighted least-squares estimate

\[\tag{B.8}\hat{\beta} = (\boldsymbol{X}'\boldsymbol{W}\boldsymbol{X})^{-1} \boldsymbol{X}'\boldsymbol{W} \boldsymbol{z},\]

where \( \boldsymbol{X} \) is the model matrix, \( \boldsymbol{W} \) is a diagonal matrix of weights with entries \( w_i \) given by (B.7) and \( \boldsymbol{z} \) is a response vector with entries \( z_i \) given by (B.6).

The procedure is repeated until successive estimates change by less than a specified small amount. McCullagh and Nelder (1989) prove that this algorithm is equivalent to Fisher scoring and leads to maximum likelihood estimates. These authors consider the case of general \( a_i(\phi) \) and include \( \phi \) in their expression for the iterative weight. In other words, they use \( w^*_i = \phi w_i \), where \( w_i \) is the weight used here. The proportionality factor \( \phi \) cancels out when you calculate the weighted least-squares estimates using (B.8), so the estimator is exactly the same. I prefer to show \( \phi \) explicitly rather than include it in \( \boldsymbol{W} \).

Example: For normal data with identity link \( \eta_i=\mu_i \), so the derivative is \( d\eta_i/ d\mu_i=1 \) and the working dependent variable is \( y_i \) itself. Since in addition \( b''(\theta_i)=1 \) and \( p_i=1 \), the weights are constant and no iteration is required.\( \Box \)

In Sections B.4 and B.5 we derive the working dependent variable and the iterative weights required for binomial data with link logit and for Poisson data with link log. In both cases iteration will usually be necessary.

Starting values may be obtained by applying the link to the data, i.e. we take \( \hat{\mu_i}=y_i \) and \( \hat{\eta_i}= g(\hat{\mu_i}) \). Sometimes this requires a few adjustments, for example to avoid taking the log of zero, and we will discuss these at the appropriate time.

Math rendered by