Germán Rodríguez
Multilevel Models Princeton University

Random-Effects Logit Models

We will illustrate random intercept logit models using data from Lillard and Panis (2000) on 1060 births to 501 mothers. The outcome of interest is whether the birth was delivered in a hospital or elsewhere. The predictors include the log of income `loginc`, the distance to the nearest hospital `distance`, and two indicators of mothers’s education: `dropout` for less than high school and `college` for college graduates, so high school or some college is the reference cell.

First we read the data from the course website

```. infile hosp loginc distance dropout college mother ///
>     using https://grodri.github.io/datasets/hospital.dat, clear
```
```> hosp <- read.table("https://grodri.github.io/datasets/hospital.dat",
```

Fitting the Model

To fit a model with a woman-level random effect we can use `xtlogit`we use `glmer()` in the `lme4` package

```. xtlogit hosp loginc distance dropout college, i(mother) re

Fitting comparison model:

Iteration 0:   log likelihood = -644.95401
Iteration 1:   log likelihood = -541.44886
Iteration 2:   log likelihood =  -537.4711
Iteration 3:   log likelihood = -537.45771
Iteration 4:   log likelihood = -537.45771

Fitting full model:

tau =  0.0     log likelihood = -537.45771
tau =  0.1     log likelihood = -534.03315
tau =  0.2     log likelihood = -530.99872
tau =  0.3     log likelihood = -528.53741
tau =  0.4     log likelihood = -526.88308
tau =  0.5     log likelihood = -526.38822
tau =  0.6     log likelihood = -527.67382

Iteration 0:   log likelihood =  -526.3876
Iteration 1:   log likelihood = -522.68442
Iteration 2:   log likelihood = -522.65042
Iteration 3:   log likelihood = -522.65042

Random-effects logistic regression                   Number of obs    =  1,060
Group variable: mother                               Number of groups =    501

Random effects u_i ~ Gaussian                        Obs per group:
min =      1
avg =    2.1
max =     10

Integration method: mvaghermite                      Integration pts. =     12

Wald chi2(4)     = 110.06
Log likelihood = -522.65042                          Prob > chi2      = 0.0000

─────────────┬────────────────────────────────────────────────────────────────
hosp │ Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
─────────────┼────────────────────────────────────────────────────────────────
loginc │   .5622009   .0727497     7.73   0.000     .4196141    .7047876
distance │  -.0765915   .0323473    -2.37   0.018    -.1399911    -.013192
dropout │  -1.997753   .2556249    -7.82   0.000    -2.498769   -1.496737
college │    1.03363   .3884851     2.66   0.008     .2722135    1.795047
_cons │   -3.36984   .4794505    -7.03   0.000    -4.309546   -2.430134
─────────────┼────────────────────────────────────────────────────────────────
/lnsig2u │   .4372018   .3161192                     -.1823805    1.056784
─────────────┼────────────────────────────────────────────────────────────────
sigma_u │   1.244335   .1966791                       .912844    1.696203
rho │   .3200274   .0687907                      .2020988    .4665343
─────────────┴────────────────────────────────────────────────────────────────
LR test of rho=0: chibar2(01) = 29.61                  Prob >= chibar2 = 0.000

. estimates store xt
```

By default Stata uses Gauss-Hermite adaptive quadrature using the mean and variance with 12 integration points.

The same model can be fit using `melogit`, which by default uses only 7 integration points. We can change the number to 12 to get better correspondence.

```. melogit hosp loginc distance dropout college || mother:, intpoints(12)

Fitting fixed-effects model:

Iteration 0:   log likelihood = -539.11554
Iteration 1:   log likelihood = -537.46251
Iteration 2:   log likelihood = -537.45771
Iteration 3:   log likelihood = -537.45771

Refining starting values:

Grid node 0:   log likelihood =  -526.3876

Fitting full model:

Iteration 0:   log likelihood =  -526.3876
Iteration 1:   log likelihood = -522.74579
Iteration 2:   log likelihood = -522.65053
Iteration 3:   log likelihood = -522.65042
Iteration 4:   log likelihood = -522.65042

Mixed-effects logistic regression               Number of obs     =      1,060
Group variable: mother                          Number of groups  =        501

Obs per group:
min =          1
avg =        2.1
max =         10

Integration method: mvaghermite                 Integration pts.  =         12

Wald chi2(4)      =     110.06
Log likelihood = -522.65042                     Prob > chi2       =     0.0000
─────────────┬────────────────────────────────────────────────────────────────
hosp │ Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
─────────────┼────────────────────────────────────────────────────────────────
loginc │   .5622029   .0727497     7.73   0.000      .419616    .7047898
distance │  -.0765916   .0323474    -2.37   0.018    -.1399914   -.0131919
dropout │  -1.997762   .2556281    -7.82   0.000    -2.498784    -1.49674
college │   1.033635   .3884878     2.66   0.008     .2722127    1.795057
_cons │  -3.369853   .4794516    -7.03   0.000    -4.309561   -2.430145
─────────────┼────────────────────────────────────────────────────────────────
mother       │
var(_cons)│   1.548407   .4894986                      .8332867    2.877238
─────────────┴────────────────────────────────────────────────────────────────
LR test vs. logistic model: chibar2(01) = 29.61       Prob >= chibar2 = 0.0000

. estimates store me

. estimates table xt me

─────────────┬──────────────────────────
Variable │     xt           me
─────────────┼──────────────────────────
hosp         │
distance │ -.07659154   -.07659161
dropout │ -1.9977529   -1.9977619
college │  1.0336304    1.0336349
_cons │   -3.36984    -3.369853
─────────────┼──────────────────────────
/lnsig2u │   .4372018
var( │
_cons[mot~r])│               1.5484071
─────────────┴──────────────────────────
```

Note that `xtlogit` reports the logged variance (and the standard deviation) whereas `melogit` reports the variance, but the results are equivalent. The other estimates are all very close.

```> library(lme4)
> pql <- glmer(hosp ~ loginc + distance + dropout + college +
+     (1 | mother), data = hosp, family=binomial)
> summary(pql, corr = FALSE)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial  ( logit )
Formula: hosp ~ loginc + distance + dropout + college + (1 | mother)
Data: hosp

AIC      BIC   logLik deviance df.resid
1061.2   1091.0   -524.6   1049.2     1054

Scaled residuals:
Min      1Q  Median      3Q     Max
-2.6758 -0.4606 -0.2779  0.5255  4.4536

Random effects:
Groups Name        Variance Std.Dev.
mother (Intercept) 1.251    1.118
Number of obs: 1060, groups:  mother, 501

Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.29448    0.46666  -7.060 1.67e-12 ***
loginc       0.55040    0.07094   7.759 8.59e-15 ***
distance    -0.07742    0.03169  -2.443  0.01456 *
dropout     -1.94727    0.24439  -7.968 1.61e-15 ***
college      1.02322    0.37259   2.746  0.00603 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

By default R uses the PQL approximation, which is not always accurate. Let us specify 12-point adaptive Gaussian quadrature instead, and then compare the estimates:

```> agq <- glmer(hosp ~ loginc + distance + dropout + college +
+   (1 | mother), data = hosp, family=binomial, nAGQ = 12)
> summary(agq, corr = FALSE)
Generalized linear mixed model fit by maximum likelihood (Adaptive
Gauss-Hermite Quadrature, nAGQ = 12) [glmerMod]
Family: binomial  ( logit )
Formula: hosp ~ loginc + distance + dropout + college + (1 | mother)
Data: hosp

AIC      BIC   logLik deviance df.resid
1057.3   1087.1   -522.7   1045.3     1054

Scaled residuals:
Min      1Q  Median      3Q     Max
-2.8085 -0.4497 -0.2659  0.4938  4.5728

Random effects:
Groups Name        Variance Std.Dev.
mother (Intercept) 1.548    1.244
Number of obs: 1060, groups:  mother, 501

Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.36986    0.47945  -7.029 2.09e-12 ***
loginc       0.56220    0.07275   7.728 1.09e-14 ***
distance    -0.07659    0.03235  -2.368   0.0179 *
dropout     -1.99775    0.25563  -7.815 5.49e-15 ***
college      1.03365    0.38849   2.661   0.0078 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> cbind(fixef(pql), fixef(agq))
[,1]       [,2]
(Intercept) -3.29448256 -3.3698604
distance    -0.07741503 -0.0765922
dropout     -1.94726835 -1.9977473
college      1.02321857  1.0336453
> c(unlist(VarCorr(pql)), unlist(VarCorr(agq)))
mother   mother
1.250605 1.548383
```

We see that the fixed effects are similar, but PQL underestimates the variance at the mother level. By the way a simple quadrature check is to increase the number of quadrature points and verify that all estimates remain essentially unchanged.

Interpreting the Estimates

The estimated fixed effects may be exponentiated and interpreted as odds ratios in the usual way. For example the coefficient of college, which exponentiated is 2.81, means that the odds of hospital delivery for a college graduate are 2.81 times those of a high school graduate with the same observed and unobserved characteristics, including income, distance to the hospital, and the mother-specific residual.

The standard deviation of the random effect may be interpreted exactly the same way, by treating it as the coefficient of a standard normal random effect. In this case the standard deviation of 1.24 exponentiated becomes 3.47, so the odds of hospital delivery for a woman whose unobserved characteristics put her one standard deviation above the mean, are three-and-a-half times those of an average woman with the same observed characteristics, including income, education, and distance to the hospital.

Intra-class correlation

Stata’s `xtlogit` reports the intra-class correlation “rho” in the latent scale as 0.32. We can verify this result We can compute the intra-class correlation in the latent scale using the estimated variance at the mother level and recalling that the level-1 variance is p2/3:

```. estimates restore xt
(results xt are active now)

. scalar v2 = exp(_b[/lnsig2u])

. di v2/(v2 + _pi^2/3)
.32002744
```
```> v <- unlist(VarCorr(agq))
> v / (v + pi^2/3)
mother
0.3200294
```

The intraclass correlation of 0.32 reflects the correlation between the latent propensity to deliver in a hospital for two births of the same mother. It also means that about one third of the variance in the latent propensity to deliver in a hospital, beyond that explained by income, education and distance to the hospital, can be attributed to other characteristics of the mothers.

We can also calculate the manifest correlation at the median linear predictor, which requires “integrating out” the random effect. The `xtrho` command can calculate this for us at the median or other quantiles. This command is only available after `xtlogit`, `xtprobit` or `xtcloglog`. Fortunately our current estimate is from `xtlogit`. We can do this by first calculating the median linear predictor and then computing the average probability of one and two hospital deliveries using the `gauher()` function to integrate out the random effect. The function can be sourced directly from this website as shown below.

```. xtrho

Measures of intra-class manifest association in random-effects logit
Evaluated at median linear predictor

Measure          │    Estimate     [95% Conf.Interval]
─────────────────┼────────────────────────────────────
Marginal prob.   │     .239252     .217793     .268467
Joint prob.      │     .093807     .067841     .132652
Odds ratio       │     2.72847     1.90755     4.28414
Pearson's r      │     .200894     .119788     .308453
Yule's Q         │     .463586     .312136     .621509
```
```> X <- model.matrix(agq)
> xb <- X %*% fixef(agq)
> md <- median(xb)
> source("https://grodri.github.io/multilevel/gauher.R")
> ghi <- function(f, gq = gauher(12)) {
+     sum(f(gq\$z) * gq\$w)
+ }
> m1  <- ghi(function(z) plogis(md + sqrt(v)*z));   m1  # margin
[1] 0.2392533
> m11 <- ghi(function(z) plogis(md + sqrt(v)*z)^2); m11 # joint
[1] 0.09381011
> M <- matrix(c(m11, m1 - m11, m1 - m11, 1 - 2*m1 + m11), 2, 2)
> or <- M[1,1]*M[2,2]/(M[1,2]*M[2,1]) ; or # odds ratio
[1] 2.72868
> r  <- (m11 - m1^2)/(m1 * (1-m1))    ; r  # Pearson's r
[1] 0.2009107
> Q  <- (or - 1)/(or + 1)             ; Q  # Yule's Q
[1] 0.4636171
```

The probability of one birth delivered in a hospital is 23.9% and the probability of two is 9.38%. The code builds a 2x2 table from the joint and marginal distributions and then computes three correlation measures.

We see that the correlation between the actual place of delivery for two births of the same woman, if she has median characteristics, is equivalent to a Pearson’s r of 0.20, a Yule’s Q of 0.46, or perhaps in more familiar terms an odds ratio of 2.73. This means that the odds of hospital delivery for a woman who delivered a previous birth at a hospital are 2.73 times those of a comparable woman who delivered the previous birth elsewhere.

Population Average Effects

Earlier we noted that the effect of education implies that the odds of hospital delivery for a college graduate are 2.8 times those of a high school graduate with the same observed and unobserved characteristics. This is a subject specific effect, comparing the odds of hospital delivery for essentially the same woman under two different education scenarios. It is also a conditional effect given a fixed value of the random effect.

We can also compute a population average effect by averaging or “integrating out” the random effect. Essentially this entails computing the effect at different values of the random effect and averaging, and it can be computed by numerical integration or by simulation. Let us find the effect of education at the mean distance of 3.74 and the mean log income of 5.88 using Gauss-Hermite integration with the built-in function `_gauss_hermite_nodes()`.

```. egen first = tag(mother)

. quietly sum dist if first

. scalar mdist = r(mean)

. quietly sum loginc if first

3.7377246 5.882525

. scalar xb = _b[_cons] + _b[loginc] * mloginc + _b[dist] * mdist

. scalar sigma = exp(_b[/lnsig2u]/2)

. scalar bcollege = _b[college]

. mata:
───────────────────────────────────────────────── mata (type end to exit) ──────
:   xb = st_numscalar("xb")

:   sigma = st_numscalar("sigma")

:   b = st_numscalar("bcollege")

:   gh = _gauss_hermite_nodes(12)'   // transpose

:   gh[,1] = gh[, 1] :* sqrt(2)      // change of variables

:   gh[,2] = gh[, 2] :/ sqrt(pi())   // to standard normal

:   p1 = sum( invlogit(xb :+ b :+ gh[,1] :* sigma) :* gh[,2])

:   p0 = sum( invlogit(xb      :+ gh[,1] :* sigma) :* gh[,2])

:   (p1/(1-p1))/(p0/(1-p0))
2.21217867

: end
────────────────────────────────────────────────────────────────────────────────
```
```> library(dplyr)
1 5.987951 3.918396
> b <- fixef(agq)
> p1 <- ghi(function(z) plogis(xb + b["college"] + sqrt(v)*z))
> p0 <- ghi(function(z) plogis(xb +                sqrt(v)*z))
> (p1/(1 - p1)) / (p0/(1-p0))
[1] 2.212816
```

We get an odds ratio of 2.21. This is smaller in magnitude than the subject-specific odds ratio of 2.81.

Other Approaches to Estimation

You will find on this website analyses of the same data using three Bayesian methods:

• Gibbs sampling with winBUGS, and Jags

• Hamiltonian MCMC using Stan, and

• Metropolis-Hastings + Gibbs using Stata’s bayesmh.

The last entry includes a comparison of estimates with all four methods.

Last updated 12 April 2018

