Random-Effects Logit Models

We will illustrate random intercept logit models using data from Lillard and Panis (2000) on 1060 births to 501 mothers. The outcome of interest is whether the birth was delivered in a hospital or elsewhere. The predictors include the log of income loginc, the distance to the nearest hospital distance, and two indicators of mothers’s education: dropout for less than high school and college for college graduates, so high school or some college is the reference cell.

Stata
R

First we read the data from the course website

. infile hosp loginc distance dropout college mother ///
>     using https://grodri.github.io/datasets/hospital.dat, clear
(1,060 observations read)

> hosp <- read.table("https://grodri.github.io/datasets/hospital.dat", 
+   header = FALSE)
> names(hosp) <- c("hosp","loginc","distance","dropout","college","mother")

Fitting the Model

To fit a model with a woman-level random effect we can use xtlogitwe use glmer() in the lme4 package

. xtlogit hosp loginc distance dropout college, i(mother) re

Fitting comparison model:

Iteration 0:   log likelihood = -644.95401  
Iteration 1:   log likelihood = -541.44886  
Iteration 2:   log likelihood =  -537.4711  
Iteration 3:   log likelihood = -537.45771  
Iteration 4:   log likelihood = -537.45771  

Fitting full model:

tau =  0.0     log likelihood = -537.45771
tau =  0.1     log likelihood = -534.03315
tau =  0.2     log likelihood = -530.99872
tau =  0.3     log likelihood = -528.53741
tau =  0.4     log likelihood = -526.88308
tau =  0.5     log likelihood = -526.38822
tau =  0.6     log likelihood = -527.67382

Iteration 0:   log likelihood =  -526.3876  
Iteration 1:   log likelihood = -522.68442  
Iteration 2:   log likelihood = -522.65042  
Iteration 3:   log likelihood = -522.65042  

Random-effects logistic regression                   Number of obs    =  1,060
Group variable: mother                               Number of groups =    501

Random effects u_i ~ Gaussian                        Obs per group:
                                                                  min =      1
                                                                  avg =    2.1
                                                                  max =     10

Integration method: mvaghermite                      Integration pts. =     12

                                                     Wald chi2(4)     = 110.06
Log likelihood = -522.65042                          Prob > chi2      = 0.0000

─────────────┬────────────────────────────────────────────────────────────────
        hosp │ Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
─────────────┼────────────────────────────────────────────────────────────────
      loginc │   .5622009   .0727497     7.73   0.000     .4196141    .7047876
    distance │  -.0765915   .0323473    -2.37   0.018    -.1399911    -.013192
     dropout │  -1.997753   .2556249    -7.82   0.000    -2.498769   -1.496737
     college │    1.03363   .3884851     2.66   0.008     .2722135    1.795047
       _cons │   -3.36984   .4794505    -7.03   0.000    -4.309546   -2.430134
─────────────┼────────────────────────────────────────────────────────────────
    /lnsig2u │   .4372018   .3161192                     -.1823805    1.056784
─────────────┼────────────────────────────────────────────────────────────────
     sigma_u │   1.244335   .1966791                       .912844    1.696203
         rho │   .3200274   .0687907                      .2020988    .4665343
─────────────┴────────────────────────────────────────────────────────────────
LR test of rho=0: chibar2(01) = 29.61                  Prob >= chibar2 = 0.000

. estimates store xt

By default Stata uses Gauss-Hermite adaptive quadrature using the mean and variance with 12 integration points.

The same model can be fit using melogit, which by default uses only 7 integration points. We can change the number to 12 to get better correspondence.

. melogit hosp loginc distance dropout college || mother:, intpoints(12)

Fitting fixed-effects model:

Iteration 0:   log likelihood = -539.11554  
Iteration 1:   log likelihood = -537.46251  
Iteration 2:   log likelihood = -537.45771  
Iteration 3:   log likelihood = -537.45771  

Refining starting values:

Grid node 0:   log likelihood =  -526.3876

Fitting full model:

Iteration 0:   log likelihood =  -526.3876  
Iteration 1:   log likelihood = -522.74579  
Iteration 2:   log likelihood = -522.65053  
Iteration 3:   log likelihood = -522.65042  
Iteration 4:   log likelihood = -522.65042  

Mixed-effects logistic regression               Number of obs     =      1,060
Group variable: mother                          Number of groups  =        501

                                                Obs per group:
                                                              min =          1
                                                              avg =        2.1
                                                              max =         10

Integration method: mvaghermite                 Integration pts.  =         12

                                                Wald chi2(4)      =     110.06
Log likelihood = -522.65042                     Prob > chi2       =     0.0000
─────────────┬────────────────────────────────────────────────────────────────
        hosp │ Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
─────────────┼────────────────────────────────────────────────────────────────
      loginc │   .5622029   .0727497     7.73   0.000      .419616    .7047898
    distance │  -.0765916   .0323474    -2.37   0.018    -.1399914   -.0131919
     dropout │  -1.997762   .2556281    -7.82   0.000    -2.498784    -1.49674
     college │   1.033635   .3884878     2.66   0.008     .2722127    1.795057
       _cons │  -3.369853   .4794516    -7.03   0.000    -4.309561   -2.430145
─────────────┼────────────────────────────────────────────────────────────────
mother       │
   var(_cons)│   1.548407   .4894986                      .8332867    2.877238
─────────────┴────────────────────────────────────────────────────────────────
LR test vs. logistic model: chibar2(01) = 29.61       Prob >= chibar2 = 0.0000

. estimates store me

. estimates table xt me

─────────────┬──────────────────────────
    Variable │     xt           me      
─────────────┼──────────────────────────
hosp         │
      loginc │  .56220088    .56220292  
    distance │ -.07659154   -.07659161  
     dropout │ -1.9977529   -1.9977619  
     college │  1.0336304    1.0336349  
       _cons │   -3.36984    -3.369853  
─────────────┼──────────────────────────
    /lnsig2u │   .4372018               
        var( │
_cons[mot~r])│               1.5484071  
─────────────┴──────────────────────────

Note that xtlogit reports the logged variance (and the standard deviation) whereas melogit reports the variance, but the results are equivalent. The other estimates are all very close.

> library(lme4)
> pql <- glmer(hosp ~ loginc + distance + dropout + college + 
+     (1 | mother), data = hosp, family=binomial)
> summary(pql, corr = FALSE)
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: hosp ~ loginc + distance + dropout + college + (1 | mother)
   Data: hosp

     AIC      BIC   logLik deviance df.resid 
  1061.2   1091.0   -524.6   1049.2     1054 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.6758 -0.4606 -0.2779  0.5255  4.4536 

Random effects:
 Groups Name        Variance Std.Dev.
 mother (Intercept) 1.251    1.118   
Number of obs: 1060, groups:  mother, 501

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -3.29448    0.46666  -7.060 1.67e-12 ***
loginc       0.55040    0.07094   7.759 8.59e-15 ***
distance    -0.07742    0.03169  -2.443  0.01456 *  
dropout     -1.94727    0.24439  -7.968 1.61e-15 ***
college      1.02322    0.37259   2.746  0.00603 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

By default R uses the PQL approximation, which is not always accurate. Let us specify 12-point adaptive Gaussian quadrature instead, and then compare the estimates:

> agq <- glmer(hosp ~ loginc + distance + dropout + college + 
+   (1 | mother), data = hosp, family=binomial, nAGQ = 12)
> summary(agq, corr = FALSE)    
Generalized linear mixed model fit by maximum likelihood (Adaptive
  Gauss-Hermite Quadrature, nAGQ = 12) [glmerMod]
 Family: binomial  ( logit )
Formula: hosp ~ loginc + distance + dropout + college + (1 | mother)
   Data: hosp

     AIC      BIC   logLik deviance df.resid 
  1057.3   1087.1   -522.7   1045.3     1054 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.8085 -0.4497 -0.2659  0.4938  4.5728 

Random effects:
 Groups Name        Variance Std.Dev.
 mother (Intercept) 1.548    1.244   
Number of obs: 1060, groups:  mother, 501

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -3.36986    0.47945  -7.029 2.09e-12 ***
loginc       0.56220    0.07275   7.728 1.09e-14 ***
distance    -0.07659    0.03235  -2.368   0.0179 *  
dropout     -1.99775    0.25563  -7.815 5.49e-15 ***
college      1.03365    0.38849   2.661   0.0078 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> cbind(fixef(pql), fixef(agq))
                   [,1]       [,2]
(Intercept) -3.29448256 -3.3698604
loginc       0.55039538  0.5622039
distance    -0.07741503 -0.0765922
dropout     -1.94726835 -1.9977473
college      1.02321857  1.0336453
> c(unlist(VarCorr(pql)), unlist(VarCorr(agq)))
  mother   mother 
1.250605 1.548383

We see that the fixed effects are similar, but PQL underestimates the variance at the mother level. By the way a simple quadrature check is to increase the number of quadrature points and verify that all estimates remain essentially unchanged.

Interpreting the Estimates

The estimated fixed effects may be exponentiated and interpreted as odds ratios in the usual way. For example the coefficient of college, which exponentiated is 2.81, means that the odds of hospital delivery for a college graduate are 2.81 times those of a high school graduate with the same observed and unobserved characteristics, including income, distance to the hospital, and the mother-specific residual.

The standard deviation of the random effect may be interpreted exactly the same way, by treating it as the coefficient of a standard normal random effect. In this case the standard deviation of 1.24 exponentiated becomes 3.47, so the odds of hospital delivery for a woman whose unobserved characteristics put her one standard deviation above the mean, are three-and-a-half times those of an average woman with the same observed characteristics, including income, education, and distance to the hospital.

Intra-class correlation

Stata’s xtlogit reports the intra-class correlation “rho” in the latent scale as 0.32. We can verify this result We can compute the intra-class correlation in the latent scale using the estimated variance at the mother level and recalling that the level-1 variance is p²/3:

. estimates restore xt    
(results xt are active now)

. scalar v2 = exp(_b[/lnsig2u])

. di v2/(v2 + _pi^2/3)
.32002744

> v <- unlist(VarCorr(agq))
> v / (v + pi^2/3)
   mother 
0.3200294

The intraclass correlation of 0.32 reflects the correlation between the latent propensity to deliver in a hospital for two births of the same mother. It also means that about one third of the variance in the latent propensity to deliver in a hospital, beyond that explained by income, education and distance to the hospital, can be attributed to other characteristics of the mothers.

We can also calculate the manifest correlation at the median linear predictor, which requires “integrating out” the random effect. The xtrho command can calculate this for us at the median or other quantiles. This command is only available after xtlogit, xtprobit or xtcloglog. Fortunately our current estimate is from xtlogit. We can do this by first calculating the median linear predictor and then computing the average probability of one and two hospital deliveries using the gauher() function to integrate out the random effect. The function can be sourced directly from this website as shown below.

. xtrho

Measures of intra-class manifest association in random-effects logit
Evaluated at median linear predictor

Measure          │    Estimate     [95% Conf.Interval]
─────────────────┼────────────────────────────────────
Marginal prob.   │     .239252     .217793     .268467
Joint prob.      │     .093807     .067841     .132652
Odds ratio       │     2.72847     1.90755     4.28414
Pearson's r      │     .200894     .119788     .308453
Yule's Q         │     .463586     .312136     .621509

> X <- model.matrix(agq)
> xb <- X %*% fixef(agq)
> md <- median(xb)
> source("https://grodri.github.io/multilevel/gauher.R")
> ghi <- function(f, gq = gauher(12)) {
+     sum(f(gq$z) * gq$w)
+ }    
> m1  <- ghi(function(z) plogis(md + sqrt(v)*z));   m1  # margin
[1] 0.2392533
> m11 <- ghi(function(z) plogis(md + sqrt(v)*z)^2); m11 # joint
[1] 0.09381011
> M <- matrix(c(m11, m1 - m11, m1 - m11, 1 - 2*m1 + m11), 2, 2)
> or <- M[1,1]*M[2,2]/(M[1,2]*M[2,1]) ; or # odds ratio
[1] 2.72868
> r  <- (m11 - m1^2)/(m1 * (1-m1))    ; r  # Pearson's r
[1] 0.2009107
> Q  <- (or - 1)/(or + 1)             ; Q  # Yule's Q
[1] 0.4636171

The probability of one birth delivered in a hospital is 23.9% and the probability of two is 9.38%. The code builds a 2x2 table from the joint and marginal distributions and then computes three correlation measures.

We see that the correlation between the actual place of delivery for two births of the same woman, if she has median characteristics, is equivalent to a Pearson’s r of 0.20, a Yule’s Q of 0.46, or perhaps in more familiar terms an odds ratio of 2.73. This means that the odds of hospital delivery for a woman who delivered a previous birth at a hospital are 2.73 times those of a comparable woman who delivered the previous birth elsewhere.

Population Average Effects

Earlier we noted that the effect of education implies that the odds of hospital delivery for a college graduate are 2.8 times those of a high school graduate with the same observed and unobserved characteristics. This is a subject specific effect, comparing the odds of hospital delivery for essentially the same woman under two different education scenarios. It is also a conditional effect given a fixed value of the random effect.

We can also compute a population average effect by averaging or “integrating out” the random effect. Essentially this entails computing the effect at different values of the random effect and averaging, and it can be computed by numerical integration or by simulation. Let us find the effect of education at the mean distance of 3.74 and the mean log income of 5.88 using Gauss-Hermite integration with the built-in function _gauss_hermite_nodes().

. egen first = tag(mother)

. quietly sum dist if first

. scalar mdist = r(mean)

. quietly sum loginc if first

. scalar mloginc = r(mean)

. di mdist, mloginc
3.7377246 5.882525

. scalar xb = _b[_cons] + _b[loginc] * mloginc + _b[dist] * mdist    

. scalar sigma = exp(_b[/lnsig2u]/2)

. scalar bcollege = _b[college]

. mata:         
───────────────────────────────────────────────── mata (type end to exit) ──────
:   xb = st_numscalar("xb")

:   sigma = st_numscalar("sigma")

:   b = st_numscalar("bcollege")

:   gh = _gauss_hermite_nodes(12)'   // transpose

:   gh[,1] = gh[, 1] :* sqrt(2)      // change of variables

:   gh[,2] = gh[, 2] :/ sqrt(pi())   // to standard normal

:   p1 = sum( invlogit(xb :+ b :+ gh[,1] :* sigma) :* gh[,2])

:   p0 = sum( invlogit(xb      :+ gh[,1] :* sigma) :* gh[,2])

:   (p1/(1-p1))/(p0/(1-p0))
  2.21217867

: end
────────────────────────────────────────────────────────────────────────────────

> library(dplyr)
> x <- summarize(hosp, loginc = mean(loginc), distance= mean(distance)); x
    loginc distance
1 5.987951 3.918396
> b <- fixef(agq)
> xb <- as.numeric(b[1] + x["loginc"] * b["loginc"] + x["distance"]* b["distance"])
> p1 <- ghi(function(z) plogis(xb + b["college"] + sqrt(v)*z)) 
> p0 <- ghi(function(z) plogis(xb +                sqrt(v)*z)) 
> (p1/(1 - p1)) / (p0/(1-p0))
[1] 2.212816

We get an odds ratio of 2.21. This is smaller in magnitude than the subject-specific odds ratio of 2.81.

Other Approaches to Estimation

You will find on this website analyses of the same data using three Bayesian methods:

Gibbs sampling with winBUGS, and Jags
Hamiltonian MCMC using Stan, and
Metropolis-Hastings + Gibbs using Stata’s bayesmh.

The last entry includes a comparison of estimates with all four methods.

Last updated 12 April 2018