We will illustrate random intercept logit models using data from
Lillard and Panis (2000) on 1060 births to 501 mothers. The outcome of
interest is whether the birth was delivered in a hospital or elsewhere.
The predictors include the log of income loginc
, the
distance to the nearest hospital distance
, and two
indicators of mothers’s education: dropout
for less than
high school and college
for college graduates, so high
school or some college is the reference cell.
First we read the data from the course website
. infile hosp loginc distance dropout college mother /// > using https://grodri.github.io/datasets/hospital.dat, clear (1,060 observations read)
> hosp <- read.table("https://grodri.github.io/datasets/hospital.dat", + header = FALSE) > names(hosp) <- c("hosp","loginc","distance","dropout","college","mother")
To fit a model with a woman-level random effect we can use xtlogit
we
use glmer()
in the lme4
package
. xtlogit hosp loginc distance dropout college, i(mother) re Fitting comparison model: Iteration 0: log likelihood = -644.95401 Iteration 1: log likelihood = -541.44886 Iteration 2: log likelihood = -537.4711 Iteration 3: log likelihood = -537.45771 Iteration 4: log likelihood = -537.45771 Fitting full model: tau = 0.0 log likelihood = -537.45771 tau = 0.1 log likelihood = -534.03315 tau = 0.2 log likelihood = -530.99872 tau = 0.3 log likelihood = -528.53741 tau = 0.4 log likelihood = -526.88308 tau = 0.5 log likelihood = -526.38822 tau = 0.6 log likelihood = -527.67382 Iteration 0: log likelihood = -526.3876 Iteration 1: log likelihood = -522.68442 Iteration 2: log likelihood = -522.65042 Iteration 3: log likelihood = -522.65042 Random-effects logistic regression Number of obs = 1,060 Group variable: mother Number of groups = 501 Random effects u_i ~ Gaussian Obs per group: min = 1 avg = 2.1 max = 10 Integration method: mvaghermite Integration pts. = 12 Wald chi2(4) = 110.06 Log likelihood = -522.65042 Prob > chi2 = 0.0000 ─────────────┬──────────────────────────────────────────────────────────────── hosp │ Coefficient Std. err. z P>|z| [95% conf. interval] ─────────────┼──────────────────────────────────────────────────────────────── loginc │ .5622009 .0727497 7.73 0.000 .4196141 .7047876 distance │ -.0765915 .0323473 -2.37 0.018 -.1399911 -.013192 dropout │ -1.997753 .2556249 -7.82 0.000 -2.498769 -1.496737 college │ 1.03363 .3884851 2.66 0.008 .2722135 1.795047 _cons │ -3.36984 .4794505 -7.03 0.000 -4.309546 -2.430134 ─────────────┼──────────────────────────────────────────────────────────────── /lnsig2u │ .4372018 .3161192 -.1823805 1.056784 ─────────────┼──────────────────────────────────────────────────────────────── sigma_u │ 1.244335 .1966791 .912844 1.696203 rho │ .3200274 .0687907 .2020988 .4665343 ─────────────┴──────────────────────────────────────────────────────────────── LR test of rho=0: chibar2(01) = 29.61 Prob >= chibar2 = 0.000 . estimates store xt
By default Stata uses Gauss-Hermite adaptive quadrature using the mean and variance with 12 integration points.
The same model can be fit using
melogit
, which by default uses only 7 integration points.
We can change the number to 12 to get better correspondence.
. melogit hosp loginc distance dropout college || mother:, intpoints(12) Fitting fixed-effects model: Iteration 0: log likelihood = -539.11554 Iteration 1: log likelihood = -537.46251 Iteration 2: log likelihood = -537.45771 Iteration 3: log likelihood = -537.45771 Refining starting values: Grid node 0: log likelihood = -526.3876 Fitting full model: Iteration 0: log likelihood = -526.3876 Iteration 1: log likelihood = -522.74579 Iteration 2: log likelihood = -522.65053 Iteration 3: log likelihood = -522.65042 Iteration 4: log likelihood = -522.65042 Mixed-effects logistic regression Number of obs = 1,060 Group variable: mother Number of groups = 501 Obs per group: min = 1 avg = 2.1 max = 10 Integration method: mvaghermite Integration pts. = 12 Wald chi2(4) = 110.06 Log likelihood = -522.65042 Prob > chi2 = 0.0000 ─────────────┬──────────────────────────────────────────────────────────────── hosp │ Coefficient Std. err. z P>|z| [95% conf. interval] ─────────────┼──────────────────────────────────────────────────────────────── loginc │ .5622029 .0727497 7.73 0.000 .419616 .7047898 distance │ -.0765916 .0323474 -2.37 0.018 -.1399914 -.0131919 dropout │ -1.997762 .2556281 -7.82 0.000 -2.498784 -1.49674 college │ 1.033635 .3884878 2.66 0.008 .2722127 1.795057 _cons │ -3.369853 .4794516 -7.03 0.000 -4.309561 -2.430145 ─────────────┼──────────────────────────────────────────────────────────────── mother │ var(_cons)│ 1.548407 .4894986 .8332867 2.877238 ─────────────┴──────────────────────────────────────────────────────────────── LR test vs. logistic model: chibar2(01) = 29.61 Prob >= chibar2 = 0.0000 . estimates store me . estimates table xt me ─────────────┬────────────────────────── Variable │ xt me ─────────────┼────────────────────────── hosp │ loginc │ .56220088 .56220292 distance │ -.07659154 -.07659161 dropout │ -1.9977529 -1.9977619 college │ 1.0336304 1.0336349 _cons │ -3.36984 -3.369853 ─────────────┼────────────────────────── /lnsig2u │ .4372018 var( │ _cons[mot~r])│ 1.5484071 ─────────────┴──────────────────────────
Note that xtlogit
reports the logged
variance (and the standard deviation) whereas melogit
reports the variance, but the results are equivalent. The other
estimates are all very close.
> library(lme4) > pql <- glmer(hosp ~ loginc + distance + dropout + college + + (1 | mother), data = hosp, family=binomial) > summary(pql, corr = FALSE) Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [glmerMod] Family: binomial ( logit ) Formula: hosp ~ loginc + distance + dropout + college + (1 | mother) Data: hosp AIC BIC logLik deviance df.resid 1061.2 1091.0 -524.6 1049.2 1054 Scaled residuals: Min 1Q Median 3Q Max -2.6758 -0.4606 -0.2779 0.5255 4.4536 Random effects: Groups Name Variance Std.Dev. mother (Intercept) 1.251 1.118 Number of obs: 1060, groups: mother, 501 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.29448 0.46666 -7.060 1.67e-12 *** loginc 0.55040 0.07094 7.759 8.59e-15 *** distance -0.07742 0.03169 -2.443 0.01456 * dropout -1.94727 0.24439 -7.968 1.61e-15 *** college 1.02322 0.37259 2.746 0.00603 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
By default R uses the PQL approximation, which is not always accurate. Let us specify 12-point adaptive Gaussian quadrature instead, and then compare the estimates:
> agq <- glmer(hosp ~ loginc + distance + dropout + college + + (1 | mother), data = hosp, family=binomial, nAGQ = 12) > summary(agq, corr = FALSE) Generalized linear mixed model fit by maximum likelihood (Adaptive Gauss-Hermite Quadrature, nAGQ = 12) [glmerMod] Family: binomial ( logit ) Formula: hosp ~ loginc + distance + dropout + college + (1 | mother) Data: hosp AIC BIC logLik deviance df.resid 1057.3 1087.1 -522.7 1045.3 1054 Scaled residuals: Min 1Q Median 3Q Max -2.8085 -0.4497 -0.2659 0.4938 4.5728 Random effects: Groups Name Variance Std.Dev. mother (Intercept) 1.548 1.244 Number of obs: 1060, groups: mother, 501 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.36986 0.47945 -7.029 2.09e-12 *** loginc 0.56220 0.07275 7.728 1.09e-14 *** distance -0.07659 0.03235 -2.368 0.0179 * dropout -1.99775 0.25563 -7.815 5.49e-15 *** college 1.03365 0.38849 2.661 0.0078 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > cbind(fixef(pql), fixef(agq)) [,1] [,2] (Intercept) -3.29448256 -3.3698604 loginc 0.55039538 0.5622039 distance -0.07741503 -0.0765922 dropout -1.94726835 -1.9977473 college 1.02321857 1.0336453 > c(unlist(VarCorr(pql)), unlist(VarCorr(agq))) mother mother 1.250605 1.548383
We see that the fixed effects are similar, but PQL underestimates the variance at the mother level. By the way a simple quadrature check is to increase the number of quadrature points and verify that all estimates remain essentially unchanged.
The estimated fixed effects may be exponentiated and interpreted as odds ratios in the usual way. For example the coefficient of college, which exponentiated is 2.81, means that the odds of hospital delivery for a college graduate are 2.81 times those of a high school graduate with the same observed and unobserved characteristics, including income, distance to the hospital, and the mother-specific residual.
The standard deviation of the random effect may be interpreted exactly the same way, by treating it as the coefficient of a standard normal random effect. In this case the standard deviation of 1.24 exponentiated becomes 3.47, so the odds of hospital delivery for a woman whose unobserved characteristics put her one standard deviation above the mean, are three-and-a-half times those of an average woman with the same observed characteristics, including income, education, and distance to the hospital.
Stata’s xtlogit
reports the
intra-class correlation “rho” in the latent scale as 0.32. We
can verify this result We can compute the
intra-class correlation in the latent scale using the
estimated variance at the mother level and recalling that the level-1
variance is p2/3:
. estimates restore xt (results xt are active now) . scalar v2 = exp(_b[/lnsig2u]) . di v2/(v2 + _pi^2/3) .32002744
> v <- unlist(VarCorr(agq)) > v / (v + pi^2/3) mother 0.3200294
The intraclass correlation of 0.32 reflects the correlation between the latent propensity to deliver in a hospital for two births of the same mother. It also means that about one third of the variance in the latent propensity to deliver in a hospital, beyond that explained by income, education and distance to the hospital, can be attributed to other characteristics of the mothers.
We can also calculate the manifest correlation at the median
linear predictor, which requires “integrating out” the random effect.
The xtrho
command can calculate this
for us at the median or other quantiles. This command is only available
after xtlogit
, xtprobit
or
xtcloglog
. Fortunately our current estimate is from
xtlogit
. We can do this by first
calculating the median linear predictor and then computing the average
probability of one and two hospital deliveries using the
gauher()
function to integrate out the random effect. The
function can be sourced directly from this website as shown
below.
. xtrho Measures of intra-class manifest association in random-effects logit Evaluated at median linear predictor Measure │ Estimate [95% Conf.Interval] ─────────────────┼──────────────────────────────────── Marginal prob. │ .239252 .217793 .268467 Joint prob. │ .093807 .067841 .132652 Odds ratio │ 2.72847 1.90755 4.28414 Pearson's r │ .200894 .119788 .308453 Yule's Q │ .463586 .312136 .621509
> X <- model.matrix(agq) > xb <- X %*% fixef(agq) > md <- median(xb) > source("https://grodri.github.io/multilevel/gauher.R") > ghi <- function(f, gq = gauher(12)) { + sum(f(gq$z) * gq$w) + } > m1 <- ghi(function(z) plogis(md + sqrt(v)*z)); m1 # margin [1] 0.2392533 > m11 <- ghi(function(z) plogis(md + sqrt(v)*z)^2); m11 # joint [1] 0.09381011 > M <- matrix(c(m11, m1 - m11, m1 - m11, 1 - 2*m1 + m11), 2, 2) > or <- M[1,1]*M[2,2]/(M[1,2]*M[2,1]) ; or # odds ratio [1] 2.72868 > r <- (m11 - m1^2)/(m1 * (1-m1)) ; r # Pearson's r [1] 0.2009107 > Q <- (or - 1)/(or + 1) ; Q # Yule's Q [1] 0.4636171
The probability of one birth delivered in a hospital is 23.9% and the probability of two is 9.38%. The code builds a 2x2 table from the joint and marginal distributions and then computes three correlation measures.
We see that the correlation between the actual place of delivery for two births of the same woman, if she has median characteristics, is equivalent to a Pearson’s r of 0.20, a Yule’s Q of 0.46, or perhaps in more familiar terms an odds ratio of 2.73. This means that the odds of hospital delivery for a woman who delivered a previous birth at a hospital are 2.73 times those of a comparable woman who delivered the previous birth elsewhere.
Earlier we noted that the effect of education implies that the odds of hospital delivery for a college graduate are 2.8 times those of a high school graduate with the same observed and unobserved characteristics. This is a subject specific effect, comparing the odds of hospital delivery for essentially the same woman under two different education scenarios. It is also a conditional effect given a fixed value of the random effect.
We can also compute a population average effect by averaging
or “integrating out” the random effect. Essentially this entails
computing the effect at different values of the random effect and
averaging, and it can be computed by numerical integration or by
simulation. Let us find the effect of education at the mean distance of
3.74 and the mean log income of 5.88 using Gauss-Hermite integration
with the built-in function
_gauss_hermite_nodes()
.
. egen first = tag(mother) . quietly sum dist if first . scalar mdist = r(mean) . quietly sum loginc if first . scalar mloginc = r(mean) . di mdist, mloginc 3.7377246 5.882525 . scalar xb = _b[_cons] + _b[loginc] * mloginc + _b[dist] * mdist . scalar sigma = exp(_b[/lnsig2u]/2) . scalar bcollege = _b[college] . mata: ───────────────────────────────────────────────── mata (type end to exit) ────── : xb = st_numscalar("xb") : sigma = st_numscalar("sigma") : b = st_numscalar("bcollege") : gh = _gauss_hermite_nodes(12)' // transpose : gh[,1] = gh[, 1] :* sqrt(2) // change of variables : gh[,2] = gh[, 2] :/ sqrt(pi()) // to standard normal : p1 = sum( invlogit(xb :+ b :+ gh[,1] :* sigma) :* gh[,2]) : p0 = sum( invlogit(xb :+ gh[,1] :* sigma) :* gh[,2]) : (p1/(1-p1))/(p0/(1-p0)) 2.21217867 : end ────────────────────────────────────────────────────────────────────────────────
> library(dplyr) > x <- summarize(hosp, loginc = mean(loginc), distance= mean(distance)); x loginc distance 1 5.987951 3.918396 > b <- fixef(agq) > xb <- as.numeric(b[1] + x["loginc"] * b["loginc"] + x["distance"]* b["distance"]) > p1 <- ghi(function(z) plogis(xb + b["college"] + sqrt(v)*z)) > p0 <- ghi(function(z) plogis(xb + sqrt(v)*z)) > (p1/(1 - p1)) / (p0/(1-p0)) [1] 2.212816
We get an odds ratio of 2.21. This is smaller in magnitude than the subject-specific odds ratio of 2.81.
You will find on this website analyses of the same data using three Bayesian methods:
The last entry includes a comparison of estimates with all four methods.
Last updated 12 April 2018