Statistics and Population

Bayes

Home 2 Levels 3 Levels Logit Poisson Bayes MLwiN

Random-Effects Logits using Stan

We now illustrate the use of the latest Hamiltonian Monte Carlo (HMC) algorithm, the No-U Turn Sampler (NUTS) of Hoffman and Gelman (2011), as implemented in the package Stan, see mc-stan.org for detailed information. Stan is written in C++ and can be run from the command line, R, Stata or Python. Here we illustrate running it on the hospital data using the R-interface rstan.

A quick reminder of the data and model. We have information on hospital delivery (yes or no) for 1060 pregnancies of 501 women. Predictors of interest are the log of income, distance to the nearest hospital, and education, represented by indicators of high school dropouts and college graduates. This is a simple random-intercept logit model that can easily be fitted by maximum likelihood using Stata’s xtlogit or melogit, as well as R’s glmer() in the lme4 package. We find comparison with Bayesian estimates of interest.

Stan uses a language similar to Bugs, but is quite different under the hood. The model specification is translated into the C++ language, and the program is then compiled and run on your data. Sampling is very fast once the model has been compiled. The package is very well documented and installation proceeded uneventfully on my home and work machines. Once you have the necessary tools installed the process is fairly transparent.

Here’s the Stan program for the hospital data, a long string assigned to an R object named hosp_code

hosp_code <- '
data {
    int N; // number of obs (pregnancies)
    int M; // number of groups (women)
    int K; // number of predictors
    
    int y[N]; // outcome
    row_vector[K] x[N]; // predictors
    int g[N];    // map obs to groups (pregnancies to women)
}
parameters {
    real alpha;
    real a[M]; 
    vector[K] beta;
    real<lower=0,upper=10> sigma;  
}
model {
  alpha ~ normal(0,100);
  a ~ normal(0,sigma);
  beta ~ normal(0,100);
  for(n in 1:N) {
    y[n] ~ bernoulli(inv_logit( alpha + a[g[n]] + x[n]*beta));
  }
}
'

We start by specifying the data, which consist of an n-vector of outcomes (y), an n by k matrix of predictors (x), and an n-vector mapping pregnancies to women (g). The parameters are the constant (alpha), the woman-specific random effects (a), the coefficients of the k predictors (beta) and the standard deviation of the random effects (sigma) which is in (0,10).

The model provides non-informative normal priors for the fixed effects alpha and beta. Stan specifies normal distributions using the standard deviations, not the variance, nor the precision used by BUGS. The hyperprior for sigma is uniform(0,10) as is determined by the limits given. Finally the outcomes are Bernoulli with probability given by the inverse logit of the linear predictor.

To run the model in R we use the rstan package. First we create an object with the data using the same names as in the Stan code. The data themselves are in an R object called hosp.

> hosp_data <- list(N=nrow(hosp),M=501,K=4,y=hosp[,1],x=hosp[,2:5],g=hosp[,6])

We are now ready to call stan() to run the model. Here I specify 2 chains of 2000 observations each.

hfit <- stan(model_code=hosp_code, model_name="hospitals", data=hosp_data, iter=2000, chains=2)

TRANSLATING MODEL 'hospitals' FROM Stan CODE TO C++ CODE NOW.
COMPILING THE C++ CODE FOR MODEL 'hospitals' NOW.
...
SAMPLING FOR MODEL 'hospitals' NOW (CHAIN 1).
Iteration: 2000 / 2000 [100%]  (Sampling)
Elapsed Time: 58.065 seconds (Warm-up)
              24.373 seconds (Sampling)
              82.438 seconds (Total)

SAMPLING FOR MODEL 'hospitals' NOW (CHAIN 2).
Iteration: 2000 / 2000 [100%]  (Sampling)
Elapsed Time: 58.074 seconds (Warm-up)
              23.186 seconds (Sampling)
              81.26 seconds (Total)

Stan prints some useful information, including folder names I supressed, and provides feedback every 200 samples. As you can see, it took less than three minutes to run the two chains.

The code below produces a summary of results. The default print() method in rstan prints all the woman-specific random effects, so I used the pars parameter to select the coefficients of primary interest. The default also prints the 2.5, 25, 50, 75, and 97.5 percentiles but I trimmed the list to obtain the median and a 95% credible interval. Finally the default prints just one decimal place and I prefer a bit more. Results are very similar to the Gibbs sampling estimates in hospBUGS

> print(hfit, pars=c("alpha","beta[1]","beta[2]","beta[3]","beta[4]","sigma"), 
+       probs = c(0.025, 0.50, 0.975), digits_summary=3)
Inference for Stan model: hospitals.
2 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=2000.

          mean se_mean    sd   2.5%    50%  97.5% n_eff  Rhat
alpha   -3.488   0.022 0.503 -4.495 -3.473 -2.575   508 1.001
beta[1]  0.581   0.003 0.074  0.446  0.579  0.736   544 1.000
beta[2] -0.077   0.001 0.033 -0.141 -0.077 -0.011  2000 0.999
beta[3] -2.088   0.016 0.279 -2.668 -2.078 -1.576   291 1.001
beta[4]  1.094   0.013 0.424  0.267  1.080  1.942  1139 1.000
sigma    1.385   0.023 0.199  1.039  1.364  1.813    76 1.012

Samples were drawn using NUTS(diag_e) at Mon Mar 24 21:12:06 2014.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Finally, here is the traditional trace plot for the same parameters. I stack them in one column with six rows, and exclude the warmup. Note that sigma exhibits slow mixing, also evident in the low effective sample size of 76.

> traceplot(hfit,c("alpha","beta[1]","beta[2]","beta[3]","beta[4]","sigma"),
    ncol=1,nrow=6,inc_warmup=F)

I realize the output would be better labelled if I had called the coefficients bcons, bloginc, bdistance, bdropout and bcollege, as I did with winBUGS, rather than alpha and beta, but I thought there was an advantage to keeping the code generic. In particular, the exact same Stan model (and compiled program) can be run on any two-level random-intercept logit model by changing just the data.