This section deals with regression models for discrete data with more than two response categories, where the assumption of a multinomial distribution is appropriate. We will consider multinomial logits for nominal data, and ordered logit models for ordinal data, with a brief mention of alternative-specific conditional logit models. We will also consider sequential logit models. (In line with the current syllabus we are skipping log-linear models for contingency tables, and thus their relationship with multinomial logit models.)
We start by reading the data on contraceptive choice by age for currently married women in El Salvador, 1985, found in Table 6.1 of the lecture notes. The data are in “long” format with one row for each combination of predictor and response, showing the age group, method choice, and number of cases. In R we reshape the data so each method choice is a column, a layout that works better with the functions we will use.
> library(haven)
> library(tidyr)
> cuselong <- read_dta("https://grodri.github.io/datasets/elsalvador1985.dta")
> cuse <- pivot_wider(cuselong, names_from=cuse, values_from=cases)
> names(cuse)[2:4] <- c("ster", "other", "none")
> cuse
# A tibble: 7 × 4
ageg ster other none
<dbl+lbl> <dbl> <dbl> <dbl>
1 1 [15-19] 3 61 232
2 2 [20-24] 80 137 400
3 3 [25-29] 216 131 301
4 4 [30-34] 268 76 203
5 5 [35-39] 197 50 188
6 6 [40-44] 150 24 164
7 7 [45-49] 91 10 183
With only one predictor, this example affords limited opportunities for interpreting coefficients, but will allow us to focus on the outcome and the comparisons underlying each type of model.
Updated fall 2022