Germán Rodríguez
Generalized Linear Models Princeton University

Problem Set 4: Doctor Visits
Due Friday, November 18, 2016

Cameron and Trivedi (2009) have some interesting data on the number of office-based doctor visits by adults aged 25-64 based on the 2002 Medical Expenditure Panel Survey. We will use data for the most recent wave, available at https://grodri.github.io/datasets/docvis.dta.

[1] A Poisson Model

(a) Fit a Poisson regression model with the number of doctor visits (docvis), as the outcome. We will use the same predictors as Cameron and Trivedi, namely health insurance status (private), health status (chronic), gender (female) and income (income), but will add two indicators of ethnicity (black and hispanic). There are many more variables one could add, but we'll keep things simple.

(b) Interpret the coefficient of black and test its significance using a Wald test and a likelihood ratio test.

(c) Compute a 95% confidence interval for the effect of private insurance and interpret this result in terms of doctor visits.

(d) Compute the deviance and Pearson chi-squared statistics for this model. Does the model fit the data? Is there evidence of overdispersion?

(e) Predict the proportion expected to have exactly zero doctor visits and compare with the observed proportion. You will find the formula for Poisson probabilities in the notes. The probability of zero is simply e − μ.

[2] Poisson Overdispersion

(a) Suppose the variance is proportional to the mean rather than equal to the mean. Estimate the proportionality parameter using Pearson's chi-squared and use this estimate to correct the standard errors.

(b) What happens to the significance of the black coefficient once we allow for extra-Poisson variation? Could we test this coefficient using a likelihood ratio test? Explain.

(c) Compare the standard errors adjusted for over-dispersion with the robust or "sandwich" estimator of the standard errors.

[3] A Negative Binomial Model

(a) Fit a negative binomial regression model using the same outcome and predictors as in part 1.a. Comment on any remarkable changes in the coefficients.

(b) Interpret the coefficient of black and test its significance using a Wald test and a likelihood ratio test. Compare your results with parts 1.b and 2.b

(c) Predict the percent of respondents with zero doctor visits according to this model and compare with part 1.e. You will find a formula for negative binomial probabilities in the addendum to the notes. The probability of zero is given by [β / (μ + β)]α where α = β = 1 / σ2.

(d) Interpret the estimate of σ2 in this model and test its significance, noting carefully the distribution of the criterion.

(e) Use predicted values from this model to divide the sample into twenty groups of about equal size, compute the mean and variance of docvis in each group, and plot these values. Superimpose curves representing the over-dispersed Poisson and negative binomial variance functions and comment.

[4] A Zero-Inflated Poisson Model

(a) Try a zero-inflated Poisson model with the same predictors of part 1a in both the Poisson and inflate equations.

(b) Predict the proportion of respondents with zero doctor visits according to this model and compare with 1.e and 3.c. (Don't forget that there are two ways of having an outcome of zero in this model.)

(c) Interpret the coefficients of black in the two equations. Is the effect related to whether blacks visit the doctor at all? To how often they visit?

[5] Model Selection

Considering the results obtained so far and bearing in mind parsimony and goodness of fit, which of the models used here provides the best description of the data? Make sure you provide a clear justification of your choice.

Posted November 9, 2016