Our class project will analyze data on marriage dissolution in the U.S. based on a longitudinal survey. We will conduct the analysis in three parts, starting with a basic proportional hazards model, adding time-varying covariates, and then considering multiple-spell data.
The dataset comes from Lillard and Panis (2003),
aML Multilevel Multiprocess Statistical Software, Version 2.0.
EconWare, Los Angeles, California.
I have created a Stata system file available as
http://data.princeton.edu/pop509/divorce3.dta
which includes the following variables:
Name | Description |
---|---|
id | Unique respondent's id |
spans | See explanation below |
weight | Sampling weight (will ignore) |
marnum | Marriage number (1..6) |
censor | Censoring indicator (1=censored,0=divorced) |
lower | Lower bound for time of event |
upper | Upper bound for time of event |
hiseduc | Husband's education (in years of schooling) |
hereduc | Wife's education (in years of schooling) |
heblack | Indicator for whether the husband is African American |
sheblack | Indicator for whether the wife is African American |
age | Age of husband (at marriage) |
agediff | Age difference between husband and wife |
These vars are repeated for n=1,2,...,17: | |
timen | Time at which this span ends |
numkidn | Number of children during this span |
An unusual feature of this dataset is that it has lower and upper bounds rather than a single time variable. Censored cases are marriages that ended by widowhood or were intact as of the date of last interview; the time is known exactly for these cases and the lower and upper bounds are equal. For marriages that ended in divorce the dataset records a range of possible dates such that the divorce is known to have occurred at some time between the lower and upper bounds. For simplicity we will assume that divorces occurred at the mid-point of the interval.
Note also that we have multiple marriages per respondent. The dataset has
3,371 respondents with a total of 4,238 marriages. The respondent with the
most marriages had six. The variable marnum
is the marriage number
for the survey respondent (which may be the husband or the wife).
For the analyses in parts 1 and 2 we will focus on first marriages only,
but in part 3 we will consider all marriages together in a multiple spell model.
Another interesting feature of the dataset is that we have data on fertility, recording the dates of birth of each child within each marriage. This will allow us to explore the extent to which the divorce rate varies with number of children. The data are available in wide format with a provision for up to 17 children (the maximum in the sample). For example respondent with id 9 has been married once only, had a child at duration 3.734 years and was interviewed at duration 10.546. This respondent has two fertility spans: (from 0) up to 3.734 with 0 children and (from 3.734) up to 10.546 with one child.
(a) Explore the determinants of divorce using a proportional hazards model
for first marriages only. I recommend that you use the same specification
as Lillard and Panis. They model ethnicity effects using two dummy variables:
heblack
and mixedrace
, which is defined as
heblack != sheblack
.
They consider husband's education using two dummy variables,
dropout
for hiseduc < 12
and
college
for hiseduc ≥ 16
.
They also consider the age difference between the spouses using dummies for
heolder
when agediff > 10
and
sheolder
when agediff < -10
.
Make sure in your exploration you describe the effects of husband's education,
couple's ethnicity and age difference, and test their significance.
(b) Use Schoenfeld residuals to explore whether the effects of the
significant predictors can be considered proportional.
Explore further the proportionality of the effect of heblack
introducing an interaction with marriage duration.
(c) Estimate the survival function for white couples where the husband has high school education. Note what proportion of marriages survives 20 years, and what proportion eventually dissolve. Compare these results with appropriate estimates for black couples with the same education. The idea here is to translate hazard ratios into more familiar probabilities.
(a) Examine the effect of children on marriage stability by introducing a linear term on number of children as a time-varying covariate, interpret the estimate and test its significance. You should find that the hazard of divorce is lower among couples with (more) children, but note the potential endogeneity of number of children. (One solution to this problem requires joint modeling of fertility and marital disruption using simultaneous hazard models, as in Lillard, 1993).
(b) Estimate the survival function for white couples where the husband has high school education assuming they never have children. How would you translate the effect of having children into survival probabilities? The idea again is to translate hazard ratios into more familiar probabilities, but this is a bit trickier with time-varying covariates.
To do this analysis you will need to create a separate record each time a couple has a child. Our discussion of time-varying covariates in class should come handy.
(a) Fit the model to all marriages adding dummy variables to represent second marriages and third or higher order marriages, ignoring for now the fact that some respondents contribute multiple observations. Interpret and test the significance of the dummy variables for marriage order.
(b) Fit a shared frailty model using a random effect to account for possible correlation between the durations of a respondent's marriages. Test the significance of the random effect and interpret the estimate of its variance in terms of a correlation coefficient and in terms of a regression coefficient. Note any changes in the coefficients of marriage order after introducing the shared frailty term.
(c) How does the risk of divorce for second marriages compares with the risk for first marriages? In your answer try to distinguish the risk for average first and second marriages and the risk for the average couple. It would be useful to translate your answers into survival probabilities.