We now consider models involving two factors with discrete levels. We illustrate using the sample data with both social setting and family planning effort grouped into categories. Key issues involve the concepts of main effects and interactions.
Table 2.13 shows the CBR decline in our 20 countries classified according to two criteria: social setting, with categories low (under 70), medium (70–79) and high (80+), and family planning effort, with categories weak (0–4), moderate (5–14) and strong (15+). In our example both setting and effort are factors with three levels. Note that there are no countries with strong programs in low social settings.
Table 2.13. CBR Decline by Levels of Social Setting
and Levels of Family Planning Effort
Setting | Effort | ||
Weak | Moderate | Strong | |
Low | 1,0,7 | 21,13,4,7 | – |
Medium | 10,6,2 | 0 | 25 |
High | 9 | 11 | 29,29,40,21,22,29 |
We will modify our notation to reflect the two-way layout of the data. Let \( n_{ij} \) denote the number of observations in the \( (i,j) \)-th cell of the table, i.e. in row \( i \) and column \( j \), and let \( y_{ijk} \) denote the response of the \( k \)-th unit in that cell, for \( k=1, \ldots, n_{ij} \). In our example \( y_{ijk} \) is the CBR decline of the \( k \)-th country in the \( i \)-th category of setting and the \( j \)-th category of effort.
In this formulation \( \mu \) represents a baseline value, \( \alpha_i \) represents the effect of the \( i \)-th level of the row factor and \( \beta_j \) represents the effect of the \( j \)-th level of the column factor. Before we proceed further we must note that the model is not identified as stated. You could add a constant \( \delta \) to each of the \( \alpha_i \)’s (or to each of the \( \beta_j \)’s) and subtract it from \( \mu \) without altering any of the expected responses. Clearly we need two constraints to identify the model.
Our preferred approach relies on the reference cell method, and sets to zero the effects for the first cell (in the top-left corner of the table), so that \( \alpha_1 = \beta_1 = 0 \). The best way to understand the meaning of the remaining parameters is to study Table 2.14, showing the expected response for each combination of levels of row and column factors having three levels each.
Table 2.14. The Two-Factor Additive Model
Row | Column | ||
1 | 2 | 3 | |
1 | \(\mu\) | \(\mu+\beta_2\) | \(\mu+\beta_3\) |
2 | \(\mu+\alpha_2\) | \(\mu+\alpha_2+\beta_2\) | \(\mu+\alpha_2+\beta_3\) |
3 | \(\mu+\alpha_3\) | \(\mu+\alpha_3+\beta_2\) | \(\mu+\alpha_3+\beta_3\) |
In this formulation of the model \( \mu \) represents the expected response in the reference cell, \( \alpha_i \) represents the effect of level \( i \) of the row factor (compared to level 1) for any fixed level of the column factor, and \( \beta_j \) represents the effect of level \( j \) of the column factor (compared to level 1) for any fixed level of the row factor.
Note that the model is additive, in the sense that the effect of each factor is the same at all levels of the other factor. To see this point consider moving from the first to the second row. The response increases by \( \alpha_2 \) if we move down the first column, but also if we move down the second or third columns.
The model in Equation 2.19 is a special case of the general linear model, where the model matrix \( \boldsymbol{X} \) has a column of ones representing the constant, and two sets of dummy or indicator variables representing the levels of the row and column factors, respectively. This matrix is not of full column rank because the row (as well as the column) dummies add to the constant. Clearly we need two constraints and we choose to drop the dummy variables corresponding to the first row and to the first column. Table 2.15 shows the resulting parameter estimates, standard errors and \( t \)-ratios for our example.
Table 2.15. Parameter Estimates for Two-Factor Additive Model
of CBR Decline by Social Setting and
Family Planning Effort
Parameter | Symbol | Estimate | Std. Error | \(t\)-ratio | |
Baseline | low/weak | \(\mu\) | 5.379 | 3.105 | 1.73 |
Setting | medium | \(\alpha_2\) | \(-\)1.681 | 3.855 | \(-0.44\) |
high | \(\alpha_3\) | 2.388 | 4.457 | 0.54 | |
Effort | moderate | \(\beta_2\) | 3.836 | 3.575 | 1.07 |
strong | \(\beta_3\) | 20.672 | 4.339 | 4.76 |
Thus, we expect a CBR decline of 5.4% in countries with low setting and weak programs. In countries with medium or high social setting we expect CBR declines of 1.7 percentage points less and 2.4 percentage points more, respectively, than in countries with low setting and the same level of effort. Finally, in countries with moderate or strong programs we expect CBR declines of 3.8 and 20.7 percentage points more than in countries with weak programs and the same level of social setting.
It appears from a cursory examination of the \( t \)-ratios in Table 2.15 that the only significant effect is the difference between strong and weak programs. Bear in mind, however, that the table only shows the comparisons that are explicit in the chosen parameterization. In this example it turns out that the difference between strong and moderate programs is also significant. (This test can be calculated from the variance-covariance matrix of the estimates, or by fitting the model with strong programs as the reference cell, so the medium-strong comparison becomes one of the parameters.) Questions of significance for factors with more than two-levels are best addressed by using the \( F \)-test discussed below.
Fitting the two-factor additive model results in a residual sum of squares of 574.4 on 15 d.f., and represents an improvement over the null model of 2075.8 at the expense of four d.f. We can further decompose this gain as an improvement of 1193.8 on 2 d.f. due to social setting (from Section 2.6) and a gain of 882.0, also on 2 d.f., due to effort given setting. These calculations are set out in Table 2.16, which also shows the corresponding mean squares and \( F \)-ratios.
Table 2.16. Hierarchical Anova for Two-Factor Additive Model
of CBR Decline by Social Setting and Family Planning Effort
Source of | Sum of | Degrees of | Mean | \(F\)- |
variation | squares | freedom | squared | ratio |
Setting | 1193.8 | 2 | 596.9 | 15.6 |
Effort\(|\)Setting | 882.0 | 2 | 441.0 | 11.5 |
Residual | 574.4 | 15 | 38.3 | |
Total | 2650.2 | 19 |
We can combine the sum of squares for setting and for effort given setting to construct a test for the overall significance of the regression. This results in an \( F \)-ratio of 13.6 on four and 15 d.f., and is highly significant. The second of the \( F \)-ratios shown in Table 2.16, which is 11.5 on two and 15 d.f., is a test for the net effect of family planning effort after accounting for social setting, and is highly significant. (The first of the \( F \)-ratios in the table, 15.6 on two and 15 d.f., is not in common use but is shown for completeness; it can be interpreted as an alternative test for the gross effect of setting, which combines the same numerator as the test in the previous section with a more refined denominator that takes into account the effect of effort.)
There is an alternative decomposition of the regression sum of squares into an improvement of 2040.0 on two d.f. due to effort and a further gain of 35.8 on two d.f. due to setting given effort. The latter can be contrasted with the error sum of squares of 574.4 on 15 d.f. to obtain a test of the net effect of setting given effort. This test would address the question of whether socio-economic conditions have an effect on fertility decline after we have accounted for family planning effort.
The sums of squares described above can be turned into proportions of variance explained using the now-familiar calculations. For example the two factors together explain 2075.8 out of 2650.2, or 78.3% of the variation in CBR decline.
The square root of this proportion, 0.885 in the example, is the multiple correlation ratio; it is analogous to (and in fact is often called) the multiple correlation coefficient. We use the word ‘ratio’ to emphasize the categorical nature of the predictors and to note that it generalizes to more than one factor the correlation ratio introduced in Section 2.4.
We can also calculate the proportion of variance explained by one of the factors out of the amount left unexplained by the other. In our example effort explained 882.0 out of the 1456.6 that setting had left unexplained, or 60.6%. The square root of this proportion, 0.778, is called the partial correlation ratio, and can be interpreted as a measure of correlation between a discrete factor and a continuous variable after adjustment for another factor.
Parameter estimates from the additive model can be translated into fitted means using Equation 2.19 evaluated at the estimates. The body of Table 2.17 shows these values for our illustrative example. Note that we are able to estimate the expected CBR decline for strong programs in low social settings although there is no country in our dataset with that particular combination of attributes. Such extrapolation relies on the additive nature of the model and should be interpreted with caution. Comparison of observed and fitted values can yield useful insights into the adequacy of the model, a topic that will be pursued in more detail when we discuss regression diagnostics later in this chapter.
Table 2.17. Fitted Means Based on Two-Factor Additive Model
of CBR Decline by Social Setting and Family Planning Effort
Setting | Effort | All | ||
Weak | Moderate | Strong | ||
Low | 5.38 | 9.22 | 26.05 | 13.77 |
Medium | 3.70 | 7.54 | 24.37 | 12.08 |
High | 7.77 | 11.60 | 28.44 | 16.15 |
All | 5.91 | 9.75 | 26.59 | 14.30 |
Table 2.17 also shows column (and row) means, representing expected CBR declines by effort (and setting) after adjusting for the other factor. The column means are calculated as weighted averages of the cell means in each column, with weights given by the total number of countries in each category of setting. In symbols
\[ \hat{\mu}_{.j} = \sum n_{i.} \hat{\mu}_{ij}/ n, \]where we have used a dot as a subscript placeholder so \( n_{i.} \) is the number of observations in row \( i \) and \( \mu_{.j} \) is the mean for column \( j \).
The resulting estimates may be interpreted as standardized means; they estimate the CBR decline that would be expected at each level of effort if those countries had the same distribution of social setting as the total sample. (The column means can also be calculated by using the fitted model to predict CBR decline for each observation with the dummies representing social setting held fixed at their sample averages and all other terms kept as observed. This construction helps reinforce their interpretation in terms of predicted CBR decline at various levels of effort adjusted for setting.)
Table 2.18. CBR Decline by Family Planning Effort
Before and After Adjustment for Social Setting
Effort | CBR Decline | |
Unadjusted | Adjusted | |
Weak | 5.00 | 5.91 |
Moderate | 9.33 | 9.75 |
Strong | 27.86 | 26.59 |
Standardized means may be useful in presenting the results of a regression analysis to a non-technical audience, as done in Table 2.18. The column labelled unadjusted shows the observed mean CBR decline by level of effort. The difference of 23 points between strong and weak programs may be due to program effort, but could also reflect differences in social setting. The column labelled adjusted corrects for compositional differences in social setting using the additive model. The difference of 21 points may be interpreted as an effect of program effort net of social setting.
Multiple Classification Analysis (MCA), a technique that has enjoyed some popularity in Sociology, turns out to be just another name for the two factor additive model discussed in this section (and more generally, for multi-factor additive models). A nice feature of MCA, however, is a tradition of presenting the results of the analysis in a table that contains
the gross effect of each of the factors, calculated using a one-factor model under the `usual' restrictions, together with the corresponding correlation ratios (called `eta' coefficients), and
the net effect of each factor, calculated using a two-factor additive model under the `usual' restrictions, together with the corresponding partial correlation ratios (unfortunately called `beta' coefficients).
Table 2.19 shows a multiple classification analysis of the program effort data that follows directly from the results obtained so far. Estimates for the additive model under the usual restrictions can be obtained from Table 2.18 as differences between the row and column means and the overall mean.
Table 2.19. Multiple Classification Analysis of CBR Decline
by Social Setting and Family Planning Effort
Factor | Category | Gross | Eta | Net | Beta |
Effect | Effect | ||||
Setting | Low | \(-\)6.73 | \(-\)0.54 | ||
Medium | \(-\)5.70 | \(-\)2.22 | |||
High | 9.45 | 1.85 | |||
0.67 | 0.24 | ||||
Effort | Weak | \(-\)9.30 | \(-\)8.39 | ||
Moderate | \(-\)4.97 | \(-\)4.55 | |||
Strong | 13.56 | 12.29 | |||
0.88 | 0.78 | ||||
Total | 14.30 | 14.30 |
The overall expected decline in the CBR is 14.3%. The effects of low, medium and high setting are substantially reduced after adjustment for effort, an attenuation reflected in the reduction of the correlation ratio from 0.67 to 0.24. On the other hand, the effects of weak, moderate and strong programs are slightly reduced after adjustment for social setting, as can be seen from correlation ratios of 0.88 and 0.78 before and after adjustment. The analysis indicates that the effects of effort are more pronounced and more resilient to adjustment than the effects of social setting.
The analysis so far has rested on the assumption of additivity. We now consider a more general model for the effects of two discrete factors on a continuous response which allows for more general effects
\[\tag{2.20}\mu_{ij} = \mu + \alpha_i + \beta_j + (\alpha\beta)_{ij}.\]In this formulation the first three terms should be familiar: \( \mu \) is a constant, and \( \alpha_i \) and \( \beta_j \) are the main effects of levels \( i \) of the row factor and \( j \) of the column factor.
The new term \( (\alpha\beta)_{ij} \) is an interaction effect. It represents the effect of the combination of levels \( i \) and \( j \) of the row and column factors. (The notation \( (\alpha\beta) \) should be understood as a single symbol, not a product; we could have used \( \gamma_{ij} \) to denote the interaction, but the notation \( (\alpha\beta)_{ij} \) is more suggestive and reminds us that the term involves a combined effect.)
One difficulty with the model as defined so far is that it is grossly overparameterized. If the row and column factors have \( R \) and \( C \) levels, respectively, we have only \( RC \) possible cells but have introduced \( 1+R+C+RC \) parameters. Our preferred solution is an extension of the reference cell method, and sets to zero all parameters involving the first row or the first column in the two-way layout, so that \( \alpha_1 = \beta_1 = (\alpha\beta)_{1j} = (\alpha\beta)_{i1} = 0 \). The best way to understand the meaning of the remaining parameters is to study Table 2.20, which shows the structure of the means in a three by three layout.
Table 2.20. The Two-Factor Model With Interactions
Row | Column | ||
1 | 2 | 3 | |
1 | \(\mu\) | \(\mu+\beta_2\) | \(\mu+\beta_3\) |
2 | \(\mu+\alpha_2\) | \(\mu + \alpha_2 + \beta_2 + (\alpha\beta)_{22}\) | \(\mu + \alpha_2 + \beta_3 + (\alpha\beta)_{23}\) |
3 | \(\mu + \alpha_3\) | \(\mu + \alpha_3 + \beta_2 + (\alpha\beta)_{32}\) | \(\mu + \alpha_3 + \beta_3 + (\alpha\beta)_{33}\) |
Here \( \mu \) is the expected response in the reference cell, just as before. The main effects are now more specialized: \( \alpha_i \) is the effect of level \( i \) of the row factor, compared to level one, when the column factor is at level one, and \( \beta_j \) is the effect of level \( j \) of the column factor, compared to level one, when the row factor is at level one. The interaction term \( (\alpha\beta)_{ij} \) is the additional effect of level \( i \) of the row factor, compared to level one, when the column factor is at level \( j \) rather than one. This term can also be interpreted as the additional effect of level \( j \) of the column factor, compared to level one, when the row factor is at level \( i \) rather than one.
The key feature of this model is that the effect of a factor now depends on the levels of the other. For example the effect of level two of the row factor, compared to level one, is \( \alpha_2 \) in the first column, \( \alpha_2+(\alpha\beta)_{22} \) in the second column, and \( \alpha_2+(\alpha\beta)_{23} \) in the third column.
The resulting model is a special case of the general lineal model where the model matrix \( \boldsymbol{X} \) has a column of ones to represent the constant, a set of \( R-1 \) dummy variables representing the row effects, a set of \( C-1 \) dummy variables representing the column effects, and a set of \( (R-1)(C-1) \) dummy variables representing the interactions.
The easiest way to calculate the interaction dummies is as products of the row and column dummies. If \( r_i \) takes the value one for observations in row \( i \) and zero otherwise, and \( c_j \) takes the value one for observations in column \( j1 \) and zero otherwise, then the product \( r_i c_j \) takes the value one for observations that are in row \( i \) and column \( j \), and is zero for all others.
In order to fit this model to the program effort data we need to introduce one additional constraint because the cell corresponding to strong programs in low settings is empty. As a result, we cannot distinguish \( \beta_3 \) from \( \beta_3+(\alpha\beta)_{23} \). A simple solution is to set \( (\alpha\beta)_{23}=0 \). This constraint is easily implemented by dropping the corresponding dummy, which would be \( r_2 c_3 \) in the above notation.
The final model has eight parameters: the constant, two setting effects, two effort effects, and three (rather than four) interaction terms.
Table 2.21. Anova for Two-Factor Model with Interaction Effect
for CBR Decline by Social Setting and Family Planning Effort
Source of | Sum of | Degrees of | Mean | \(F\)- |
variation | squares | freedom | squared | ratio |
Setting | 1193.8 | 2 | 596.9 | 15.5 |
Effort\(|\)Setting | 882.0 | 2 | 441.0 | 11.5 |
Interaction | 113.6 | 3 | 37.9 | 1.0 |
Residual | 460.8 | 12 | 38.4 | |
Total | 2650.2 | 19 |
Fitting the model gives a \( \mbox{RSS} \) of 460.8 on 12 d.f. Combining this result with the anova for the additive model leads to the hierarchical anova in Table 2.21. The \( F \)-test for the interaction is one on three and 12 d.f. and is clearly not significant. Thus, we have no evidence to contradict the assumption of additivity. We conclude that the effect of effort is the same at all social settings. Calculation and interpretation of the parameter estimates is left as an exercise.
In our analysis of CBR decline we treated social setting and family planning effort as continuous variates with linear effects in Sections 2.4 and 2.5, and as discrete factors in Sections 2.6 and 2.7.
The fundamental difference between the two approaches hinges on the assumption of linearity. When we treat a predictor as a continuous variate we assume a linear effect. If the assumption is reasonable we attain a parsimonious fit, but if it is not reasonable we are forced to introduce transformations or higher-order polynomial terms, resulting in models which are often harder to interpret.
A reasonable alternative in these cases is to model the predictor as a discrete factor, an approach that allows arbitrary changes in the response from one category to another. This approach has the advantage of a simpler and more direct interpretation, but by grouping the predictor into categories we are not making full use of the information in the data.
In our example we found that social setting explained 45% of the variation in CBR declines when treated as a variate and 45% when treated as a factor with three levels. Both approaches give the same result, suggesting that the assumption of linearity of setting effects is reasonable.
On the other hand family planning effort explained 64% when treated as a variate and 77% when treated as a factor with three levels. The difference suggests that we might be better off grouping effort into three categories. The reason, of course, is that the effect of effort is non-linear: CBR decline changes little as we move from weak to moderate programs, but raises steeply for strong programs.