Statistics and Population

MLwiN

Home 2 Levels 3 Levels Logit Poisson Bayes MLwiN

A Model With Serial Correlation

Repeated measurements often exhibit serial correlation, with outcomes closer it time more highly correlated that outcomes further apart. Multilevel models have an exchangeable error structure, where all outcomes in a group are equally correlated, but may be adapted to accomodate serial correlation.

We will analyze the example in Goldstein (1995, Sections 6.4 and 6.5 starting on page 91) using the MLwiN macros described in Yang et al (2001) MLwiN Macros for advanced multilevel modelling, which you may download from MLwiN macros.

A Seasonal Growth Model

The data are available in a worksheet supplied with MLwiN, oxboys.ws2, containing height measurements on 26 boys measured on nine occassions between the ages of 11 and 14. The data have already been stacked in long format. The variables of interest are height and age, which is modelled using a fourth-degree polynomial. Variable occ indicates the occassions and seas is the month when measures were taken.

If you retrieve the worksheet you will note that it already has defined the two-level structure and has fitted a simple 2-level model with a fourth-degree polynomial on age. You may want to type fixed and random and compare the results with Yang et al (2001). I will start from scratch to document all steps. Make sure you use the actual location of MLwiN on your computer.

retrieve c:\program files\mlwin1.10\oxboys.ws2
response "height"
ident 1 "occ" 2 "id"
expla 1 "cons" "age" "age2" "age3" "age4"
setv 1 "cons"
setv 2 "cons" "age" "age2"

Note that only the constant, linear and square terms on age are random at the student level. You may want to fit this model and compare your results with Yang et al (2001). I will go ahead and add a seasonality term. As noted in Goldstein (1995, p.92), if the seasonal component has amplitude a and phase g we can write

a cos(t+g) = a₁cos(t) - a₂sin(t).

For this data the second coefficient is very close to zero and will be omitted.

note add the seasonal term:
calc c10 = 3.1416 * 'seas'/6
cos c10 c11
name c11 'cos'
expl 1 'cos'
batch 1
start
fixed 
random

These are my results, reproducing Table 6.4 in Goldstein (1995, p.93)

fixed 
  PARAMETER            ESTIMATE     S. ERROR(U)   PREV. ESTIMATE
cons                      148.9         1.54              148.9 
age                       6.189       0.3485              6.187 
age2                      2.167       0.4504              2.167 
age3                     0.3918       0.1565             0.3918 
age4                     -1.553       0.4423             -1.553 
cos                     -0.2367      0.06769            -0.2367 
 
random 
LEV.  PARAMETER       (NCONV)    ESTIMATE    S. ERROR(U)  PREV. ESTIM     CORR.
-------------------------------------------------------------------------------
 2    cons     /cons     ( 2)       61.51           17.1        61.61         1
 2    age      /cons     ( 2)       7.979           3.02        7.999     0.613
 2    age      /age      ( 2)       2.754         0.7794        2.758         1
 2    age2     /cons     ( 2)       1.349          1.414        1.356     0.213
 2    age2     /age      ( 1)      0.8776         0.3432       0.8788     0.654
 2    age2     /age2     ( 1)      0.6547         0.2272       0.6549         1
-------------------------------------------------------------------------------
 1    cons     /cons     ( 1)      0.1991        0.02254       0.1991

Adding Serial Correlation

We will now consider a model where level-1 residuals for the same individual are correlated and the magnitude of the correlation depends on the time difference between the measurements. Specifically we will assume that

cov(e_t,j,e_t-s,j) = s²_e exp{- g s}

So the correlation between residuals s time-units apart is exp{- g s} and decays exponentially with s. (The correlation between outcomes will also involve level-2 residuals.)

To fit this model make sure you have downloaded the TS macros, and have adjusted the MLwiN settings to use the right files. As usual I will provide a script instead of using the GUI. First I define the folder where the macros are located and the files to be used for pre and post estimation. The time-series macros require the time variable to be called T. The other commands are explained in comments:

note the time variable must be called T
name c2 "T"
note define the location of the pre and post macros
fpath c:\program files\mlwiN1.10\ts
prefile pre.ts
postfile post.ts
note turn on time series switch
set b10 1
note put power in c185 (we use 1)
join c185 1 c185
note put the starting value in c201 (we use 20)
join c201 20 c201
note turn off switches for other models
set b11 0
set b12 0
set b13 0
note set maximum number of iterations to 50
maxi 50

At this stage I fitted the model interactively by pressing the Start button. The procedure did four iterations. The results are not all shown in the Equation window, so you have to use random to look at the random parameters. To ensure that s1 has converged press More and use random a couple of times.

Here are my results, in general agreement with Table 6.5 in Goldstein (1995, p. 93).

fixed 
  PARAMETER            ESTIMATE     S. ERROR(U)   PREV. ESTIMATE
cons                      148.9        1.539              148.9 
t                         6.191       0.3509              6.191 
age2                      2.163       0.4494              2.163 
age3                     0.3863        0.169             0.3863 
age4                     -1.548       0.4294             -1.548 
cos                      -0.236      0.06733             -0.236 
 
random 
LEV.  PARAMETER       (NCONV)    ESTIMATE    S. ERROR(U)  PREV. ESTIM     CORR.
-------------------------------------------------------------------------------
 2    cons     /cons     ( 8)       61.48          17.07        61.48         1
 2    t        /cons     ( 8)        7.93          2.992         7.93     0.618
 2    t        /t        ( 6)        2.68         0.7655         2.68         1
 2    age2     /cons     ( 5)       1.479          1.402        1.479     0.249
 2    age2     /t        ( 5)      0.8525         0.3357       0.8525     0.687
 2    age2     /age2     ( 5)      0.5745         0.2284       0.5746         1
 2    t(0)     *         ( 5)      0.2346         0.0433       0.2346 
 2    s1       *         ( 4)       6.908          2.082         6.91 
-------------------------------------------------------------------------------
 1    cons     /cons     ( 5)      0.2346         0.0433       0.2346

The interesting parameter is 6.908. To see what this implies in terms of serial correlation we compute exp{- g s} for S=0.25, 0.50, 0.75 and 1, which corresponds to observations 3, 6, 9 and 12 months apart:

join c50 .25 .5 .75 1 c50
calc c51 = expo(-6.908 * c50)
print c51

The result is

print c51 

1

 N =      4       
   1  0.17782      
   2  0.031619     
   3  0.0056224    
   4  0.00099976

We see that the correlation is 0.18 for residuals three months apart, and declines to 0.03 for residuals six months apart.