Loglinear Regression Models

Applied Categorical & Nonnormal Data Analysis

Loglinear Regression Models

Contingency tables are sufficient for analyzing associations in two-way tables. Three-way and higher tables have many more associations that may be explored necessitating a flexible method for generating the expected frequencies for the different hypotheses. Loglinear regression models are one approach that can be used. Loglinear regression models have the general form

^AB

We will use the ipf (iterated proportional fitting) command written by Adrian Mander to estimate the models. Use findit ipf in Stata 7 to locate the command. Iterated proportional fitting is a fast, efficient method for estimating the expected frequencies for various models. The drawback to using the ipf command is that it does not produce coefficients for each of the individual terms in the model. It produces Likelihood and Pearson chi-squares along with the expected frequencies for each cell.

We will illustrate loglinear regression models for contingency tables using the acm dataset from Agresti. The acm file contains information about alochol, cigarette, and marijuana use. Let's begin with a two-variable example looking at the relation between cigarette smoking and marijuana use. We will start off with a traditional contingency table analysis that the independence between cigarettes and marijuana.

Two-Variable Example

use http://www.philender.com/courses/data/acm, clear

describe

Contains data from http://www.gseis.ucla.edu/courses/data/acm.dta
  obs:             8                          
 vars:             4                          28 Nov 2001 14:28
 size:            72 (100.0% of memory free)
-------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
a               byte   %8.0g       yn         alcohol use
c               byte   %8.0g       yn         cigarette use
m               byte   %8.0g       yn         marijuana use
freq            int    %8.0g                  
-------------------------------------------------------------------------------
 
list

            a         c         m      freq
  1.       no        no       yes         2
  2.       no       yes       yes         3
  3.       no       yes        no        43
  4.      yes        no       yes        44
  5.       no        no        no       279
  6.      yes        no        no       456
  7.      yes       yes        no       538
  8.      yes       yes       yes       911
 
tabulate c m [fw=freq], exp all 

+--------------------+
| Key                |
|--------------------|
|     frequency      |
| expected frequency |
+--------------------+

 cigarette |     marijuana use
       use |        no        yes |     Total
-----------+----------------------+----------
        no |       735         46 |       781 
           |     451.6      329.4 |     781.0 
-----------+----------------------+----------
       yes |       581        914 |     1,495 
           |     864.4      630.6 |   1,495.0 
-----------+----------------------+----------
     Total |     1,316        960 |     2,276 
           |   1,316.0      960.0 |   2,276.0 

          Pearson chi2(1) = 642.0347   Pr = 0.000
 likelihood-ratio chi2(1) = 751.8083   Pr = 0.000
               Cramer's V =   0.5311
                    gamma =   0.9235  ASE = 0.012
          Kendall's tau-b =   0.5311  ASE = 0.014
 
ipf [fw=freq], fit(c+m) exp

Expansion of the various marginal models
----------------------------------------
marginal model 1 varlist :  c 
marginal model 2 varlist :  m 
unique varlist  c m

-------------------------------------------------------------------
N.B.  structural/sampling zeroes may lead to an incorrect df
Residual degrees of freedom = 1  


Goodness of Fit Tests
---------------------
df = 1
Likelihood Ratio Statistic G^2 = 751.8083 p-value = 0.000
Pearson Statistic          X^2 = 642.0347 p-value = 0.000

        c          m       Efreq       Ofreq        prob
        1          1   630.57996         914   .27705622
        1          2   864.42004         581   .37979791
        2          1   329.42004          46    .1447364
        2          2   451.57996         735   .19840947
 
ipf [fw=freq], fit(c*m) exp

Expansion of the various marginal models
----------------------------------------
marginal model 1 varlist :  c m 
unique varlist  c m

-------------------------------------------------------------------
N.B.  structural/sampling zeroes may lead to an incorrect df
Residual degrees of freedom = 0  

Goodness of Fit Tests
---------------------
df = 0
Likelihood Ratio Statistic G^2 =   0.0000 p-value =     .
Pearson Statistic          X^2 =   0.0000 p-value =     .

        c          m       Efreq       Ofreq        prob
        1          1         914         914   .40158172
        1          2         581         581   .25527241
        2          1          46          46    .0202109
        2          2         735         735   .32293497

The last model, using c*m, perfectly reproduces the observed frequencies because it uses information concerning both the joint and marginal distributions of cigarettes and marijuana. It is equivalent to testing c+m+c*m and is called the saturated model since it is fully deterministic.

The loglinear model for independence looks like this:

_ij

The loglinear model for the saturated case looks like this:

_ij

^cm

When you get to three variables and higher models the questions that can be asked are more interesting than with only two variables. If we add alcohol to our model then testing a+c+m is equivalent to running a 3-way contingency table test for independence. This is known as the mutual independence model. It isn't really very interesting and fortunately isn't all that common either.

The loglinear model for mutual independence would look something like this:

_ijk

A shorthand notation for this model would be (a,c,m).

Consider the following model:

_ijk

_ik

^am

This is the joint independence model which asserts the two-way independence independence between cigarettes and four combinations of levels of alcohol and marijuana.

Consider the following conditional independence model:

_ijk

_ij

^cm

_ik

^am

This model accounts for the association between cigarettes and marijuana controlling for alochol and for the association between alcohol and marijuana controlling for cigarettes. This is a conditional independence model specifying the conditional independence between alcohol and cigarettes while controlling for marijuana. It can be denoted (cm,am).

A model that contains all of the two-way interactions is known as a homogeneous association model.

_ijk

_ij

^ac

_ik

^am

_jk

^cm

In this model all three pairs of variables are conditionally dependent on the third variable and is denoted (am,cm,ac). The saturated model, (acm), with all possible interactions, looks like this:

_ijk

_ij

^ac

_ik

^am

_jk

^cm

_ijk

^acm

Three-Variable Example

table c m [fw=freq], by(a)

----------------------
alcohol   |
use and   | marijuana 
cigarette |    use    
use       |   no   yes
----------+-----------
no        |
       no |  279     2
      yes |   43     3
----------+-----------
yes       |
       no |  456    44
      yes |  538   911
----------------------
 
ipf [fw=freq], fit(a+c+m)  /* mutual independence model */

Expansion of the various marginal models
----------------------------------------
marginal model 1 varlist :  a 
marginal model 2 varlist :  c 
marginal model 3 varlist :  m 
unique varlist  a c m

-------------------------------------------------------------------
N.B.  structural/sampling zeroes may lead to an incorrect df
Residual degrees of freedom = 4  


Goodness of Fit Tests
---------------------
df = 4
Likelihood Ratio Statistic G^2 = 1286.0199 p-value = 0.000
Pearson Statistic          X^2 = 1411.3860 p-value = 0.000
 
ipf [fw=freq], fit(a*c+m)

Expansion of the various marginal models
----------------------------------------
marginal model 1 varlist :  a c 
marginal model 2 varlist :  m 
unique varlist  a c m

-------------------------------------------------------------------
N.B.  structural/sampling zeroes may lead to an incorrect df
Residual degrees of freedom = 3  

Goodness of Fit Tests
---------------------
df = 3
Likelihood Ratio Statistic G^2 = 843.8267 p-value = 0.000
Pearson Statistic          X^2 = 704.9071 p-value = 0.000
 
ipf [fw=freq], fit(a*m+c)

Expansion of the various marginal models
----------------------------------------
marginal model 1 varlist :  a m 
marginal model 2 varlist :  c 
unique varlist  a m c

-------------------------------------------------------------------
N.B.  structural/sampling zeroes may lead to an incorrect df
Residual degrees of freedom = 3  

Goodness of Fit Tests
---------------------
df = 3
Likelihood Ratio Statistic G^2 = 939.5626 p-value = 0.000
Pearson Statistic          X^2 = 824.1630 p-value = 0.000
 
ipf [fw=freq], fit(c*m+a)

Expansion of the various marginal models
----------------------------------------
marginal model 1 varlist :  c m 
marginal model 2 varlist :  a 
unique varlist  c m a

-------------------------------------------------------------------
N.B.  structural/sampling zeroes may lead to an incorrect df
Residual degrees of freedom = 3  

Goodness of Fit Tests
---------------------
df = 3
Likelihood Ratio Statistic G^2 = 534.2117 p-value = 0.000
Pearson Statistic          X^2 = 505.5977 p-value = 0.000
 
ipf [fw=freq], fit(c*m+a*m)

Expansion of the various marginal models
----------------------------------------
marginal model 1 varlist :  c m 
marginal model 2 varlist :  a m 
unique varlist  c m a

-------------------------------------------------------------------
N.B.  structural/sampling zeroes may lead to an incorrect df
Residual degrees of freedom = 2  

Goodness of Fit Tests
---------------------
df = 2
Likelihood Ratio Statistic G^2 = 187.7543 p-value = 0.000
Pearson Statistic          X^2 = 177.6149 p-value = 0.000
 
ipf [fw=freq], fit(c*a+m*a)

Expansion of the various marginal models
----------------------------------------
marginal model 1 varlist :  c a 
marginal model 2 varlist :  m a 
unique varlist  c a m

-------------------------------------------------------------------
N.B.  structural/sampling zeroes may lead to an incorrect df
Residual degrees of freedom = 2  

Goodness of Fit Tests
---------------------
df = 2
Likelihood Ratio Statistic G^2 = 497.3693 p-value = 0.000
Pearson Statistic          X^2 = 443.7611 p-value = 0.000
 
ipf [fw=freq], fit(a*c+c*m)

Expansion of the various marginal models
----------------------------------------
marginal model 1 varlist :  a c 
marginal model 2 varlist :  c m 
unique varlist  a c m

-------------------------------------------------------------------
N.B.  structural/sampling zeroes may lead to an incorrect df
Residual degrees of freedom = 2  

Goodness of Fit Tests
---------------------
df = 2
Likelihood Ratio Statistic G^2 =  92.0184 p-value = 0.000
Pearson Statistic          X^2 =  80.8148 p-value = 0.000
 
ipf [fw=freq], fit(c*m+a*m+a*c)  /* homogeneous association model */

Expansion of the various marginal models
----------------------------------------
marginal model 1 varlist :  c m 
marginal model 2 varlist :  a m 
marginal model 3 varlist :  a c 
unique varlist  c m a

-------------------------------------------------------------------
N.B.  structural/sampling zeroes may lead to an incorrect df
Residual degrees of freedom = 1  

Goodness of Fit Tests
---------------------
df = 1
Likelihood Ratio Statistic G^2 =   0.3740 p-value = 0.541
Pearson Statistic          X^2 =   0.4011 p-value = 0.527
 
ipf [fw=freq], fit(a*c*m)  /* fully saturated model */

Expansion of the various marginal models
----------------------------------------
marginal model 1 varlist :  a c m 
unique varlist  a c m

-------------------------------------------------------------------
N.B.  structural/sampling zeroes may lead to an incorrect df
Residual degrees of freedom = 0  

Goodness of Fit Tests
---------------------
df = 0
Likelihood Ratio Statistic G^2 =   0.0000 p-value =     .
Pearson Statistic          X^2 =   0.0000 p-value =     .

Let's collect all of the results together into a table.

Model       df        G²          χ²
a+c+m        4     1286.02     1411.39     mutual independence
a*c+m        3      843.83      704.91     joint independentce
a*m+c        3      939.56      824.16     joint independentce
c*m+a        3      534.21      505.6      joint independentce
c*m+a*m      2      187.75      177.61     conditional independence
c*a+m*a      2      497.37      443.76     conditional independence
a*c+c*m      2       92.02       80.81     conditional independence
c*m+a*m+a*c  1         .4          .4      homogeneous association
a*c*m        0        0.0         0.0      fully saturated

It is possible to test the λ^ac = 0 in the (c*m+a*m+a*c) model. This is accomplished by computing a difference in the G²'s which is equivalent to a likelihood ration test.

The difference of 187.35 with 1 degree of freedom is statistically significant (P < .001) and supports an a*c partial association.

Using the Poisson Distribution

Loglinear regression models can also be estimated using the poisson distribution. In this example we will use the glm command with family = poisson and link = log to estimate the models for (a+c+m), (a*c+c*m+a*c) and (c*m+a*m). We will be discussing generalized linear models, glm, later in the course.

glm freq a c m, family(pois) link(log) nolog

Generalized linear models                          No. of obs      =         8
Optimization     : ML: Newton-Raphson              Residual df     =         4
                                                   Scale parameter =         1
Deviance         =  1286.019954                    (1/df) Deviance =   321.505
Pearson          =  1411.386007                    (1/df) Pearson  =  352.8465

Variance function: V(u) = u                        [Poisson]
Link function    : g(u) = ln(u)                    [Log]
Standard errors  : OIM

Log likelihood   = -667.5316914                    AIC             =  167.8829
BIC              =  1277.702188

------------------------------------------------------------------------------
        freq |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           a |   1.785112   .0597594    29.87   0.000     1.667985    1.902238
           c |   .6493063   .0441509    14.71   0.000     .5627721    .7358406
           m |  -.3154188   .0424446    -7.43   0.000    -.3986087   -.2322289
       _cons |   4.172538   .0649589    64.23   0.000     4.045221    4.299855
------------------------------------------------------------------------------

tabulate c m [fw=freq], cell

 cigarette |     marijuana use
       use |        no        yes |     Total
-----------+----------------------+----------
        no |       735         46 |       781 
           |     32.29       2.02 |     34.31 
-----------+----------------------+----------
       yes |       581        914 |     1,495 
           |     25.53      40.16 |     65.69 
-----------+----------------------+----------
     Total |     1,316        960 |     2,276 
           |     57.82      42.18 |    100.00 


prtesti 781 .3431 1495 .6569

Two-sample test of proportion                      x: Number of obs =      781
                                                   y: Number of obs =     1495

------------------------------------------------------------------------------
    Variable |       Mean   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |      .3431   .0169877                      .3098047    .3763953
           y |      .6569   .0122783                      .6328349    .6809651
-------------+----------------------------------------------------------------
        diff |     -.3138   .0209604                     -.3548817   -.2727183
             |  under Ho:   .0219682   -14.28   0.000
------------------------------------------------------------------------------

               Ho: proportion(x) - proportion(y) = diff = 0

       Ha: diff < 0            Ha: diff != 0             Ha: diff > 0
        z = -14.284              z = -14.284              z = -14.284
    P < z =  0.0000        P > |z| =  0.0000          P > z =  1.0000

The constant in the glm model is the log of the expected frequency for cell a=0, c=0, m=0, thus exp(4.172538) = 64.879909. The coefficient for c is the difference in the logs of the expected frequency from c=0 to c=1 for a=0 and m=0. The test of the coefficient is the test of the differences in the marginal proportions for cigarette.

We will need to create three interaction terms, a*m, a*c and c*m. The response variable will be freq the number of observations.

generate am = a*m
generate ac = a*c
generate cm = c*m
 
glm freq a c m ac cm am, family(pois) link(log) 

Generalized linear models                          No. of obs      =         8
Optimization     : ML: Newton-Raphson              Residual df     =         1
                                                   Scale param     =         1
Deviance         =  .3739858701                    (1/df) Deviance =  .3739859
Pearson          =  .4011005168                    (1/df) Pearson  =  .4011005

Variance function: V(u) = u                        [Poisson]
Link function    : g(u) = ln(u)                    [Log]
Standard errors  : OIM

Log likelihood   = -24.70870712                    AIC             =  7.927177
BIC              = -14.18210492

------------------------------------------------------------------------------
        freq |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           a |    .487719   .0757672     6.44   0.000      .339218      .63622
           c |  -1.886669    .162697   -11.60   0.000    -2.205549   -1.567789
           m |  -5.309042    .475197   -11.17   0.000    -6.240411   -4.377673
          ac |   2.054534   .1740643    11.80   0.000     1.713374    2.395694
          cm |   2.847889   .1638394    17.38   0.000      2.52677    3.169009
          am |   2.986014    .464678     6.43   0.000     2.075262    3.896767
       _cons |    5.63342   .0597008    94.36   0.000     5.516409    5.750432
------------------------------------------------------------------------------
 
lrtest, saving(0)
 
glm freq a c m am cm, family(pois) link(log)

Generalized linear models                          No. of obs      =         8
Optimization     : ML: Newton-Raphson              Residual df     =         2
                                                   Scale param     =         1
Deviance         =  187.7543029                    (1/df) Deviance =  93.87715
Pearson          =  177.6148606                    (1/df) Pearson  =  88.80743

Variance function: V(u) = u                        [Poisson]
Link function    : g(u) = ln(u)                    [Log]
Standard errors  : OIM

Log likelihood   = -118.3988656                    AIC             =  31.09972
BIC              =  175.2776537

------------------------------------------------------------------------------
        freq |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           a |   1.127186    .064122    17.58   0.000     1.001509    1.252862
           c |  -.2351197   .0555132    -4.24   0.000    -.3439236   -.1263159
           m |  -6.620924   .4737127   -13.98   0.000    -7.549384   -5.692464
          am |   4.125088   .4529445     9.11   0.000     3.237333    5.012843
          cm |   3.224309   .1609812    20.03   0.000     2.908792    3.539826
       _cons |    5.19207    .060879    85.29   0.000     5.072749    5.311391
------------------------------------------------------------------------------
   
lrtest

Glm:  likelihood-ratio test                           chi2(1)     =     187.38
                                                      Prob > chi2 =     0.0000

Note that the Deviance and the Pearson are the same as the G² and χ², respectively in the table above.

Connection to Logistic Regression

We haven't begun our discussions of logistic regression yet, but this will serve as a demonstration that there is a relationship between logistic regression and loglinear regression models.

Logistic regression requires that we select one of the variables to be the response variable. Let's select m, marijuana.

use http://www.philender.com/courses/data/acm, clear
 
logit m a c ac [fw=freq]

Logit estimates                                   Number of obs   =       2276
                                                  LR chi2(3)      =     843.83
                                                  Prob > chi2     =     0.0000
Log likelihood = -1127.7332                       Pseudo R2       =     0.2723

------------------------------------------------------------------------------
           m |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           a |  -3.778783   1.402386    -2.69   0.007    -6.527409   -1.030157
           c |  -3.454498   .9857381    -3.50   0.000    -5.386509   -1.522487
          ac |   .5895107   .9423638     0.63   0.532    -1.257488     2.43651
       _cons |   7.170455   1.441154     4.98   0.000     4.345845    9.995064
------------------------------------------------------------------------------

The likelihood ratio chi-square in this model is the same as the a*c+m loglinear regression model above.

hsbdemo Example

Here is an example derived from the hsbdemo dataset, in which, write and read have been categorized by quantile and the data, along with female, contracted to frequencies.

clear
input f w r n
 0   1   1   18 
 0   1   2    9 
 0   1   3    5 
 0   1   4    2 
 0   2   1    5 
 0   2   2    5 
 0   2   3    8 
 0   2   4    4 
 0   3   2    4 
 0   3   3    5 
 0   3   4    9 
 0   4   2    1 
 0   4   3    7 
 0   4   4    9 
 1   1   1   10 
 1   1   2    5 
 1   1   3    1 
 1   2   1   14 
 1   2   2   13 
 1   2   3    6 
 1   2   4    2 
 1   3   1    5 
 1   3   2    5 
 1   3   3   10 
 1   3   4    6 
 1   4   2    7 
 1   4   3   10 
 1   4   4   15
end

tab f w [fw=n], chi2 lr row
tab f r [fw=n], chi2 lr row
tab w r [fw=n], chi2 lr row

ipf [fw=n], fit(f+w+r)              /* mutual independence */

ipf [fw=n], fit(f*w+r)              /* joint independence */
ipf [fw=n], fit(f*r+w)
ipf [fw=n], fit(w*r+f)

ipf [fw=n], fit(f*w+f*r)            /* conditional independence */
ipf [fw=n], fit(w*f+w*r)
ipf [fw=n], fit(r*w+r*f)

ipf [fw=n], fit(f*w+f*r+w*r)        /* homogeneous association */

ipf [fw=n], fit(f*w*r)              /* fully saturated */

summary of models 

model       df        G2          X2
f+w+r       24      117.10      101.33     mutual independence
f*w+r       21      102.99       85.56     joint independence
f*r+w       21      115.44      102.94     joint independence
w*r+f       15       33.08       29.39     joint independence
f*w+f*r     18      101.32       84.14     conditional independence
w*f+w*r     12       18.96       16.24     conditional independence (n.s.)
r*w+r*f     12       31.42       28.03     conditional independence
f*w+f*r+w*r  9        6.53        5.67     homogeneous association  (n.s.)
f*w*r        0        0.0         0.0      fully saturated

xi3: poisson n i.w*i.f i.w*i.r

fitstat, saving(m1)

xi3: poisson n i.w*i.f i.w*i.r i.f*i.r

fitstat, using(m1)

Measures of Fit for poisson of n

                             Current            Saved       Difference
Model:                       poisson          poisson
N:                                28               28                0
Log-Lik Intercept Only:      -84.603          -84.603            0.000
Log-Lik Full Model:          -52.752          -56.749            3.997
D:                           105.504(6)       113.498(9)         7.994(3)
LR:                           63.702(21)       55.707(18)        7.994(3)
Prob > LR:                     0.000            0.000            0.046
McFadden's R2:                 0.376            0.329            0.047
McFadden's Adj R2:             0.116            0.105            0.012
Maximum Likelihood R2:         0.897            0.863            0.034
Cragg & Uhler's R2:            0.899            0.865            0.034
AIC:                           5.339            5.411           -0.071
AIC*n:                       149.504          151.498           -1.994
BIC:                          85.510           83.508            2.002
BIC':                          6.275            4.272            2.002

Difference of    2.002 in BIC' provides positive support for saved model.

Note: p-value for difference in LR is only valid if models are nested.

Categorical Data Analysis Course

Phil Ender