Contingency tables are sufficient for analyzing associations in two-way tables. Three-way and higher tables have many more associations that may be explored necessitating a flexible method for generating the expected frequencies for the different hypotheses. Loglinear regression models are one approach that can be used. Loglinear regression models have the general form
We will illustrate loglinear regression models for contingency tables using the acm dataset from Agresti. The acm file contains information about alochol, cigarette, and marijuana use. Let's begin with a two-variable example looking at the relation between cigarette smoking and marijuana use. We will start off with a traditional contingency table analysis that the independence between cigarettes and marijuana.
Two-Variable Example
use http://www.philender.com/courses/data/acm, clear
describe
Contains data from http://www.gseis.ucla.edu/courses/data/acm.dta
obs: 8
vars: 4 28 Nov 2001 14:28
size: 72 (100.0% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
a byte %8.0g yn alcohol use
c byte %8.0g yn cigarette use
m byte %8.0g yn marijuana use
freq int %8.0g
-------------------------------------------------------------------------------
list
a c m freq
1. no no yes 2
2. no yes yes 3
3. no yes no 43
4. yes no yes 44
5. no no no 279
6. yes no no 456
7. yes yes no 538
8. yes yes yes 911
tabulate c m [fw=freq], exp all
+--------------------+
| Key |
|--------------------|
| frequency |
| expected frequency |
+--------------------+
cigarette | marijuana use
use | no yes | Total
-----------+----------------------+----------
no | 735 46 | 781
| 451.6 329.4 | 781.0
-----------+----------------------+----------
yes | 581 914 | 1,495
| 864.4 630.6 | 1,495.0
-----------+----------------------+----------
Total | 1,316 960 | 2,276
| 1,316.0 960.0 | 2,276.0
Pearson chi2(1) = 642.0347 Pr = 0.000
likelihood-ratio chi2(1) = 751.8083 Pr = 0.000
Cramer's V = 0.5311
gamma = 0.9235 ASE = 0.012
Kendall's tau-b = 0.5311 ASE = 0.014
ipf [fw=freq], fit(c+m) exp
Expansion of the various marginal models
----------------------------------------
marginal model 1 varlist : c
marginal model 2 varlist : m
unique varlist c m
-------------------------------------------------------------------
N.B. structural/sampling zeroes may lead to an incorrect df
Residual degrees of freedom = 1
Goodness of Fit Tests
---------------------
df = 1
Likelihood Ratio Statistic G^2 = 751.8083 p-value = 0.000
Pearson Statistic X^2 = 642.0347 p-value = 0.000
c m Efreq Ofreq prob
1 1 630.57996 914 .27705622
1 2 864.42004 581 .37979791
2 1 329.42004 46 .1447364
2 2 451.57996 735 .19840947
ipf [fw=freq], fit(c*m) exp
Expansion of the various marginal models
----------------------------------------
marginal model 1 varlist : c m
unique varlist c m
-------------------------------------------------------------------
N.B. structural/sampling zeroes may lead to an incorrect df
Residual degrees of freedom = 0
Goodness of Fit Tests
---------------------
df = 0
Likelihood Ratio Statistic G^2 = 0.0000 p-value = .
Pearson Statistic X^2 = 0.0000 p-value = .
c m Efreq Ofreq prob
1 1 914 914 .40158172
1 2 581 581 .25527241
2 1 46 46 .0202109
2 2 735 735 .32293497
The last model, using c*m, perfectly reproduces the observed frequencies because
it uses information concerning both the joint and marginal distributions of cigarettes and
marijuana. It is equivalent to testing c+m+c*m and is called the saturated model since
it is fully deterministic. The loglinear model for independence looks like this:
The loglinear model for the saturated case looks like this:
When you get to three variables and higher models the questions that can be asked are more interesting than with only two variables. If we add alcohol to our model then testing a+c+m is equivalent to running a 3-way contingency table test for independence. This is known as the mutual independence model. It isn't really very interesting and fortunately isn't all that common either.
The loglinear model for mutual independence would look something like this:
A shorthand notation for this model would be (a,c,m).
Consider the following model:
This is the joint independence model which asserts the two-way independence independence between cigarettes and four combinations of levels of alcohol and marijuana.
Consider the following conditional independence model:
This model accounts for the association between cigarettes and marijuana controlling for alochol and for the association between alcohol and marijuana controlling for cigarettes. This is a conditional independence model specifying the conditional independence between alcohol and cigarettes while controlling for marijuana. It can be denoted (cm,am).
A model that contains all of the two-way interactions is known as a homogeneous association model.
In this model all three pairs of variables are conditionally dependent on the third variable and is denoted (am,cm,ac). The saturated model, (acm), with all possible interactions, looks like this:
Three-Variable Example
table c m [fw=freq], by(a)
----------------------
alcohol |
use and | marijuana
cigarette | use
use | no yes
----------+-----------
no |
no | 279 2
yes | 43 3
----------+-----------
yes |
no | 456 44
yes | 538 911
----------------------
ipf [fw=freq], fit(a+c+m) /* mutual independence model */
Expansion of the various marginal models
----------------------------------------
marginal model 1 varlist : a
marginal model 2 varlist : c
marginal model 3 varlist : m
unique varlist a c m
-------------------------------------------------------------------
N.B. structural/sampling zeroes may lead to an incorrect df
Residual degrees of freedom = 4
Goodness of Fit Tests
---------------------
df = 4
Likelihood Ratio Statistic G^2 = 1286.0199 p-value = 0.000
Pearson Statistic X^2 = 1411.3860 p-value = 0.000
ipf [fw=freq], fit(a*c+m)
Expansion of the various marginal models
----------------------------------------
marginal model 1 varlist : a c
marginal model 2 varlist : m
unique varlist a c m
-------------------------------------------------------------------
N.B. structural/sampling zeroes may lead to an incorrect df
Residual degrees of freedom = 3
Goodness of Fit Tests
---------------------
df = 3
Likelihood Ratio Statistic G^2 = 843.8267 p-value = 0.000
Pearson Statistic X^2 = 704.9071 p-value = 0.000
ipf [fw=freq], fit(a*m+c)
Expansion of the various marginal models
----------------------------------------
marginal model 1 varlist : a m
marginal model 2 varlist : c
unique varlist a m c
-------------------------------------------------------------------
N.B. structural/sampling zeroes may lead to an incorrect df
Residual degrees of freedom = 3
Goodness of Fit Tests
---------------------
df = 3
Likelihood Ratio Statistic G^2 = 939.5626 p-value = 0.000
Pearson Statistic X^2 = 824.1630 p-value = 0.000
ipf [fw=freq], fit(c*m+a)
Expansion of the various marginal models
----------------------------------------
marginal model 1 varlist : c m
marginal model 2 varlist : a
unique varlist c m a
-------------------------------------------------------------------
N.B. structural/sampling zeroes may lead to an incorrect df
Residual degrees of freedom = 3
Goodness of Fit Tests
---------------------
df = 3
Likelihood Ratio Statistic G^2 = 534.2117 p-value = 0.000
Pearson Statistic X^2 = 505.5977 p-value = 0.000
ipf [fw=freq], fit(c*m+a*m)
Expansion of the various marginal models
----------------------------------------
marginal model 1 varlist : c m
marginal model 2 varlist : a m
unique varlist c m a
-------------------------------------------------------------------
N.B. structural/sampling zeroes may lead to an incorrect df
Residual degrees of freedom = 2
Goodness of Fit Tests
---------------------
df = 2
Likelihood Ratio Statistic G^2 = 187.7543 p-value = 0.000
Pearson Statistic X^2 = 177.6149 p-value = 0.000
ipf [fw=freq], fit(c*a+m*a)
Expansion of the various marginal models
----------------------------------------
marginal model 1 varlist : c a
marginal model 2 varlist : m a
unique varlist c a m
-------------------------------------------------------------------
N.B. structural/sampling zeroes may lead to an incorrect df
Residual degrees of freedom = 2
Goodness of Fit Tests
---------------------
df = 2
Likelihood Ratio Statistic G^2 = 497.3693 p-value = 0.000
Pearson Statistic X^2 = 443.7611 p-value = 0.000
ipf [fw=freq], fit(a*c+c*m)
Expansion of the various marginal models
----------------------------------------
marginal model 1 varlist : a c
marginal model 2 varlist : c m
unique varlist a c m
-------------------------------------------------------------------
N.B. structural/sampling zeroes may lead to an incorrect df
Residual degrees of freedom = 2
Goodness of Fit Tests
---------------------
df = 2
Likelihood Ratio Statistic G^2 = 92.0184 p-value = 0.000
Pearson Statistic X^2 = 80.8148 p-value = 0.000
ipf [fw=freq], fit(c*m+a*m+a*c) /* homogeneous association model */
Expansion of the various marginal models
----------------------------------------
marginal model 1 varlist : c m
marginal model 2 varlist : a m
marginal model 3 varlist : a c
unique varlist c m a
-------------------------------------------------------------------
N.B. structural/sampling zeroes may lead to an incorrect df
Residual degrees of freedom = 1
Goodness of Fit Tests
---------------------
df = 1
Likelihood Ratio Statistic G^2 = 0.3740 p-value = 0.541
Pearson Statistic X^2 = 0.4011 p-value = 0.527
ipf [fw=freq], fit(a*c*m) /* fully saturated model */
Expansion of the various marginal models
----------------------------------------
marginal model 1 varlist : a c m
unique varlist a c m
-------------------------------------------------------------------
N.B. structural/sampling zeroes may lead to an incorrect df
Residual degrees of freedom = 0
Goodness of Fit Tests
---------------------
df = 0
Likelihood Ratio Statistic G^2 = 0.0000 p-value = .
Pearson Statistic X^2 = 0.0000 p-value = .
Let's collect all of the results together into a table.
Model df G2 χ2 a+c+m 4 1286.02 1411.39 mutual independence a*c+m 3 843.83 704.91 joint independentce a*m+c 3 939.56 824.16 joint independentce c*m+a 3 534.21 505.6 joint independentce c*m+a*m 2 187.75 177.61 conditional independence c*a+m*a 2 497.37 443.76 conditional independence a*c+c*m 2 92.02 80.81 conditional independence c*m+a*m+a*c 1 .4 .4 homogeneous association a*c*m 0 0.0 0.0 fully saturatedIt is possible to test the λac = 0 in the (c*m+a*m+a*c) model. This is accomplished by computing a difference in the G2's which is equivalent to a likelihood ration test.
Using the Poisson Distribution
Loglinear regression models can also be estimated using the poisson distribution. In this example we will use the glm command with family = poisson and link = log to estimate the models for (a+c+m), (a*c+c*m+a*c) and (c*m+a*m). We will be discussing generalized linear models, glm, later in the course.
glm freq a c m, family(pois) link(log) nolog
Generalized linear models No. of obs = 8
Optimization : ML: Newton-Raphson Residual df = 4
Scale parameter = 1
Deviance = 1286.019954 (1/df) Deviance = 321.505
Pearson = 1411.386007 (1/df) Pearson = 352.8465
Variance function: V(u) = u [Poisson]
Link function : g(u) = ln(u) [Log]
Standard errors : OIM
Log likelihood = -667.5316914 AIC = 167.8829
BIC = 1277.702188
------------------------------------------------------------------------------
freq | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
a | 1.785112 .0597594 29.87 0.000 1.667985 1.902238
c | .6493063 .0441509 14.71 0.000 .5627721 .7358406
m | -.3154188 .0424446 -7.43 0.000 -.3986087 -.2322289
_cons | 4.172538 .0649589 64.23 0.000 4.045221 4.299855
------------------------------------------------------------------------------
tabulate c m [fw=freq], cell
cigarette | marijuana use
use | no yes | Total
-----------+----------------------+----------
no | 735 46 | 781
| 32.29 2.02 | 34.31
-----------+----------------------+----------
yes | 581 914 | 1,495
| 25.53 40.16 | 65.69
-----------+----------------------+----------
Total | 1,316 960 | 2,276
| 57.82 42.18 | 100.00
prtesti 781 .3431 1495 .6569
Two-sample test of proportion x: Number of obs = 781
y: Number of obs = 1495
------------------------------------------------------------------------------
Variable | Mean Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | .3431 .0169877 .3098047 .3763953
y | .6569 .0122783 .6328349 .6809651
-------------+----------------------------------------------------------------
diff | -.3138 .0209604 -.3548817 -.2727183
| under Ho: .0219682 -14.28 0.000
------------------------------------------------------------------------------
Ho: proportion(x) - proportion(y) = diff = 0
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
z = -14.284 z = -14.284 z = -14.284
P < z = 0.0000 P > |z| = 0.0000 P > z = 1.0000
The constant in the glm model is the log of the expected frequency for cell a=0, c=0, m=0,
thus exp(4.172538) = 64.879909. The coefficient for c is the difference in the logs of the
expected frequency from c=0 to c=1 for a=0 and m=0. The test of the coefficient is the test of the differences
in the marginal proportions for cigarette.We will need to create three interaction terms, a*m, a*c and c*m. The response variable will be freq the number of observations.
generate am = a*m
generate ac = a*c
generate cm = c*m
glm freq a c m ac cm am, family(pois) link(log)
Generalized linear models No. of obs = 8
Optimization : ML: Newton-Raphson Residual df = 1
Scale param = 1
Deviance = .3739858701 (1/df) Deviance = .3739859
Pearson = .4011005168 (1/df) Pearson = .4011005
Variance function: V(u) = u [Poisson]
Link function : g(u) = ln(u) [Log]
Standard errors : OIM
Log likelihood = -24.70870712 AIC = 7.927177
BIC = -14.18210492
------------------------------------------------------------------------------
freq | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
a | .487719 .0757672 6.44 0.000 .339218 .63622
c | -1.886669 .162697 -11.60 0.000 -2.205549 -1.567789
m | -5.309042 .475197 -11.17 0.000 -6.240411 -4.377673
ac | 2.054534 .1740643 11.80 0.000 1.713374 2.395694
cm | 2.847889 .1638394 17.38 0.000 2.52677 3.169009
am | 2.986014 .464678 6.43 0.000 2.075262 3.896767
_cons | 5.63342 .0597008 94.36 0.000 5.516409 5.750432
------------------------------------------------------------------------------
lrtest, saving(0)
glm freq a c m am cm, family(pois) link(log)
Generalized linear models No. of obs = 8
Optimization : ML: Newton-Raphson Residual df = 2
Scale param = 1
Deviance = 187.7543029 (1/df) Deviance = 93.87715
Pearson = 177.6148606 (1/df) Pearson = 88.80743
Variance function: V(u) = u [Poisson]
Link function : g(u) = ln(u) [Log]
Standard errors : OIM
Log likelihood = -118.3988656 AIC = 31.09972
BIC = 175.2776537
------------------------------------------------------------------------------
freq | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
a | 1.127186 .064122 17.58 0.000 1.001509 1.252862
c | -.2351197 .0555132 -4.24 0.000 -.3439236 -.1263159
m | -6.620924 .4737127 -13.98 0.000 -7.549384 -5.692464
am | 4.125088 .4529445 9.11 0.000 3.237333 5.012843
cm | 3.224309 .1609812 20.03 0.000 2.908792 3.539826
_cons | 5.19207 .060879 85.29 0.000 5.072749 5.311391
------------------------------------------------------------------------------
lrtest
Glm: likelihood-ratio test chi2(1) = 187.38
Prob > chi2 = 0.0000
Note that the Deviance and the Pearson are the same as the G2 and
χ2, respectively in the table above.Connection to Logistic Regression
We haven't begun our discussions of logistic regression yet, but this will serve as a demonstration that there is a relationship between logistic regression and loglinear regression models.
Logistic regression requires that we select one of the variables to be the response variable. Let's select m, marijuana.
use http://www.philender.com/courses/data/acm, clear
logit m a c ac [fw=freq]
Logit estimates Number of obs = 2276
LR chi2(3) = 843.83
Prob > chi2 = 0.0000
Log likelihood = -1127.7332 Pseudo R2 = 0.2723
------------------------------------------------------------------------------
m | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
a | -3.778783 1.402386 -2.69 0.007 -6.527409 -1.030157
c | -3.454498 .9857381 -3.50 0.000 -5.386509 -1.522487
ac | .5895107 .9423638 0.63 0.532 -1.257488 2.43651
_cons | 7.170455 1.441154 4.98 0.000 4.345845 9.995064
------------------------------------------------------------------------------
The likelihood ratio chi-square in this model is the same as the a*c+m loglinear
regression model above.hsbdemo Example
Here is an example derived from the hsbdemo dataset, in which, write and read have been categorized by quantile and the data, along with female, contracted to frequencies.
clear
input f w r n
0 1 1 18
0 1 2 9
0 1 3 5
0 1 4 2
0 2 1 5
0 2 2 5
0 2 3 8
0 2 4 4
0 3 2 4
0 3 3 5
0 3 4 9
0 4 2 1
0 4 3 7
0 4 4 9
1 1 1 10
1 1 2 5
1 1 3 1
1 2 1 14
1 2 2 13
1 2 3 6
1 2 4 2
1 3 1 5
1 3 2 5
1 3 3 10
1 3 4 6
1 4 2 7
1 4 3 10
1 4 4 15
end
tab f w [fw=n], chi2 lr row
tab f r [fw=n], chi2 lr row
tab w r [fw=n], chi2 lr row
ipf [fw=n], fit(f+w+r) /* mutual independence */
ipf [fw=n], fit(f*w+r) /* joint independence */
ipf [fw=n], fit(f*r+w)
ipf [fw=n], fit(w*r+f)
ipf [fw=n], fit(f*w+f*r) /* conditional independence */
ipf [fw=n], fit(w*f+w*r)
ipf [fw=n], fit(r*w+r*f)
ipf [fw=n], fit(f*w+f*r+w*r) /* homogeneous association */
ipf [fw=n], fit(f*w*r) /* fully saturated */
summary of models
model df G2 X2
f+w+r 24 117.10 101.33 mutual independence
f*w+r 21 102.99 85.56 joint independence
f*r+w 21 115.44 102.94 joint independence
w*r+f 15 33.08 29.39 joint independence
f*w+f*r 18 101.32 84.14 conditional independence
w*f+w*r 12 18.96 16.24 conditional independence (n.s.)
r*w+r*f 12 31.42 28.03 conditional independence
f*w+f*r+w*r 9 6.53 5.67 homogeneous association (n.s.)
f*w*r 0 0.0 0.0 fully saturated
xi3: poisson n i.w*i.f i.w*i.r
fitstat, saving(m1)
xi3: poisson n i.w*i.f i.w*i.r i.f*i.r
fitstat, using(m1)
Measures of Fit for poisson of n
Current Saved Difference
Model: poisson poisson
N: 28 28 0
Log-Lik Intercept Only: -84.603 -84.603 0.000
Log-Lik Full Model: -52.752 -56.749 3.997
D: 105.504(6) 113.498(9) 7.994(3)
LR: 63.702(21) 55.707(18) 7.994(3)
Prob > LR: 0.000 0.000 0.046
McFadden's R2: 0.376 0.329 0.047
McFadden's Adj R2: 0.116 0.105 0.012
Maximum Likelihood R2: 0.897 0.863 0.034
Cragg & Uhler's R2: 0.899 0.865 0.034
AIC: 5.339 5.411 -0.071
AIC*n: 149.504 151.498 -1.994
BIC: 85.510 83.508 2.002
BIC': 6.275 4.272 2.002
Difference of 2.002 in BIC' provides positive support for saved model.
Note: p-value for difference in LR is only valid if models are nested.
Categorical Data Analysis Course
Phil Ender