So Far...
Example Using hsbdemo
We will look at a model that uses write as the response variable and female and prog as predictors.
use http://www.philender.com/courses/data/hsbdemo, clear
tab1 female prog
-> tabulation of female
female | Freq. Percent Cum.
------------+-----------------------------------
male | 91 45.50 45.50
female | 109 54.50 100.00
------------+-----------------------------------
Total | 200 100.00
-> tabulation of prog
type of |
program | Freq. Percent Cum.
------------+-----------------------------------
general | 45 22.50 22.50
academic | 105 52.50 75.00
vocation | 50 25.00 100.00
------------+-----------------------------------
Total | 200 100.00
table prog female, cont(mean write sd write freq)
------------------------------
type of | female
program | male female
----------+-------------------
general | 49.14286 53.25
| 10.36478 8.205248
| 21 24
|
academic | 54.61702 57.58621
| 8.656622 7.115672
| 47 58
|
vocation | 41.82609 50.96296
| 8.003705 8.341193
| 23 27
------------------------------
/* model 2 -- no interaction */
anova write female prog
Number of obs = 200 R-squared = 0.2408
Root MSE = 8.32211 Adj R-squared = 0.2291
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
Model | 4304.40272 3 1434.80091 20.72 0.0000
|
female | 1128.70487 1 1128.70487 16.30 0.0001
prog | 3128.18888 2 1564.09444 22.58 0.0000
|
Residual | 13574.4723 196 69.2575116
-----------+----------------------------------------------------
Total | 17878.875 199 89.843593
regress write i.female i.prog
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 3, 196) = 20.72
Model | 4304.40272 3 1434.80091 Prob > F = 0.0000
Residual | 13574.4723 196 69.2575116 R-squared = 0.2408
-------------+------------------------------ Adj R-squared = 0.2291
Total | 17878.875 199 89.843593 Root MSE = 8.3221
------------------------------------------------------------------------------
write | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.female | 4.771211 1.181876 4.04 0.000 2.440385 7.102037
|
prog |
2 | 4.832929 1.482956 3.26 0.001 1.908331 7.757528
3 | -4.605141 1.710049 -2.69 0.008 -7.9776 -1.232683
|
_cons | 48.78869 1.391537 35.06 0.000 46.04438 51.533
------------------------------------------------------------------------------
test 1.female
( 1) 1.female = 0
F( 1, 196) = 16.30
Prob > F = 0.0001
testparm i.prog
( 1) 2.prog = 0
( 2) 3.prog = 0
F( 2, 196) = 22.58
Prob > F = 0.0000
/* model 2 -- interaction */
anova write female prog female#prog
Number of obs = 200 R-squared = 0.2590
Root MSE = 8.26386 Adj R-squared = 0.2399
Source | Partial SS df MS F Prob > F
------------+----------------------------------------------------
Model | 4630.36091 5 926.072182 13.56 0.0000
|
female | 1261.85329 1 1261.85329 18.48 0.0000
prog | 3274.35082 2 1637.17541 23.97 0.0000
female#prog | 325.958189 2 162.979094 2.39 0.0946
|
Residual | 13248.5141 194 68.2913097
------------+----------------------------------------------------
Total | 17878.875 199 89.843593
regress write i.female##i.prog
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 5, 194) = 13.56
Model | 4630.36091 5 926.072182 Prob > F = 0.0000
Residual | 13248.5141 194 68.2913097 R-squared = 0.2590
-------------+------------------------------ Adj R-squared = 0.2399
Total | 17878.875 199 89.843593 Root MSE = 8.2639
------------------------------------------------------------------------------
write | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.female | 4.107143 2.469299 1.66 0.098 -.7629757 8.977261
|
prog |
2 | 5.474164 2.169095 2.52 0.012 1.196128 9.7522
3 | -7.31677 2.494224 -2.93 0.004 -12.23605 -2.397493
|
female#prog |
1 2 | -1.137957 2.954299 -0.39 0.701 -6.964625 4.68871
1 3 | 5.029733 3.40528 1.48 0.141 -1.686391 11.74586
|
_cons | 49.14286 1.803321 27.25 0.000 45.58623 52.69949
------------------------------------------------------------------------------
test _IfemXpro_1_2 _IfemXpro_1_3
( 1) 1.female#2.prog = 0
( 2) 1.female#3.prog = 0
F( 2, 194) = 2.39
Prob > F = 0.0946
test 1.female
( 1) 1.female = 0
F( 1, 194) = 2.77
Prob > F = 0.0979
testparm i.prog
( 1) 2.prog = 0
( 2) 3.prog = 0
F( 2, 194) = 18.69
Prob > F = 0.0000
Please note: With dummy coding the tests of the highest order interaction is the
same as that using anova. However, the tests of the main effects will not be
the same as anova. We will need to use a different approach, such as, the
anovalator program (findit anovalator).
anovalator female prog, main fratio anovalator main-effect for female chi2(1) = 18.477509 p-value = .00001719 scaled as F-ratio = 18.477509 anovalator main-effect for prog chi2(2) = 47.946815 p-value = 3.877e-11 scaled as F-ratio = 23.973408Some examples of 2x2 interactions
Consider the following 2x2 table of cell means and the regression results. The categorical predictors are A and B, each with two levels. The first example will not have a significant interaction effect.
| A0 | A1 | |
|---|---|---|
| B0 | 50.02 | 55.09 |
| B1 | 54.61 | 60.09 |
regress y1 a##b
Source | SS df MS Number of obs = 40
-------------+------------------------------ F( 3, 36) = 185.27
Model | 507.825013 3 169.275004 Prob > F = 0.0000
Residual | 32.8925799 36 .913682774 R-squared = 0.9392
-------------+------------------------------ Adj R-squared = 0.9341
Total | 540.717593 39 13.8645537 Root MSE = .95587
------------------------------------------------------------------------------
y1 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.a | 5.068859 .427477 11.86 0.000 4.201896 5.935823
1.b | 4.582972 .427477 10.72 0.000 3.716009 5.449936
|
a#b |
1 1 | .410215 .6045437 0.68 0.502 -.8158565 1.636286
|
_cons | 50.02439 .3022719 165.49 0.000 49.41135 50.63743
------------------------------------------------------------------------------
Let's interpret this regression table.Next, let's try an example in which the interaction term is significant and positive.
| A0 | A1 | |
|---|---|---|
| B0 | 50.25 | 54.73 |
| B1 | 55.10 | 65.08 |
regress y2 a##b
Source | SS df MS Number of obs = 40
-------------+------------------------------ F( 3, 36) = 753.03
Model | 1175.55141 3 391.850469 Prob > F = 0.0000
Residual | 18.7332456 36 .520367934 R-squared = 0.9843
-------------+------------------------------ Adj R-squared = 0.9830
Total | 1194.28465 39 30.6226834 Root MSE = .72137
------------------------------------------------------------------------------
y2 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.a | 4.485349 .3226044 13.90 0.000 3.831077 5.139621
1.b | 4.848378 .3226044 15.03 0.000 4.194106 5.50265
|
a#b |
1 1 | 5.494711 .4562315 12.04 0.000 4.569431 6.419991
|
_cons | 50.24988 .2281157 220.28 0.000 49.78724 50.71252
------------------------------------------------------------------------------
Let's interpret this regression table for y2.Finally, we will run a model in which the interaction coefficient is negative and statistically significant.
| A0 | A1 | |
|---|---|---|
| B0 | 50.33 | 55.21 |
| B1 | 54.76 | 55.43 |
regress y3 a##b
Source | SS df MS Number of obs = 40
-------------+------------------------------ F( 3, 36) = 33.39
Model | 175.137941 3 58.3793136 Prob > F = 0.0000
Residual | 62.938775 36 1.74829931 R-squared = 0.7356
-------------+------------------------------ Adj R-squared = 0.7136
Total | 238.076716 39 6.10453118 Root MSE = 1.3222
------------------------------------------------------------------------------
y3 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.a | 4.879437 .5913204 8.25 0.000 3.680183 6.07869
1.b | 4.427636 .5913204 7.49 0.000 3.228383 5.626889
|
a#b |
1 1 | -4.213134 .8362534 -5.04 0.000 -5.909134 -2.517133
|
_cons | 50.33252 .4181267 120.38 0.000 49.48452 51.18052
------------------------------------------------------------------------------
Let's interpret this regression table for y3.We can see from the tables and regression results that the coefficient for the interaction term specifies the amount that is added to or subtracted from the sum of the coefficients for A and B plus the constant.
Linear Statistical Models Course
Phil Ender, 24sep10, 18dec99