Consider the Following 4 Group Design:
Level a1
a2 a3 a4 Total
1
3
2
2
2
3
4
3
5
6
4
5
10
10
9
11
Mean 2.0 3.0 5.0 10.0 5.0
Dummy Coding
Dummy coded variables are also known as indicator variables.
input y grp d1 d2 d3
1 1 1 0 0
3 1 1 0 0
2 1 1 0 0
2 1 1 0 0
2 2 0 1 0
3 2 0 1 0
4 2 0 1 0
3 2 0 1 0
5 3 0 0 1
6 3 0 0 1
4 3 0 0 1
5 3 0 0 1
10 4 0 0 0
10 4 0 0 0
9 4 0 0 0
11 4 0 0 0
end
tabstat y, by(grp)
Summary for variables: y
by categories of: grp
grp | mean
---------+----------
1 | 2
2 | 3
3 | 5
4 | 10
---------+----------
Total | 5
--------------------
regress y d1 d2 d3
Source | SS df MS Number of obs = 16
---------+------------------------------ F( 3, 12) = 76.00
Model | 152.00 3 50.6666667 Prob > F = 0.0000
Residual | 8.00 12 .666666667 R-squared = 0.9500
---------+------------------------------ Adj R-squared = 0.9375
Total | 160.00 15 10.6666667 Root MSE = .8165
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
d1 | -8 .5773503 -13.856 0.000 -9.257938 -6.742062
d2 | -7 .5773503 -12.124 0.000 -8.257938 -5.742062
d3 | -5 .5773503 -8.660 0.000 -6.257938 -3.742062
_cons | 10 .4082483 24.495 0.000 9.110503 10.8895
------------------------------------------------------------------------------
Introduced in Stata 11, dummy coded factor variables can be generated for
most estomation models.
regress y i.grp
Source | SS df MS Number of obs = 16
-------------+------------------------------ F( 3, 12) = 76.00
Model | 152 3 50.6666667 Prob > F = 0.0000
Residual | 8 12 .666666667 R-squared = 0.9500
-------------+------------------------------ Adj R-squared = 0.9375
Total | 160 15 10.6666667 Root MSE = .8165
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
grp |
2 | 1 .5773503 1.73 0.109 -.2579382 2.257938
3 | 3 .5773503 5.20 0.000 1.742062 4.257938
4 | 8 .5773503 13.86 0.000 6.742062 9.257938
|
_cons | 2 .4082483 4.90 0.000 1.110503 2.889497
------------------------------------------------------------------------------
/* change reference group to grp 4 */
regress y ib4.grp
Source | SS df MS Number of obs = 16
-------------+------------------------------ F( 3, 12) = 76.00
Model | 152 3 50.6666667 Prob > F = 0.0000
Residual | 8 12 .666666667 R-squared = 0.9500
-------------+------------------------------ Adj R-squared = 0.9375
Total | 160 15 10.6666667 Root MSE = .8165
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
grp |
1 | -8 .5773503 -13.86 0.000 -9.257938 -6.742062
2 | -7 .5773503 -12.12 0.000 -8.257938 -5.742062
3 | -5 .5773503 -8.66 0.000 -6.257938 -3.742062
|
_cons | 10 .4082483 24.49 0.000 9.110503 10.8895
------------------------------------------------------------------------------
/* anova treats all predictors as categorical unless otherwise indicated */
anova y grp
Number of obs = 16 R-squared = 0.9500
Root MSE = .816497 Adj R-squared = 0.9375
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
Model | 152.00 3 50.6666667 76.00 0.0000
|
grp | 152.00 3 50.6666667 76.00 0.0000
|
Residual | 8.00 12 .666666667
-----------+----------------------------------------------------
Total | 160.00 15 10.6666667
regress
Source | SS df MS Number of obs = 16
-------------+------------------------------ F( 3, 12) = 76.00
Model | 152 3 50.6666667 Prob > F = 0.0000
Residual | 8 12 .666666667 R-squared = 0.9500
-------------+------------------------------ Adj R-squared = 0.9375
Total | 160 15 10.6666667 Root MSE = .8165
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
grp |
2 | 1 .5773503 1.73 0.109 -.2579382 2.257938
3 | 3 .5773503 5.20 0.000 1.742062 4.257938
4 | 8 .5773503 13.86 0.000 6.742062 9.257938
|
_cons | 2 .4082483 4.90 0.000 1.110503 2.889497
------------------------------------------------------------------------------
Effect Coding
Effect coding is sometimes known as deviation coding.
input y grp e1 e2 e3
1 1 1 0 0
3 1 1 0 0
2 1 1 0 0
2 1 1 0 0
2 2 0 1 0
3 2 0 1 0
4 2 0 1 0
3 2 0 1 0
5 3 0 0 1
6 3 0 0 1
4 3 0 0 1
5 3 0 0 1
10 4 -1 -1 -1
10 4 -1 -1 -1
9 4 -1 -1 -1
11 4 -1 -1 -1
end
regress y e1 e2 e3
Source | SS df MS Number of obs = 16
---------+------------------------------ F( 3, 12) = 76.00
Model | 152.00 3 50.6666667 Prob > F = 0.0000
Residual | 8.00 12 .666666667 R-squared = 0.9500
---------+------------------------------ Adj R-squared = 0.9375
Total | 160.00 15 10.6666667 Root MSE = .8165
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
e1 | -3 .3535534 -8.485 0.000 -3.770327 -2.229673
e2 | -2 .3535534 -5.657 0.000 -2.770327 -1.229673
e3 | 0 .3535534 0.000 1.000 -.7703266 .7703266
_cons | 5 .2041241 24.495 0.000 4.555252 5.444748
------------------------------------------------------------------------------
test e1 e2 e3
( 1) e1 = 0
( 2) e2 = 0
( 3) e3 = 0
F( 3, 12) = 76.00
Prob > F = 0.0000Orthogonal Coding
Example Using Orthogonal Coding
input y grp x1 x2 x3
1 1 1 1 1
3 1 1 1 1
2 1 1 1 1
2 1 1 1 1
2 2 -1 1 1
3 2 -1 1 1
4 2 -1 1 1
3 2 -1 1 1
5 3 0 -2 1
6 3 0 -2 1
4 3 0 -2 1
5 3 0 -2 1
10 4 0 0 -3
10 4 0 0 -3
9 4 0 0 -3
11 4 0 0 -3
end
table grp, contents(freq mean y sd y)
----------------------------------------------
grp | Freq. mean(y) sd(y)
----------+-----------------------------------
1 | 4 2 .8164966
2 | 4 3 .8164966
3 | 4 5 .8164966
4 | 4 10 .8164966
----------------------------------------------
corr x1 x2 x3
(obs=16)
| x1 x2 x3
-------------+---------------------------
x1 | 1.0000
x2 | 0.0000 1.0000
x3 | 0.0000 0.0000 1.0000
Anova
anova y grp
Number of obs = 16 R-squared = 0.9500
Root MSE = .816497 Adj R-squared = 0.9375
Source | Partial SS df MS F Prob > F
-----------+----------------------------------------------------
Model | 152.00 3 50.6666667 76.00 0.0000
|
grp | 152.00 3 50.6666667 76.00 0.0000
|
Residual | 8.00 12 .666666667
-----------+----------------------------------------------------
Total | 160.00 15 10.6666667
Regression Analysis Using Orthogonal Coding
regress y x1 x2 x3
Source | SS df MS Number of obs = 16
---------+------------------------------ F( 3, 12) = 76.00
Model | 152.00 3 50.6666667 Prob > F = 0.0000
Residual | 8.00 12 .666666667 R-squared = 0.9500
---------+------------------------------ Adj R-squared = 0.9375
Total | 160.00 15 10.6666667 Root MSE = .8165
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
---------+--------------------------------------------------------------------
x1 | -.5 .2886751 -1.732 0.109 -1.128969 .1289691
x2 | -.8333333 .1666667 -5.000 0.000 -1.196469 -.4701979
x3 | -1.666667 .1178511 -14.142 0.000 -1.923442 -1.409891
_cons | 5 .2041241 24.495 0.000 4.555252 5.444748
------------------------------------------------------------------------------
test x1 x2 x3
( 1) x1 = 0
( 2) x2 = 0
( 3) x3 = 0
F( 3, 12) = 76.00
Prob > F = 0.0000Orthogonal Coding Schema
Grp X1 X2 X3 X4 X5 X6 X7 X8 X9 1 1 1 1 1 1 1 1 1 1 2 -1 1 1 1 1 1 1 1 1 3 0 -2 1 1 1 1 1 1 1 4 0 0 -3 1 1 1 1 1 1 5 0 0 0 -4 1 1 1 1 1 6 0 0 0 0 -5 1 1 1 1 7 0 0 0 0 0 -6 1 1 1 8 0 0 0 0 0 0 -7 1 1 9 0 0 0 0 0 0 0 -8 1 10 0 0 0 0 0 0 0 0 -9
Linear Statistical Models Course
Phil Ender, 17sep10, 21Feb02, 17Mar98