Dichotomous Variables
Interpreting Coefficients
Level a1
a2 Total
1
3
2
2
2
3
4
3
5
6
4
5
10
10
9
11
Mean 2.5 7.5 5.0
Example Using Dummy Coding
input y grp x1 x2 x3 x4 onetwo
1 1 1 0 1 326 1
3 1 1 0 1 326 1
2 1 1 0 1 326 1
2 1 1 0 1 326 1
2 1 1 0 1 326 1
3 1 1 0 1 326 1
4 1 1 0 1 326 1
3 1 1 0 1 326 1
5 2 0 1 -1 -11814 2
6 2 0 1 -1 -11814 2
4 2 0 1 -1 -11814 2
5 2 0 1 -1 -11814 2
10 2 0 1 -1 -11814 2
10 2 0 1 -1 -11814 2
9 2 0 1 -1 -11814 2
11 2 0 1 -1 -11814 2
end
regress y grp, beta
Source | SS df MS Number of obs = 16
---------+------------------------------ F( 1, 14) = 23.33
Model | 100.00 1 100.00 Prob > F = 0.0003
Residual | 60.00 14 4.28571429 R-squared = 0.6250
---------+------------------------------ Adj R-squared = 0.5982
Total | 160.00 15 10.6666667 Root MSE = 2.0702
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| Beta
---------+--------------------------------------------------------------------
grp | 5 1.035098 4.830 0.000 .7905694
_cons | -2.5 1.636634 -1.528 0.149 .
------------------------------------------------------------------------------
regress y x1, beta
Source | SS df MS Number of obs = 16
---------+------------------------------ F( 1, 14) = 23.33
Model | 100.00 1 100.00 Prob > F = 0.0003
Residual | 60.00 14 4.28571429 R-squared = 0.6250
---------+------------------------------ Adj R-squared = 0.5982
Total | 160.00 15 10.6666667 Root MSE = 2.0702
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| Beta
---------+--------------------------------------------------------------------
x1 | -5 1.035098 -4.830 0.000 -.7905694
_cons | 7.5 .7319251 10.247 0.000 .
------------------------------------------------------------------------------
regress y x2, beta
Source | SS df MS Number of obs = 16
---------+------------------------------ F( 1, 14) = 23.33
Model | 100.00 1 100.00 Prob > F = 0.0003
Residual | 60.00 14 4.28571429 R-squared = 0.6250
---------+------------------------------ Adj R-squared = 0.5982
Total | 160.00 15 10.6666667 Root MSE = 2.0702
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| Beta
---------+--------------------------------------------------------------------
x2 | 5 1.035098 4.830 0.000 .7905694
_cons | 2.5 .7319251 3.416 0.004 .
------------------------------------------------------------------------------
regress y x3, beta
Source | SS df MS Number of obs = 16
---------+------------------------------ F( 1, 14) = 23.33
Model | 100.00 1 100.00 Prob > F = 0.0003
Residual | 60.00 14 4.28571429 R-squared = 0.6250
---------+------------------------------ Adj R-squared = 0.5982
Total | 160.00 15 10.6666667 Root MSE = 2.0702
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| Beta
---------+--------------------------------------------------------------------
x3 | -2.5 .5175492 -4.830 0.000 -.7905694
_cons | 5 .5175492 9.661 0.000 .
------------------------------------------------------------------------------
regress y x4, beta
Source | SS df MS Number of obs = 16
---------+------------------------------ F( 1, 14) = 23.33
Model | 100.00 1 100.00 Prob > F = 0.0003
Residual | 60.00 14 4.28571429 R-squared = 0.6250
---------+------------------------------ Adj R-squared = 0.5982
Total | 160.00 15 10.6666667 Root MSE = 2.0702
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| Beta
---------+--------------------------------------------------------------------
x4 | -.0004119 .0000853 -4.830 0.000 -.7905694
_cons | 2.634267 .7125415 3.697 0.002 .
------------------------------------------------------------------------------
regress y x1 x2, beta
Source | SS df MS Number of obs = 16
---------+------------------------------ F( 1, 14) = 23.33
Model | 100.00 1 100.00 Prob > F = 0.0003
Residual | 60.00 14 4.28571429 R-squared = 0.6250
---------+------------------------------ Adj R-squared = 0.5982
Total | 160.00 15 10.6666667 Root MSE = 2.0702
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| Beta
---------+--------------------------------------------------------------------
x1 | -5 1.035098 -4.830 0.000 -.7905694
x2 | (dropped)
_cons | 7.5 .7319251 10.247 0.000 .
------------------------------------------------------------------------------
Well, why not just use 1's and 2's, why all this 0/1 or 1/-1 coding.
regress y onetwo
Source | SS df MS Number of obs = 16
-------------+------------------------------ F( 1, 14) = 23.33
Model | 100 1 100 Prob > F = 0.0003
Residual | 60 14 4.28571429 R-squared = 0.6250
-------------+------------------------------ Adj R-squared = 0.5982
Total | 160 15 10.6666667 Root MSE = 2.0702
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
onetwo | 5 1.035098 4.83 0.000 2.779935 7.220065
_cons | -2.5 1.636634 -1.53 0.149 -6.010231 1.010231
------------------------------------------------------------------------------As you can see, the coefficient for the groups is the same as for dummy coding. However, the constant is not as informative since it represents the mean for the group coded zero. A group that does not, in fact, exist. In this respect, dummy coding is much more informative.
Automatic Dummy Coding
Stata introduced factor variables in Stata 11, which allow for the automatic coding of dummy variables. It is also easy to change the reference group when using factor variables.
regress y i.grp
Source | SS df MS Number of obs = 16
-------------+------------------------------ F( 1, 14) = 23.33
Model | 100 1 100 Prob > F = 0.0003
Residual | 60 14 4.28571429 R-squared = 0.6250
-------------+------------------------------ Adj R-squared = 0.5982
Total | 160 15 10.6666667 Root MSE = 2.0702
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
2.grp | 5 1.035098 4.83 0.000 2.779935 7.220065
_cons | 2.5 .7319251 3.42 0.004 .9301769 4.069823
------------------------------------------------------------------------------
/* changing the reference grp */
regress y ib2.grp
Source | SS df MS Number of obs = 16
-------------+------------------------------ F( 1, 14) = 23.33
Model | 100 1 100 Prob > F = 0.0003
Residual | 60 14 4.28571429 R-squared = 0.6250
-------------+------------------------------ Adj R-squared = 0.5982
Total | 160 15 10.6666667 Root MSE = 2.0702
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.grp | -5 1.035098 -4.83 0.000 -7.220065 -2.779935
_cons | 7.5 .7319251 10.25 0.000 5.930177 9.069823
------------------------------------------------------------------------------
Linear Statistical Models Course
Phil Ender, 17sep10, 11Feb99