Dichotomous Variables

Linear Statistical Models: Regression

Dichotomous Variables

Updated for Stata 11

Dichotomous Variables

A categorical variable with two levels.

Observations can be classed into two groups; male/female, group 1/group2, true/false, yes/no, etc.

Can use 1/0, 1/-1 or any coding system that uses two different values even 1/2 (see below).

1/0 coding is called dummy coding.
1/-1 coding is called effect coding.

Interpreting Coefficients

Dummy Coding

Constant -- mean of group coded zero, i.e., the reference group
Regression coefficient -- difference in means of group coded one and group coded zero (reference group

Effect Coding

Constant -- grand mean
Regression coefficient -- difference in means of group coded one and the grand mean

Consider the Following Two Group Design:

Level a1 a2 Total
1
3
2
2
2
3
4
3 5
6
4
5
10
10
9
11
Mean 2.5 7.5 5.0

Level	a1	a2	Total
	1 3 2 2 2 3 4 3	5 6 4 5 10 10 9 11
Mean	2.5	7.5	5.0

Example Using Dummy Coding

input y  grp x1 x2 x3 x4 onetwo
 1   1  1  0   1   326   1
 3   1  1  0   1   326   1
 2   1  1  0   1   326   1
 2   1  1  0   1   326   1
 2   1  1  0   1   326   1
 3   1  1  0   1   326   1
 4   1  1  0   1   326   1
 3   1  1  0   1   326   1
 5   2  0  1  -1 -11814  2
 6   2  0  1  -1 -11814  2
 4   2  0  1  -1 -11814  2
 5   2  0  1  -1 -11814  2
10   2  0  1  -1 -11814  2
10   2  0  1  -1 -11814  2
 9   2  0  1  -1 -11814  2
11   2  0  1  -1 -11814  2
end

regress y grp, beta

  Source |       SS       df       MS                  Number of obs =      16
---------+------------------------------               F(  1,    14) =   23.33
   Model |      100.00     1      100.00               Prob > F      =  0.0003
Residual |       60.00    14  4.28571429               R-squared     =  0.6250
---------+------------------------------               Adj R-squared =  0.5982
   Total |      160.00    15  10.6666667               Root MSE      =  2.0702

------------------------------------------------------------------------------
       y |      Coef.   Std. Err.       t     P>|t|                       Beta
---------+--------------------------------------------------------------------
     grp |          5   1.035098      4.830   0.000                   .7905694
   _cons |       -2.5   1.636634     -1.528   0.149                          .
------------------------------------------------------------------------------

regress y x1, beta

  Source |       SS       df       MS                  Number of obs =      16
---------+------------------------------               F(  1,    14) =   23.33
   Model |      100.00     1      100.00               Prob > F      =  0.0003
Residual |       60.00    14  4.28571429               R-squared     =  0.6250
---------+------------------------------               Adj R-squared =  0.5982
   Total |      160.00    15  10.6666667               Root MSE      =  2.0702

------------------------------------------------------------------------------
       y |      Coef.   Std. Err.       t     P>|t|                       Beta
---------+--------------------------------------------------------------------
      x1 |         -5   1.035098     -4.830   0.000                  -.7905694
   _cons |        7.5   .7319251     10.247   0.000                          .
------------------------------------------------------------------------------

regress y x2, beta

  Source |       SS       df       MS                  Number of obs =      16
---------+------------------------------               F(  1,    14) =   23.33
   Model |      100.00     1      100.00               Prob > F      =  0.0003
Residual |       60.00    14  4.28571429               R-squared     =  0.6250
---------+------------------------------               Adj R-squared =  0.5982
   Total |      160.00    15  10.6666667               Root MSE      =  2.0702

------------------------------------------------------------------------------
       y |      Coef.   Std. Err.       t     P>|t|                       Beta
---------+--------------------------------------------------------------------
      x2 |          5   1.035098      4.830   0.000                   .7905694
   _cons |        2.5   .7319251      3.416   0.004                          .
------------------------------------------------------------------------------

regress y x3, beta

  Source |       SS       df       MS                  Number of obs =      16
---------+------------------------------               F(  1,    14) =   23.33
   Model |      100.00     1      100.00               Prob > F      =  0.0003
Residual |       60.00    14  4.28571429               R-squared     =  0.6250
---------+------------------------------               Adj R-squared =  0.5982
   Total |      160.00    15  10.6666667               Root MSE      =  2.0702

------------------------------------------------------------------------------
       y |      Coef.   Std. Err.       t     P>|t|                       Beta
---------+--------------------------------------------------------------------
      x3 |       -2.5   .5175492     -4.830   0.000                  -.7905694
   _cons |          5   .5175492      9.661   0.000                          .
------------------------------------------------------------------------------

regress y x4, beta

  Source |       SS       df       MS                  Number of obs =      16
---------+------------------------------               F(  1,    14) =   23.33
   Model |      100.00     1      100.00               Prob > F      =  0.0003
Residual |       60.00    14  4.28571429               R-squared     =  0.6250
---------+------------------------------               Adj R-squared =  0.5982
   Total |      160.00    15  10.6666667               Root MSE      =  2.0702

------------------------------------------------------------------------------
       y |      Coef.   Std. Err.       t     P>|t|                       Beta
---------+--------------------------------------------------------------------
      x4 |  -.0004119   .0000853     -4.830   0.000                  -.7905694
   _cons |   2.634267   .7125415      3.697   0.002                          .
------------------------------------------------------------------------------

regress y x1 x2, beta

  Source |       SS       df       MS                  Number of obs =      16
---------+------------------------------               F(  1,    14) =   23.33
   Model |      100.00     1      100.00               Prob > F      =  0.0003
Residual |       60.00    14  4.28571429               R-squared     =  0.6250
---------+------------------------------               Adj R-squared =  0.5982
   Total |      160.00    15  10.6666667               Root MSE      =  2.0702

------------------------------------------------------------------------------
       y |      Coef.   Std. Err.       t     P>|t|                       Beta
---------+--------------------------------------------------------------------
      x1 |         -5   1.035098     -4.830   0.000                  -.7905694
      x2 |  (dropped)
   _cons |        7.5   .7319251     10.247   0.000                          .
------------------------------------------------------------------------------

Well, why not just use 1's and 2's, why all this 0/1 or 1/-1 coding.

regress y onetwo

      Source |       SS       df       MS              Number of obs =      16
-------------+------------------------------           F(  1,    14) =   23.33
       Model |         100     1         100           Prob > F      =  0.0003
    Residual |          60    14  4.28571429           R-squared     =  0.6250
-------------+------------------------------           Adj R-squared =  0.5982
       Total |         160    15  10.6666667           Root MSE      =  2.0702

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      onetwo |          5   1.035098     4.83   0.000     2.779935    7.220065
       _cons |       -2.5   1.636634    -1.53   0.149    -6.010231    1.010231
------------------------------------------------------------------------------

As you can see, the coefficient for the groups is the same as for dummy coding. However, the constant is not as informative since it represents the mean for the group coded zero. A group that does not, in fact, exist. In this respect, dummy coding is much more informative.

Automatic Dummy Coding

Stata introduced factor variables in Stata 11, which allow for the automatic coding of dummy variables. It is also easy to change the reference group when using factor variables.

regress y i.grp

      Source |       SS       df       MS              Number of obs =      16
-------------+------------------------------           F(  1,    14) =   23.33
       Model |         100     1         100           Prob > F      =  0.0003
    Residual |          60    14  4.28571429           R-squared     =  0.6250
-------------+------------------------------           Adj R-squared =  0.5982
       Total |         160    15  10.6666667           Root MSE      =  2.0702

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       2.grp |          5   1.035098     4.83   0.000     2.779935    7.220065
       _cons |        2.5   .7319251     3.42   0.004     .9301769    4.069823
------------------------------------------------------------------------------

/* changing the reference grp */

regress y ib2.grp

      Source |       SS       df       MS              Number of obs =      16
-------------+------------------------------           F(  1,    14) =   23.33
       Model |         100     1         100           Prob > F      =  0.0003
    Residual |          60    14  4.28571429           R-squared     =  0.6250
-------------+------------------------------           Adj R-squared =  0.5982
       Total |         160    15  10.6666667           Root MSE      =  2.0702

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       1.grp |         -5   1.035098    -4.83   0.000    -7.220065   -2.779935
       _cons |        7.5   .7319251    10.25   0.000     5.930177    9.069823
------------------------------------------------------------------------------

Linear Statistical Models Course

Phil Ender, 17sep10, 11Feb99