Applied Categorical & Nonnormal Data Analysis

Interpreting Probit Coefficients

As we discussed in the previous unit, probit analysis is based on the cululative normal probability distribution. The coefficients of the probit model are effects on a cumulative normal function of the probabilities that the response variable equals one. Here is a table of some z-scores and their associated probabilities:

Z-score   Prob
 -2.0    .0228
 -1.0    .1587
 -0.5    .3085
  0.0    .5000
  0.5    .6915
  1.0    .8413
  2.0    .9772

Consider an intercept only model using the honors dataset which we encountered earlier.

use http://www.gseis.ucla.edu/courses/data/honors, clear

probit honors, nolog

Probit estimates                                  Number of obs   =        200
                                                  LR chi2(0)      =      -0.00
                                                  Prob > chi2     =          .
Log likelihood = -115.64441                       Pseudo R2       =    -0.0000

------------------------------------------------------------------------------
      honors |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   -.628006   .0952758    -6.59   0.000    -.8147431   -.4412689
------------------------------------------------------------------------------

predict p0

The constant can be interpreted as a predicted z-score of -.628006. We could look this z-score up in a table or we could use Stata's norm function to find the probability associated with this z-score. We can also find the rhe empirical probability of being in honors using the sumarize command. And then we can compare both of these to the predicted probability from the predict command.

display norm(-.628006)

.265

summarize p0

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
          p0 |       200        .265           0       .265       .265

tablist p0

  +-------------+
  |   p0   Freq |
  |-------------|
  | .265    200 |
  +-------------+

Next, we will add female to the model.

probit honors female, nolog

Probit estimates                                  Number of obs   =        200
                                                  LR chi2(1)      =       3.94
                                                  Prob > chi2     =     0.0473
Log likelihood =  -113.6769                       Pseudo R2       =     0.0170

------------------------------------------------------------------------------
      honors |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      female |   .3848753   .1952923     1.97   0.049     .0021095     .767641
       _cons |  -.8494977   .1501507    -5.66   0.000    -1.143788   -.5552078
------------------------------------------------------------------------------

predict xb, xb

tablist female xb

  +---------------------------+
  | female          xb   Freq |
  |---------------------------|
  | female   -.4646225    109 |
  |   male   -.8494977     91 |
  +---------------------------+

predict p1

tablist female p1


  +--------------------------+
  | female         p1   Freq |
  |--------------------------|
  | female   .3211009    109 |
  |   male   .1978022     91 |
  +--------------------------+

Thus, males would would have the probability associated with a predicted z-score of -.8494977 and females would have a z-score .3211009 higher, that is, being female increases the predicted z-score by .3211009.

display norm(-.8494977)

.1978022

display -.8494977+.3848753

-.4646224

display norm(-.4646224)

.32110094

Finally, we will center math on 50 and use it as an interval predictor in the model.

generate math50 = math - 50

probit honors math50, nolog

Probit estimates                                  Number of obs   =        200
                                                  LR chi2(1)      =      64.91
                                                  Prob > chi2     =     0.0000
Log likelihood = -83.191478                       Pseudo R2       =     0.2806

------------------------------------------------------------------------------
      honors |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      math50 |   .0969714   .0138825     6.99   0.000     .0697622    .1241806
       _cons |  -1.107924   .1399694    -7.92   0.000    -1.382258   -.8335886
------------------------------------------------------------------------------

predict p2

sort math

list math p2 if math==50 | math==51

     +-----------------+
     | math         p2 |
     |-----------------|
 81. |   50   .1339474 |
 82. |   50   .1339474 |
 83. |   50   .1339474 |
 84. |   50   .1339474 |
 85. |   50   .1339474 |
     |-----------------|
 86. |   50   .1339474 |
 87. |   50   .1339474 |
 88. |   51   .1560197 |
 89. |   51   .1560197 |
 90. |   51   .1560197 |
     |-----------------|
 91. |   51   .1560197 |
 92. |   51   .1560197 |
 93. |   51   .1560197 |
 94. |   51   .1560197 |
 95. |   51   .1560197 |
     +-----------------+

Now the constant is the predicted z-score when math equals 50 and the coefficient tells us how much the z-score will increase for each one-unit increase in the math score. Thus, a math score of 51 yields a predicted z-score of -1.0109526.

display norm(-1.107924)

.13394732

display -1.107924+.0969714

-1.0109526

display norm(-1.0109526)

.15601956

We can verify that these same predicted probabilities are found when using math untransformed.

probit honors math, nolog

Probit estimates                                  Number of obs   =        200
                                                  LR chi2(1)      =      64.91
                                                  Prob > chi2     =     0.0000
Log likelihood = -83.191478                       Pseudo R2       =     0.2806

------------------------------------------------------------------------------
      honors |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        math |   .0969714   .0138825     6.99   0.000     .0697622    .1241806
       _cons |  -5.956492    .787668    -7.56   0.000    -7.500293   -4.412692
------------------------------------------------------------------------------

prvalue, x(math=50)  /* from Long and Freese */
 
probit: Predictions for honors

  Pr(y=1|x):          0.1339   95% ci: (0.0834,0.2023)
  Pr(y=0|x):          0.8661   95% ci: (0.7977,0.9166)

    math
x=    50

prvalue, x(math=51)
 
probit: Predictions for honors

  Pr(y=1|x):          0.1560   95% ci: (0.1021,0.2259)
  Pr(y=0|x):          0.8440   95% ci: (0.7741,0.8979)

    math
x=    51

Note that although it is possible to interpret the probit coefficients as changes in z-scores we end up convert the z-scores to probabilities. So, in the end its probably better to focus on the probabilities and/or the changes in probability in interpreting your probit model.

Categorical Data Analysis Course

Phil Ender