As we discussed in the previous unit, probit analysis is based on the cululative normal probability distribution. The coefficients of the probit model are effects on a cumulative normal function of the probabilities that the response variable equals one. Here is a table of some z-scores and their associated probabilities:
Z-score Prob -2.0 .0228 -1.0 .1587 -0.5 .3085 0.0 .5000 0.5 .6915 1.0 .8413 2.0 .9772Consider an intercept only model using the honors dataset which we encountered earlier.
use http://www.gseis.ucla.edu/courses/data/honors, clear
probit honors, nolog
Probit estimates Number of obs = 200
LR chi2(0) = -0.00
Prob > chi2 = .
Log likelihood = -115.64441 Pseudo R2 = -0.0000
------------------------------------------------------------------------------
honors | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | -.628006 .0952758 -6.59 0.000 -.8147431 -.4412689
------------------------------------------------------------------------------
predict p0
The constant can be interpreted as a predicted z-score of -.628006. We could look this z-score up in
a table or we could use Stata's norm function to find the probability associated with this
z-score. We can also find the rhe empirical probability of being in honors using the sumarize
command. And then we can compare both of these to the predicted probability from the
predict command.
display norm(-.628006)
.265
summarize p0
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
p0 | 200 .265 0 .265 .265
tablist p0
+-------------+
| p0 Freq |
|-------------|
| .265 200 |
+-------------+
Next, we will add female to the model.
probit honors female, nolog
Probit estimates Number of obs = 200
LR chi2(1) = 3.94
Prob > chi2 = 0.0473
Log likelihood = -113.6769 Pseudo R2 = 0.0170
------------------------------------------------------------------------------
honors | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | .3848753 .1952923 1.97 0.049 .0021095 .767641
_cons | -.8494977 .1501507 -5.66 0.000 -1.143788 -.5552078
------------------------------------------------------------------------------
predict xb, xb
tablist female xb
+---------------------------+
| female xb Freq |
|---------------------------|
| female -.4646225 109 |
| male -.8494977 91 |
+---------------------------+
predict p1
tablist female p1
+--------------------------+
| female p1 Freq |
|--------------------------|
| female .3211009 109 |
| male .1978022 91 |
+--------------------------+
Thus, males would would have the probability associated with a predicted z-score of -.8494977 and
females would have a z-score .3211009 higher, that is, being female increases the predicted
z-score by .3211009.
display norm(-.8494977) .1978022 display -.8494977+.3848753 -.4646224 display norm(-.4646224) .32110094Finally, we will center math on 50 and use it as an interval predictor in the model.
generate math50 = math - 50
probit honors math50, nolog
Probit estimates Number of obs = 200
LR chi2(1) = 64.91
Prob > chi2 = 0.0000
Log likelihood = -83.191478 Pseudo R2 = 0.2806
------------------------------------------------------------------------------
honors | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
math50 | .0969714 .0138825 6.99 0.000 .0697622 .1241806
_cons | -1.107924 .1399694 -7.92 0.000 -1.382258 -.8335886
------------------------------------------------------------------------------
predict p2
sort math
list math p2 if math==50 | math==51
+-----------------+
| math p2 |
|-----------------|
81. | 50 .1339474 |
82. | 50 .1339474 |
83. | 50 .1339474 |
84. | 50 .1339474 |
85. | 50 .1339474 |
|-----------------|
86. | 50 .1339474 |
87. | 50 .1339474 |
88. | 51 .1560197 |
89. | 51 .1560197 |
90. | 51 .1560197 |
|-----------------|
91. | 51 .1560197 |
92. | 51 .1560197 |
93. | 51 .1560197 |
94. | 51 .1560197 |
95. | 51 .1560197 |
+-----------------+
Now the constant is the predicted z-score when math equals 50 and the coefficient tells us
how much the z-score will increase for each one-unit increase in the math score. Thus, a math score
of 51 yields a predicted z-score of -1.0109526.
display norm(-1.107924) .13394732 display -1.107924+.0969714 -1.0109526 display norm(-1.0109526) .15601956We can verify that these same predicted probabilities are found when using math untransformed.
probit honors math, nolog
Probit estimates Number of obs = 200
LR chi2(1) = 64.91
Prob > chi2 = 0.0000
Log likelihood = -83.191478 Pseudo R2 = 0.2806
------------------------------------------------------------------------------
honors | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
math | .0969714 .0138825 6.99 0.000 .0697622 .1241806
_cons | -5.956492 .787668 -7.56 0.000 -7.500293 -4.412692
------------------------------------------------------------------------------
prvalue, x(math=50) /* from Long and Freese */
probit: Predictions for honors
Pr(y=1|x): 0.1339 95% ci: (0.0834,0.2023)
Pr(y=0|x): 0.8661 95% ci: (0.7977,0.9166)
math
x= 50
prvalue, x(math=51)
probit: Predictions for honors
Pr(y=1|x): 0.1560 95% ci: (0.1021,0.2259)
Pr(y=0|x): 0.8440 95% ci: (0.7741,0.8979)
math
x= 51
Note that although it is possible to interpret the probit coefficients as changes in z-scores we
end up convert the z-scores to probabilities. So, in the end its probably better to focus on the
probabilities and/or the changes in probability in interpreting your probit model.
Categorical Data Analysis Course
Phil Ender