Classical Regression vs Logistic Regression
Different Assumptions
Logistic Regression Assumptions
Logit
Note: I would like to thank John Napier (1550-1617), lord of Merchiston (near Edinburgh), for developing the idea of logarithms.
About Logistic Regression
Intrepreting Logistic Coefficients
Intrepreting Odds Ratios
Example Dataset
input apt gender admit
8 1 1
7 1 0
5 1 1
3 1 0
3 1 0
5 1 1
7 1 1
8 1 1
5 1 1
5 1 1
4 0 0
7 0 1
3 0 1
2 0 0
4 0 0
2 0 0
3 0 0
4 0 1
3 0 0
2 0 0
end
Example 1: Categorical Independent Variable
logit admit i.gender
Iteration 0: log likelihood = -13.862944
Iteration 1: log likelihood = -12.222013
Iteration 2: log likelihood = -12.217286
Iteration 3: log likelihood = -12.217286
Logistic regression Number of obs = 20
LR chi2(1) = 3.29
Prob > chi2 = 0.0696
Log likelihood = -12.217286 Pseudo R2 = 0.1187
------------------------------------------------------------------------------
admit | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.gender | 1.694596 .9759001 1.74 0.082 -.2181333 3.607325
_cons | -.8472978 .6900656 -1.23 0.220 -2.199801 .5052058
------------------------------------------------------------------------------
logit admit gender, or
Logistic regression Number of obs = 20
LR chi2(1) = 3.29
Prob > chi2 = 0.0696
Log likelihood = -12.217286 Pseudo R2 = 0.1187
------------------------------------------------------------------------------
admit | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.gender | 5.444444 5.313233 1.74 0.082 .8040182 36.86729
------------------------------------------------------------------------------
Example 2: Continuous Independent Variable
logit admit apt
Iteration 0: log likelihood = -13.862944
Iteration 1: log likelihood = -9.6278718
Iteration 2: log likelihood = -9.3197603
Iteration 3: log likelihood = -9.3029734
Iteration 4: log likelihood = -9.3028914
Logit estimates Number of obs = 20
LR chi2(1) = 9.12
Prob > chi2 = 0.0025
Log likelihood = -9.3028914 Pseudo R2 = 0.3289
------------------------------------------------------------------------------
admit | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
apt | .9455112 .422872 2.236 0.025 .1166974 1.774325
_cons | -4.095248 1.83403 -2.233 0.026 -7.689881 -.5006154
------------------------------------------------------------------------------
logit, or
Logit estimates Number of obs = 20
LR chi2(1) = 9.12
Prob > chi2 = 0.0025
Log likelihood = -9.3028914 Pseudo R2 = 0.3289
------------------------------------------------------------------------------
admit | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
apt | 2.574129 1.088527 2.236 0.025 1.123779 5.8963
------------------------------------------------------------------------------
Example 3: Categorical & Continuous Independent Variables
logit admit i.gender apt
Iteration 0: log likelihood = -13.862944
Iteration 1: log likelihood = -9.3188454
Iteration 2: log likelihood = -9.2822992
Iteration 3: log likelihood = -9.2820991
Iteration 4: log likelihood = -9.2820991
Logistic regression Number of obs = 20
LR chi2(2) = 9.16
Prob > chi2 = 0.0102
Log likelihood = -9.2820991 Pseudo R2 = 0.3304
------------------------------------------------------------------------------
admit | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.gender | .2671938 1.300911 0.21 0.837 -2.282545 2.816932
apt | .8982803 .4713918 1.91 0.057 -.0256307 1.822191
_cons | -4.028764 1.838393 -2.19 0.028 -7.631949 -.4255801
------------------------------------------------------------------------------
logit, or
Logistic regression Number of obs = 20
LR chi2(2) = 9.16
Prob > chi2 = 0.0102
Log likelihood = -9.2820991 Pseudo R2 = 0.3304
------------------------------------------------------------------------------
admit | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.gender | 1.306294 1.699372 0.21 0.837 .1020242 16.72547
apt | 2.455377 1.157445 1.91 0.057 .974695 6.185398
------------------------------------------------------------------------------Example 4: Honors Composition using HSB Dataset
use http://www.philender.com/courses/data/hsbdemo, clear
tabulate honors
honcomp | Freq. Percent Cum.
------------+-----------------------------------
0 | 147 73.50 73.50
1 | 53 26.50 100.00
------------+-----------------------------------
Total | 200 100.00
logit honors female i.ses read math
Iteration 0: log likelihood = -115.64441
Iteration 1: log likelihood = -75.969526
Iteration 2: log likelihood = -72.051616
Iteration 3: log likelihood = -71.994777
Iteration 4: log likelihood = -71.994756
Iteration 5: log likelihood = -71.994756
Logistic regression Number of obs = 200
LR chi2(5) = 87.30
Prob > chi2 = 0.0000
Log likelihood = -71.994756 Pseudo R2 = 0.3774
------------------------------------------------------------------------------
honors | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | 1.145726 .4513589 2.54 0.011 .2610792 2.030374
|
ses |
2 | -1.040402 .5791511 -1.80 0.072 -2.175517 .094713
3 | .0541296 .5945439 0.09 0.927 -1.111155 1.219414
|
read | .0687277 .0287044 2.39 0.017 .0124681 .1249873
math | .1358904 .0336875 4.03 0.000 .0698642 .2019166
_cons | -12.55332 1.838493 -6.83 0.000 -16.1567 -8.949939
------------------------------------------------------------------------------
testparm i.ses
( 1) [honors]2.ses = 0
( 2) [honors]3.ses = 0
chi2( 2) = 6.13
Prob > chi2 = 0.0466
logit, or
Logistic regression Number of obs = 200
LR chi2(5) = 87.30
Prob > chi2 = 0.0000
Log likelihood = -71.994756 Pseudo R2 = 0.3774
------------------------------------------------------------------------------
honors | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | 1.145726 .4513589 2.54 0.011 .2610792 2.030374
|
ses |
2 | -1.040402 .5791511 -1.80 0.072 -2.175517 .094713
3 | .0541296 .5945439 0.09 0.927 -1.111155 1.219414
|
read | .0687277 .0287044 2.39 0.017 .0124681 .1249873
math | .1358904 .0336875 4.03 0.000 .0698642 .2019166
_cons | -12.55332 1.838493 -6.83 0.000 -16.1567 -8.949939
------------------------------------------------------------------------------
fitstat /* available for J. Scott Long via the Internet */
Measures of Fit for logit of honors
Log-Lik Intercept Only: -115.644 Log-Lik Full Model: -71.995
D(193): 143.990 LR(5): 87.299
Prob > LR: 0.000
McFadden's R2: 0.377 McFadden's Adj R2: 0.317
ML (Cox-Snell) R2: 0.354 Cragg-Uhler(Nagelkerke) R2: 0.516
McKelvey & Zavoina's R2: 0.549 Efron's R2: 0.404
Variance of y*: 7.296 Variance of error: 3.290
Count R2: 0.830 Adj Count R2: 0.358
AIC: 0.790 AIC*n: 157.990
BIC: -878.586 BIC': -60.808
BIC used by Stata: 175.779 AIC used by Stata: 155.990
lfit
Logistic model for honors, goodness-of-fit test
number of observations = 200
number of covariate patterns = 189
Pearson chi2(183) = 166.48
Prob > chi2 = 0.8040
lfit, group(10)
Logistic model for honors, goodness-of-fit test
(Table collapsed on quantiles of estimated probabilities)
number of observations = 200
number of groups = 10
Hosmer-Lemeshow chi2(8) = 12.91
Prob > chi2 = 0.1151
lstat
Logistic model for honors
-------- True --------
Classified | D ~D | Total
-----------+--------------------------+-----------
+ | 31 12 | 43
- | 22 135 | 157
-----------+--------------------------+-----------
Total | 53 147 | 200
Classified + if predicted Pr(D) >= .5
True D defined as honors != 0
--------------------------------------------------
Sensitivity Pr( +| D) 58.49%
Specificity Pr( -|~D) 91.84%
Positive predictive value Pr( D| +) 72.09%
Negative predictive value Pr(~D| -) 85.99%
--------------------------------------------------
False + rate for true ~D Pr( +|~D) 8.16%
False - rate for true D Pr( -| D) 41.51%
False + rate for classified + Pr(~D| +) 27.91%
False - rate for classified - Pr( D| -) 14.01%
--------------------------------------------------
Correctly classified 83.00%
--------------------------------------------------
Linear Statistical Models Course
Phil Ender, 17sep10, 20dec00