In many instances the number of zeros in a count model can be increased because some of the zeros are generated by a different process than the remaining counts. Using data on doctoral publications, as an example, while many scientists are actively involved in research and publication some have jobs in which research and publishing is not required or even possible.
We will illustrate zero inflated count models using Long's data on doctoral publications.
Zero-inflated Poisson
use http://www.gseis.ucla.edu/courses/data/couart
describe
Contains data from http://www.gseis.ucla.edu/courses/data/couart.dta
obs: 915 Scientific Productivity of Bioc
vars: 7 18 Oct 2001 22:21
size: 18,300 (99.7% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
fem byte %9.0g sexlbl Sex: 1=female, 0=male.
ment float %9.0g Article by mentor in last 3 yrs
phd float %9.0g Prestige of PhD department.
mar byte %9.0g marlbl Married: 1=yes, 0=no.
kid5 byte %9.0g Number of children <= 5.
art byte %9.0g Articles in last 3 yrs of PhD.
lnart float %9.0g Log of art + .5.
-------------------------------------------------------------------------------
summarize
Variable | Obs Mean Std. Dev. Min Max
-------------+-----------------------------------------------------
fem | 915 .4601093 .4986788 0 1
ment | 915 8.767212 9.483915 0 76.99998
phd | 915 3.103109 .9842491 .755 4.62
mar | 915 .6622951 .473186 0 1
kid5 | 915 .495082 .76488 0 3
art | 915 1.692896 1.926069 0 19
lnart | 915 .4399161 .8566493 -.6931472 2.970414
poisson art fem mar kid5 phd ment
Poisson regression Number of obs = 915
LR chi2(5) = 183.03
Prob > chi2 = 0.0000
Log likelihood = -1651.0563 Pseudo R2 = 0.0525
------------------------------------------------------------------------------
art | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
fem | -.2245942 .0546138 -4.11 0.000 -.3316352 -.1175532
mar | .1552434 .0613747 2.53 0.011 .0349512 .2755356
kid5 | -.1848827 .0401272 -4.61 0.000 -.2635305 -.1062349
phd | .0128226 .0263972 0.49 0.627 -.038915 .0645601
ment | .0255427 .0020061 12.73 0.000 .0216109 .0294746
_cons | .3046168 .1029822 2.96 0.003 .1027755 .5064581
------------------------------------------------------------------------------
quietly fitstat, saving(0)
zip art fem mar kid5 phd ment, inflate(fem mar kid5 phd ment) vuong
Zero-inflated poisson regression Number of obs = 915
Nonzero obs = 640
Zero obs = 275
Inflation model = logit LR chi2(5) = 78.56
Log likelihood = -1604.773 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
art | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
art |
fem | -.2091446 .0634047 -3.30 0.001 -.3334155 -.0848737
mar | .103751 .071111 1.46 0.145 -.035624 .243126
kid5 | -.1433196 .0474293 -3.02 0.003 -.2362793 -.0503599
phd | -.0061662 .0310086 -0.20 0.842 -.066942 .0546096
ment | .0180977 .0022948 7.89 0.000 .0135999 .0225955
_cons | .6408391 .1213072 5.28 0.000 .4030814 .8785967
-------------+----------------------------------------------------------------
inflate |
fem | .1097465 .2800813 0.39 0.695 -.4392028 .6586958
mar | -.3540108 .3176103 -1.11 0.265 -.9765156 .2684941
kid5 | .2171001 .196481 1.10 0.269 -.1679956 .6021958
phd | .0012702 .1452639 0.01 0.993 -.2834418 .2859821
ment | -.134111 .0452462 -2.96 0.003 -.2227918 -.0454302
_cons | -.5770618 .5093853 -1.13 0.257 -1.575439 .421315
------------------------------------------------------------------------------
Vuong Test of Zip vs. Poisson: Std. Normal = 4.18 Pr> Z = 0.0000
The vuong option is included to obtain a test of zip versus poisson, which
in this case favors zip.
fitstat, using(0) force
Measures of Fit for zip of art
Warning: Current model estimated by zip, but saved model estimated by poisson
Current Saved Difference
Model: zip poisson
N: 915 915 0
Log-Lik Intercept Only: -1679.391 -1742.573 63.182
Log-Lik Full Model: -1604.773 -1651.056 46.283
D: 3209.546(903) 3302.113(909) 92.567(6)
LR: 149.236(10) 183.034(5) 33.798(5)
Prob > LR: 0.000 0.000 0.000
McFadden's R2: 0.044 0.053 -0.008
McFadden's Adj R2: 0.037 0.049 -0.012
Maximum Likelihood R2: 0.150 0.181 -0.031
Cragg & Uhler's R2: 0.154 0.185 -0.031
AIC: 3.534 3.622 -0.088
AIC*n: 3233.546 3314.113 -80.567
BIC: -2947.943 -2896.289 -51.653
BIC': -81.047 -148.940 67.892
Note: p-value for difference in LR is only valid if models are nested.
Zero-inflated Negative Binomial
nbreg art fem mar kid5 phd ment
Negative binomial regression Number of obs = 915
LR chi2(5) = 97.96
Prob > chi2 = 0.0000
Log likelihood = -1560.9583 Pseudo R2 = 0.0304
------------------------------------------------------------------------------
art | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
fem | -.2164184 .0726724 -2.98 0.003 -.3588537 -.0739832
mar | .1504895 .0821063 1.83 0.067 -.0104359 .3114148
kid5 | -.1764152 .0530598 -3.32 0.001 -.2804105 -.07242
phd | .0152712 .0360396 0.42 0.672 -.0553652 .0859075
ment | .0290823 .0034701 8.38 0.000 .0222811 .0358836
_cons | .256144 .1385604 1.85 0.065 -.0154294 .5277174
-------------+----------------------------------------------------------------
/lnalpha | -.8173044 .1199372 -1.052377 -.5822318
-------------+----------------------------------------------------------------
alpha | .4416205 .0529667 .3491069 .5586502
------------------------------------------------------------------------------
Likelihood ratio test of alpha=0: chibar2(01) = 180.20 Prob>=chibar2 = 0.000
quietly fitstat, saving(0)
zinb art fem mar kid5 phd ment, inflate(fem mar kid5 phd ment) vuong zip
Zero-inflated negative binomial regression Number of obs = 915
Nonzero obs = 640
Zero obs = 275
Inflation model = logit LR chi2(5) = 67.97
Log likelihood = -1549.991 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
art | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
art |
fem | -.1955068 .0755926 -2.59 0.010 -.3436655 -.0473481
mar | .0975826 .084452 1.16 0.248 -.0679402 .2631054
kid5 | -.1517325 .054206 -2.80 0.005 -.2579744 -.0454906
phd | -.0007001 .0362696 -0.02 0.985 -.0717872 .0703869
ment | .0247862 .0034924 7.10 0.000 .0179412 .0316312
_cons | .4167466 .1435962 2.90 0.004 .1353032 .69819
-------------+----------------------------------------------------------------
inflate |
fem | .6359327 .8489175 0.75 0.454 -1.027915 2.299781
mar | -1.499469 .93867 -1.60 0.110 -3.339228 .3402907
kid5 | .6284274 .4427825 1.42 0.156 -.2394104 1.496265
phd | -.0377153 .3080086 -0.12 0.903 -.641401 .5659705
ment | -.8822932 .3162277 -2.79 0.005 -1.502088 -.2624984
_cons | -.1916864 1.322821 -0.14 0.885 -2.784368 2.400995
-------------+----------------------------------------------------------------
/lnalpha | -.9763565 .1354679 -7.21 0.000 -1.241869 -.7108443
-------------+----------------------------------------------------------------
alpha | .376681 .0510282 .288844 .4912293
------------------------------------------------------------------------------
Likelihood ratio test of alpha=0: chibar2(01) = 109.56 Pr>=chibar2 = 0.0000
Vuong Test of Zinb vs. Neg. Bin: Std. Normal = 2.24 Pr> Z = 0.0125
fitstat, using(0) force
Measures of Fit for zinb of art
Warning: Current model estimated by zinb, but saved model estimated by nbreg
Current Saved Difference
Model: zinb nbreg
N: 915 915 0
Log-Lik Intercept Only: -1609.937 -1609.937 -0.000
Log-Lik Full Model: -1549.991 -1560.958 10.967
D: 3099.982(902) 3121.917(908) 21.935(6)
LR: 119.892(11) 97.957(5) 21.935(6)
Prob > LR: 0.000 0.000 0.001
McFadden's R2: 0.037 0.030 0.007
McFadden's Adj R2: 0.029 0.026 0.003
Maximum Likelihood R2: 0.123 0.102 0.021
Cragg & Uhler's R2: 0.127 0.105 0.022
AIC: 3.416 3.427 -0.011
AIC*n: 3125.982 3135.917 -9.935
BIC: -3050.688 -3069.666 18.979
BIC': -44.884 -63.862 18.979
Difference of 18.979 in BIC' provides very strong support for saved model.
Note: p-value for difference in LR is only valid if models are nested.
We have included the vuong and zip options. zip requests that a likelihood-ratio
test comparing zinb with zip be included. The results indicate that
zinb is the better choice.
vuong was used to obtain a test of the zinb versus nbreg
models. In general, Vuong test that are significantly positive support the zero-inflated models,
while those that are significantly negative favor nonzero-inflated models. The Vuong test
above supports the use of a zero-inflated approach.Let's try again and see if we can improve our model by removing some non-significant variables.
zinb art fem mar kid5 ment, inflate(ment) vuong
Zero-inflated negative binomial regression Number of obs = 915
Nonzero obs = 640
Zero obs = 275
Inflation model = logit LR chi2(4) = 71.91
Log likelihood = -1553.273 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
art | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
art |
fem | -.2119365 .0719188 -2.95 0.003 -.3528948 -.0709782
mar | .1389895 .0807376 1.72 0.085 -.0192532 .2972323
kid5 | -.1676594 .0524524 -3.20 0.001 -.2704641 -.0648546
ment | .024431 .0034497 7.08 0.000 .0176696 .0311923
_cons | .4101993 .0863877 4.75 0.000 .2408825 .5795161
-------------+----------------------------------------------------------------
inflate |
ment | -.6096804 .2456692 -2.48 0.013 -1.091183 -.1281775
_cons | -.8053801 .3520712 -2.29 0.022 -1.495427 -.1153333
-------------+----------------------------------------------------------------
/lnalpha | -1.003111 .1427915 -7.03 0.000 -1.282977 -.7232447
-------------+----------------------------------------------------------------
alpha | .3667368 .0523669 .2772108 .4851755
------------------------------------------------------------------------------
Vuong Test of Zinb vs. Neg. Bin: Std. Normal = 1.88 Pr> Z = 0.0299
fitstat, using(0) force
Measures of Fit for zinb of art
Warning: Current model estimated by zinb, but saved model estimated by nbreg
Current Saved Difference
Model: zinb nbreg
N: 915 915 0
Log-Lik Intercept Only: -1609.937 -1609.937 -0.000
Log-Lik Full Model: -1553.273 -1560.958 7.686
D: 3106.545(907) 3121.917(908) 15.371(1)
LR: 113.328(6) 97.957(5) 15.371(1)
Prob > LR: 0.000 0.000 0.000
McFadden's R2: 0.035 0.030 0.005
McFadden's Adj R2: 0.030 0.026 0.004
Maximum Likelihood R2: 0.116 0.102 0.015
Cragg & Uhler's R2: 0.120 0.105 0.015
AIC: 3.413 3.427 -0.015
AIC*n: 3122.545 3135.917 -13.371
BIC: -3078.219 -3069.666 -8.552
BIC': -72.415 -63.862 -8.552
Difference of 8.552 in BIC' provides strong support for current model.
Note: p-value for difference in LR is only valid if models are nested.