Generalized Linear Models
Most students are introduced to linear models through either multiple regression or analysis of variance. With these methods the expected value of the response variable is statistically modeled, that is, it is expressed as a linear combination of the explanatory variables. With categorical and count response variables, the regression cannot be linear. The problem of nonlinearity is handled through nonlinear functions that transform the expected value of the categorical or count variable into a linear function of the explanatory variables. Such transformations are referred to as link functions.
For example, in the analysis of count data, the expected frequencies must be nonnegative. To ensure that the predicted values from the linear models fit these constraints, the log link is used to transform the expected value of the response variable. This loglinear transformation serves two purposes: it ensures that the fitted values are appropriate for count data, and it permits the unknown regression parameters to lie within the real number space.
Different types of response variables utilize different link functions: both the logit and probit link functions work with binomial response variables while the log link function works with both poisson and negative binomial response variables. Growing out of the work of Nelder & Wedderburn (1972) and McCullagh & Nelder (1989), generalized linear models provides a unified framework which can be applied to various 'linear' models.
Generalized linear models take the form:
You might recognize this example more easily if it were rewritten as follows:
Another example is poisson regression in which the distribution family is poisson, i.e., y -> {poisson} and the link function is the natural log, i.e., g(y) = ln(y). The glm model would then be written as,
Here are examples of distributions and link functions for some common estimation procedures:
type of distribution link estimation family function OLS regression gaussian identity logistic regression binomial logit probit binomial probit cloglog binomial cloglog poisson regression poisson log neg binomial regression neg binomial log |
An OLS regression would look like this using regress and glm:
iden log logit probit cloglog nbinom power opower loglog logc
gaussian X X X
inverse gaussian X X X
binomial X X X X X X X X X
poisson X X X
negative binomial X X X X
gamma X X X |
Examples
use http://www.gseis.ucla.edu/courses/data/hsb2
generate hon = write>=60
regress write read math female
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 3, 196) = 72.52
Model | 9405.34864 3 3135.11621 Prob > F = 0.0000
Residual | 8473.52636 196 43.2322773 R-squared = 0.5261
-------------+------------------------------ Adj R-squared = 0.5188
Total | 17878.875 199 89.843593 Root MSE = 6.5751
------------------------------------------------------------------------------
write | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
read | .3252389 .0607348 5.36 0.000 .2054613 .4450166
math | .3974826 .0664037 5.99 0.000 .266525 .5284401
female | 5.44337 .9349987 5.82 0.000 3.59942 7.287319
_cons | 11.89566 2.862845 4.16 0.000 6.249728 17.5416
------------------------------------------------------------------------------
glm write read math female, link(iden) fam(gauss) nolog
Generalized linear models No. of obs = 200
Optimization : ML: Newton-Raphson Residual df = 196
Scale parameter = 43.23228
Deviance = 8473.526357 (1/df) Deviance = 43.23228
Pearson = 8473.526357 (1/df) Pearson = 43.23228
Variance function: V(u) = 1 [Gaussian]
Link function : g(u) = u [Identity]
Standard errors : OIM
Log likelihood = -658.4261736 AIC = 6.624262
BIC = 7435.056153
------------------------------------------------------------------------------
write | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
read | .3252389 .0607348 5.36 0.000 .2062009 .444277
math | .3974826 .0664037 5.99 0.000 .2673336 .5276315
female | 5.44337 .9349987 5.82 0.000 3.610806 7.275934
_cons | 11.89566 2.862845 4.16 0.000 6.28459 17.50674
------------------------------------------------------------------------------
logit hon read math female, nolog
Logit estimates Number of obs = 200
LR chi2(3) = 80.87
Prob > chi2 = 0.0000
Log likelihood = -75.209827 Pseudo R2 = 0.3496
------------------------------------------------------------------------------
hon | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
read | .0752424 .027577 2.73 0.006 .0211924 .1292924
math | .1317117 .0324607 4.06 0.000 .06809 .1953335
female | 1.154801 .4340856 2.66 0.008 .304009 2.005593
_cons | -13.12749 1.850769 -7.09 0.000 -16.75493 -9.50005
------------------------------------------------------------------------------
logit, or
Logit estimates Number of obs = 200
LR chi2(3) = 80.87
Prob > chi2 = 0.0000
Log likelihood = -75.209827 Pseudo R2 = 0.3496
------------------------------------------------------------------------------
hon | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
read | 1.078145 .0297321 2.73 0.006 1.021419 1.138023
math | 1.140779 .0370305 4.06 0.000 1.070462 1.215716
female | 3.173393 1.377524 2.66 0.008 1.355281 7.430502
------------------------------------------------------------------------------
glm hon read math female, link(logit) fam(bin) nolog
Generalized linear models No. of obs = 200
Optimization : ML: Newton-Raphson Residual df = 196
Scale parameter = 1
Deviance = 150.4196543 (1/df) Deviance = .7674472
Pearson = 164.2509104 (1/df) Pearson = .8380148
Variance function: V(u) = u*(1-u) [Bernoulli]
Link function : g(u) = ln(u/(1-u)) [Logit]
Standard errors : OIM
Log likelihood = -75.20982717 AIC = .7920983
BIC = -888.0505495
------------------------------------------------------------------------------
hon | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
read | .0752424 .0275779 2.73 0.006 .0211906 .1292941
math | .1317117 .0324623 4.06 0.000 .0680869 .1953366
female | 1.154801 .4341012 2.66 0.008 .3039785 2.005624
_cons | -13.12749 1.850893 -7.09 0.000 -16.75517 -9.499808
------------------------------------------------------------------------------
glm, eform
Generalized linear models No. of obs = 200
Optimization : ML: Newton-Raphson Residual df = 196
Scale parameter = 1
Deviance = 150.4196543 (1/df) Deviance = .7674472
Pearson = 164.2509104 (1/df) Pearson = .8380148
Variance function: V(u) = u*(1-u) [Bernoulli]
Link function : g(u) = ln(u/(1-u)) [Logit]
Standard errors : OIM
Log likelihood = -75.20982717 AIC = .7920983
BIC = -888.0505495
------------------------------------------------------------------------------
hon | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
read | 1.078145 .029733 2.73 0.006 1.021417 1.138025
math | 1.140779 .0370323 4.06 0.000 1.070458 1.21572
female | 3.173393 1.377573 2.66 0.008 1.35524 7.430728
------------------------------------------------------------------------------
probit hon read math female, nolog
Probit estimates Number of obs = 200
LR chi2(3) = 81.80
Prob > chi2 = 0.0000
Log likelihood = -74.745943 Pseudo R2 = 0.3537
------------------------------------------------------------------------------
hon | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
read | .0473262 .0157561 3.00 0.003 .0164449 .0782076
math | .0735256 .0173216 4.24 0.000 .0395759 .1074754
female | .6824682 .2447275 2.79 0.005 .2028112 1.162125
_cons | -7.663304 .9921289 -7.72 0.000 -9.607841 -5.718767
------------------------------------------------------------------------------
glm hon read math female, link(probit) fam(bin) nolog
Generalized linear models No. of obs = 200
Optimization : ML: Newton-Raphson Residual df = 196
Scale parameter = 1
Deviance = 149.4918859 (1/df) Deviance = .7627137
Pearson = 160.9679286 (1/df) Pearson = .8212649
Variance function: V(u) = u*(1-u) [Bernoulli]
Link function : g(u) = invnorm(u) [Probit]
Standard errors : OIM
Log likelihood = -74.74594294 AIC = .7874594
BIC = -888.978318
------------------------------------------------------------------------------
hon | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
read | .0473262 .0157561 3.00 0.003 .0164448 .0782077
math | .0735256 .0173217 4.24 0.000 .0395758 .1074755
female | .6824681 .2447281 2.79 0.005 .2028098 1.162126
_cons | -7.663303 .9921345 -7.72 0.000 -9.607851 -5.718755
------------------------------------------------------------------------------
use http://www.gseis.ucla.edu/courses/data/lahigh, clear
poisson daysabs langnce gender, nolog
Poisson regression Number of obs = 316
LR chi2(2) = 171.50
Prob > chi2 = 0.0000
Log likelihood = -1549.8567 Pseudo R2 = 0.0524
------------------------------------------------------------------------------
daysabs | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
langnce | -.01467 .0012934 -11.34 0.000 -.0172051 -.0121349
gender | -.4093528 .0482192 -8.49 0.000 -.5038606 -.3148449
_cons | 2.646977 .0697764 37.94 0.000 2.510217 2.783736
------------------------------------------------------------------------------
poisson, irr
Poisson regression Number of obs = 316
LR chi2(2) = 171.50
Prob > chi2 = 0.0000
Log likelihood = -1549.8567 Pseudo R2 = 0.0524
------------------------------------------------------------------------------
daysabs | IRR Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
langnce | .9854371 .0012746 -11.34 0.000 .982942 .9879384
gender | .6640799 .0320214 -8.49 0.000 .6041936 .7299021
------------------------------------------------------------------------------
glm daysabs langnce gender, link(log) fam(poisson) nolog
Generalized linear models No. of obs = 316
Optimization : ML: Newton-Raphson Residual df = 313
Scale parameter = 1
Deviance = 2238.317597 (1/df) Deviance = 7.151174
Pearson = 2752.913231 (1/df) Pearson = 8.79525
Variance function: V(u) = u [Poisson]
Link function : g(u) = ln(u) [Log]
Standard errors : OIM
Log likelihood = -1549.85665 AIC = 9.828207
BIC = 436.7702841
------------------------------------------------------------------------------
daysabs | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
langnce | -.01467 .0012934 -11.34 0.000 -.0172051 -.0121349
gender | -.4093528 .0482192 -8.49 0.000 -.5038606 -.3148449
_cons | 2.646977 .0697764 37.94 0.000 2.510217 2.783736
------------------------------------------------------------------------------
glm, eform
Generalized linear models No. of obs = 316
Optimization : ML: Newton-Raphson Residual df = 313
Scale parameter = 1
Deviance = 2238.317597 (1/df) Deviance = 7.151174
Pearson = 2752.913231 (1/df) Pearson = 8.79525
Variance function: V(u) = u [Poisson]
Link function : g(u) = ln(u) [Log]
Standard errors : OIM
Log likelihood = -1549.85665 AIC = 9.828207
BIC = 436.7702841
------------------------------------------------------------------------------
daysabs | IRR Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
langnce | .9854371 .0012746 -11.34 0.000 .982942 .9879384
gender | .6640799 .0320214 -8.49 0.000 .6041936 .7299021
------------------------------------------------------------------------------
nbreg daysabs langnce gender, nolog
Negative binomial regression Number of obs = 316
LR chi2(2) = 20.63
Prob > chi2 = 0.0000
Log likelihood = -880.9274 Pseudo R2 = 0.0116
------------------------------------------------------------------------------
daysabs | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
langnce | -.0156493 .0039485 -3.96 0.000 -.0233882 -.0079104
gender | -.4312069 .1396913 -3.09 0.002 -.7049968 -.1574169
_cons | 2.70344 .2292762 11.79 0.000 2.254067 3.152813
-------------+----------------------------------------------------------------
/lnalpha | .25394 .095509 .0667457 .4411342
-------------+----------------------------------------------------------------
alpha | 1.289094 .1231201 1.069024 1.554469
------------------------------------------------------------------------------
Likelihood ratio test of alpha=0: chibar2(01) = 1337.86 Prob>=chibar2 = 0.000
glm daysabs langnce gender, link(log) fam(nbin) nolog
Generalized linear models No. of obs = 316
Optimization : ML: Newton-Raphson Residual df = 313
Scale parameter = 1
Deviance = 425.603464 (1/df) Deviance = 1.359755
Pearson = 415.6288036 (1/df) Pearson = 1.327888
Variance function: V(u) = u+(1)u^2 [Neg. Binomial]
Link function : g(u) = ln(u) [Log]
Standard errors : OIM
Log likelihood = -884.4953535 AIC = 5.617059
BIC = -1375.943849
------------------------------------------------------------------------------
daysabs | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
langnce | -.0156357 .0035438 -4.41 0.000 -.0225814 -.0086899
gender | -.4307736 .1253082 -3.44 0.001 -.6763732 -.185174
_cons | 2.702606 .2052709 13.17 0.000 2.300282 3.104929
------------------------------------------------------------------------------
glm, eform
Generalized linear models No. of obs = 316
Optimization : ML: Newton-Raphson Residual df = 313
Scale parameter = 1
Deviance = 425.603464 (1/df) Deviance = 1.359755
Pearson = 415.6288036 (1/df) Pearson = 1.327888
Variance function: V(u) = u+(1)u^2 [Neg. Binomial]
Link function : g(u) = ln(u) [Log]
Standard errors : OIM
Log likelihood = -884.4953535 AIC = 5.617059
BIC = -1375.943849
------------------------------------------------------------------------------
daysabs | IRR Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
langnce | .9844859 .0034888 -4.41 0.000 .9776716 .9913477
gender | .650006 .0814511 -3.44 0.001 .5084577 .8309596
------------------------------------------------------------------------------
glm daysabs langnce gender, fam(gamma) link(log) nolog
Generalized linear models No. of obs = 316
Optimization : ML: Newton-Raphson Residual df = 313
Scale parameter = 1.583724
Deviance = 251.8270233 (1/df) Deviance = .8045592
Pearson = 495.7055497 (1/df) Pearson = 1.583724
Variance function: V(u) = u^2 [Gamma]
Link function : g(u) = ln(u) [Log]
Standard errors : OIM
Log likelihood = -856.2487643 AIC = 5.438283
BIC = -1549.72029
------------------------------------------------------------------------------
daysabs | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
langnce | -.0156852 .0040626 -3.86 0.000 -.0236478 -.0077226
gender | -.4326492 .1443719 -3.00 0.003 -.7156129 -.1496854
_cons | 2.705757 .2383799 11.35 0.000 2.238541 3.172973
------------------------------------------------------------------------------
glm, eform
Generalized linear models No. of obs = 316
Optimization : ML: Newton-Raphson Residual df = 313
Scale parameter = 1.583724
Deviance = 251.8270233 (1/df) Deviance = .8045592
Pearson = 495.7055497 (1/df) Pearson = 1.583724
Variance function: V(u) = u^2 [Gamma]
Link function : g(u) = ln(u) [Log]
Standard errors : OIM
Log likelihood = -856.2487643 AIC = 5.438283
BIC = -1549.72029
------------------------------------------------------------------------------
daysabs | ExpB Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
langnce | .9844372 .0039994 -3.86 0.000 .9766296 .9923071
gender | .6487881 .0936668 -3.00 0.003 .4888924 .8609788
------------------------------------------------------------------------------
Categorical Data Analysis Course
Phil Ender