Applied Categorical & Nonnormal Data Analysis

Relative Risk for Non-rare Events

To a certain extent relative risk is a more intuitive concept then is odds ratio. In the case of rare events (probability less than .1) odds ratios and relative risk are nearly equal. But what about the situations in which events are not rare. We will present two methods of obtaining relative risk using several of Stata's estimation commands along with their equivalent glm commands.

Acknowledgements: Numerous contributors to the Statalist and to Karla Lindquist of UCSF.

use http://www.gseis.ucla.edu/courses/data/honors

 tabulate female honors, nolabel

           |        honors
    female |         0          1 |     Total
-----------+----------------------+----------
         0 |        73         18 |        91      
         1 |        74         35 |       109 
-----------+----------------------+----------
     Total |       147         53 |       200   

display "odds ratio = " (73*35)/(18*74)

odds ratio = 1.9181682

display "relative risk = " (35/109)/(18/91)

relative risk = 1.6233435

First we will compute the odds ratio using logistic regression followed by the equivalent glm command. With glm the eform option displays the exponentiated coefficient.

logit honors female, or nolog

Logit estimates                                   Number of obs   =        200
                                                  LR chi2(1)      =       3.94
                                                  Prob > chi2     =     0.0473
Log likelihood =  -113.6769                       Pseudo R2       =     0.0170

------------------------------------------------------------------------------
      honors | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      female |   1.918168   .6400451     1.95   0.051     .9973827    3.689024
------------------------------------------------------------------------------

glm honors female, fam(binom) link(logit) eform nolog

Generalized linear models                          No. of obs      =       200
Optimization     : ML: Newton-Raphson              Residual df     =       198
                                                   Scale parameter =         1
Deviance         =  227.3538087                    (1/df) Deviance =  1.148252
Pearson          =          200                    (1/df) Pearson  =  1.010101

Variance function: V(u) = u*(1-u)                  [Bernoulli]
Link function    : g(u) = ln(u/(1-u))              [Logit]
Standard errors  : OIM

Log likelihood   = -113.6769044                    AIC             =  1.156769
BIC              = -821.7130299

------------------------------------------------------------------------------
      honors | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      female |   1.918168   .6400451     1.95   0.051     .9973827    3.689024
------------------------------------------------------------------------------

Next, we will compute the risk ratio using binary regression. Binary regression is notorious for poor convergence. The equivalent glm command keeps the binomial family but switches to the log link.

binreg honors female, rr nolog

Residual df  =       198                                No. of obs =       200
Pearson X2   =       200                                Deviance   =  227.3538
Dispersion   =  1.010101                                Dispersion =  1.148252

Bernoulli distribution, log link
------------------------------------------------------------------------------
             |                 EIM
      honors | Risk Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      female |   1.623344   .4105603     1.92   0.055     .9888554    2.664944
------------------------------------------------------------------------------

glm honors female, fam(binom) link(log) eform nolog

Generalized linear models                          No. of obs      =       200
Optimization     : ML: Newton-Raphson              Residual df     =       198
                                                   Scale parameter =         1
Deviance         =  227.3538087                    (1/df) Deviance =  1.148252
Pearson          =  199.9999895                    (1/df) Pearson  =  1.010101

Variance function: V(u) = u*(1-u)                  [Bernoulli]
Link function    : g(u) = ln(u)                    [Log]
Standard errors  : OIM

Log likelihood   = -113.6769044                    AIC             =  1.156769
BIC              = -821.7130299

------------------------------------------------------------------------------
      honors | Risk Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      female |   1.623343   .4105603     1.92   0.055     .9888551    2.664944
------------------------------------------------------------------------------

Next, we will use poisson regression to accomplish the same thing. In order to obtain reasonable standard errors we need to include the robust option with poisson. And for glm we need to change the family from binomial to poisson while leaving the link at log.

This use of poisson regression to obtain relative risk is from an article by Guangyong Zou (A Modified Poisson Regression Approach to Prospective Studies with Binary Data. Am J Epidemiol 2004; 159(7):702-6.). This "modified poisson" approach is interesting in that each observation is only a 0/1 event, not the traditional count type variable typically found in poisson models.

poisson honors female, irr robust nolog

Poisson regression                                Number of obs   =        200
                                                  Wald chi2(1)    =       3.65
                                                  Prob > chi2     =     0.0560
Log pseudo-likelihood = -121.92877                Pseudo R2       =     0.0118

------------------------------------------------------------------------------
             |               Robust
      honors |        IRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      female |   1.623344   .4115907     1.91   0.056      .987626    2.668261
------------------------------------------------------------------------------

glm honors female, fam(poisson) link(log) robust eform nolog

Generalized linear models                          No. of obs      =       200
Optimization     : ML: Newton-Raphson              Residual df     =       198
                                                   Scale parameter =         1
Deviance         =  137.8575464                    (1/df) Deviance =  .6962502
Pearson          =  146.9999999                    (1/df) Pearson  =  .7424242

Variance function: V(u) = u                        [Poisson]
Link function    : g(u) = ln(u)                    [Log]
Standard errors  : Sandwich

Log pseudo-likelihood = -121.9287732               AIC             =  1.239288
BIC                   =-911.2092922

------------------------------------------------------------------------------
             |               Robust
      honors |        IRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      female |   1.623344   .4115907     1.91   0.056      .987626    2.668261
------------------------------------------------------------------------------

We will conclude by running an example which includes several continuous predictors. This is the type of model that often fails to converge using binary regression.

glm honors female lang math science ses, fam(binom) link(log) eform

Iteration 0:   log likelihood = -143.93712  (not concave)
Iteration 1:   log likelihood = -138.17315  (not concave)

*** output deleted ***

Iteration 49:  log likelihood = -137.87877  (not concave)
Iteration 50:  log likelihood = -137.87877  (not concave)
convergence not achieved

Generalized linear models                          No. of obs      =       200
Optimization     : ML: Newton-Raphson              Residual df     =       194
                                                   Scale parameter =         1
Deviance         =  275.7575387                    (1/df) Deviance =  1.421431
Pearson          =  300000129.1                    (1/df) Pearson  =   1546392

Variance function: V(u) = u*(1-u)                  [Bernoulli]
Link function    : g(u) = ln(u)                    [Log]
Standard errors  : OIM

Log likelihood   = -137.8787694                    AIC             =  1.438788
BIC              = -752.1160304

------------------------------------------------------------------------------
      honors | Risk Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      female |   1.045888   .1405277     0.33   0.738     .8037407    1.360988
        lang |   1.024339   2.74e-09        .   0.000     1.024339    1.024339
        math |   1.032622   3.65e-09        .   0.000     1.032622    1.032622
     science |   1.022528   6.65e-09        .   0.000     1.022528    1.022528
         ses |   1.004854          .        .       .            .           .
------------------------------------------------------------------------------

glm honors female lang math science ses, fam(poisson) link(log) robust eform

Iteration 0:   log pseudo-likelihood = -99.421803  
Iteration 1:   log pseudo-likelihood = -96.828386  
Iteration 2:   log pseudo-likelihood = -96.820156  
Iteration 3:   log pseudo-likelihood = -96.820155  

Generalized linear models                          No. of obs      =       200
Optimization     : ML: Newton-Raphson              Residual df     =       194
                                                   Scale parameter =         1
Deviance         =  87.64030952                    (1/df) Deviance =  .4517542
Pearson          =  110.2542715                    (1/df) Pearson  =   .568321

Variance function: V(u) = u                        [Poisson]
Link function    : g(u) = ln(u)                    [Log]
Standard errors  : Sandwich

Log pseudo-likelihood = -96.82015476               AIC             =  1.028202
BIC                   =-940.2332596

------------------------------------------------------------------------------
             |               Robust
      honors |        IRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      female |   2.012414   .4427827     3.18   0.001     1.307468    3.097444
        lang |   1.032885   .0139128     2.40   0.016     1.005973    1.060516
        math |   1.059176   .0183383     3.32   0.001     1.023837    1.095736
     science |   1.030488   .0179599     1.72   0.085     .9958814    1.066297
         ses |   1.058971    .157016     0.39   0.699     .7919076    1.416099
------------------------------------------------------------------------------

Categorical Data Analysis Course

Phil Ender