In all of the examples so far, the observations have been independent. But what if the observations were matched or even repeated? You might think that it would possible to include dummy coded variables to indicate the matching. For example, if you had 56 matched pairs you could include 55 dummy variables to account for non-independence along with whatever covariates you wanted to have in the model. Logistic regression has problems when the number of degrees of freedom is close to the total degrees of freedom available. In a situation, such as this, the conditional logistic model is recommended.
Conditional logistic regression, also known as fixed effects logistic regression, is designed to work with matched subjects or repeated measures. Stata's clogit command will work with 1:1 matching, 1:k matching and repeated measures models. The repeated measures models are also called panel models or cross-sectional time-series models.
Example 1: 1-1 Matching
This example is adapted from Hosmer & Lemeshow (2000). Mothers of low birth weight babies were matched by age with mothers of normal weight babies. Low birth weight is defined as less than 2500 grams. The variable, pairid, indicates with mother were matched.
use http://www.gseis.ucla.edu/courses/data/lbwt11, clear
describe
Contains data from http://www.gseis.ucla.edu/courses/data/lbwt11.dta
obs: 112
vars: 9 10 Feb 2001 12:40
size: 4,480 (99.9% of memory free)
-------------------------------------------------------------------------------
1. pairid float %9.0g
2. lbwt float %9.0g low brth wt < 2500g
3. age float %9.0g mother's age
4. lastwt float %9.0g last weight
5. race float %9.0g rl 1 wht 2 blk 3 oth
6. smoke float %9.0g smoke during pregnancy
7. ptd float %9.0g previous preterm delivery
8. ht float %9.0g hypertension
9. ui float %9.0g uterine irritability
-------------------------------------------------------------------------------
summarize
Variable | Obs Mean Std. Dev. Min Max
---------+-----------------------------------------------------
pairid | 112 28.5 16.23587 1 56
lbwt | 112 .5 .5022472 0 1
age | 112 22.50893 4.341286 14 35
lastwt | 112 127.1696 30.46986 80 241
race | 112 2.026786 .9050392 1 3
smoke | 112 .4107143 .4941746 0 1
ptd | 112 .2232143 .4182723 0 1
ht | 112 .0892857 .2864373 0 1
ui | 112 .1785714 .3847144 0 1
tab1 smoke ptd ht ui
-> tabulation of smoke
smoke |
during |
pregnancy | Freq. Percent Cum.
------------+-----------------------------------
0 | 66 58.93 58.93
1 | 46 41.07 100.00
------------+-----------------------------------
Total | 112 100.00
-> tabulation of ptd
previous |
preterm |
delivery | Freq. Percent Cum.
------------+-----------------------------------
0 | 87 77.68 77.68
1 | 25 22.32 100.00
------------+-----------------------------------
Total | 112 100.00
-> tabulation of ht
hypertensio |
n | Freq. Percent Cum.
------------+-----------------------------------
0 | 102 91.07 91.07
1 | 10 8.93 100.00
------------+-----------------------------------
Total | 112 100.00
-> tabulation of ui
uterine |
irritabilit |
y | Freq. Percent Cum.
------------+-----------------------------------
0 | 92 82.14 82.14
1 | 20 17.86 100.00
------------+-----------------------------------
Total | 112 100.00
tab race, gen(race)
1 wht 2 blk |
3 oth | Freq. Percent Cum.
------------+-----------------------------------
white | 44 39.29 39.29
black | 21 18.75 58.04
other | 47 41.96 100.00
------------+-----------------------------------
Total | 112 100.00
tabulate lbwt smoke, lrchi2 exp
+--------------------+
| Key |
|--------------------|
| frequency |
| expected frequency |
+--------------------+
| smoke during
low brth | pregnancy
wt < 2500g | 0 1 | Total
-----------+----------------------+----------
0 | 40 16 | 56
| 33.0 23.0 | 56.0
-----------+----------------------+----------
1 | 26 30 | 56
| 33.0 23.0 | 56.0
-----------+----------------------+----------
Total | 66 46 | 112
| 66.0 46.0 | 112.0
likelihood-ratio chi2(1) = 7.3216 Pr = 0.007
logit lbwt smoke, nolog /* does not take into account the matched pairs */
Logit estimates Number of obs = 112
LR chi2(1) = 7.32
Prob > chi2 = 0.0068
Log likelihood = -73.971688 Pseudo R2 = 0.0472
------------------------------------------------------------------------------
lbwt | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
smoke | 1.059392 .3991154 2.65 0.008 .2771398 1.841643
_cons | -.4307829 .2519156 -1.71 0.087 -.9245285 .0629626
------------------------------------------------------------------------------
clogit lbwt smoke, group(pairid) nolog
Conditional (fixed-effects) logistic regression Number of obs = 112
LR chi2(1) = 6.79
Prob > chi2 = 0.0091
Log likelihood = -35.419282 Pseudo R2 = 0.0875
------------------------------------------------------------------------------
lbwt | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
smoke | 1.011601 .4128614 2.45 0.014 .2024075 1.820794
------------------------------------------------------------------------------
predict p1
(option pc1 assumed; conditional probability for single outcome within group)
list pairid lbwt smoke p1, sep(2)
+----------------------------------+
| pairid lbwt smoke p1 |
|----------------------------------|
1. | 1 0 0 .2666667 |
2. | 1 1 1 .7333333 |
|----------------------------------|
3. | 2 0 0 .5 |
4. | 2 1 0 .5 |
|----------------------------------|
5. | 3 0 0 .5 |
6. | 3 1 0 .5 |
|----------------------------------|
7. | 4 0 0 .2666667 |
8. | 4 1 1 .7333333 |
|----------------------------------|
9. | 5 0 1 .5 |
10. | 5 1 1 .5 |
|----------------------------------|
11. | 6 0 0 .2666667 |
12. | 6 1 1 .7333333 |
|----------------------------------|
13. | 7 0 0 .5 |
14. | 7 1 0 .5 |
|----------------------------------|
15. | 8 0 0 .5 |
16. | 8 1 0 .5 |
|----------------------------------|
17. | 9 0 1 .7333333 |
18. | 9 1 0 .2666667 |
|----------------------------------|
/* manual computation of the probabilities */
display exp(1*1.011601)/(exp(1*1.011601)+exp(0*1.011601))
.73333335
display exp(0*1.011601)/(exp(1*1.011601)+exp(0*1.011601))
.26666665
display exp(1*1.011601)/(exp(1*1.011601)+exp(1*1.011601))
.5
display exp(0*1.011601)/(exp(0*1.011601)+exp(0*1.011601))
.5
clogit lbwt lastwt smoke race2 race3 ptd ht ui, group(pairid) nolog
Conditional (fixed-effects) logistic regression Number of obs = 112
LR chi2(7) = 26.04
Prob > chi2 = 0.0005
Log likelihood = -25.794271 Pseudo R2 = 0.3355
------------------------------------------------------------------------------
lbwt | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lastwt | -.0183757 .0100806 -1.82 0.068 -.0381333 .0013819
smoke | 1.400656 .6278396 2.23 0.026 .1701131 2.631199
race2 | .5713643 .6896449 0.83 0.407 -.7803149 1.923044
race3 | -.0253148 .6992044 -0.04 0.971 -1.39573 1.345101
ptd | 1.808009 .7886502 2.29 0.022 .2622829 3.353735
ht | 2.361152 1.086128 2.17 0.030 .2323797 4.489924
ui | 1.401929 .6961585 2.01 0.044 .0374836 2.766375
------------------------------------------------------------------------------
test race2 race3
( 1) race2 = 0.0
( 2) race3 = 0.0
chi2( 2) = 0.88
Prob > chi2 = 0.6436
estimates store M1
clogit lbwt lastwt smoke ptd ht ui, group(pairid) nolog
Conditional (fixed-effects) logistic regression Number of obs = 112
LR chi2(5) = 25.16
Prob > chi2 = 0.0001
Log likelihood = -26.236872 Pseudo R2 = 0.3241
------------------------------------------------------------------------------
lbwt | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lastwt | -.0150834 .0081465 -1.85 0.064 -.0310503 .0008834
smoke | 1.479564 .5620191 2.63 0.008 .3780272 2.581102
ptd | 1.670594 .7468062 2.24 0.025 .206881 3.134308
ht | 2.329361 1.002549 2.32 0.020 .3644009 4.294322
ui | 1.344895 .693843 1.94 0.053 -.0150127 2.704802
------------------------------------------------------------------------------
lrtest M1
likelihood-ratio test LR chi2(2) = 0.89
(Assumption: . nested in M1) Prob > chi2 = 0.6424
listcoef
clogit (N=112): Factor Change in Odds
Odds of: 1 vs 0
--------------------------------------------------
lbwt | b z P>|z| e^b
-------------+------------------------------------
lastwt | -0.01508 -1.852 0.064 0.9850
smoke | 1.47956 2.633 0.008 4.3910
ptd | 1.67059 2.237 0.025 5.3153
ht | 2.32936 2.323 0.020 10.2714
ui | 1.34489 1.938 0.053 3.8378
--------------------------------------------------
listcoef, percent
clogit (N=112): Percentage Change in Odds
Odds of: 1 vs 0
--------------------------------------------------
lbwt | b z P>|z| %
-------------+------------------------------------
lastwt | -0.01508 -1.852 0.064 -1.5
smoke | 1.47956 2.633 0.008 339.1
ptd | 1.67059 2.237 0.025 431.5
ht | 2.32936 2.323 0.020 927.1
ui | 1.34489 1.938 0.053 283.8
--------------------------------------------------
predict p
list pairid lbwt p, sep(2)
+--------------------------+
| pairid lbwt p |
|--------------------------|
1. | 1 0 .0250138 |
2. | 1 1 .9749862 |
|--------------------------|
3. | 2 0 .2519053 |
4. | 2 1 .7480947 |
|--------------------------|
5. | 3 0 .6289979 |
6. | 3 1 .3710021 |
|--------------------------|
7. | 4 0 .0164993 |
8. | 4 1 .9835007 |
|--------------------------|
9. | 5 0 .4548728 |
10. | 5 1 .5451272 |
|--------------------------|
11. | 6 0 .2019775 |
12. | 6 1 .7980225 |
|--------------------------|
13. | 7 0 .5263715 |
14. | 7 1 .4736285 |
|--------------------------|
15. | 8 0 .1210587 |
16. | 8 1 .8789413 |
|--------------------------|
17. | 9 0 .9005696 |
18. | 9 1 .0994304 |
|--------------------------|
19. | 10 0 .4939925 |
20. | 10 1 .5060075 |
|--------------------------|
21. | 11 0 .004564 |
22. | 11 1 .995436 |
|--------------------------|
23. | 12 0 .4511353 |
24. | 12 1 .5488647 |
|--------------------------|
25. | 13 0 .2950889 |
26. | 13 1 .7049111 |
|--------------------------|
27. | 14 0 .578796 |
28. | 14 1 .4212039 |
|--------------------------|
29. | 15 0 .2663816 |
30. | 15 1 .7336184 |
|--------------------------|
prchange
prchange does not work for last model estimated.
prtab lastwt
prtab does not work for the last type of model estimated.
Example 2: 1-M MatchingHere is another example from Hosmer and Lemeshow which involves a 1-M matching. In this case, there are three controls matched with each diagnosed case of breast cancer.
use http://www.gseis.ucla.edu/courses/data/bbdm13, clear
describe
Contains data from bbdm13.dta
obs: 200
vars: 15 5 Nov 2001 19:37
size: 10,400 (96.1% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
str float %9.0g stratum
obs float %9.0g
agmt float %9.0g age at interview
fndx float %9.0g final diagnosis
chk float %9.0g regular check-ups
agp1 float %9.0g age at 1st preg
agmn float %9.0g age at menarche
nlv float %9.0g number stillbirths
liv float %9.0g number live births
wt float %9.0g wt at interview
mst float %9.0g marital status
mar byte %8.0g married
mod byte %8.0g div or sep
wid byte %8.0g widowed
nvmr byte %8.0g never married
-------------------------------------------------------------------------------
summarize
Variable | Obs Mean Std. Dev. Min Max
-------------+-----------------------------------------------------
str | 200 25.5 14.46708 1 50
obs | 200 2.5 1.12084 1 4
agmt | 200 46.185 10.29323 27 68
fndx | 200 .25 .4340993 0 1
chk | 200 1.405 .4921239 1 2
agp1 | 178 23.57865 4.05847 14 40
agmn | 200 12.95 1.744338 8 17
nlv | 178 .5168539 .9638946 0 7
liv | 178 2.853933 1.544449 0 11
wt | 200 143.715 31.92994 80 280
mst | 200 1.655 1.234339 1 5
mar | 200 .725 .4476348 0 1
mod | 200 .13 .3371474 0 1
wid | 200 .085 .2795815 0 1
nvmr | 200 .06 .2380828 0 1
clogit fndx chk agmn wt mod wid nvmr, group(str) nolog
Conditional (fixed-effects) logistic regression Number of obs = 200
LR chi2(6) = 48.20
Prob > chi2 = 0.0000
Log likelihood = -45.214824 Pseudo R2 = 0.3477
------------------------------------------------------------------------------
fndx | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
chk | -1.121849 .4474471 -2.51 0.012 -1.998829 -.2448688
agmn | .3561333 .1291722 2.76 0.006 .1029605 .6093061
wt | -.0283565 .0099776 -2.84 0.004 -.0479122 -.0088009
mod | -.2030472 .6472909 -0.31 0.754 -1.471714 1.06562
wid | -.4915826 .8173094 -0.60 0.548 -2.09348 1.110314
nvmr | 1.472195 .7582064 1.94 0.052 -.0138621 2.958252
------------------------------------------------------------------------------
listcoef
clogit (N=200): Factor Change in Odds
Odds of: 1 vs 0
--------------------------------------------------
fndx | b z P>|z| e^b
-------------+------------------------------------
chk | -1.12185 -2.507 0.012 0.3257
agmn | 0.35613 2.757 0.006 1.4278
wt | -0.02836 -2.842 0.004 0.9720
mod | -0.20305 -0.314 0.754 0.8162
wid | -0.49158 -0.601 0.548 0.6117
nvmr | 1.47220 1.942 0.052 4.3588
--------------------------------------------------
listcoef, percent
clogit (N=200): Percentage Change in Odds
Odds of: 1 vs 0
--------------------------------------------------
fndx | b z P>|z| %
-------------+------------------------------------
chk | -1.12185 -2.507 0.012 -67.4
agmn | 0.35613 2.757 0.006 42.8
wt | -0.02836 -2.842 0.004 -2.8
mod | -0.20305 -0.314 0.754 -18.4
wid | -0.49158 -0.601 0.548 -38.8
nvmr | 1.47220 1.942 0.052 335.9
--------------------------------------------------
test mod wid nvmr
( 1) mod = 0.0
( 2) wid = 0.0
( 3) nvmr = 0.0
chi2( 3) = 4.99
Prob > chi2 = 0.1724
xtlogit fndx chk agmn wt mod wid nvmr, i(str) fe nolog
Conditional fixed-effects logistic regression Number of obs = 200
Group variable (i): str Number of groups = 50
Obs per group: min = 4
avg = 4.0
max = 4
LR chi2(6) = 48.20
Log likelihood = -45.214824 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
fndx | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
chk | -1.121849 .4474471 -2.51 0.012 -1.998829 -.2448688
agmn | .3561333 .1291722 2.76 0.006 .1029605 .6093061
wt | -.0283565 .0099776 -2.84 0.004 -.0479122 -.0088009
mod | -.2030472 .6472909 -0.31 0.754 -1.471714 1.06562
wid | -.4915826 .8173094 -0.60 0.548 -2.09348 1.110314
nvmr | 1.472195 .7582064 1.94 0.052 -.0138621 2.958252
------------------------------------------------------------------------------
clogit fndx chk agmn wt nvmr, group(str) nolog
Conditional (fixed-effects) logistic regression Number of obs = 200
LR chi2(4) = 47.75
Prob > chi2 = 0.0000
Log likelihood = -45.439011 Pseudo R2 = 0.3445
------------------------------------------------------------------------------
fndx | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
chk | -1.161303 .4469763 -2.60 0.009 -2.037361 -.285246
agmn | .3592472 .1278849 2.81 0.005 .1085973 .609897
wt | -.0282355 .0099785 -2.83 0.005 -.047793 -.0086781
nvmr | 1.593384 .7360284 2.16 0.030 .1507946 3.035973
------------------------------------------------------------------------------
listcoef
clogit (N=200): Factor Change in Odds
Odds of: 1 vs 0
--------------------------------------------------
fndx | b z P>|z| e^b
-------------+------------------------------------
chk | -1.16130 -2.598 0.009 0.3131
agmn | 0.35925 2.809 0.005 1.4323
wt | -0.02824 -2.830 0.005 0.9722
nvmr | 1.59338 2.165 0.030 4.9204
--------------------------------------------------
listcoef, percent
clogit (N=200): Percentage Change in Odds
Odds of: 1 vs 0
--------------------------------------------------
fndx | b z P>|z| %
-------------+------------------------------------
chk | -1.16130 -2.598 0.009 -68.7
agmn | 0.35925 2.809 0.005 43.2
wt | -0.02824 -2.830 0.005 -2.8
nvmr | 1.59338 2.165 0.030 392.0
--------------------------------------------------
Categorical Data Analysis Course
Phil Ender