Latent Class Analysis is a type of latent variable analysis in which the observed predictor variables are categorical and the latent (unobserved) response variable is also categorical. More formally, latent class analysis is a statistical method for finding subtypes of related cases (latent classes) from multivariate categorical data. In a sense, latent class analysis is like cluster analysis, in that, it attempts to find groups or classes of observations that are similar to one another.
MI Example
Four diagnostic criteria binary indicators
use http://www.gseis.ucla.edu/courses/data/rindskopf2a
list, sep(0) noobs
+-------------------------------+
| pat y1 y2 y3 y4 wt2 |
|-------------------------------|
| 1 1 1 1 1 24 |
| 2 0 1 1 1 5 |
| 3 1 0 1 1 4 |
| 4 0 0 1 1 3 |
| 5 1 1 0 1 3 |
| 6 0 1 0 1 5 |
| 7 1 0 0 1 2 |
| 8 0 0 0 1 7 |
| 9 1 1 1 0 0 |
| 10 0 1 1 0 0 |
| 11 1 0 1 0 0 |
| 12 0 0 1 0 1 |
| 13 1 1 0 0 0 |
| 14 0 1 0 0 7 |
| 15 1 0 0 0 0 |
| 16 0 0 0 0 33 |
+-------------------------------+
reshape long y, i(pat) j(var)
list in 1/8, sep(4)
+---------------------+
| pat var y wt2 |
|---------------------|
1. | 1 1 1 24 |
2. | 1 2 1 24 |
3. | 1 3 1 24 |
4. | 1 4 1 24 |
|---------------------|
5. | 2 1 0 5 |
6. | 2 2 1 5 |
7. | 2 3 1 5 |
8. | 2 4 1 5 |
+---------------------+
for num 1/4: generate vX = var==X
list in 1/8, sep(4)
+-----------------------------------------+
| pat var y wt2 v1 v2 v3 v4 |
|-----------------------------------------|
1. | 1 1 1 24 1 0 0 0 |
2. | 1 2 1 24 0 1 0 0 |
3. | 1 3 1 24 0 0 1 0 |
4. | 1 4 1 24 0 0 0 1 |
|-----------------------------------------|
5. | 2 1 0 5 1 0 0 0 |
6. | 2 2 1 5 0 1 0 0 |
7. | 2 3 1 5 0 0 1 0 |
8. | 2 4 1 5 0 0 0 1 |
+-----------------------------------------+
Now that the data are organized the way we want, we can begin the latent class analysis. The
option nip(2) indicates that we want two latent classes.
eq v1: v1
eq v2: v2
eq v3: v3
eq v4: v4
gllamm y, i(pat) ip(fn) nrf(4) eqs(v1 v2 v3 v4) we(wt) nip(2) l(logit) f(binom) nocons
number of level 1 units = 376
number of level 2 units = 94
Condition Number = 4452.1073
gllamm model
log likelihood = -180.69771
No fixed effects
Probabilities and locations of random effects
------------------------------------------------------------------------------
***level 2 (pat)
loc1: -17.584, 1.1903
var(1): 87.492802
loc2: -1.4173, 1.3333
cov(2,1): 12.818057
var(2): 1.8778983
loc3: -3.5875, 1.5708
cov(3,1): 24.038823
cov(3,2): 3.5217869
var(3): 6.6047149
loc4: -1.4143, 16.845
cov(4,1): 85.093401
cov(4,2): 12.466535
cov(4,3): 23.379583
var(4): 82.759801
prob: 0.5422, 0.4578
log odds parameters
class 1
_cons: .16913132 (.22601687)
------------------------------------------------------------------------------
display 94*0.5422
50.9668
display 94*0.4578
43.0332
/* reorganizing the output manually */
class1 class2
loc1: -17.584 1.1903
loc2: -1.4173 1.3333
loc3: -3.5875 1.5708
loc4: -1.4143 16.845
prob: 0.5422 0.4578
ecount: 50.9668 43.0332
Class 2 is the class most likely to have an MI (Pr = .4578) with an expected count of 43.0332.If you wish to see the latent class coefficients along with their standard errors, try this code:
mat b=e(b)
mat V = e(V)
ereturn post b V,
ereturn display
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
z2_1_1 v1 | -17.58414 952.4511 -0.02 0.985 -1884.354 1849.186
-------------+----------------------------------------------------------------
z2_2_1 v2 | -1.417265 .3881472 -3.65 0.000 -2.178019 -.6565103
-------------+----------------------------------------------------------------
z2_3_1 v3 | -3.587496 1.008792 -3.56 0.000 -5.564692 -1.610299
-------------+----------------------------------------------------------------
z2_4_1 v4 | -1.414294 .4120723 -3.43 0.001 -2.221941 -.6066475
-------------+----------------------------------------------------------------
p2_1 _cons | .1691313 .2260169 0.75 0.454 -.2738536 .6121162
-------------+----------------------------------------------------------------
z2_1_2 v1 | 1.190308 .4176254 2.85 0.004 .3717775 2.008839
-------------+----------------------------------------------------------------
z2_2_2 v2 | 1.33327 .3892167 3.43 0.001 .5704188 2.09612
-------------+----------------------------------------------------------------
z2_3_2 v3 | 1.570822 .4738385 3.32 0.001 .6421153 2.499528
-------------+----------------------------------------------------------------
z2_4_2 v4 | 16.84528 701.791 0.02 0.981 -1358.64 1392.33
------------------------------------------------------------------------------
We can obtain the predicted probability for each pattern of being in a class,
Pr(c=1|yj) and Pr(c=2|yj), using the gllapred command.
gllapred prob, p merge pat using rindskopf2a, keep(y1 y2 y3 y4) list pat y1 y2 y3 y4 wt2 prob1 prob2 if var==1, sep(0) noobs +-------------------------------------------------------+ | pat y1 y2 y3 y4 wt2 prob1 prob2 | |-------------------------------------------------------| | 1 1 1 1 1 24 5.589e-11 1 | | 2 0 1 1 1 5 .00789839 .99210161 | | 3 1 0 1 1 4 8.748e-10 1 | | 4 0 0 1 1 3 .11079636 .88920364 | | 5 1 1 0 1 3 9.718e-09 .99999999 | | 6 0 1 0 1 5 .58057906 .41942094 | | 7 1 0 0 1 2 1.521e-07 .99999985 | | 8 0 0 0 1 7 .95587857 .04412143 | | 9 1 1 1 0 0 .00473495 .99526505 | | 10 0 1 1 0 0 .99999852 1.476e-06 | | 11 1 0 1 0 0 .06929933 .93070067 | | 12 0 0 1 0 1 .99999991 9.428e-08 | | 13 1 1 0 0 0 .45271194 .54728806 | | 14 0 1 0 0 7 .99999999 8.487e-09 | | 15 1 0 0 0 0 .92829674 .07170326 | | 16 0 0 0 0 33 1 5.423e-10 | +-------------------------------------------------------+We can classify the observation into the latent classes based upon which class has the larger probability.
generate class2 = prob2>prob1 list pat y1 y2 y3 y4 wt2 class2 if var==1, sep(0) noobs +----------------------------------------+ | pat y1 y2 y3 y4 wt2 class2 | |----------------------------------------| | 1 1 1 1 1 24 1 | | 2 0 1 1 1 5 1 | | 3 1 0 1 1 4 1 | | 4 0 0 1 1 3 1 | | 5 1 1 0 1 3 1 | | 6 0 1 0 1 5 0 | | 7 1 0 0 1 2 1 | | 8 0 0 0 1 7 0 | | 9 1 1 1 0 0 1 | | 10 0 1 1 0 0 0 | | 11 1 0 1 0 0 1 | | 12 0 0 1 0 1 0 | | 13 1 1 0 0 0 1 | | 14 0 1 0 0 7 0 | | 15 1 0 0 0 0 0 | | 16 0 0 0 0 33 0 | +----------------------------------------+We can also use gllapred to compute the conditional response probabilities, in particular, Pr(yij=1|c=2), also know as the sensitivity.
generate e1=1.1903 generate e2=1.3333 generate e3=1.5706 generate e4=16.845 gllapred cprob, mu us(e) li v1-v4 cprob in 1/4, noobs +-------------------------------+ | v1 v2 v3 v4 cprob | |-------------------------------| | 1 0 0 0 .76679471 | | 0 1 0 0 .79138597 | | 0 0 1 0 .82786913 | | 0 0 0 1 .99999995 | +-------------------------------+Next, we will use gllapred to obtain the expected counts for each of the patterns.
gllapred l, ll (ll will be stored in l) log-likelihood:-180.69771 generate count = e(N)*exp(l) list pat y1 y2 y3 y4 wt2 count if var==1, sep(0) noobs +------------------------------------------+ | pat y1 y2 y3 y4 wt2 count | |------------------------------------------| | 1 1 1 1 1 24 21.62042 | | 2 0 1 1 1 5 6.627714 | | 3 1 0 1 1 4 5.699445 | | 4 0 0 1 1 3 1.949337 | | 5 1 1 0 1 3 4.49433 | | 6 0 1 0 1 5 3.258896 | | 7 1 0 0 1 2 1.184768 | | 8 0 0 0 1 7 8.166566 | | 9 1 1 1 0 0 . | | 10 0 1 1 0 0 . | | 11 1 0 1 0 0 . | | 12 0 0 1 0 1 .8884496 | | 13 1 1 0 0 0 . | | 14 0 1 0 0 7 7.783092 | | 15 1 0 0 0 0 . | | 16 0 0 0 0 33 32.11164 | +------------------------------------------+
Categorical Data Analysis Course
Phil Ender