Applied Categorical & Nonnormal Data Analysis

Latent Class Analysis Example

Latent Class Analysis is a type of latent variable analysis in which the observed predictor variables are categorical and the latent (unobserved) response variable is also categorical. More formally, latent class analysis is a statistical method for finding subtypes of related cases (latent classes) from multivariate categorical data. In a sense, latent class analysis is like cluster analysis, in that, it attempts to find groups or classes of observations that are similar to one another.

MI Example

Four diagnostic criteria binary indicators

y1 - presence of a Q-wave in the EKG (qwave)
y2 - presence of classical clinical history (hist)
y3 - presence of flipped LDH (ldh)
y4 - presence of CPK-MB (cpk)

The more indicators present the greater the likelihood on an MI.

use http://www.gseis.ucla.edu/courses/data/rindskopf2a

list, sep(0) noobs

  +-------------------------------+
  | pat   y1   y2   y3   y4   wt2 |
  |-------------------------------|
  |   1    1    1    1    1    24 |
  |   2    0    1    1    1     5 |
  |   3    1    0    1    1     4 |
  |   4    0    0    1    1     3 |
  |   5    1    1    0    1     3 |
  |   6    0    1    0    1     5 |
  |   7    1    0    0    1     2 |
  |   8    0    0    0    1     7 |
  |   9    1    1    1    0     0 |
  |  10    0    1    1    0     0 |
  |  11    1    0    1    0     0 |
  |  12    0    0    1    0     1 |
  |  13    1    1    0    0     0 |
  |  14    0    1    0    0     7 |
  |  15    1    0    0    0     0 |
  |  16    0    0    0    0    33 |
  +-------------------------------+
 
reshape long y, i(pat) j(var)
 
list in 1/8, sep(4)

     +---------------------+
     | pat   var   y   wt2 |
     |---------------------|
  1. |   1     1   1    24 |
  2. |   1     2   1    24 |
  3. |   1     3   1    24 |
  4. |   1     4   1    24 |
     |---------------------|
  5. |   2     1   0     5 |
  6. |   2     2   1     5 |
  7. |   2     3   1     5 |
  8. |   2     4   1     5 |
     +---------------------+
  
for num 1/4: generate vX = var==X
 
list in 1/8, sep(4)
 
     +-----------------------------------------+
     | pat   var   y   wt2   v1   v2   v3   v4 |
     |-----------------------------------------|
  1. |   1     1   1    24    1    0    0    0 |
  2. |   1     2   1    24    0    1    0    0 |
  3. |   1     3   1    24    0    0    1    0 |
  4. |   1     4   1    24    0    0    0    1 |
     |-----------------------------------------|
  5. |   2     1   0     5    1    0    0    0 |
  6. |   2     2   1     5    0    1    0    0 |
  7. |   2     3   1     5    0    0    1    0 |
  8. |   2     4   1     5    0    0    0    1 |
     +-----------------------------------------+

Now that the data are organized the way we want, we can begin the latent class analysis. The option nip(2) indicates that we want two latent classes.

eq v1: v1
eq v2: v2
eq v3: v3
eq v4: v4
 
gllamm y, i(pat) ip(fn) nrf(4) eqs(v1 v2 v3 v4) we(wt) nip(2) l(logit) f(binom) nocons
  
number of level 1 units = 376
number of level 2 units = 94
 
Condition Number = 4452.1073
 
gllamm model
log likelihood = -180.69771
No fixed effects
  
Probabilities and locations of random effects
------------------------------------------------------------------------------
 
***level 2 (pat)
 
    loc1: -17.584, 1.1903
  var(1): 87.492802
 
    loc2: -1.4173, 1.3333
cov(2,1): 12.818057
  var(2): 1.8778983
 
    loc3: -3.5875, 1.5708
cov(3,1): 24.038823
cov(3,2): 3.5217869
  var(3): 6.6047149
 
    loc4: -1.4143, 16.845
cov(4,1): 85.093401
cov(4,2): 12.466535
cov(4,3): 23.379583
  var(4): 82.759801
    prob: 0.5422, 0.4578
 
    log odds parameters
    class 1
    _cons: .16913132 (.22601687)
------------------------------------------------------------------------------
 
display 94*0.5422
50.9668

display 94*0.4578
43.0332
 
/*  reorganizing the output manually  */
 
         class1   class2
 loc1:  -17.584    1.1903
 loc2:   -1.4173   1.3333
 loc3:   -3.5875   1.5708
 loc4:   -1.4143  16.845
 prob:    0.5422   0.4578
ecount:  50.9668  43.0332

Class 2 is the class most likely to have an MI (Pr = .4578) with an expected count of 43.0332.

If you wish to see the latent class coefficients along with their standard errors, try this code:

mat b=e(b)
mat V = e(V)
ereturn post b V, 
ereturn display

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
z2_1_1    v1 |  -17.58414   952.4511    -0.02   0.985    -1884.354    1849.186
-------------+----------------------------------------------------------------
z2_2_1    v2 |  -1.417265   .3881472    -3.65   0.000    -2.178019   -.6565103
-------------+----------------------------------------------------------------
z2_3_1    v3 |  -3.587496   1.008792    -3.56   0.000    -5.564692   -1.610299
-------------+----------------------------------------------------------------
z2_4_1    v4 |  -1.414294   .4120723    -3.43   0.001    -2.221941   -.6066475
-------------+----------------------------------------------------------------
p2_1   _cons |   .1691313   .2260169     0.75   0.454    -.2738536    .6121162
-------------+----------------------------------------------------------------
z2_1_2    v1 |   1.190308   .4176254     2.85   0.004     .3717775    2.008839
-------------+----------------------------------------------------------------
z2_2_2    v2 |    1.33327   .3892167     3.43   0.001     .5704188     2.09612
-------------+----------------------------------------------------------------
z2_3_2    v3 |   1.570822   .4738385     3.32   0.001     .6421153    2.499528
-------------+----------------------------------------------------------------
z2_4_2    v4 |   16.84528    701.791     0.02   0.981     -1358.64     1392.33
------------------------------------------------------------------------------

We can obtain the predicted probability for each pattern of being in a class, Pr(c=1|y_j) and Pr(c=2|y_j), using the gllapred command.

gllapred prob, p
 
merge pat using rindskopf2a, keep(y1 y2 y3 y4)
 
list pat y1 y2 y3 y4 wt2 prob1 prob2 if var==1, sep(0) noobs
 
  +-------------------------------------------------------+
  | pat   y1   y2   y3   y4   wt2       prob1       prob2 |
  |-------------------------------------------------------|
  |   1    1    1    1    1    24   5.589e-11           1 |
  |   2    0    1    1    1     5   .00789839   .99210161 |
  |   3    1    0    1    1     4   8.748e-10           1 |
  |   4    0    0    1    1     3   .11079636   .88920364 |
  |   5    1    1    0    1     3   9.718e-09   .99999999 |
  |   6    0    1    0    1     5   .58057906   .41942094 |
  |   7    1    0    0    1     2   1.521e-07   .99999985 |
  |   8    0    0    0    1     7   .95587857   .04412143 |
  |   9    1    1    1    0     0   .00473495   .99526505 |
  |  10    0    1    1    0     0   .99999852   1.476e-06 |
  |  11    1    0    1    0     0   .06929933   .93070067 |
  |  12    0    0    1    0     1   .99999991   9.428e-08 |
  |  13    1    1    0    0     0   .45271194   .54728806 |
  |  14    0    1    0    0     7   .99999999   8.487e-09 |
  |  15    1    0    0    0     0   .92829674   .07170326 |
  |  16    0    0    0    0    33           1   5.423e-10 |
  +-------------------------------------------------------+

We can classify the observation into the latent classes based upon which class has the larger probability.

generate class2 = prob2>prob1
 
list pat y1 y2 y3 y4 wt2 class2 if var==1, sep(0) noobs
 
  +----------------------------------------+
  | pat   y1   y2   y3   y4   wt2   class2 |
  |----------------------------------------|
  |   1    1    1    1    1    24        1 |
  |   2    0    1    1    1     5        1 |
  |   3    1    0    1    1     4        1 |
  |   4    0    0    1    1     3        1 |
  |   5    1    1    0    1     3        1 |
  |   6    0    1    0    1     5        0 |
  |   7    1    0    0    1     2        1 |
  |   8    0    0    0    1     7        0 |
  |   9    1    1    1    0     0        1 |
  |  10    0    1    1    0     0        0 |
  |  11    1    0    1    0     0        1 |
  |  12    0    0    1    0     1        0 |
  |  13    1    1    0    0     0        1 |
  |  14    0    1    0    0     7        0 |
  |  15    1    0    0    0     0        0 |
  |  16    0    0    0    0    33        0 |
  +----------------------------------------+

We can also use gllapred to compute the conditional response probabilities, in particular, Pr(y_ij=1|c=2), also know as the sensitivity.

generate e1=1.1903
generate e2=1.3333
generate e3=1.5706
generate e4=16.845
 
gllapred cprob, mu us(e)
 
li v1-v4 cprob in 1/4, noobs
 
  +-------------------------------+
  | v1   v2   v3   v4       cprob |
  |-------------------------------|
  |  1    0    0    0   .76679471 |
  |  0    1    0    0   .79138597 |
  |  0    0    1    0   .82786913 |
  |  0    0    0    1   .99999995 |
  +-------------------------------+

Next, we will use gllapred to obtain the expected counts for each of the patterns.

gllapred l, ll
(ll will be stored in l)
log-likelihood:-180.69771
 
generate count = e(N)*exp(l)
 
list pat y1 y2 y3 y4 wt2 count if var==1, sep(0) noobs
 
  +------------------------------------------+
  | pat   y1   y2   y3   y4   wt2      count |
  |------------------------------------------|
  |   1    1    1    1    1    24   21.62042 |
  |   2    0    1    1    1     5   6.627714 |
  |   3    1    0    1    1     4   5.699445 |
  |   4    0    0    1    1     3   1.949337 |
  |   5    1    1    0    1     3    4.49433 |
  |   6    0    1    0    1     5   3.258896 |
  |   7    1    0    0    1     2   1.184768 |
  |   8    0    0    0    1     7   8.166566 |
  |   9    1    1    1    0     0          . |
  |  10    0    1    1    0     0          . |
  |  11    1    0    1    0     0          . |
  |  12    0    0    1    0     1   .8884496 |
  |  13    1    1    0    0     0          . |
  |  14    0    1    0    0     7   7.783092 |
  |  15    1    0    0    0     0          . |
  |  16    0    0    0    0    33   32.11164 |
  +------------------------------------------+

Categorical Data Analysis Course

Phil Ender