Multivariate Analysis
Classification of Observations


In addition to discriminanting between groups discriminant analysis allows for the classification of observations into groups. Classification can be performed using the observed response variables or more commonly the values of the linear discriminant function. We will begin by looking at classification using observed variables.

Using observed variables

Let xbar1 be the mean vector for group 1 and let xbarj be the mean vector for group j on the observed variables X.

Let x1 be a vector the observed scores for subject 1 and let xi be a vector of observed scores for subject i.

Let d11 be the difference vector for subject 1 using the mean vector for group 1, that is, d11 = x1 - xbar1. And let d1j = x1 - xbarj.

Using the pooled within covariance matrix, Cw

Cw = (S1 + S2 + ... + Sk)/(N-k), where k is the number of groups.

For each subject and each group compute the quantity

χij2 = dij*Cw-1*dij'

Thus, if there are three groups there will be three values of χij2 computed for each subject. The subject will the be classified into the group with the smallest χij2.

Using a separate covariance matrix for each group, Cj

Cj = Sj/(nj - 1), where j is the group identifier.

Let xi, xbarj and dij be defined as above.

For each subject and each group compute the quantity

χ'ij2 = dij*Cj-1*dij' + ln|Cj|

Again, the subject will the be classified into the group with the smallest χ'ij2.

Taking prior probabilities into account

Let xi, xbarj and dij be defined as above.

Let p1 = n1/N be the prior probability for group 1 and let pk = nk/N be the prior probability for group k. For each subject and each group compute the quantity

χ''ij2 = χ'ij2 - 2ln pk

The subject will the be classified into the group with the smallest χ''ij2.

Example using hsb2 with variables write and math

/* mean vectors for each group */

mat xb1 = (51.33, 50.02)  /* n1 = 45  */
mat xb2 = (56.26, 56.73)  /* n2 = 105 */
mat xb3 = (46.76, 46.42)  /* n3 = 50  */

/* pooled within covariance matrix */
mat list Cw

symmetric Cw[2,2]
           write       math
write  74.635417
 math  37.430998   68.34361

/* separate group covariance matriced */

mat list C1

symmetric C1[2,2]
           write       math
write  88.318182
 math  25.083333  55.385859

mat list C2

symmetric C2[2,2]
           write       math
write  63.096703
 math  42.511538  76.216667

mat list C3

symmetric C3[2,2]
           write       math
write  86.839184
 math   37.73551   63.26898
 
/* determinants of covariance matrices */
scalar det1 = det(C1)
scalar det2 = det(C2)
scalar det3 = det(C3)

display det1 "  " det2 "  " det3

4262.4047  3001.7895  4070.2578

/* prior probabilities */
scalar p1 = 45/200
scalar p2 = 105/200
scalar p3 = 50/200

display p1 "  " p2 "  " p3

.225  .525  .25

/* scores for subject 1 & 2 */
mat x1 =  (52, 41)            /* actual group membership -- group 1 */
mat x2 =  (41, 44)            /* actual group membership -- group 3 */
mat x3 =  (64, 70)            /* actual group membership -- group 2 */

/* difference vectors for subject 1 from each group */
mat d11 = x1-xb1
mat d12 = x1-xb2
mat d13 = x1-xb3

/* chi-square for subject 1 */
mat xc11 = d11*syminv(Cw)*d11'
mat xc12 = d12*syminv(Cw)*d12'
mat xc13 = d13*syminv(Cw)*d13'

symmetric xc11[1,1]
           r1
r1  1.7718562

symmetric xc12[1,1]
           r1
r1  3.9707944

symmetric xc13[1,1]
           r1
r1  1.6744838

/* classify subject 1 into group 3 */

/* chi-square prime for subject 1 */
mat xcp11 = d11*syminv(C1)*d11' + ln(det1)
mat xcp12 = d12*syminv(C2)*d12' + ln(det2)
mat xcp13 = d13*syminv(C3)*d13' + ln(det3)

symmetric xcp11[1,1]
           c1
r1  4264.1675

symmetric xcp12[1,1]
           c1
r1  3005.5532

symmetric xcp13[1,1]
          c1
r1  4071.838

/* classify subject 1 into group 2 */


/* chi-square double prime for subject 1 */
mat xcpp11 = xcp11 - 2*ln(p1)
mat xcpp12 = xcp12 - 2*ln(p2)
mat xcpp13 = xcp13 - 2*ln(p3)

symmetric xcpp11[1,1]
           c1
r1  4267.1508

symmetric xcpp12[1,1]
          c1
r1  3006.842

symmetric xcpp13[1,1]
           c1
r1  4074.6106

/* classify subject 1 into group 2 */

/* table for all three subjects */

using pooled within covariance matrix
S  Grp1     Grp2     Grp3   Class as
1  1.77     3.97     1.67     3
2  1.44     3.64      .45     3
3  5.89     2.58     8.48     2

using separate group covariance matrices
S  Grp1     Grp2     Grp3   Class as
1  10.12    11.77    9.89     3
2   9.76    11.82    8.69     3
3  15.74    10.32   17.26     2

using prior probabilities
S  Grp1     Grp2     Grp3   Class as
1  13.10    13.06    12.66    3
2  12.75    13.11    11.47    3
3  18.72    11.61    20.03    2
Since there are often more observed variables than discriminant functions, it is usually more efficient to do the classification using the the discriminant function scores. The computations are exactly the same as with observed variables.

In this small example there is no particular saving by using the discriminant scores from all of the dimensions.

Using discriminant function scores

candisc write math in 1/200, group(prog) notable nomeans nostruct

Canonical linear discriminant analysis

      |                                 | Like- 
      | Canon.   Eigen-     Variance    | lihood
  Fcn | Corr.    value   Prop.   Cumul. | Ratio     F      df1    df2  Prob>F
  ----+---------------------------------+------------------------------------
    1 | 0.5038  .340237  0.9882  0.9882 | 0.7431  15.683     4    392  0.0000 e
    2 | 0.0636  .004058  0.0118  1.0000 | 0.9960  .79952     1    197  0.3723 e
  ---------------------------------------------------------------------------
  Ho: this and smaller canon. corr. are zero;                     e = exact F

Standardized canonical discriminant function coefficients

                 | function1  function2 
    -------------+----------------------
           write |  .4198644  -1.096543 
            math |  .7138331   .9322744

/* display raw coefficients */
mat lis e(L_unstd)

e(L_unstd)[3,2]
        function1   function2
write   .04860004  -.12692679
 math   .08634709   .11277032
_cons  -7.1106094   .76176766

/* generate the discriminant function scores
generate f1 =  -7.1106 +  0.0486*write +  0.0863*math
generate f2 =    .7618 -  0.1269*write +  0.1128*math

/* mean vector for each function and group */
mat fb1 = (-0.2965,  -0.1128)
mat fb2 = ( 0.5222,   0.0191)
mat fb3 = (-0.8298,   0.0615)

/* discriminant function scores for each subject */
mat f1 = (-1.0451, -1.2122)
mat f2 = (-1.3208,   .5221)
mat f2 = ( 2.0408,   .5362)

/* table for all three subjects using discriminant scores */

using pooled within covariance matrix
differs from using observed variables only by rounding error
S  Grp1     Grp2     Grp3   Class as
1  1.77     3.97     1.67     3
2  1.45     3.65      .45     3
3  5.88     2.57     8.47     2

using separate group covariance matrices
S  Grp1     Grp2     Grp3   Class as
1   1.90     3.55    1.67     3
2   1.55     3.62     .48     3
3   7.50     2.10    9.03     2

using prior probabilities
S  Grp1     Grp2     Grp3   Class as
1   4.89     4.84     4.44    3
2   4.54     4.91     3.25    3
3  10.49     3.39    11.80    2
One advantage to using discriminant functions scores is that you may want to use only the scores from the significant dimensions. In our example only the first dimension is statistically significant. Since we are using fewer scores this approach can be considered to be using reduced dimensionality.

Using only the significant discriminant function scores

/* mean vector for each group on the single function */
mat fb1 = (-0.2965)
mat fb2 = ( 0.5222)
mat fb3 = (-0.8298)

/* discriminant function scores for each subject */
mat f1 = (-1.0451)
mat f2 = (-1.3208)
mat f2 = ( 2.0408)

/* table for all three subjects using discriminant scores */

using pooled within variance matrix
S  Grp1     Grp2     Grp3   Class as
1   .56     2.46      .046    3
2  1.05     3.40      .241    3
3  5.46     2.31     8.240    2

using separate group variance matrices
S  Grp1     Grp2     Grp3   Class as
1  1.51     3.36     1.040    3
2  2.09     4.24     1.236    3
3  7.40     3.22     9.287    2

using prior probabilities
S  Grp1     Grp2     Grp3   Class as
1  4.49     4.65     3.81     3
2  5.08     5.53     4.01     3
3 10.38     4.51    12.06     2
Classifying with unknown group membership

The real utility of classification comes when you have scores on individuals with unknown group membership. Using the hsb2 dataset we will create three new cases and then using candisc and predict classify them into groups.

use http://www.gseis.ucla.edu/courses/data/hsb2, clear

set obs 203

replace read =40 in 201
replace write=40 in 201
replace math =40 in 201

replace read =50 in 202
replace write=50 in 202
replace math =50 in 202

replace read =60 in 203
replace write=60 in 203
replace math =60 in 203

list read write math prog in 201/203, clean

       read   write   math   prog  
201.     40      40     40      .  
202.     50      50     50      .  
203.     60      60     60      .  

candisc read write math in 1/200, group(prog) notable nostruct

Canonical linear discriminant analysis

      |                                 | Like- 
      | Canon.   Eigen-     Variance    | lihood
  Fcn | Corr.    value   Prop.   Cumul. | Ratio     F      df1    df2  Prob>F
  ----+---------------------------------+------------------------------------
    1 | 0.5125  .356283  0.9874  0.9874 | 0.7340   10.87     6    390  0.0000 e
    2 | 0.0672  .004543  0.0126  1.0000 | 0.9955  .44518     2    196  0.6414 e
  ---------------------------------------------------------------------------
  Ho: this and smaller canon. corr. are zero;                     e = exact F

Standardized canonical discriminant function coefficients

                 | function1  function2 
    -------------+----------------------
            read |  .2728524   .4097932 
           write |  .3310784  -1.183414 
            math |  .5815538    .655658 

Group means on canonical variables

            prog | function1  function2 
    -------------+----------------------
         general | -.3120021  -.1190423 
        academic |  .5358515   .0196809 
        vocation | -.8444861   .0658081 


predict class, classification

list read write math prog class in 201/203, clean

       read   write   math   prog   class  
201.     40      40     40      .       3  
202.     50      50     50      .       1  
203.     60      60     60      .       2 


Multivariate Course Page

Phil Ender, 29jul07, 30oct05