In addition to discriminanting between groups discriminant analysis allows for the classification of observations into groups. Classification can be performed using the observed response variables or more commonly the values of the linear discriminant function. We will begin by looking at classification using observed variables.
Using observed variables
Let xbar1 be the mean vector for group 1 and let xbarj be the mean vector for group j on the observed variables X.
Let x1 be a vector the observed scores for subject 1 and let xi be a vector of observed scores for subject i.
Let d11 be the difference vector for subject 1 using the mean vector for group 1, that is, d11 = x1 - xbar1. And let d1j = x1 - xbarj.
Using the pooled within covariance matrix, Cw
Cw = (S1 + S2 + ... + Sk)/(N-k), where k is the number of groups.
For each subject and each group compute the quantity
χij2 = dij*Cw-1*dij'
Thus, if there are three groups there will be three values of χij2 computed for each subject. The subject will the be classified into the group with the smallest χij2.
Using a separate covariance matrix for each group, Cj
Cj = Sj/(nj - 1), where j is the group identifier.
Let xi, xbarj and dij be defined as above.
For each subject and each group compute the quantity
χ'ij2 = dij*Cj-1*dij' + ln|Cj|
Again, the subject will the be classified into the group with the smallest χ'ij2.
Taking prior probabilities into account
Let xi, xbarj and dij be defined as above.
Let p1 = n1/N be the prior probability for group 1 and let pk = nk/N be the prior probability for group k. For each subject and each group compute the quantity
χ''ij2 = χ'ij2 - 2ln pk
The subject will the be classified into the group with the smallest χ''ij2.
Example using hsb2 with variables write and math
/* mean vectors for each group */
mat xb1 = (51.33, 50.02) /* n1 = 45 */
mat xb2 = (56.26, 56.73) /* n2 = 105 */
mat xb3 = (46.76, 46.42) /* n3 = 50 */
/* pooled within covariance matrix */
mat list Cw
symmetric Cw[2,2]
write math
write 74.635417
math 37.430998 68.34361
/* separate group covariance matriced */
mat list C1
symmetric C1[2,2]
write math
write 88.318182
math 25.083333 55.385859
mat list C2
symmetric C2[2,2]
write math
write 63.096703
math 42.511538 76.216667
mat list C3
symmetric C3[2,2]
write math
write 86.839184
math 37.73551 63.26898
/* determinants of covariance matrices */
scalar det1 = det(C1)
scalar det2 = det(C2)
scalar det3 = det(C3)
display det1 " " det2 " " det3
4262.4047 3001.7895 4070.2578
/* prior probabilities */
scalar p1 = 45/200
scalar p2 = 105/200
scalar p3 = 50/200
display p1 " " p2 " " p3
.225 .525 .25
/* scores for subject 1 & 2 */
mat x1 = (52, 41) /* actual group membership -- group 1 */
mat x2 = (41, 44) /* actual group membership -- group 3 */
mat x3 = (64, 70) /* actual group membership -- group 2 */
/* difference vectors for subject 1 from each group */
mat d11 = x1-xb1
mat d12 = x1-xb2
mat d13 = x1-xb3
/* chi-square for subject 1 */
mat xc11 = d11*syminv(Cw)*d11'
mat xc12 = d12*syminv(Cw)*d12'
mat xc13 = d13*syminv(Cw)*d13'
symmetric xc11[1,1]
r1
r1 1.7718562
symmetric xc12[1,1]
r1
r1 3.9707944
symmetric xc13[1,1]
r1
r1 1.6744838
/* classify subject 1 into group 3 */
/* chi-square prime for subject 1 */
mat xcp11 = d11*syminv(C1)*d11' + ln(det1)
mat xcp12 = d12*syminv(C2)*d12' + ln(det2)
mat xcp13 = d13*syminv(C3)*d13' + ln(det3)
symmetric xcp11[1,1]
c1
r1 4264.1675
symmetric xcp12[1,1]
c1
r1 3005.5532
symmetric xcp13[1,1]
c1
r1 4071.838
/* classify subject 1 into group 2 */
/* chi-square double prime for subject 1 */
mat xcpp11 = xcp11 - 2*ln(p1)
mat xcpp12 = xcp12 - 2*ln(p2)
mat xcpp13 = xcp13 - 2*ln(p3)
symmetric xcpp11[1,1]
c1
r1 4267.1508
symmetric xcpp12[1,1]
c1
r1 3006.842
symmetric xcpp13[1,1]
c1
r1 4074.6106
/* classify subject 1 into group 2 */
/* table for all three subjects */
using pooled within covariance matrix
S Grp1 Grp2 Grp3 Class as
1 1.77 3.97 1.67 3
2 1.44 3.64 .45 3
3 5.89 2.58 8.48 2
using separate group covariance matrices
S Grp1 Grp2 Grp3 Class as
1 10.12 11.77 9.89 3
2 9.76 11.82 8.69 3
3 15.74 10.32 17.26 2
using prior probabilities
S Grp1 Grp2 Grp3 Class as
1 13.10 13.06 12.66 3
2 12.75 13.11 11.47 3
3 18.72 11.61 20.03 2
Since there are often more observed variables than discriminant functions, it is usually
more efficient to do the classification using the the discriminant function scores. The
computations are exactly the same as with observed variables.
In this small example there is no particular saving by using the discriminant scores from all of the dimensions.
Using discriminant function scores
candisc write math in 1/200, group(prog) notable nomeans nostruct
Canonical linear discriminant analysis
| | Like-
| Canon. Eigen- Variance | lihood
Fcn | Corr. value Prop. Cumul. | Ratio F df1 df2 Prob>F
----+---------------------------------+------------------------------------
1 | 0.5038 .340237 0.9882 0.9882 | 0.7431 15.683 4 392 0.0000 e
2 | 0.0636 .004058 0.0118 1.0000 | 0.9960 .79952 1 197 0.3723 e
---------------------------------------------------------------------------
Ho: this and smaller canon. corr. are zero; e = exact F
Standardized canonical discriminant function coefficients
| function1 function2
-------------+----------------------
write | .4198644 -1.096543
math | .7138331 .9322744
/* display raw coefficients */
mat lis e(L_unstd)
e(L_unstd)[3,2]
function1 function2
write .04860004 -.12692679
math .08634709 .11277032
_cons -7.1106094 .76176766
/* generate the discriminant function scores
generate f1 = -7.1106 + 0.0486*write + 0.0863*math
generate f2 = .7618 - 0.1269*write + 0.1128*math
/* mean vector for each function and group */
mat fb1 = (-0.2965, -0.1128)
mat fb2 = ( 0.5222, 0.0191)
mat fb3 = (-0.8298, 0.0615)
/* discriminant function scores for each subject */
mat f1 = (-1.0451, -1.2122)
mat f2 = (-1.3208, .5221)
mat f2 = ( 2.0408, .5362)
/* table for all three subjects using discriminant scores */
using pooled within covariance matrix
differs from using observed variables only by rounding error
S Grp1 Grp2 Grp3 Class as
1 1.77 3.97 1.67 3
2 1.45 3.65 .45 3
3 5.88 2.57 8.47 2
using separate group covariance matrices
S Grp1 Grp2 Grp3 Class as
1 1.90 3.55 1.67 3
2 1.55 3.62 .48 3
3 7.50 2.10 9.03 2
using prior probabilities
S Grp1 Grp2 Grp3 Class as
1 4.89 4.84 4.44 3
2 4.54 4.91 3.25 3
3 10.49 3.39 11.80 2
One advantage to using discriminant functions scores is that you may want to use only
the scores from the significant dimensions. In our example only the first dimension
is statistically significant. Since we are using fewer scores this approach can be
considered to be using reduced dimensionality.Using only the significant discriminant function scores
/* mean vector for each group on the single function */ mat fb1 = (-0.2965) mat fb2 = ( 0.5222) mat fb3 = (-0.8298) /* discriminant function scores for each subject */ mat f1 = (-1.0451) mat f2 = (-1.3208) mat f2 = ( 2.0408) /* table for all three subjects using discriminant scores */ using pooled within variance matrix S Grp1 Grp2 Grp3 Class as 1 .56 2.46 .046 3 2 1.05 3.40 .241 3 3 5.46 2.31 8.240 2 using separate group variance matrices S Grp1 Grp2 Grp3 Class as 1 1.51 3.36 1.040 3 2 2.09 4.24 1.236 3 3 7.40 3.22 9.287 2 using prior probabilities S Grp1 Grp2 Grp3 Class as 1 4.49 4.65 3.81 3 2 5.08 5.53 4.01 3 3 10.38 4.51 12.06 2Classifying with unknown group membership
The real utility of classification comes when you have scores on individuals with unknown group membership. Using the hsb2 dataset we will create three new cases and then using candisc and predict classify them into groups.
use http://www.gseis.ucla.edu/courses/data/hsb2, clear
set obs 203
replace read =40 in 201
replace write=40 in 201
replace math =40 in 201
replace read =50 in 202
replace write=50 in 202
replace math =50 in 202
replace read =60 in 203
replace write=60 in 203
replace math =60 in 203
list read write math prog in 201/203, clean
read write math prog
201. 40 40 40 .
202. 50 50 50 .
203. 60 60 60 .
candisc read write math in 1/200, group(prog) notable nostruct
Canonical linear discriminant analysis
| | Like-
| Canon. Eigen- Variance | lihood
Fcn | Corr. value Prop. Cumul. | Ratio F df1 df2 Prob>F
----+---------------------------------+------------------------------------
1 | 0.5125 .356283 0.9874 0.9874 | 0.7340 10.87 6 390 0.0000 e
2 | 0.0672 .004543 0.0126 1.0000 | 0.9955 .44518 2 196 0.6414 e
---------------------------------------------------------------------------
Ho: this and smaller canon. corr. are zero; e = exact F
Standardized canonical discriminant function coefficients
| function1 function2
-------------+----------------------
read | .2728524 .4097932
write | .3310784 -1.183414
math | .5815538 .655658
Group means on canonical variables
prog | function1 function2
-------------+----------------------
general | -.3120021 -.1190423
academic | .5358515 .0196809
vocation | -.8444861 .0658081
predict class, classification
list read write math prog class in 201/203, clean
read write math prog class
201. 40 40 40 . 3
202. 50 50 50 . 1
203. 60 60 60 . 2
Multivariate Course Page
Phil Ender, 29jul07, 30oct05