## Multivariate Analysis Canonical Correlation Analysis

According to Webster

ca•non•i•cal   | kə'nänikəl |
reduced to the simplest or clearest schema possible.

From the Hacker's Dictionary
The usual or standard state or manner of something. This word has a somewhat more technical meaning in mathematics. Two formulas such as 9 + x and x + 9 are said to be equivalent because they mean the same thing, but the second one is in 'canonical form' because it is written in the usual way, with the highest power of x first.

In the beginning...

Consider two sets of variables: Construct

Construct the linear combinations: Such that rzw is a maximum.

Let The XY Matrix

Consider a matrix XY made up of p Y's and q X's. Partitioning the Covariance Matrix

Let S be the XY covariance matrix, thus, And Thus, the sum of squared deviation scores can be obtained without transforming the raw scores. Criteria

Choose u and v such that Compute  Now let Let μi2 = eigenvalues of A

Let u = eigenvectors of A

Next let A & B will have the same eigenvalues.

Let v = eigenvectors of B.

Computing v

Let Canonical Correlation

Eigenvalues of A are canonical correlations squared, therefore Computational Notes

Matrix A is not symmetric so we will need to go through some additional steps in order to get the eigenvalues and eigenvectors using the symeigen command.

```1)  C = Syx*Sxx-1*Sxy
2)  F = cholesky(Syy-1)
3)  D = F'*C*F
4)  symeigen W L = D    /* L has eigenvalues  of A */
5)  U = F*W             /* U has eigenvectors of A */
```

Remember the elements of L are μi2

Different Eigenvalues

Each canonical correlation has an eigenvalue related to Wilks' Lambda. Tests of Significance

Wilks' Lambdas Compute m = n -3/2 - (p+q)/2 once.

The following are repeated with one being subtracted from p and q until either is equal to one. Thus,

```First time     p=3    q=5
Second time    p=2    q=4
Third time     p=1    q=3
``` df1 = pq

df2 = ms - pq/2 + 1

Rao's F Approximation with df1 and df2 degrees of freedom.

Canonical Redundancy Coefficients

A measure of association between two sets of variables.

This measure is asymmetric:

R2x.y is the redundancy of set X given set Y

R2y.x is the redundancy of set Y given set X.

Canonical Redundancy Note

Rc2 is an estimate of the shared variance of two linear combinations of variables and not of the variance of the variables themselves. Thus, even when Rc2 is high, the redundancy of Y, X, or both may be very low.

Although it is always possible to compute both R2x.y and R2y.x, it is not always the case that both redundancy measures are meaningful. For example, when the Y variables are true dependent variables, R2y.x is useful while R2x.y does not make sense.

Redundancy

Each is a weighted sum of the squared canonical correlations, proportional to the aggregate variance of the variables in the set accounted for by successive canonical variates of that set. What Canonical Correlation Analysis Does...

Best

Questions concerning the number and nature of mutually independent relations (dimensions) between two sets of variables.

Mediocre

Questions concerning the degree of overlap or redundancy between two sets of variables.

Not Very Well

Questions concerning the similarity between two within-set correlation or covariance matrices.

Stata Example

Stata has completely rewritten their canonical correlation procedure in Stata 9.

```use http://www.philender.com/courses/data/timm, clear

canon (apt ppvt rpmt) (n s ns na ss), test(1 2 3)

Linear combinations for canonical correlations         Number of obs =      37
------------------------------------------------------------------------------
|      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
u1           |
apt |   .0032264   .0082904     0.39   0.699    -.0135873      .02004
ppvt |   .0762248   .0152914     4.98   0.000     .0452124    .1072372
rpmt |   .0141323   .0588196     0.24   0.811    -.1051594    .1334239
-------------+----------------------------------------------------------------
v1           |
n |  -.0071509    .082705    -0.09   0.932    -.1748843    .1605826
s |  -.0756585   .0465969    -1.62   0.113    -.1701612    .0188443
ns |  -.0353218   .0450604    -0.78   0.438    -.1267084    .0560649
na |   .1579155    .051318     3.08   0.004     .0538377    .2619933
ss |   .0435563   .0544885     0.80   0.429    -.0669515     .154064
-------------+----------------------------------------------------------------
u2           |
apt |   -.033176   .0151245    -2.19   0.035      -.06385    -.002502
ppvt |  -.0016715   .0278969    -0.06   0.953     -.058249     .054906
rpmt |   .2756473   .1073075     2.57   0.014     .0580176     .493277
-------------+----------------------------------------------------------------
v2           |
n |   .1599911   .1508828     1.06   0.296    -.1460134    .4659955
s |    .042124   .0850089     0.50   0.623    -.1302821    .2145302
ns |   .2381605   .0822059     2.90   0.006     .0714393    .4048817
na |  -.0594188    .093622    -0.63   0.530    -.2492931    .1304555
ss |  -.1823911    .099406    -1.83   0.075    -.3839959    .0192137
-------------+----------------------------------------------------------------
u3           |
apt |   .0358057    .030764     1.16   0.252    -.0265865    .0981979
ppvt |  -.0482553   .0567435    -0.85   0.401    -.1633363    .0668258
rpmt |   .2104353   .2182681     0.96   0.341     -.232233    .6531035
-------------+----------------------------------------------------------------
v3           |
n |   .0992871   .3069021     0.32   0.748    -.5231393    .7217135
s |   .1746239   .1729119     1.01   0.319    -.1760577    .5253054
ns |  -.0100806   .1672103    -0.06   0.952    -.3491988    .3290376
na |  -.2290303   .1904313    -1.20   0.237    -.6152428    .1571822
ss |   .2019493   .2021962     1.00   0.325    -.2081236    .6120222
------------------------------------------------------------------------------
(Standard errors estimated conditionally)
Canonical correlations:
0.7165  0.4906  0.2668

----------------------------------------------------------------------------
Tests of significance of all canonical correlations

Statistic      df1      df2            F     Prob>F
Wilks' lambda     .343169       15  80.4576       2.5381     0.0039 a
Pillai's trace     .825289       15       93       2.3529     0.0066 a
Lawley-Hotelling trace     1.44876       15       83       2.6722     0.0023 a
Roy's largest root     1.05512        5       31       6.5417     0.0003 u
----------------------------------------------------------------------------
Test of significance of canonical correlations 1-3

Statistic      df1      df2            F     Prob>F
Wilks' lambda     .343169       15  80.4576       2.5381     0.0039 a
----------------------------------------------------------------------------
Test of significance of canonical correlations 2-3

Statistic      df1      df2            F     Prob>F
Wilks' lambda     .705252        8       60       1.4308     0.2025 e
----------------------------------------------------------------------------
Test of significance of canonical correlation 3

Statistic      df1      df2            F     Prob>F
Wilks' lambda      .92883        3       31       0.7918     0.5078 e
----------------------------------------------------------------------------
e = exact, a = approximate, u = upper bound on F

canon, stdcoef

Canonical correlation analysis                         Number of obs =      37

Standardized coefficients for the first variable set

|        1         2         3
-------------+------------------------------
apt |   0.0713   -0.7332    0.7913
ppvt |   0.9548   -0.0209   -0.6044
rpmt |   0.0437    0.8531    0.6513
--------------------------------------------

Standardized coefficients for the second variable set

|        1         2         3
-------------+------------------------------
n |  -0.0211    0.4719    0.2928
s |  -0.3835    0.2135    0.8850
ns |  -0.2244    1.5132   -0.0640
na |   1.1438   -0.4304   -1.6589
ss |   0.2774   -1.1618    1.2864
--------------------------------------------

Canonical correlations:
0.7165  0.4906  0.2668

estat correlations

Correlations for variable list 1

|      apt      ppvt      rpmt
-------------+------------------------------
apt |   1.0000
ppvt |   0.3703    1.0000
rpmt |   0.2114    0.3548    1.0000
--------------------------------------------

Correlations for variable list 2

|        n         s        ns        na        ss
-------------+--------------------------------------------------
n |   1.0000
s |   0.4007    1.0000
ns |   0.5370    0.3523    1.0000
na |   0.6481    0.6478    0.7136    1.0000
ss |   0.6704    0.4252    0.7695    0.7951    1.0000
----------------------------------------------------------------

Correlations between variable lists 1 and 2

|      apt      ppvt      rpmt
-------------+------------------------------
n |   0.1860    0.4444    0.3504
s |   0.1609    0.2682    0.2386
ns |   0.0685    0.4692    0.4388
na |   0.2617    0.6720    0.3390
ss |   0.3341    0.5876    0.3404
--------------------------------------------

|        1         2         3
-------------+------------------------------
apt |   0.4341   -0.5606    0.7052
ppvt |   0.9967    0.0102   -0.0803
rpmt |   0.3976    0.6906    0.6041
--------------------------------------------

|        1         2         3
-------------+------------------------------
n |   0.6320    0.3122    0.4004
s |   0.3879    0.1630    0.4521
ns |   0.6588    0.6406    0.2112
na |   0.9422    0.1697    0.0814
ss |   0.8371    0.0675    0.4906
--------------------------------------------

Correlation between variable list 1 and canonical variates from list 2

|        1         2         3
-------------+------------------------------
apt |   0.3111   -0.2750    0.1881
ppvt |   0.7142    0.0050   -0.0214
rpmt |   0.2849    0.3388    0.1612
--------------------------------------------

Correlation between variable list 2 and canonical variates from list 1

|        1         2         3
-------------+------------------------------
n |   0.4529    0.1532    0.1068
s |   0.2780    0.0800    0.1206
ns |   0.4721    0.3143    0.0563
na |   0.6751    0.0833    0.0217
ss |   0.5998    0.0331    0.1309
--------------------------------------------

canred 1  /* findit canred */

Canonical redundancy analysis for canonical correlation 1

Canonical correlation coefficient          0.7165
Squared canonical correlation coefficient  0.5134

own    opposite
Proportion of standardized variance  variate   variate
of u variables with ...    0.4467    0.2293
of v variables with ...    0.5145    0.2641

canred 2  /* findit canred */

Canonical redundancy analysis for canonical correlation 2

Canonical correlation coefficient          0.4906
Squared canonical correlation coefficient  0.2407

own    opposite
Proportion of standardized variance  variate   variate
of u variables with ...    0.2638    0.0635
of v variables with ...    0.1136    0.0273```

Multivariate Course Page

Phil Ender, 2may05, 29Jan98