In this unit we will explore some properties of principle components.
We will start with the variables read and write which correlate about .6. There are two eigenvalues which are the variances of the two principle components. The larger variance is about four times the smaller.
use http://www.gseis.ucla.edu/courses/data/hsb2
corr read write
corr read write
(obs=200)
| read write
-------------+------------------
read | 1.0000
write | 0.5968 1.0000
scatter read write
pca read write
(obs=200)
(principal components; 2 components retained)
Component Eigenvalue Difference Proportion Cumulative
------------------------------------------------------------------
1 1.59678 1.19355 0.7984 0.7984
2 0.40322 . 0.2016 1.0000
Eigenvectors
Variable | 1 2
-------------+---------------------
read | 0.70711 0.70711
write | 0.70711 -0.70711
predict f1 f2
(based on unrotated principal components)
Scoring Coefficients
Variable | 1 2
-------------+---------------------
read | 0.70711 0.70711
write | 0.70711 -0.70711
tabstat f1 f2, stat(mean sd var) col(stat)
variable | mean sd variance
-------------+------------------------------
f1 | 3.56e-09 1.263636 1.596776
f2 | 2.44e-09 .6349988 .4032235
--------------------------------------------
corr f1 f2
(obs=200)
| f1 f2
-------------+------------------
f1 | 1.0000
f2 | 0.0000 1.0000
scatter f1 f2, xline(0) yline(0)

Next, we will create a random normal variable rnorm. The correlation is close to zero
and the two eigenvalues are very nearly equal.
generate rnorm = invnorm(uniform())
corr read rnorm
(obs=200)
| read rnorm
-------------+------------------
read | 1.0000
rnorm | -0.0539 1.0000
scatter read rnorm
pca read rnorm
(obs=200)
(principal components; 2 components retained)
Component Eigenvalue Difference Proportion Cumulative
------------------------------------------------------------------
1 1.05392 0.10785 0.5270 0.5270
2 0.94608 . 0.4730 1.0000
Eigenvectors
Variable | 1 2
-------------+---------------------
read | -0.70711 0.70711
rnorm | 0.70711 0.70711
Now we will create a variable that is highly correlated with read and call it read2.
The correlation is about .92 and almost all of the variance falls in the first principle
component.
generate read2 = read + 15*uniform()
corr read read2
(obs=200)
| read read2
-------------+------------------
read | 1.0000
read2 | 0.9231 1.0000
scatter read read2
pca read read2
(obs=200)
(principal components; 2 components retained)
Component Eigenvalue Difference Proportion Cumulative
------------------------------------------------------------------
1 1.92305 1.84611 0.9615 0.9615
2 0.07695 . 0.0385 1.0000
Eigenvectors
Variable | 1 2
-------------+---------------------
read | 0.70711 0.70711
read2 | 0.70711 -0.70711
Finally, we will do two linear transformations of our original variables read and write.
The first transformation will create deviation scores amd the second transformation will create
standard score. Note that the eigenvalues and eigenvectors are the same in each case.
summarize read
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
read | 200 52.23 10.25294 28 76
generate dread = read-r(mean)
summarize write
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
write | 200 52.775 9.478586 31 67
generate dwrite = write-r(mean)
egen zread=std(read)
egen zwrite=std(write)
pca read write
(obs=200)
(principal components; 2 components retained)
Component Eigenvalue Difference Proportion Cumulative
------------------------------------------------------------------
1 1.59678 1.19355 0.7984 0.7984
2 0.40322 . 0.2016 1.0000
Eigenvectors
Variable | 1 2
-------------+---------------------
read | 0.70711 0.70711
write | 0.70711 -0.70711
pca dread dwrite
(obs=200)
(principal components; 2 components retained)
Component Eigenvalue Difference Proportion Cumulative
------------------------------------------------------------------
1 1.59678 1.19355 0.7984 0.7984
2 0.40322 . 0.2016 1.0000
Eigenvectors
Variable | 1 2
-------------+---------------------
dread | 0.70711 0.70711
dwrite | 0.70711 -0.70711
pca zread zwrite
(obs=200)
(principal components; 2 components retained)
Component Eigenvalue Difference Proportion Cumulative
------------------------------------------------------------------
1 1.59678 1.19355 0.7984 0.7984
2 0.40322 . 0.2016 1.0000
Eigenvectors
Variable | 1 2
-------------+---------------------
zread | 0.70711 0.70711
zwrite | 0.70711 -0.70711
Multivariate Course Page
Phil Ender, 25may02; 29jan98