Path Analysis Background
Some Definitions
Path Analysis Assumptions
Consider the Model

Decomposition of correlations:
Each correlation can be decomposed into one or more of the following four types of effects:
Effects relating variables 1 and 2:

Path Tracing to Reproduce Correlations
B. If variable j sends a path to variable k, which in turn sends a path to variable i, either in two steps or through other intervening variables, simply trace back from i, through k to j. Multiply path coefficients as you go. If more than one distinct compound path exists going back to variable k, treat each separately.
B. A double-headed arrow, representing the correlation between two exogenous variables, can be traversed only once during any compound path. Note that a traverse of a double-headed correlation arrow always results in a change of direction. Tracing a correlation path results in a multiplication of the compound path by the correlation coefficient.
C. All the legitimate compound paths in the path diagram must be traced and values multiplied to determine the magnitude and sign of the compound effects.
With the following variables:
use http://www.philender.com/courses/data/ped788, clear
corr ses iq am gpa
(obs=300)
| ses iq am gpa
---------+------------------------------------
ses | 1.0000
iq | 0.3000 1.0000
am | 0.4100 0.1600 1.0000
gpa | 0.3300 0.5700 0.5000 1.0000
regress iq ses, beta
Source | SS df MS Number of obs = 300
---------+------------------------------ F( 1, 298) = 29.47
Model | 26.9099995 1 26.9099995 Prob > F = 0.0000
Residual | 272.089997 298 .91305368 R-squared = 0.0900
---------+------------------------------ Adj R-squared = 0.0869
Total | 298.999996 299 .999999987 Root MSE = .95554
------------------------------------------------------------------------------
iq | Coef. Std. Err. t P>|t| Beta
---------+--------------------------------------------------------------------
ses | .3 .0552602 5.429 0.000 .3
_cons | -7.10e-09 .055168 0.000 1.000 .
------------------------------------------------------------------------------
display sqrt(1-.09)
.9539392
regress am ses iq, beta
Source | SS df MS Number of obs = 300
---------+------------------------------ F( 2, 297) = 30.33
Model | 50.7117145 2 25.3558572 Prob > F = 0.0000
Residual | 248.288282 297 .835987481 R-squared = 0.1696
---------+------------------------------ Adj R-squared = 0.1640
Total | 298.999996 299 .999999988 Root MSE = .91432
------------------------------------------------------------------------------
am | Coef. Std. Err. t P>|t| Beta
---------+--------------------------------------------------------------------
ses | .3978022 .0554298 7.177 0.000 .3978022
iq | .0406593 .0554298 0.734 0.464 .0406593
_cons | -8.73e-09 .0527885 0.000 1.000 .
------------------------------------------------------------------------------
display sqrt(1-.1696)
.91126286
regress gpa am ses iq, beta
Source | SS df MS Number of obs = 300
---------+------------------------------ F( 3, 296) = 97.28
Model | 148.445584 3 49.4818613 Prob > F = 0.0000
Residual | 150.554414 296 .508629776 R-squared = 0.4965
---------+------------------------------ Adj R-squared = 0.4914
Total | 298.999998 299 .999999992 Root MSE = .71318
------------------------------------------------------------------------------
gpa | Coef. Std. Err. t P>|t| Beta
---------+--------------------------------------------------------------------
am | .4161263 .0452609 9.194 0.000 .4161263
ses | .0091893 .046835 0.196 0.845 .0091893
iq | .500663 .0432751 11.569 0.000 .500663
_cons | 1.23e-09 .0411756 0.000 1.000 .
------------------------------------------------------------------------------
display sqrt(1-.4965)
.70957734
Estimated path coefficients from multiple regression analyses:
P21 = .300
P31 = .398
P32 = .041
P41 = .009
P42 = .501
P43 = .416
Path Analysis: Example 1: Just Identified Model

Compare actual and reproduced correlations: Model 1
To test whether the model fits the data, compare actual correlations to reproduced correlations based on paths in the model. We denote actual correlations by r and reproduced correlations by r*. The actual correlations are in brackets below.
r*12 = P21
DE
= .300 [.300]
r*13 = P31 + P32P21
DE IE
= .398 + (.041)(.3) = .410 [.410]
r*14 = P41 + P42P21 + P43P31 + P43P32P21
DE IE IE IE
= .009+(.501)(.30)+(.416)(.398)+(.416)(.041)(.30) = .330 [.330]
r*23 = P31P21 + P32
S DE
= (.398)(.30) + .041 = .160 [.160]
r*24 = P41P21 + P42 + P43P31P21 + P43P32
S DE S IE
= (.009)(.30)+(.501)+(.416)(.398)(.30)+(.416)(.041) = .570 [.570]
r*34 = P41P31 + P41P21P32 + P42P21P31 + P42P32 + P43
S S S S DE
= (.009)(.398)+(.009)(.30)(.041)+(.501)(.30)(.398)+(.501)(.041)+.416 = .500 [.500]
Note:
This is not a very interesting example because the reproduced and original correlations will be the same -- this model has all possible paths among the variables (i.e., no paths deleted).
Path Analysis: Model 2
regress am ses, beta
Source | SS df MS Number of obs = 300
---------+------------------------------ F( 1, 298) = 60.22
Model | 50.2619001 1 50.2619001 Prob > F = 0.0000
Residual | 248.738096 298 .834691598 R-squared = 0.1681
---------+------------------------------ Adj R-squared = 0.1653
Total | 298.999996 299 .999999988 Root MSE = .91361
------------------------------------------------------------------------------
am | Coef. Std. Err. t P>|t| Beta
---------+--------------------------------------------------------------------
ses | .41 .0528357 7.760 0.000 .41
_cons | -9.02e-09 .0527476 0.000 1.000 .
------------------------------------------------------------------------------
display sqrt(1-.1681)
.91208552
regress gpa iq am, beta
Source | SS df MS Number of obs = 300
---------+------------------------------ F( 2, 297) = 146.38
Model | 148.426003 2 74.2130017 Prob > F = 0.0000
Residual | 150.573994 297 .506983146 R-squared = 0.4964
---------+------------------------------ Adj R-squared = 0.4930
Total | 298.999998 299 .999999992 Root MSE = .71203
------------------------------------------------------------------------------
gpa | Coef. Std. Err. t P>|t| Beta
---------+--------------------------------------------------------------------
iq | .5028736 .041715 12.055 0.000 .5028736
am | .4195402 .041715 10.057 0.000 .4195402
_cons | 1.16e-09 .0411089 0.000 1.000 .
------------------------------------------------------------------------------
display sqrt(1-.4964)
.7096478

Reproduced correlations: Model 2
r*12 = r12 U = .30 [.30] r*13 = P31 DE = .410 [.410] r*14 = P42r12 + P43P31 U IE = (.503)(.30)+(.420)(.410) = .323 [.330] r*23 = P31r12 U = (.410)(.30) = .123 [.160] r*24 = P42 + P43P31r12 DE U = (.503)+(.420)(.410)(.30) = .555 [.570] r*34 = P42P31r12 + P43 U DE = (.503)(.410)(.30)+(.420) = .482 [.50]Terman Data Set
Variables:
1 - Parents education
2 - Father's occupation
3 - Parents attitude
4 - IQ
5 - Achievement
6 - Education level
7 - Occupation
8 - Income
Terman Model 1

Terman Model 2

Reproduced Correlations: Terman Model 2
r*14 = P41 + P43r13 DE U r*15 = P54P41 + P54P43r13 IE U r*16 = P61 DE r*17 = P75P54P41 + P75P54P43r13 + P76P61 IE U IE r*18 = P87P75P54P41 + P87P75P54P43r13 + P87P76P61 IE U IE r*34 = P41r13 + P43 U DE r*35 = P54P41r13 + P54P43 U IE r*36 = P61r13 U r*37 = P75P54P41r13 + P75P54P43 + P76P61r13 U IE U r*38 = P87P75P54P41r13 + P87P75P54P43 + P87P76P61r13 U IE U r*45 = P54 DE r*46 = P61P41 + P61P43r13 S U r*47 = P75P54 + P76P61P41 + P76P61P43r13 IE S U r*48 = P87P75P54 + P87P76P61P41 + P87P76P61P43r13 IE S U r*56 = P61P54P41 + P61P54P43r13 S U r*57 = P75 + P76P61P54P41 + P76P61P54P43r13 DE S U r*58 = P87P75 + P87P76P61P54P41 + P87P76P61P54P43r13 IE S U r*67 = P75P61P54P41 + P75P61P54P43r13 + P76 S U DE r*68 = P87P75P61P54P41 + P87P75P61P54P43r13 + P87P76 S U IE r*78 = P87 DE
Reproduced and actual correlations: Terman Model 2
r*13 = .03 [ .03] r*14 = .16 [ .16] r*15 = -.016 [ .07] possible mismatch r*16 = .31 [ .31] r*17 = .10 [ .08] r*18 = .04 [ .06] r*34 = .08 [ .08] r*35 = -.008 [.003] r*36 = .01 [ .14] mismatch r*37 = .003 [ .09] r*38 = .001 [ .08] r*45 = -.10 [-.10] r*46 = .05 [ .10] r*47 = .02 [ .08] possible mismatch r*48 = .008 [ .09] possible mismatch r*56 = -.005 [ .06] possible mismatch r*57 = -.001 [ .02] r*58 = .04 [-.01] r*67 = .32 [ .32] r*68 = .13 [ .20] possible mismatch r*78 = .41 [ .41]
Terman Model 3

EstimatedEquations: Terman Model 3
z'6 = P61 z1
z'7 = P76 z6
z'8 = P87 z7
Reproduced and actual correlations: Terman Model 3
r*16 = P61 DE = .31 [.31] r*17 = P76 p61 IE = (.32)(.31) = .10 [.08] r*18 = P87 P76 P61 IE = (.41)(.32)(.31) = .04 [.06] r*67 = P76 DE = .32 [.32] r*68 = P87 P76 IE = (.41)(.32) = .13 [.20] (possible mismatch) r*78 = P87 DE = .41 [.41]Example Using Stata
We will use the hsbdemo dataset. For purposes of this example ses will be treated as continuous even thought it is categorical. In this example, ses and female will be exogenous while read and write will be endogenous. Here is our just identified model.
use http://www.philender.com/courses/data/hsbdemo, clear
corr ses female read write
(obs=200)
| ses female read write
---------+------------------------------------
ses | 1.0000
female | -0.1250 1.0000
read | 0.2933 -0.0531 1.0000
write | 0.2075 0.2565 0.5968 1.0000
regress read ses female, beta
Source | SS df MS Number of obs = 200
---------+------------------------------ F( 2, 197) = 9.30
Model | 1805.58553 2 902.792765 Prob > F = 0.0001
Residual | 19113.8345 197 97.0245405 R-squared = 0.0863
---------+------------------------------ Adj R-squared = 0.0770
Total | 20919.42 199 105.122714 Root MSE = 9.8501
------------------------------------------------------------------------------
read | Coef. Std. Err. t P>|t| Beta
---------+--------------------------------------------------------------------
ses | 4.122699 .9716753 4.243 0.000 .2912371
female | -.3425006 1.40975 -0.243 0.808 -.0166765
_cons | 43.94452 2.333705 18.83 0.000 .
------------------------------------------------------------------------------
display sqrt(1 - .08632)
.9558661
regress write read ses female, beta
Source | SS df MS Number of obs = 200
---------+------------------------------ F( 3, 196) = 52.17
Model | 7937.69723 3 2645.89908 Prob > F = 0.0000
Residual | 9941.17777 196 50.7202947 R-squared = 0.4440
---------+------------------------------ Adj R-squared = 0.4355
Total | 17878.875 199 89.843593 Root MSE = 7.1218
------------------------------------------------------------------------------
write | Coef. Std. Err. t P>|t| Beta
---------+--------------------------------------------------------------------
read | .5470064 .051513 10.619 0.000 .591694
ses | .9296443 .7339381 1.267 0.207 .0710373
female | 5.634919 1.01943 5.528 0.000 .2967813
_cons | 19.2234 2.823373 6.81 0.000 .
------------------------------------------------------------------------------
display sqrt(1 - .444)
.74565408
Let's say that you are kind of lazy and don't want to run three separate regressions and compute the error at each stage. Here is a convenience command that you can use if you have Stata 7.
A Shortcut
/* user written progran -- findit pathreg */
pathreg (read ses female)(write ses read female)
------------------------------------------------------------------------------
read | Coef. Std. Err. t P>|t| Beta
-------------+----------------------------------------------------------------
ses | 4.122699 .9716753 4.24 0.000 .2912371
female | -.3425006 1.40975 -0.24 0.808 -.0166765
_cons | 43.94452 2.333705 18.83 0.000 .
------------------------------------------------------------------------------
n = 200 R2 = 0.0863 sqrt(1 - R2) = 0.9559
------------------------------------------------------------------------------
write | Coef. Std. Err. t P>|t| Beta
-------------+----------------------------------------------------------------
ses | .9296443 .7339381 1.27 0.207 .0710373
read | .5470064 .051513 10.62 0.000 .591694
female | 5.634919 1.01943 5.53 0.000 .2967813
_cons | 19.2234 2.823373 6.81 0.000 .
------------------------------------------------------------------------------
n = 200 R2 = 0.4440 sqrt(1 - R2) = 0.7457

Reproducing Correlations
r*12 = r12
U
= -.125 [-.125]
r*13 = P31 + P32r12
DE U
= .29 + (-.02)(-.125) = .29 [.29]
r*14 = P41 + P42r12 + P43P31
DE U IE
= .07 + (.3)(-.125) + (.59)(.29) = .20 [.21]
r*23 = P32 + P31r12
DE U
= -.02 + (.29)(-.125) = -.06 [-.05]
r*24 = P42 + P43P32 + P43P31r12 + P41r12
DE IE U U
= .3 +(.59)(-.02)+(.59)(.29)(-.125)+(.07)(-.125) = .26 [.26]
r*34 = P43 + P42P32 + P41P31 + P42r12P31 + P41r12P32
DE IE IE S S
= .59 +(.3)(-.02)+(.07)(.29)+(.3)(-.125)(.29)+(.07)(-.125)(-.02) = .59 [.6]
Overidentified Model
Now let's look at an overidentified model.
pathreg (read ses)(write read female)
------------------------------------------------------------------------------
read | Coef. Std. Err. t P>|t| Beta
-------------+----------------------------------------------------------------
ses | 4.15221 .9617596 4.32 0.000 .2933218
_cons | 43.69721 2.095003 20.86 0.000 .
------------------------------------------------------------------------------
n = 200 R2 = 0.0860 sqrt(1 - R2) = 0.9560
------------------------------------------------------------------------------
write | Coef. Std. Err. t P>|t| Beta
-------------+----------------------------------------------------------------
read | .5658869 .0493849 11.46 0.000 .6121169
female | 5.486894 1.014261 5.41 0.000 .2889851
_cons | 20.22837 2.713756 7.45 0.000 .
------------------------------------------------------------------------------
n = 200 R2 = 0.4394 sqrt(1 - R2) = 0.7487

Reproducing Correlations
r*12 = r12
U
= -.125 [-.125]
r*13 = P31
DE
= .29 = .29 [.29]
r*14 = P42r12 + P43P31
U IE
= (.29)(-.125) + (.61)(.29) = .14 [.21]
r*23 = P31r12
U
= (.29)(-.125) = -.04 [-.05]
r*24 = P42 + P43P31r12
DE U
= .29 +(.61)(.29)(-.125) = .28 [.26]
r*34 = P43 + P42r12P31
DE S
= .61 + (.29)(-.125)(.29) = .6 [.6]