Multivariate multiple regression is a logical extension of the multiple regression concept to allow for multiple response (dependent) variables. Multivariate regression estimates the same coefficients and standard errors as one would obtain using separate OLS regressions. In addition, multivariate regression, being a joint estimator, also estimates the between-equation covariances. This means that it is possible to test coefficient across equations.
The matrix formula for multivariate regression is virtually identical to the OLS formula with the only change being that Y is a matrix response variables and not a vector.
Stata Example
use http://www.gseis.ucla.edu/courses/data/hsb2
xi: regress read female i.prog
i.prog _Iprog_1-3 (naturally coded; _Iprog_1 omitted)
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 3, 196) = 14.45
Model | 3789.28412 3 1263.09471 Prob > F = 0.0000
Residual | 17130.1359 196 87.3986524 R-squared = 0.1811
-------------+------------------------------ Adj R-squared = 0.1686
Total | 20919.42 199 105.122714 Root MSE = 9.3487
------------------------------------------------------------------------------
read | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | -1.208582 1.327672 -0.91 0.364 -3.826939 1.409774
_Iprog_2 | 6.42937 1.665893 3.86 0.000 3.143993 9.714746
_Iprog_3 | -3.547498 1.921001 -1.85 0.066 -7.335983 .2409862
_cons | 50.40013 1.563197 32.24 0.000 47.31729 53.48298
------------------------------------------------------------------------------
xi: regress write female i.prog
i.prog _Iprog_1-3 (naturally coded; _Iprog_1 omitted)
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 3, 196) = 20.72
Model | 4304.40272 3 1434.80091 Prob > F = 0.0000
Residual | 13574.4723 196 69.2575116 R-squared = 0.2408
-------------+------------------------------ Adj R-squared = 0.2291
Total | 17878.875 199 89.843593 Root MSE = 8.3221
------------------------------------------------------------------------------
write | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | 4.771211 1.181876 4.04 0.000 2.440385 7.102037
_Iprog_2 | 4.832929 1.482956 3.26 0.001 1.908331 7.757528
_Iprog_3 | -4.605141 1.710049 -2.69 0.008 -7.9776 -1.232683
_cons | 48.78869 1.391537 35.06 0.000 46.04438 51.533
------------------------------------------------------------------------------
xi: regress math female i.prog
i.prog _Iprog_1-3 (naturally coded; _Iprog_1 omitted)
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 3, 196) = 19.56
Model | 4024.61221 3 1341.5374 Prob > F = 0.0000
Residual | 13441.1828 196 68.5774632 R-squared = 0.2304
-------------+------------------------------ Adj R-squared = 0.2186
Total | 17465.795 199 87.7678141 Root MSE = 8.2812
------------------------------------------------------------------------------
math | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
female | -.6737673 1.176059 -0.57 0.567 -2.993122 1.645587
_Iprog_2 | 6.723945 1.475657 4.56 0.000 3.81374 9.634149
_Iprog_3 | -3.59773 1.701633 -2.11 0.036 -6.953591 -.2418702
_cons | 50.38156 1.384689 36.38 0.000 47.65076 53.11237
------------------------------------------------------------------------------
xi: mvreg read write math = female i.prog
i.prog _Iprog_1-3 (naturally coded; _Iprog_1 omitted)
Equation Obs Parms RMSE "R-sq" F P
----------------------------------------------------------------------
read 200 4 9.348725 0.1811 14.45211 0.0000
write 200 4 8.32211 0.2408 20.7169 0.0000
math 200 4 8.281151 0.2304 19.56237 0.0000
------------------------------------------------------------------------------
| Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
read |
female | -1.208582 1.327672 -0.91 0.364 -3.826939 1.409774
_Iprog_2 | 6.42937 1.665893 3.86 0.000 3.143993 9.714746
_Iprog_3 | -3.547498 1.921001 -1.85 0.066 -7.335983 .2409862
_cons | 50.40013 1.563197 32.24 0.000 47.31729 53.48298
-------------+----------------------------------------------------------------
write |
female | 4.771211 1.181876 4.04 0.000 2.440385 7.102037
_Iprog_2 | 4.832929 1.482956 3.26 0.001 1.908331 7.757528
_Iprog_3 | -4.605141 1.710049 -2.69 0.008 -7.9776 -1.232683
_cons | 48.78869 1.391537 35.06 0.000 46.04438 51.533
-------------+----------------------------------------------------------------
math |
female | -.6737673 1.176059 -0.57 0.567 -2.993122 1.645587
_Iprog_2 | 6.723945 1.475657 4.56 0.000 3.81374 9.634149
_Iprog_3 | -3.59773 1.701633 -2.11 0.036 -6.953591 -.2418702
_cons | 50.38156 1.384689 36.38 0.000 47.65076 53.11237
------------------------------------------------------------------------------
test female
( 1) [read]female = 0.0
( 2) [write]female = 0.0
( 3) [math]female = 0.0
F( 3, 196) = 11.63
Prob > F = 0.0000
test _Iprog_2 _Iprog_3
( 1) [read]_Iprog_2 = 0.0
( 2) [write]_Iprog_2 = 0.0
( 3) [math]_Iprog_2 = 0.0
( 4) [read]_Iprog_3 = 0.0
( 5) [write]_Iprog_3 = 0.0
( 6) [math]_Iprog_3 = 0.0
F( 6, 196) = 11.83
Prob > F = 0.0000
The same model run using the manova command to get the multivariate tests.
manova read write math = female prog
Number of obs = 200
W = Wilks' lambda L = Lawley-Hotelling trace
P = Pillai's trace R = Roy's largest root
Source | Statistic df F(df1, df2) = F Prob>F
-----------+--------------------------------------------------
Model | W 0.6231 3 9.0 472.3 11.26 0.0000 a
| P 0.4170 9.0 588.0 10.55 0.0000 a
| L 0.5406 9.0 578.0 11.57 0.0000 a
| R 0.3642 3.0 196.0 23.79 0.0000 u
|--------------------------------------------------
Residual | 196
-----------+--------------------------------------------------
female | W 0.8489 1 3.0 194.0 11.51 0.0000 e
| P 0.1511 3.0 194.0 11.51 0.0000 e
| L 0.1780 3.0 194.0 11.51 0.0000 e
| R 0.1780 3.0 194.0 11.51 0.0000 e
|--------------------------------------------------
prog | W 0.7329 2 6.0 388.0 10.87 0.0000 e
| P 0.2686 6.0 390.0 10.08 0.0000 a
| L 0.3623 6.0 386.0 11.65 0.0000 a
| R 0.3564 3.0 195.0 23.16 0.0000 u
|--------------------------------------------------
Residual | 196
-----------+--------------------------------------------------
Total | 199
--------------------------------------------------------------
e = exact, a = approximate, u = upper bound on F
Example 2Next, we will perform an mvreg which is equivalent to a factorial multivariate analysis of variance. Using xi3 will ensure that the the main effects are estimated correctly.
xi3: mvreg read write math = e.female*e.prog
e.female _Ifemale_0-1 (naturally coded; _Ifemale_0 omitted)
e.prog _Iprog_1-3 (naturally coded; _Iprog_1 omitted)
Equation Obs Parms RMSE "R-sq" F P
----------------------------------------------------------------------
read 200 6 9.301994 0.1976 9.553455 0.0000
write 200 6 8.263856 0.2590 13.56062 0.0000
math 200 6 8.32305 0.2306 11.62587 0.0000
------------------------------------------------------------------------------
| Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
read |
_Ifemale_1 | -.8645308 .7076023 -1.22 0.223 -2.260112 .5310502
_Iprog_2 | 5.410261 .8822916 6.13 0.000 3.670146 7.150376
_Iprog_3 | -4.603099 1.039838 -4.43 0.000 -6.653939 -2.55226
_Ife1Xpr2 | .7607157 .8822916 0.86 0.390 -.9793993 2.500831
_Ife1Xpr3 | 1.371777 1.039838 1.32 0.189 -.6790623 3.422617
_cons | 50.76252 .7076023 71.74 0.000 49.36694 52.1581
-------------+----------------------------------------------------------------
write |
_Ifemale_1 | 2.702201 .6286312 4.30 0.000 1.462372 3.94203
_Iprog_2 | 4.870758 .7838244 6.21 0.000 3.324847 6.41667
_Iprog_3 | -4.836331 .9237884 -5.24 0.000 -6.658289 -3.014373
_Ife1Xpr2 | -1.217608 .7838244 -1.55 0.122 -2.763519 .3283035
_Ife1Xpr3 | 1.866237 .9237884 2.02 0.045 .0442793 3.688195
_cons | 51.23086 .6286312 81.50 0.000 49.99103 52.47068
-------------+----------------------------------------------------------------
math |
_Ifemale_1 | -.323731 .633134 -0.51 0.610 -1.572441 .9249787
_Iprog_2 | 5.684064 .7894389 7.20 0.000 4.127079 7.241049
_Iprog_3 | -4.63014 .9304055 -4.98 0.000 -6.465149 -2.795132
_Ife1Xpr2 | -.0332022 .7894389 -0.04 0.966 -1.590187 1.523783
_Ife1Xpr3 | -.1327907 .9304055 -0.14 0.887 -1.967799 1.702218
_cons | 51.08666 .633134 80.69 0.000 49.83795 52.33537
------------------------------------------------------------------------------
/* using manova (multivariate analysis of variance) */
manova read write math = female prog female*prog
Number of obs = 200
W = Wilks' lambda L = Lawley-Hotelling trace
P = Pillai's trace R = Roy's largest root
Source | Statistic df F(df1, df2) = F Prob>F
------------+--------------------------------------------------
Model | W 0.5808 5 15.0 530.4 7.69 0.0000 a
| P 0.4796 15.0 582.0 7.38 0.0000 a
| L 0.6206 15.0 572.0 7.89 0.0000 a
| R 0.3762 5.0 194.0 14.59 0.0000 u
|--------------------------------------------------
Residual | 194
------------+--------------------------------------------------
female | W 0.8238 1 3.0 192.0 13.69 0.0000 e
| P 0.1762 3.0 192.0 13.69 0.0000 e
| L 0.2139 3.0 192.0 13.69 0.0000 e
| R 0.2139 3.0 192.0 13.69 0.0000 e
|--------------------------------------------------
prog | W 0.7305 2 6.0 384.0 10.88 0.0000 e
| P 0.2712 6.0 386.0 10.09 0.0000 a
| L 0.3666 6.0 382.0 11.67 0.0000 a
| R 0.3602 3.0 193.0 23.17 0.0000 u
|--------------------------------------------------
female*prog | W 0.9321 2 6.0 384.0 2.29 0.0347 e
| P 0.0691 6.0 386.0 2.30 0.0338 a
| L 0.0716 6.0 382.0 2.28 0.0356 a
| R 0.0381 3.0 193.0 2.45 0.0646 u
|--------------------------------------------------
Residual | 194
------------+--------------------------------------------------
Total | 199
---------------------------------------------------------------
e = exact, a = approximate, u = upper bound on FExample 3
Here is another example of multivariate regression. By including the corr option we can see how highly the residuals of the two equation are correlated. We also get the Breusch-Pagan test of independence.
mvreg math science = read write, corr
Equation Obs Parms RMSE "R-sq" F P
----------------------------------------------------------------------
math 200 3 6.555315 0.5153 104.7222 0.0000
science 200 3 7.340989 0.4558 82.49331 0.0000
------------------------------------------------------------------------------
| Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
math |
read | .4169486 .0564838 7.38 0.000 .3055581 .5283391
write | .3411219 .0610982 5.58 0.000 .2206314 .4616124
_cons | 12.86507 2.82162 4.56 0.000 7.30061 18.42952
-------------+----------------------------------------------------------------
science |
read | .4345423 .0632535 6.87 0.000 .3098013 .5592832
write | .3153468 .068421 4.61 0.000 .1804151 .4502784
_cons | 12.51143 3.159799 3.96 0.000 6.280058 18.7428
------------------------------------------------------------------------------
Correlation matrix of residuals:
math science
math 1.0000
science 0.2849 1.0000
Breusch-Pagan test of independence: chi2(1) = 16.230, Pr = 0.0001
The command test read test whether the coefficient for read is zero in
both equations. A more interesting test might be to see whether the coefficient
for read is the same in each equation, that is, is the effect of read the same
for math as it is for science.
test read
( 1) [math]read = 0
( 2) [science]read = 0
F( 2, 197) = 39.61
Prob > F = 0.0000
test [math=science]: read
( 1) [math]read - [science]read = 0
F( 1, 197) = 0.06
Prob > F = 0.8067
Seemingly Unrelated RegressionSeemingly unrelated regressions allows us to estimate multiple models simultaneously while accounting for the correlated errors due to the fact that the models involve the same observations. This leads to efficient estimates of the coefficients and standard errors. By including the corr option with sureg we can also obtain an estimate of the correlation between the errors of the two models. Note that both the estimates of the coefficients and their standard errors are different from the OLS model estimates shown above. The bottom of the sureg output provides a Breusch-Pagan test of whether the residuals from the two equations are independent (in this case, residuals were not independent, chi-square = 6.290, Pr = 0.0121).
use http://www.gseis.ucla.edu/courses/data/hsb2
xi: regress write read female i.prog
i.prog _Iprog_1-3 (naturally coded; _Iprog_1 omitted)
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 4, 195) = 43.58
Model | 8438.77721 4 2109.6943 Prob > F = 0.0000
Residual | 9440.09779 195 48.4107579 R-squared = 0.4720
-------------+------------------------------ Adj R-squared = 0.4612
Total | 17878.875 199 89.843593 Root MSE = 6.9578
------------------------------------------------------------------------------
write | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
read | .4912748 .0531607 9.24 0.000 .3864311 .5961185
female | 5.364957 .9902058 5.42 0.000 3.412069 7.317845
_Iprog_2 | 1.674342 1.286089 1.30 0.194 -.8620872 4.210771
_Iprog_3 | -2.862345 1.442088 -1.98 0.049 -5.706437 -.0182527
_cons | 24.02837 2.920993 8.23 0.000 18.26758 29.78916
------------------------------------------------------------------------------
test read
( 1) read = 0.0
F( 1, 195) = 85.40
Prob > F = 0.0000
test _Iprog_2 _Iprog_3
( 1) _Iprog_2 = 0.0
( 2) _Iprog_3 = 0.0
F( 2, 195) = 6.02
Prob > F = 0.0029
xi: regress science math female i.prog
i.prog _Iprog_1-3 (naturally coded; _Iprog_1 omitted)
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 4, 195) = 36.25
Model | 8318.90574 4 2079.72643 Prob > F = 0.0000
Residual | 11188.5943 195 57.3774065 R-squared = 0.4264
-------------+------------------------------ Adj R-squared = 0.4147
Total | 19507.50 199 98.0276382 Root MSE = 7.5748
------------------------------------------------------------------------------
science | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
math | .6954811 .0653359 10.64 0.000 .5666254 .8243368
female | -2.113129 1.076644 -1.96 0.051 -4.236491 .0102329
_Iprog_2 | -3.271645 1.41948 -2.30 0.022 -6.071149 -.4721421
_Iprog_3 | -2.705079 1.574137 -1.72 0.087 -5.809598 .3994395
_cons | 18.78194 3.526991 5.33 0.000 11.82599 25.73788
------------------------------------------------------------------------------
test math
( 1) math = 0.0
F( 1, 195) = 113.31
Prob > F = 0.0000
test _Iprog_2 _Iprog_3
( 1) _Iprog_2 = 0.0
( 2) _Iprog_3 = 0.0
F( 2, 195) = 2.84
Prob > F = 0.0611
xi: sureg (write read female i.prog) (science math female i.prog), corr small
i.prog _Iprog_1-3 (naturally coded; _Iprog_1 omitted)
Seemingly unrelated regression
----------------------------------------------------------------------
Equation Obs Parms RMSE "R-sq" F-Stat P
----------------------------------------------------------------------
write 200 4 6.970941 0.4700 41.20 0.0000
science 200 4 7.587139 0.4246 33.52 0.0000
----------------------------------------------------------------------
------------------------------------------------------------------------------
| Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
write |
read | .4456005 .051933 8.58 0.000 .3434969 .5477041
female | 5.309756 .9777063 5.43 0.000 3.387521 7.23199
_Iprog_2 | 1.967999 1.26896 1.55 0.122 -.5268599 4.462858
_Iprog_3 | -3.024374 1.42369 -2.12 0.034 -5.823442 -.2253068
_cons | 26.33036 2.858429 9.21 0.000 20.71051 31.95022
-------------+----------------------------------------------------------------
science |
math | .6433571 .063827 10.08 0.000 .5178691 .7688452
female | -2.148248 1.063082 -2.02 0.044 -4.238337 -.0581596
_Iprog_2 | -2.921167 1.400201 -2.09 0.038 -5.674053 -.1682801
_Iprog_3 | -2.892607 1.553968 -1.86 0.063 -5.947811 .162596
_cons | 21.40802 3.450343 6.20 0.000 14.62442 28.19162
------------------------------------------------------------------------------
Correlation matrix of residuals:
write science
write 1.0000
science 0.1773 1.0000
Breusch-Pagan test of independence: chi2(1) = 6.290, Pr = 0.0121
test math
( 1) [science]math = 0.0
F( 1, 393) = 30.93
Prob > F = 0.0000
test read
( 1) [write]read = 0.0
F( 1, 393) = 31.75
Prob > F = 0.0000
test _Iprog_2 _Iprog_3
( 1) [write]_Iprog_2 = 0.0
( 2) [science]_Iprog_2 = 0.0
( 3) [write]_Iprog_3 = 0.0
( 4) [science]_Iprog_3 = 0.0
F( 2, 393) = 8.31
Prob > F = 0.0003
Second ExampleThe ultimate in seemingly unrelated regression occurs when there are equations with no variables in common.
xi: regress socst i.prog write
i.prog _Iprog_1-3 (naturally coded; _Iprog_1 omitted)
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 3, 196) = 46.50
Model | 9537.34999 3 3179.11666 Prob > F = 0.0000
Residual | 13398.845 196 68.3614541 R-squared = 0.4158
-------------+------------------------------ Adj R-squared = 0.4069
Total | 22936.195 199 115.257261 Root MSE = 8.2681
------------------------------------------------------------------------------
socst | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Iprog_2 | 3.302177 1.510935 2.19 0.030 .3223989 6.281954
_Iprog_3 | -2.985748 1.727315 -1.73 0.085 -6.392257 .4207608
write | .5672562 .0681868 8.32 0.000 .4327823 .7017301
_cons | 21.48085 3.710919 5.79 0.000 14.16239 28.7993
------------------------------------------------------------------------------
test _Iprog_2 _Iprog_3
( 1) _Iprog_2 = 0.0
( 2) _Iprog_3 = 0.0
F( 2, 196) = 8.40
Prob > F = 0.0003
regress science math read
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 2, 197) = 90.27
Model | 9328.73944 2 4664.36972 Prob > F = 0.0000
Residual | 10178.7606 197 51.6688353 R-squared = 0.4782
-------------+------------------------------ Adj R-squared = 0.4729
Total | 19507.50 199 98.0276382 Root MSE = 7.1881
------------------------------------------------------------------------------
science | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
math | .4017207 .0725922 5.53 0.000 .2585632 .5448782
read | .3654205 .0663299 5.51 0.000 .2346128 .4962282
_cons | 11.6155 3.054262 3.80 0.000 5.592255 17.63875
------------------------------------------------------------------------------
xi: sureg (socst i.prog write) (science math read), small
i.prog _Iprog_1-3 (naturally coded; _Iprog_1 omitted)
Seemingly unrelated regression
----------------------------------------------------------------------
Equation Obs Parms RMSE "R-sq" F-Stat P
----------------------------------------------------------------------
socst 200 3 8.268303 0.4158 47.67 0.0000
science 200 2 7.188272 0.4782 92.77 0.0000
----------------------------------------------------------------------
------------------------------------------------------------------------------
| Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
socst |
_Iprog_2 | 3.177816 1.495345 2.13 0.034 .2379403 6.117692
_Iprog_3 | -3.030467 1.709478 -1.77 0.077 -6.391332 .3303987
write | .5720854 .0674874 8.48 0.000 .4394039 .7047669
_cons | 21.30246 3.672902 5.80 0.000 14.08146 28.52345
-------------+----------------------------------------------------------------
science |
math | .4005705 .0720278 5.56 0.000 .2589625 .5421784
read | .3708297 .0658133 5.63 0.000 .2414395 .5002198
_cons | 11.39354 3.030858 3.76 0.000 5.434813 17.35226
------------------------------------------------------------------------------
test _Iprog_2 _Iprog_3
( 1) [socst]_Iprog_2 = 0.0
( 2) [socst]_Iprog_3 = 0.0
F( 2, 393) = 8.31
Prob > F = 0.0003
test write=read
( 1) [socst]write - [science]read = 0.0
F( 1, 393) = 4.54
Prob > F = 0.0338
Multivariate Course Page
Phil Ender, 23apr05, 21may02