Stata Multiple Regression Session
use http://www.gseis.ucla.edu/courses/data/hsb2, clear
describe
Contains data from http://www.philender.com/courses/data/hsbdemo, clear
obs: 200 highschool and beyond (200
cases)
vars: 11 21 Jun 2000 08:54
size: 9,600 (98.0% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
id float %9.0g
female float %9.0g fl
race float %12.0g rl
ses float %9.0g sl
schtyp float %9.0g scl type of school
prog float %9.0g sel type of program
read float %9.0g reading score
write float %9.0g writing score
math float %9.0g math score
science float %9.0g science score
socst float %9.0g social studies score
-------------------------------------------------------------------------------
Sorted by:
summarize write read math female
Variable | Obs Mean Std. Dev. Min Max
-------------+-----------------------------------------------------
write | 200 52.775 9.478586 31 67
read | 200 52.23 10.25294 28 76
math | 200 52.645 9.368448 33 75
female | 200 .545 .4992205 0 1
summarize write read math female
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
write | 200 52.775 9.478586 31 67
read | 200 52.23 10.25294 28 76
math | 200 52.645 9.368448 33 75
female | 200 .545 .4992205 0 1
stem write
Stem-and-leaf plot for write (writing score)
3* | 1111
3t | 3333
3f | 55
3s | 66777
3. | 899999
4* | 0001111111111
4t | 223
4f | 4444444444445
4s | 66666666677
4. | 99999999999
5* | 00
5t | 2222222222222223
5f | 44444444444444444555
5s | 777777777777
5. | 9999999999999999999999999
6* | 00001111
6t | 2222222222222222223333
6f | 5555555555555555
6s | 7777777
stem read, lines(2)
Stem-and-leaf plot for read (reading score)
2. | 8
3* | 1444444
3. | 56667799999999
4* | 112222222222222334444444444444
4. | 5567777777777777777777777777778
5* | 0000000000000000002222222222222234
5. | 555555555555577777777777777
6* | 00000000013333333333333333
6. | 555555555688888888888
7* | 1133333
7. | 66
stem math, lines(2)
Stem-and-leaf plot for math (math score)
3* | 3
3. | 5788999999
4* | 00000000001111111222222233333334444
4. | 5555555566666666777888889999999999
5* | 00000001111111122222233333334444444444
5. | 555556666666777777777777788888899
6* | 00000111111122223333344444
6. | 555666677899
7* | 011112223
7. | 55
kdbox write, normal mean /* findit kdbox */
kdbox read, normal mean
kdbox math, normal mean
[graphs omitted]
/* shortcut for the 3 kdensity graphs */
foreach var of varlist write read math {
kdbox `var', normal mean
more
}
[graphs omitted]
foreach var of varlist write read math {
pnorm `var'
more
qnorm `var'
more
}
[graphs omitted]
graph matrix read math female write, half
[graph omitted]
correlate write read math female
(obs=200)
| write read math female
-------------+------------------------------------
write | 1.0000
read | 0.5968 1.0000
math | 0.6174 0.6623 1.0000
female | 0.2565 -0.0531 -0.0293 1.0000
pcorr write read math female
(obs=200)
Partial correlation of write with
Variable | Corr. Sig.
-------------+------------------
read | 0.3573 0.000
math | 0.3931 0.000
female | 0.3840 0.000
regress write read math female
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 3, 196) = 72.52
Model | 9405.34864 3 3135.11621 Prob > F = 0.0000
Residual | 8473.52636 196 43.2322773 R-squared = 0.5261
-------------+------------------------------ Adj R-squared = 0.5188
Total | 17878.875 199 89.843593 Root MSE = 6.5751
------------------------------------------------------------------------------
write | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
read | .3252389 .0607348 5.36 0.000 .2054613 .4450166
math | .3974826 .0664037 5.99 0.000 .266525 .5284401
female | 5.44337 .9349987 5.82 0.000 3.59942 7.287319
_cons | 11.89566 2.862845 4.16 0.000 6.249728 17.5416
------------------------------------------------------------------------------
predict e, resid
predict rstu, rstu
predict p
graph twoway scatter rstu p, yline(-2.5 2.5) ylabel(-3(1)3) jitter(2)
rvfplot2, rstu yline(2.5 -2.5) jitter(2) /* findit rvfplot2 */
[graphs omitted]
rvpplot2 read, rstu yline(0 -2.5 2.5) jitter(2) /* findit rvpplot2 */
rvpplot math, yline(0 -2.5 2.5) jitter(2)
rvpplot female, yline(0 -2.5 2.5) jitter(2)
[graphs omitted]
graph twoway scatter rstu read, yline(0 -2.5 2.5) ylabel(-3(1)3) jitter(2)
[graph omitted]
avplot read
avplot math
avplot female
[graphs omitted]
kdensity e, normal
graph twoway scatter write p, jitter(2)
graph twoway (scatter write p, jitter(2)) (lfit write p)
graph twoway scatter rstu id, yline(0)
indexplot rstu, scatter /* findit indexplot */
[graphs omitted]
list id write rstu if abs(rstu)>=2.5
id write rstu
31. 126 31 -2.697508
198. 187 41 -2.72472
lvr2plot, ylabel xlabel
dfbeta
list id write rstu DFread if abs(DFread)>2/sqrt(e(N))
id write rstu DFread
169. 150 41 -1.113306 .1435211
172. 141 44 -1.092409 -.1484074
190. 170 62 1.636351 -.1785097
194. 103 52 -1.564255 -.2235134
196. 86 33 -2.276461 .2035398
198. 3 65 2.106786 .2715756
199. 62 65 2.00872 .2973564
200. 126 31 -2.697508 .3477473
list id write rstu DFmath if abs(DFmath)>2/sqrt(e(N))
id write rstu DFmath
166. 24 62 1.074772 .1493585
167. 189 59 1.047505 .1459866
175. 32 67 1.107803 .1665842
189. 83 62 1.871348 -.197515
190. 170 62 1.636351 .1939547
193. 200 54 -1.52912 -.202688
195. 50 59 2.194752 -.2067871
196. 86 33 -2.276461 -.1484884
197. 133 31 -2.026189 .2327446
198. 3 65 2.106786 -.2397425
199. 62 65 2.00872 -.2541649
200. 126 31 -2.697508 -.2931431
list id write rstu DFfemale if abs(DFfemale)>2/sqrt(e(N))
id write rstu DFfemale
178. 85 39 -2.073712 .1599997
184. 18 33 -2.262443 .1778462
185. 81 43 -1.982814 .1469716
187. 60 65 2.210802 -.1678427
188. 16 31 -2.114106 .1683168
191. 187 41 -2.72472 -.1817335
195. 50 59 2.194752 -.1720256
196. 86 33 -2.276461 .186213
197. 133 31 -2.026189 .1588308
198. 3 65 2.106786 -.1553218
199. 62 65 2.00872 -.146781
200. 126 31 -2.697508 .2246109
/* alternate code */
sort DFread
list id write DFread in 1/10
list id write DFread in -10/l
sort DFmath
list id write DFmath in 1/10
list id write DFmath in -10/l
sort DFfemale
list id write DFfemale in 1/10
list id write DFfemale in -10/l
indexplot leverage, scatter
predict lev, leverage
sort lev
list id write rstu lev in -10/l
id write lev
191. 103 52 .0376407
192. 164 36 .0378285
193. 34 61 .0378289
194. 33 65 .037994
195. 161 62 .037994
196. 19 46 .0387017
197. 200 54 .0389156
198. 143 63 .0417192
199. 61 63 .0425231
200. 167 49 .0752208
indexplot cooksd, scatter
predict d, cooksd
sort d
list id write rstu lev d in -10/l
id write rstu lev d
191. 187 41 -2.72472 .0107086 .0194529
192. 117 49 1.634066 .028638 .0195144
193. 200 54 -1.52912 .0389156 .0235088
194. 103 52 -1.564255 .0376407 .023751
195. 50 59 2.194752 .0200704 .0241933
196. 86 33 -2.276461 .018896 .0244312
197. 133 31 -2.026189 .0242461 .0251059
198. 3 65 2.106786 .0285327 .032029
199. 62 65 2.00872 .0335684 .0345036
200. 126 31 -2.697508 .0280834 .0509327
vif
Variable | VIF 1/VIF
-------------+----------------------
read | 1.78 0.560251
math | 1.78 0.561351
female | 1.00 0.997122
-------------+----------------------
Mean VIF | 1.52
collin read math female /* available from ATS vis the Internet */
Collinearity Diagnostics
SQRT R-
Variable VIF VIF Tolerance Squared
----------------------------------------------------
read 1.78 1.34 0.5603 0.4397
math 1.78 1.33 0.5614 0.4386
female 1.00 1.00 0.9971 0.0029
----------------------------------------------------
Mean VIF 1.52
Cond
Eigenval Index
---------------------------------
1 1.6674 1.0000
2 0.9953 1.2943
3 0.3373 2.2234
---------------------------------
Condition Number 2.2234
Eigenvalues & Cond Index computed from deviation sscp (no intercept)
Det(correlation matrix) 0.5598
linktest
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 2, 197) = 116.16
Model | 9674.70222 2 4837.35111 Prob > F = 0.0000
Residual | 8204.17278 197 41.6455471 R-squared = 0.5411
-------------+------------------------------ Adj R-squared = 0.5365
Total | 17878.875 199 89.843593 Root MSE = 6.4533
------------------------------------------------------------------------------
write | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_hat | 3.306865 .9095168 3.64 0.000 1.513226 5.100504
_hatsq | -.0215942 .008491 -2.54 0.012 -.0383392 -.0048492
_cons | -60.58511 24.08436 -2.52 0.013 -108.0814 -13.08885
------------------------------------------------------------------------------
ovtest
Ramsey RESET test using powers of the fitted values of write
Ho: model has no omitted variables
F(3, 193) = 3.06
Prob > F = 0.0295
hettest
Cook-Weisberg test for heteroskedasticity using fitted values of write
Ho: Constant variance
chi2(1) = 6.64
Prob > chi2 = 0.0100
whitetst /* Downloaded from Stata (STB 55, sg137) via the Internet */
White's general test statistic : 15.17126 Chi-sq( 8) P-value = .0559
Linear Statistical Models Course
Phil Ender, 5feb04; 13jan00