Stata is an easy to use statistical software package. Stata is command driven and is in some respects similar to SAS, although much easier to learn. This Computer Module is designed to introduce students to Stata and to allow them to begin to statistically analyze data.
Students have several choices when it comes to using Stata, they can use the interactive desktop version of Stata in one of the campus computer labs or they can purchase their own copy of Stata for around $100.
View Useful Stata Commands.
View ATS Stata Class Notes.
input id x
13 34
17 21
14 25
9 33
18 40
12 33
4 44
11 41
17 21
end
sort id
list
+---------+
| id x |
|---------|
1. | 13 34 |
2. | 17 21 |
3. | 14 25 |
4. | 9 33 |
5. | 18 40 |
|---------|
6. | 12 33 |
7. | 4 44 |
8. | 11 41 |
9. | 17 21 |
+---------+
use http://www.philender.com/courses/data/hsb2, clear
describe
Contains data from http://www.gseis.ucla.edu/courses/data/hsb2.dta
obs: 200 highschool and beyond (200
cases)
vars: 11 21 Jun 2000 08:54
size: 9,600 (98.9% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
id float %9.0g
female float %9.0g fl
race float %12.0g rl
ses float %9.0g sl
schtyp float %9.0g scl type of school
prog float %9.0g sel type of program
read float %9.0g reading score
write float %9.0g writing score
math float %9.0g math score
science float %9.0g science score
socst float %9.0g social studies score
-------------------------------------------------------------------------------
list
list
Observation 1
id 70 female male race white
ses low schtyp public prog general
read 57 write 52 math 41
science 47 socst 57
Observation 2
id 121 female female race white
ses middle schtyp public prog vocation
read 68 write 59 math 53
science 63 socst 61
Observation 3
id 86 female male race white
ses high schtyp public prog general
read 44 write 33 math 54
science 58 socst 31
Observation 4
id 141 female male race white
ses high schtyp public prog vocation
read 63 write 44 math 47
science 53 socst 56
Observation 5
id 172 female male race white
ses middle schtyp public prog academic
read 47 write 52 math 57
science 53 socst 61
...
list id female race ses prog read in 1/20
+--------------------------------------------------------+
| id female race ses prog read |
|--------------------------------------------------------|
1. | 70 male white low general 57 |
2. | 121 female white middle vocation 68 |
3. | 86 male white high general 44 |
4. | 141 male white high vocation 63 |
5. | 172 male white middle academic 47 |
|--------------------------------------------------------|
6. | 113 male white middle academic 44 |
7. | 50 male african-amer middle general 50 |
8. | 11 male hispanic middle academic 34 |
9. | 84 male white middle general 63 |
10. | 48 male african-amer middle academic 57 |
|--------------------------------------------------------|
11. | 75 male white middle vocation 60 |
12. | 60 male white middle academic 57 |
13. | 95 male white high academic 73 |
14. | 104 male white high academic 54 |
15. | 38 male african-amer low academic 45 |
|--------------------------------------------------------|
16. | 115 male white low general 42 |
17. | 76 male white high academic 47 |
18. | 195 male white middle general 57 |
19. | 114 male white high academic 68 |
20. | 85 male white middle general 55 |
+--------------------------------------------------------+
list id female race ses prog read in 1/20, clean
id female race ses prog read
1. 70 male white low general 57
2. 121 female white middle vocation 68
3. 86 male white high general 44
4. 141 male white high vocation 63
5. 172 male white middle academic 47
6. 113 male white middle academic 44
7. 50 male african-amer middle general 50
8. 11 male hispanic middle academic 34
9. 84 male white middle general 63
10. 48 male african-amer middle academic 57
11. 75 male white middle vocation 60
12. 60 male white middle academic 57
13. 95 male white high academic 73
14. 104 male white high academic 54
15. 38 male african-amer low academic 45
16. 115 male white low general 42
17. 76 male white high academic 47
18. 195 male white middle general 57
19. 114 male white high academic 68
20. 85 male white middle general 55
list id female race ses prog read in 1/20, clean nolabel
id female race ses prog read
1. 70 0 4 1 1 57
2. 121 1 4 2 3 68
3. 86 0 4 3 1 44
4. 141 0 4 3 3 63
5. 172 0 4 2 2 47
6. 113 0 4 2 2 44
7. 50 0 3 2 1 50
8. 11 0 1 2 2 34
9. 84 0 4 2 1 63
10. 48 0 3 2 2 57
11. 75 0 4 2 3 60
12. 60 0 4 2 2 57
13. 95 0 4 3 2 73
14. 104 0 4 3 2 54
15. 38 0 3 1 2 45
16. 115 0 4 1 1 42
17. 76 0 4 3 2 47
18. 195 0 4 2 1 57
19. 114 0 4 3 2 68
20. 85 0 4 2 1 55
summarize
Variable | Obs Mean Std. Dev. Min Max
-------------+-----------------------------------------------------
id | 200 100.5 57.87918 1 200
female | 200 .545 .4992205 0 1
race | 200 3.43 1.039472 1 4
ses | 200 2.055 .7242914 1 3
schtyp | 200 1.16 .367526 1 2
prog | 200 2.025 .6904772 1 3
read | 200 52.23 10.25294 28 76
write | 200 52.775 9.478586 31 67
math | 200 52.645 9.368448 33 75
science | 200 51.85 9.900891 26 74
socst | 200 52.405 10.73579 26 71
summarize write
Variable | Obs Mean Std. Dev. Min Max
-------------+-----------------------------------------------------
write | 200 52.775 9.478586 31 67
histogram write
histogram write, start(30) width(5) normal
tabulate write
writing |
score | Freq. Percent Cum.
------------+-----------------------------------
31 | 4 2.00 2.00
33 | 4 2.00 4.00
35 | 2 1.00 5.00
36 | 2 1.00 6.00
37 | 3 1.50 7.50
38 | 1 0.50 8.00
39 | 5 2.50 10.50
40 | 3 1.50 12.00
41 | 10 5.00 17.00
42 | 2 1.00 18.00
43 | 1 0.50 18.50
44 | 12 6.00 24.50
45 | 1 0.50 25.00
46 | 9 4.50 29.50
47 | 2 1.00 30.50
49 | 11 5.50 36.00
50 | 2 1.00 37.00
52 | 15 7.50 44.50
53 | 1 0.50 45.00
54 | 17 8.50 53.50
55 | 3 1.50 55.00
57 | 12 6.00 61.00
59 | 25 12.50 73.50
60 | 4 2.00 75.50
61 | 4 2.00 77.50
62 | 18 9.00 86.50
63 | 4 2.00 88.50
65 | 16 8.00 96.50
67 | 7 3.50 100.00
------------+-----------------------------------
Total | 200 100.00
sort prog
by prog: summarize write
_______________________________________________________________________________
-> prog = general
Variable | Obs Mean Std. Dev. Min Max
-------------+-----------------------------------------------------
write | 45 51.33333 9.397775 31 67
_______________________________________________________________________________
-> prog = academic
Variable | Obs Mean Std. Dev. Min Max
-------------+-----------------------------------------------------
write | 105 56.25714 7.943343 33 67
_______________________________________________________________________________
-> prog = vocation
Variable | Obs Mean Std. Dev. Min Max
-------------+-----------------------------------------------------
write | 50 46.76 9.318754 31 67
summarize write, detail
writing score
-------------------------------------------------------------
Percentiles Smallest
1% 31 31
5% 35.5 31
10% 39 31 Obs 200
25% 45.5 31 Sum of Wgt. 200
50% 54 Mean 52.775
Largest Std. Dev. 9.478586
75% 60 67
90% 65 67 Variance 89.84359
95% 65 67 Skewness -.4784158
99% 67 67 Kurtosis 2.238527
stem write
Stem-and-leaf plot for write (writing score)
3* | 1111
3t | 3333
3f | 55
3s | 66777
3. | 899999
4* | 0001111111111
4t | 223
4f | 4444444444445
4s | 66666666677
4. | 99999999999
5* | 00
5t | 2222222222222223
5f | 44444444444444444555
5s | 777777777777
5. | 9999999999999999999999999
6* | 00001111
6t | 2222222222222222223333
6f | 5555555555555555
6s | 7777777
graph box write
graph box write, over(prog)

use http://www.philender.com/courses/data/hsb2, clear
tabulate prog
type of |
program | Freq. Percent Cum.
------------+-----------------------------------
general | 45 22.50 22.50
academic | 105 52.50 75.00
vocation | 50 25.00 100.00
------------+-----------------------------------
Total | 200 100.00
summarize write if prog==1
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
write | 45 51.33333 9.397775 31 67Example 2
You will have to clear and reload the data after this example.
keep if prog==1
(155 observations deleted)
tabulate prog
type of |
program | Freq. Percent Cum.
------------+-----------------------------------
general | 45 100.00 100.00
------------+-----------------------------------
Total | 45 100.00
summarize write
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
write | 45 51.33333 9.397775 31 67
use http://www.philender.com/courses/data/hsb2 scatter write readscatter write read, jitter(2)
scatter write read, jitter(2) msym(Oh)
twoway (scatter write read, jitter(2) msym(Oh))(lfit write read)
correlate write read math female (obs=200) | write read math female -------------+------------------------------------ write | 1.0000 read | 0.5968 1.0000 math | 0.6174 0.6623 1.0000 female | 0.2565 -0.0531 -0.0293 1.0000 sort female by female: correlate write read math ------------------------------------------------------------------------------- -> female = male (obs=91) | write read math -------------+--------------------------- write | 1.0000 read | 0.6485 1.0000 math | 0.6268 0.6085 1.0000 -------------------------------------------------------------------------------- -> female = female (obs=109) | write read math -------------+--------------------------- write | 1.0000 read | 0.6209 1.0000 math | 0.6749 0.7111 1.0000 regress write read Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 1, 198) = 109.52 Model | 6367.42127 1 6367.42127 Prob > F = 0.0000 Residual | 11511.4537 198 58.1386552 R-squared = 0.3561 -------------+------------------------------ Adj R-squared = 0.3529 Total | 17878.875 199 89.843593 Root MSE = 7.6249 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | .5517051 .0527178 10.47 0.000 .4477445 .6556656 _cons | 23.95944 2.805744 8.54 0.000 18.42647 29.49242 ------------------------------------------------------------------------------ regress write read female Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 2, 197) = 77.21 Model | 7856.32118 2 3928.16059 Prob > F = 0.0000 Residual | 10022.5538 197 50.8759077 R-squared = 0.4394 -------------+------------------------------ Adj R-squared = 0.4337 Total | 17878.875 199 89.843593 Root MSE = 7.1327 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- read | .5658869 .0493849 11.46 0.000 .468496 .6632778 female | 5.486894 1.014261 5.41 0.000 3.48669 7.487098 _cons | 20.22837 2.713756 7.45 0.000 14.87663 25.58011 ------------------------------------------------------------------------------
use http://www.philender.com/courses/data/hsb2, clear
ttest write, by(female)
Two-sample t test with equal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
male | 91 50.12088 1.080274 10.30516 47.97473 52.26703
female | 109 54.99083 .7790686 8.133715 53.44658 56.53507
---------+--------------------------------------------------------------------
combined | 200 52.775 .6702372 9.478586 51.45332 54.09668
---------+--------------------------------------------------------------------
diff | -4.869947 1.304191 -7.441835 -2.298059
------------------------------------------------------------------------------
diff = mean(male) - mean(female) t = -3.7341
Ho: diff = 0 degrees of freedom = 198
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.0001 Pr(|T| > |t|) = 0.0002 Pr(T > t) = 0.9999Example 2: Dependent t-test
ttest write = read
Paired t test
------------------------------------------------------------------------------
Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
write | 200 52.775 .6702372 9.478586 51.45332 54.09668
read | 200 52.23 .7249921 10.25294 50.80035 53.65965
---------+--------------------------------------------------------------------
diff | 200 .545 .6283822 8.886666 -.6941424 1.784142
------------------------------------------------------------------------------
mean(diff) = mean(write - read) t = 0.8673
Ho: mean(diff) = 0 degrees of freedom = 199
Ha: mean(diff) < 0 Ha: mean(diff) != 0 Ha: mean(diff) > 0
Pr(T < t) = 0.8066 Pr(|T| > |t|) = 0.3868 Pr(T > t) = 0.1934
tabulate prog female, all
type of | female
program | male female | Total
-----------+----------------------+----------
general | 21 24 | 45
academic | 47 58 | 105
vocation | 23 27 | 50
-----------+----------------------+----------
Total | 91 109 | 200
Pearson chi2(2) = 0.0528 Pr = 0.974
likelihood-ratio chi2(2) = 0.0528 Pr = 0.974
Cramér's V = 0.0162
gamma = 0.0066 ASE = 0.122
Kendall's tau-b = 0.0036 ASE = 0.067
use http://www.philender.com/courses/data/missing, clear
describe
Contains data from missing.dta
obs: 15
vars: 2 14 Jul 2006 17:56
size: 180 (99.9% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
mt float %9.0g
final float %9.0g
-------------------------------------------------------------------------------
Sorted by:
list, clean
mt final
1. 43 48
2. . 41
3. 41 44
4. 40 44
5. 38 43
6. 46 42
7. 41 40
8. 48 .
9. 42 45
10. 41 40
11. 43 46
12. . 45
13. 44 48
14. 39 42
15. 40 45
generate total = mt + final
(3 missing values generated)
list, clean
mt final total
1. 43 48 91
2. . 41 .
3. 41 44 85
4. 40 44 84
5. 38 43 81
6. 46 42 88
7. 41 40 81
8. 48 . .
9. 42 45 87
10. 41 40 81
11. 43 46 89
12. . 45 .
13. 44 48 92
14. 39 42 81
15. 40 45 85
summarize
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
mt | 13 42 2.798809 38 48
final | 14 43.78571 2.607049 40 48
total | 12 85.41667 4.010403 81 92
correlate
(obs=12)
| mt final total
-------------+---------------------------
mt | 1.0000
final | 0.3263 1.0000
total | 0.7755 0.8498 1.0000
pwcorr, obs
| mt final total
-------------+---------------------------
mt | 1.0000
| 13
|
final | 0.3263 1.0000
| 12 14
|
total | 0.7755 0.8498 1.0000
| 12 12 12
log using mylog1.log [ a bunch of Stata commands ] log close type mylog1.log
Variable Name Variable Label Value Labels
CASENUM Case number Possible range= 100 to 3000
MATHTYPE Level of math class 1-N/A
2-Low
3-Average
4-High
5-Algebra
6-Honors Algebra
LUNCH2 School lunch 1-Yes
2-No
TOTALC Total accuracy score Possible range= 0 to 25
use http://www.philender.com/courses/data/clean, clear
describe
Contains data from http://www.philender.com/courses/data/clean.dta
obs: 199
vars: 4 14 Jul 2006 17:54
size: 3,980 (99.9% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
id float %9.0g
mathtype float %9.0g
lunch2 float %9.0g
totalc float %9.0g
-------------------------------------------------------------------------------
Sorted by:
list, clean
id mathtype lunch2 totalc
1. 884 2 1 7
2. 885 2 1 11
3. 886 2 1 13
4. 887 2 1 14
5. 888 2 1 6
...
195. 756 6 1 18
196. 757 6 1 14
197. 758 6 1 17
198. 761 6 1 15
199. 299 6 2 23
summarize id totalc, detail
id
-------------------------------------------------------------
Percentiles Smallest
1% 106 102
5% 148 106
10% 176 108 Obs 199
25% 472 141 Sum of Wgt. 199
50% 755 Mean 819.4774
Largest Std. Dev. 519.5533
75% 1068 2126
90% 1158 2133 Variance 269935.6
95% 2037 3121 Skewness 1.481881
99% 3121 3123 Kurtosis 6.65515
totalc
-------------------------------------------------------------
Percentiles Smallest
1% 1 0
5% 3 1
10% 5 1 Obs 199
25% 9 1 Sum of Wgt. 199
50% 14 Mean 14.1407
Largest Std. Dev. 7.793438
75% 19 24
90% 22 25 Variance 60.73768
95% 24 33 Skewness 2.549864
99% 33 77 Kurtosis 22.51892
tab1 mathtype lunch2 totalc
-> tabulation of mathtype
mathtype | Freq. Percent Cum.
------------+-----------------------------------
2 | 44 22.11 22.11
3 | 44 22.11 44.22
4 | 44 22.11 66.33
5 | 43 21.61 87.94
6 | 21 10.55 98.49
8 | 1 0.50 98.99
9 | 2 1.01 100.00
------------+-----------------------------------
Total | 199 100.00
-> tabulation of lunch2
lunch2 | Freq. Percent Cum.
------------+-----------------------------------
0 | 1 0.50 0.50
1 | 110 55.28 55.78
2 | 87 43.72 99.50
3 | 1 0.50 100.00
------------+-----------------------------------
Total | 199 100.00
-> tabulation of totalc
totalc | Freq. Percent Cum.
------------+-----------------------------------
0 | 1 0.50 0.50
1 | 3 1.51 2.01
2 | 2 1.01 3.02
3 | 5 2.51 5.53
4 | 3 1.51 7.04
5 | 8 4.02 11.06
6 | 10 5.03 16.08
7 | 8 4.02 20.10
8 | 7 3.52 23.62
9 | 7 3.52 27.14
10 | 11 5.53 32.66
11 | 13 6.53 39.20
12 | 6 3.02 42.21
13 | 9 4.52 46.73
14 | 8 4.02 50.75
15 | 10 5.03 55.78
16 | 13 6.53 62.31
17 | 10 5.03 67.34
18 | 10 5.03 72.36
19 | 12 6.03 78.39
20 | 9 4.52 82.91
21 | 5 2.51 85.43
22 | 10 5.03 90.45
23 | 9 4.52 94.97
24 | 7 3.52 98.49
25 | 1 0.50 98.99
33 | 1 0.50 99.50
77 | 1 0.50 100.00
------------+-----------------------------------
Total | 199 100.00
sort id
list if id == id[_n+1], clean
id mathtype lunch2 totalc
155. 1101 4 1 16
list if id == 1101, clean
id mathtype lunch2 totalc
155. 1101 4 1 16
156. 1101 4 2 12
list if id>3000 | id<1, clean
id mathtype lunch2 totalc
198. 3121 5 2 21
199. 3123 5 2 22
Phil Ender, 25Sep00