Introduction to Research Design and Statistics

Effect Size


Statistical significance is certainly important but it is not necessarily the most important consideration in evaluating research results. Statistical significance tells us only the likelihood that the observed results are due to chance alone. Once we have determined statistical significance our concern should be with the effect size. Effect size is an indicator of of how strong or how important our results are.

One common method of indicating effect size is to express the difference in means in terms of standard deviations, not standard errors as in the t-test but standard deviations. One approach is to use the stand deviation of the control group (Glass' delta) but more commonly the pooled standard deviation (Cohen's d) is used. We will compute the pooled standard deviation as follows:

     sp = sqrt(((n1-1)*s12 + (n2-1)*s22)/(n1+n2-2))
Thus, effects size can be calculated as
d = (mean1 - mean2)/sp
You can ignore the sign of d. (Technically, we are computing Hedges' g but for other than very small sample sizes Hedges' g and Cohen's d are virtually equal.)

Cohen gives the following very rough guidelines for interpreting the effect size d:

Here is an example using the hsb2 dataset looking at the effect size for female with the variable write.
female |         N      mean        sd  variance
-------+----------------------------------------
  male |        91  50.12088  10.30516  106.1963
female |       109  54.99083  8.133715  66.15732
-------+----------------------------------------
The pooled standard deviation is sp = 9.18 and the effect size is d = -4.87/9.18 = -0.53, which is a medium effect.

Here are the details of the computation:

     sp = sqrt(((91-1)*106.2 + (109-1)*66.16)/(91+109-2))

     sp = sqrt((9558 + 7145.28)/198)

     sp = sqrt(16703.28/198)

     sp = sqrt(84.36)

     sp = 9.18

     d = (50.12 - 54.99)/9.18 = -4.87/9.18 = -0.53
Let's try again by comparing the mean of the academic group versus the others. We will use Stata to do much of the computation.
generate academic = prog==2

ttest read, by(academic)
 
Two-sample t test with equal variances

------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
       0 |      95    47.88421    .9429248    9.190494    46.01201    49.75641
       1 |     105     56.1619     .935769    9.588779    54.30624    58.01757
---------+--------------------------------------------------------------------
combined |     200       52.23    .7249921    10.25294    50.80035    53.65965
---------+--------------------------------------------------------------------
    diff |           -8.277694     1.33128                 -10.903   -5.652387
------------------------------------------------------------------------------
Degrees of freedom: 198

                      Ho: mean(0) - mean(1) = diff = 0

     Ha: diff < 0               Ha: diff != 0              Ha: diff > 0
       t =  -6.2178                t =  -6.2178              t =  -6.2178
   P < t =   0.0000          P > |t| =   0.0000          P > t =   1.0000
 
tabstat read, stat(n mean sd var) by(academic)

Summary for variables: read
     by categories of: academic 

academic |         N      mean        sd  variance
---------+----------------------------------------
       0 |        95  47.88421  9.190494  84.46517
       1 |       105   56.1619  9.588779  91.94469
---------+----------------------------------------
   Total |       200     52.23  10.25294  105.1227
--------------------------------------------------
 
display sqrt(((95-1)*84.47+(105-1)*91.94)/(95+105-2))
9.401789

display (47.88-56.16)/9.4
-.88085106
In this example, d = -0.88 which is a large effect size.

Here is some more detailed information one effects size.

Effect size can also be thought of as the average percentile standing of the average treatment (or experimental) participant relative to the average untreated (or control) participant. A d of 0.0 indicates that the mean of the treatment group is at the 50th percentile of the control group. A d of 0.8 indicates that the mean of the treatment group is at the 79th percentile of the control group. An effect size of 1.7 indicates that the mean of the treatment group is at the 95.5 percentile of the untreated group.

Effect sizes can also be interpreted in terms of the percent of nonoverlap of the treatment group's scores with those of the untreated group. A d of 0.0 indicates that the distribution of scores for the treatment group overlaps completely with the distribution of scores for the control group, there is 0% of nonoverlap. A d of 0.8 indicates a nonoverlap of 47.4% in the two distributions. A d of 1.7 indicates a nonoverlap of 75.4% in the two distributions.

Effect Size d Percentile Standing Percent Nonoverlap
2.0 97.7 81.1%
1.9 97.1 79.4%
1.8 96.4 77.4%
1.7 95.5 75.4%
1.6 94.5 73.1%
1.5 93.3 70.7%
1.4 91.9 68.1%
1.3 90 65.3%
1.2 88 62.2%
1.1 86 58.9%
1.0 84 55.4%
0.9 82 51.6%
large 0.8 79 47.4%
0.7 76 43.0%
0.6 73 38.2%
medium 0.5 69 33.0%
0.4 66 27.4%
0.3 62 21.3%
small 0.2 58 14.7%
0.1 54 7.7%
0.0 50 0%


Intro Home Page

Phil Ender, 12Nov03