Introduction to Research Design and Statistics

Measuring Association


Scatterplots

Plotting Two Variables Simultaneously

Scatterplots are used to plot two variables simultaneously, i.e., the joint distribution (also called a bivariate distribution) of two variables. Scatterplots that appear as a random circular cluster of points indicate low associations between the variables and would be said to have a a low degree of correlation (correlation close to zero). As the degree of association, and thus the correlation increases the circular cluster becomes more elliptical. The more narrow the ellipse the higher the correlation between the two variables and the higher the ability to predict one variable from another.

If the main axis of the ellipse slopes up to the right the assiciation is positive, if it slopes down to the right the association is negative. With positive correlations higher values on one variable are associated with heigher values on the other variable. When the correlation is negative, higher values on one variable are associated with lower values on the other.

Selected Scatter Plots

Pearson Product Moment Correlation Coefficient

Also known as, the Pearson correlation coefficient, or just the correlation coefficient.

Correlation coefficients can take on any value between -1 and +1, with ±1 representing perfect correlations between the variables. A correlation of zero represents no relationship between the variables.

A rule of thumb for interpreting correlation coefficients:

 Corr     Interpretation
 0 to .1  trivial
.1 to .3  small
.3 to .5  moderate
.5 to .7  large
.7 to .9  very large

Correlations are interpreted by squaring the value of the correlation coefficient. The squared value represents the proportion of variance of one variable that is shared with the other variable, in other words, the proportion of the variance of one variable that can be predicted from the other variable.

The squared correlation coefficient, r2, is know as the coefficient of determination. The proportion of variance that cannot be predicted or accounted for by the other variable is 1 - r2 and is also know as the coefficient of alienation.

Percent of Variance Accounted For

Correlation and Sample Size

The computation of correlation coefficients do not lend themselves to small sample sizes. The following table gives the recommended sample size for detecting various correlations with a power = 0.8 with an alpha = 0.05.

corr   n
.10   617              
.20   153              
.30    68             
.40    37             
.50    22             
.60    15              
.70    10              
.80     7              
.90     5 

Covariance

Consider the variance as being the covariance of a variable with itself.

The Sample Correlation Coefficient

In deviation score form.

Calculator Computational Formula

Sources of Misleading Correlation Coefficients

  • Restriction of Range
  • Extreme Groups
  • Combining Groups
  • Outliers
  • Curvilinearity

    Restriction of Range

    Extreme Groups

    Combining Groups

    Outliers

    Curvilinearity

    Discuss Correlation & Causation

    Of course, just because two variables are correlated it does not mean that they are causally related. Often a third variable, a lurking variable, that is not included in the analysis is responsible (causes) for the first two variables. A lurking variable is a variable that loiters in the background and affects both of the original variables

    Stata Examples

    use http://www.philender.com/courses/data/hsb2, clear
    
    correlate  female read write math science socst, cov
    (obs=200)
    
             |   female     read    write     math  science    socst
    ---------+------------------------------------------------------
      female |  .249221
        read | -.271709  105.123
       write |  1.21369  57.9967  89.8436
        math | -.137211  63.6147  54.8293  87.7678
     science | -.631407  63.9693  53.5339  58.5043  98.0276
       socst |  .280678  68.4089  61.5438  54.7626  49.4379  115.257
       
    correlate  female read write math science socst
    (obs=200)
    
             |   female     read    write     math  science    socst
    ---------+------------------------------------------------------
      female |   1.0000
        read |  -0.0531   1.0000
       write |   0.2565   0.5968   1.0000
        math |  -0.0293   0.6623   0.6174   1.0000
     science |  -0.1277   0.6302   0.5704   0.6307   1.0000
       socst |   0.0524   0.6215   0.6048   0.5445   0.4651   1.0000   
       
    scatter write read   
    
    
    
    
    
    scatter write read, jitter(2)  
    
    
    
    twoway scatter write read, jitter(2) || lfit write read
    
    
    
    use http://www.philender.com/courses/data/hsb1, clear  /* contains missing data */
    
    pwcorr read write math science socst, obs star(.05)
    
              |     read    write     math  science    socst
    ----------+---------------------------------------------
         read |   1.0000 
              |      200
              |
        write |   0.5968*  1.0000 
              |      200      200
              |
         math |   0.6623*  0.6174*  1.0000 
              |      200      200      200
              |
      science |   0.6171*  0.5671*  0.6166*  1.0000 
              |      195      195      195      195
              |
        socst |   0.6215*  0.6048*  0.5445*  0.4529*  1.0000 
              |      200      200      200      195      200
    


    Intro Home Page

    Phil Ender, 15Jan98