Ed231A
Multivariate Analysis

Introduction


Introduction to Education 231A

Multivariate Analysis

Instructor: Phil Ender

  • Email: ender@ucla.edu
  • Moore Hall 3030
  • (310) 206-3195

    Textbook:

  • Computer-Aided Multivariate Analysis (4th Edition)
    by Afifi, Clark and May
    Publisher: Chapman & Hall/CRC
    Year: 2004
    ISBN 1-58488-308-1

    You can view textbook examples for this book using several different statistical software packages at the ATS website: Afifi, Clark & May -- Textbook Examples.

    Topics Covered by Afifi et al vs Lecture

    Textbook                          Lecture
                                      matrix algebra
    simple linear regression          simple linear regression
    multiple linear regression        multiple linear regression
                                      multivariate multiple regression
                                      Hotellings T2
                                      multivariate analysis of variance
    canonical correlation             canonical correlation
    discriminant analysis             discriminant analysis
    logistic regression               probit regression
    survival analysis
    principal components analysis     principal components analysis
    factor analysis                   factor analysis
    cluster analysis                  cluster analysis
    log-linear analysis
    Course Organization
  • No exams
  • 10 Computer Assignments
  • Programming using either Stata, SAS or R
  • Note: There will be class the Wednesday before Thanksgiving

    Electronic Support

    Multivariate Course Webpage

  • http://www.philender.com/courses/multivariate/
  • Syllabus
  • Lecture Notes
  • Help Sheets
  • Computer Assignments
  • ed231a_583244200_ender

    Lecture Notes

  • Lectures will be used in class.
  • Lectures will be available on the Multivariate Course Web site.

    About Assignments

  • Write your own programs
  • Make programs general
  • Include comments & labels

    Computers Running Stata

  • 16 Macs in Moore Hall*
  • 20 Macs in GSE&IS Building*
  • Macs & PCs in CLICC Labs in Powell Library
  • PCs in Social Sciences Computing Lab**

    *May Require Technology Fee
    **Social Science students only

    Relative Course Difficulty

    Let's get started...

    What makes a model multivariate?

  •         Is multiple regression multivariate?
  •         The Afifi, Clark & May view of multivariate.

    Every model has a

    lhs variables are response variables (the so called dependent variables, outcome variables).
    rhs variables are predictor or explanatory variables (aka independent variables).

    Here are two univariate models.

    And two multivariate models. For the purposes of this class, multivariate will be taken to mean models with multiple lhs variables.

    The concept of right hand side and left hand side equivalence.
    There are times when rhs variables and lhs variables an be exchanged and the two models can yield the same results.

    Examples:
    /* multivariate anova -- female is a rhs variable */
    manova read write math = female
    
                               Number of obs =     200
    
                               W = Wilks' lambda      L = Lawley-Hotelling trace
                               P = Pillai's trace     R = Roy's largest root
    
                      Source |  Statistic     df   F(df1,    df2) =   F   Prob>F
                  -----------+--------------------------------------------------
                      female | W   0.8501      1     3.0   196.0    11.52 0.0000 e
                             | P   0.1499            3.0   196.0    11.52 0.0000 e
                             | L   0.1763            3.0   196.0    11.52 0.0000 e
                             | R   0.1763            3.0   196.0    11.52 0.0000 e
                             |--------------------------------------------------
                    Residual |               198
                  -----------+--------------------------------------------------
                       Total |               199
                  --------------------------------------------------------------
                               e = exact, a = approximate, u = upper bound on F
    
    /* OLS regression -- female is a lhs variable */
    /* in SAS: model female = read write math     */
    regress female read write math
    
          Source |       SS       df       MS              Number of obs =     200
    -------------+------------------------------           F(  3,   196) =   11.52
           Model |  7.43351627     3  2.47783876           Prob > F      =  0.0000
        Residual |  42.1614837   196  .215109611           R-squared     =  0.1499
    -------------+------------------------------           Adj R-squared =  0.1369
           Total |      49.595   199  .249221106           Root MSE      =   .4638
    
    ------------------------------------------------------------------------------
          female |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
            read |  -.0112975   .0045153    -2.50   0.013    -.0202023   -.0023926
           write |   .0270844   .0046522     5.82   0.000     .0179095    .0362593
            math |  -.0102947   .0050408    -2.04   0.042     -.020236   -.0003535
           _cons |   .2476519   .2099033     1.18   0.239    -.1663071     .661611
    ------------------------------------------------------------------------------
    The role of matrix algebra in multivariate analysis.

    Matrix algebra gives us a concise and elegant way in which to represent multivariate models. If you are intimidated by it, please realize that the alternatives to matrix representation are worse.

    Consider this univariate multiple regression model

    Contrast it with this multivariate multiple regression model Some Examples of Multivariate Generalization of Univariate Models

    These examples are in stat package pseudo-code

    Classifying Multivariate Models

    I. Testing effects; discriminating among groups

    II. Simplification of variable structure; determining dimensionality; rank reduction III. Other Some Multivariate Analogs to Univariate Procedures

    To be a well behaved multivariate analog the multivariate procedure with one response variable should yield equivalent results as the univariate proecedure.

    Examples:

    ttest write, by(female)
    
    Two-sample t test with equal variances
    
    ------------------------------------------------------------------------------
       Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
    ---------+--------------------------------------------------------------------
        male |      91    50.12088    1.080274    10.30516    47.97473    52.26703
      female |     109    54.99083    .7790686    8.133715    53.44658    56.53507
    ---------+--------------------------------------------------------------------
    combined |     200      52.775    .6702372    9.478586    51.45332    54.09668
    ---------+--------------------------------------------------------------------
        diff |           -4.869947    1.304191               -7.441835   -2.298059
    ------------------------------------------------------------------------------
    Degrees of freedom: 198
    
                      Ho: mean(male) - mean(female) = diff = 0
    
         Ha: diff < 0               Ha: diff != 0              Ha: diff > 0
           t =  -3.7341                t =  -3.7341              t =  -3.7341
       P < t =   0.0001          P > |t| =   0.0002          P > t =   0.9999
    
    hotel write, by(female) notable
    
    2-group Hotelling's T-squared = 13.943308
    F test statistic: ((200-1-1)/(200-2)(1)) x 13.943308 = 13.943308
    
    H0: Vectors of means are equal for the two groups
                  F(1,198) =   13.9433
           Prob > F(1,198) =    0.0002
    
    display sqrt(r(T2))
    3.7340739
    
    anova write prog
    
                               Number of obs =     200     R-squared     =  0.1776
                               Root MSE      = 8.63918     Adj R-squared =  0.1693
    
                      Source |  Partial SS    df       MS           F     Prob > F
                  -----------+----------------------------------------------------
                       Model |  3175.69786     2  1587.84893      21.27     0.0000
                             |
                        prog |  3175.69786     2  1587.84893      21.27     0.0000
                             |
                    Residual |  14703.1771   197   74.635417   
                  -----------+----------------------------------------------------
                       Total |   17878.875   199   89.843593   
    
    manova write = prog
    
                               Number of obs =     200
    
                               W = Wilks' lambda      L = Lawley-Hotelling trace
                               P = Pillai's trace     R = Roy's largest root
    
                      Source |  Statistic     df   F(df1,    df2) =   F   Prob>F
                  -----------+--------------------------------------------------
                        prog | W   0.8224      2     2.0   197.0    21.27 0.0000 e
                             | P   0.1776            2.0   197.0    21.27 0.0000 e
                             | L   0.2160            2.0   197.0    21.27 0.0000 e
                             | R   0.2160            2.0   197.0    21.27 0.0000 e
                             |--------------------------------------------------
                    Residual |               197
                  -----------+--------------------------------------------------
                       Total |               199
                  --------------------------------------------------------------
                               e = exact, a = approximate, u = upper bound on F
    
    regress write read female
    
          Source |       SS       df       MS              Number of obs =     200
    -------------+------------------------------           F(  2,   197) =   77.21
           Model |  7856.32118     2  3928.16059           Prob > F      =  0.0000
        Residual |  10022.5538   197  50.8759077           R-squared     =  0.4394
    -------------+------------------------------           Adj R-squared =  0.4337
           Total |   17878.875   199   89.843593           Root MSE      =  7.1327
    
    ------------------------------------------------------------------------------
           write |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
            read |   .5658869   .0493849    11.46   0.000      .468496    .6632778
          female |   5.486894   1.014261     5.41   0.000      3.48669    7.487098
           _cons |   20.22837   2.713756     7.45   0.000     14.87663    25.58011
    ------------------------------------------------------------------------------
    
    display sqrt(.4394192130387506) /* multiple correlation */
    
    .66288703
    
    mvreg write = read female
    
    Equation          Obs  Parms        RMSE    "R-sq"          F        P
    ----------------------------------------------------------------------
    write             200      3    7.132735    0.4394   77.21062   0.0000
    
    ------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    write        |
            read |   .5658869   .0493849    11.46   0.000      .468496    .6632778
          female |   5.486894   1.014261     5.41   0.000      3.48669    7.487098
           _cons |   20.22837   2.713756     7.45   0.000     14.87663    25.58011
    ------------------------------------------------------------------------------
    
    
    canon (write) (read female)
    
    Linear combinations for canonical correlation 1        Number of obs =     200
    ------------------------------------------------------------------------------
                 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
    u            |
           write |    .105501   .0084684    12.46   0.000     .0888016    .1222004
    -------------+----------------------------------------------------------------
    v            |
            read |    .090063   .0078598    11.46   0.000     .0745639    .1055622
          female |   .8732598   .1614235     5.41   0.000     .5549397     1.19158
    ------------------------------------------------------------------------------
                                         (Standard errors estimated conditionally)
    Canonical correlations:
      0.6629
      
    display .66288703^2  /* canonical correlation squared */
      
    .43941921


    Multivariate Course Page

    Phil Ender, 12jul07, 30sep05, 24jan05