Linear Statistical Models: Regression

Multiple Linear Regression


Example

use http://www.philender/com/courses/data/hsbdemo, clear

regress write read math science

      Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  3,   196) =   57.30
       Model |  8353.98999     3  2784.66333           Prob > F      =  0.0000
    Residual |  9524.88501   196  48.5963521           R-squared     =  0.4673
-------------+------------------------------           Adj R-squared =  0.4591
       Total |   17878.875   199   89.843593           Root MSE      =  6.9711

------------------------------------------------------------------------------
       write |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        read |   .2356606   .0691053     3.41   0.001     .0993751    .3719461
        math |   .3194791   .0756752     4.22   0.000     .1702369    .4687213
     science |   .2016571   .0690962     2.92   0.004     .0653896    .3379246
       _cons |   13.19155   3.068867     4.30   0.000     7.139308    19.24378
------------------------------------------------------------------------------
Interpretation

_cons = 13.19155 -- The predicted value when all of the predictors equal zero.

_b[read] = .2356606 -- For every one unit increase in read, the predicted value for write increases by .2356606 when all other variable in the model are held constant.

_b[math] = .3194791 -- For every one unit increase in math, the predicted value for write increases by .3194791 when all other variable in the model are held constant.

_b[science] = .2016571 -- For every one unit increase in science, the predicted value for write increases by .2016571 when all other variable in the model are held constant.

Conditional Expectation

In the multiple regression model we can write the conditional expectation as E(y | x1, x2), which indicates that we are interested in in the effect of variable x1 on the expected value of y while holding the variable x2 constant.

Regression Equation

Prediction Equation

The Two Predictor Case

Squared Multiple Correlation

  • When r12 = 0

  • When r12 does not equal 0

    Regression Coefficients

    Sums of Squares

    Raw Regression Coefficient vs Standardized Regression Coefficient

    b vs β

  • Use b with raw scores.
  • Use β with standard scores.

    Note

  • When r12 = 0 then β1 = ry1 & β2 = ry2

    Prediction Equation in Standardized Form

    Beta Coefficients

    More on Betas

    More on Squared Multiple Correlations

    Even More Squared Multiple Correlation

    Variance of Estimate/Standard Error of Estimate

    The variance of estimate is also called the mean square error in the ANOVA summary table of the regression analysis.

    The standard error of estimate gives an indicatin of how far, on the average, observations fall from the regression line.

    Testing the Model

    The Overall F-test

  • Tests R2 equal to zero.
  • Tests the regression equation.
  • Tests that all b's are simultaneously zero.

    Interpreting Regression Coefficients

  • The regression coefficient for variable j indicates how much change there will be in the predicted score when there is a one-unit change in the in variable j with all of the other variables in the model held constant.

    Interpreting Standardized Regression Coefficients (Betas)

  • The standardized regression coefficient for variable j indicates how many standard deviations change there will be in the predicted score when there is a one standard deviation change in the in variable j with all of the other variables in the model held constant.

    Tests of Regression Coefficients

  • Tests b1 = 0 when all other variables in the equation are held constant.

    About Tests of Regression Coefficients

  • Tests a single coefficient with all the others in the regression equation held constant.
  • The larger the r12 the larger the standard error of b, and thus, the lower the power of the t-test.

    Note:

  • R2y.123 = R2y.321
  • When independent variables are correlated, the incremental proportion of variance accounted for by a single variable depends on, among other things, when the variable enters into the regression equation.
  • Further, assuming all variables are positively correlated, the later the entry point of the variable, the smaller the incremental proportion of variance accounted for.

    Comparing Variables

  • Due to different scales of measurement, b's cannot, generally, be used to compare variables.
  • Care must be taken in using β's to compare variables, since β's are affected by, among other things, the variability of the variables with which they are associated.

    Interpreting R2

    R2 has several interpretations:

  • R2 is the proportion of variance accounted for by the whole model.
  • R2 is the ratio of the model sum of squares to the total sum of squares.
  • R2 is a transformation of the F-ratio for the whole model.
                       R2/k
             F = ----------------
                 (1 - R2)/(N-k-1)
  • R2 is the Pearson correlation squared between the response variable and the predicted value.


    Linear Statistical Models Course

    Phil Ender, 29Jan98