Centering a variable involves subtracting the mean from each of the scores, that is, creating deviation scores. Centering can be done two ways; 1) centering using the grand mean and 2) centering using group means, which is also known as context centering.
Centering using the grand mean
We will illustrate issues surrounding centering using using the hsb2 dataset. We will begin by interpreting the constant in simple linear regression.
use http://www.philender.com/courses/data/hsbdemo, clear
summarize socst
Variable | Obs Mean Std. Dev. Min Max
-------------+-----------------------------------------------------
socst | 200 52.405 10.73579 26 71
generate csocst = socst - r(mean)
regress write socst
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 1, 198) = 114.19
Model | 6539.6427 1 6539.6427 Prob > F = 0.0000
Residual | 11339.2323 198 57.26885 R-squared = 0.3658
-------------+------------------------------ Adj R-squared = 0.3626
Total | 17878.875 199 89.843593 Root MSE = 7.5676
------------------------------------------------------------------------------
write | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
socst | .5339693 .0499688 10.69 0.000 .4354301 .6325086
_cons | 24.79234 2.672728 9.28 0.000 19.52167 30.063
------------------------------------------------------------------------------
regress write csocst
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 1, 198) = 114.19
Model | 6539.64271 1 6539.64271 Prob > F = 0.0000
Residual | 11339.2323 198 57.2688499 R-squared = 0.3658
-------------+------------------------------ Adj R-squared = 0.3626
Total | 17878.875 199 89.843593 Root MSE = 7.5676
------------------------------------------------------------------------------
write | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
csocst | .5339693 .0499688 10.69 0.000 .4354301 .6325086
_cons | 52.775 .5351114 98.62 0.000 51.71975 53.83025
------------------------------------------------------------------------------
summarize write
Variable | Obs Mean Std. Dev. Min Max
-------------+-----------------------------------------------------
write | 200 52.775 9.478586 31 67
Now, let's examine a model that includes an interaction.
regress write i.female##c.socst
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 3, 196) = 49.26
Model | 7685.43528 3 2561.81176 Prob > F = 0.0000
Residual | 10193.4397 196 52.0073455 R-squared = 0.4299
-------------+------------------------------ Adj R-squared = 0.4211
Total | 17878.875 199 89.843593 Root MSE = 7.2116
------------------------------------------------------------------------------
write | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.female | 15.00001 5.09795 2.94 0.004 4.946132 25.05389
socst | .6247968 .0670709 9.32 0.000 .4925236 .7570701
|
female#|
c.socst |
1 | -.2047288 .0953726 -2.15 0.033 -.3928171 -.0166405
|
_cons | 17.7619 3.554993 5.00 0.000 10.75095 24.77284
------------------------------------------------------------------------------
regress write i.female##c.csocst
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 3, 196) = 49.26
Model | 7685.43527 3 2561.81176 Prob > F = 0.0000
Residual | 10193.4397 196 52.0073456 R-squared = 0.4299
-------------+------------------------------ Adj R-squared = 0.4211
Total | 17878.875 199 89.843593 Root MSE = 7.2116
------------------------------------------------------------------------------
write | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.female | 4.271196 1.025448 4.17 0.000 2.248868 6.293523
csocst | .6247968 .0670709 9.32 0.000 .4925236 .7570701
|
female#|
c.csocst |
1 | -.2047288 .0953726 -2.15 0.033 -.3928171 -.0166405
|
_cons | 50.50437 .7571024 66.71 0.000 49.01126 51.99749
------------------------------------------------------------------------------
generate fxss = female*socst
generate fxcs = female*csocst
collin female socst fxss
Collinearity Diagnostics
SQRT Cond R-
Variable VIF VIF Tolerance Eigenval Index Squared
------------------------------------------------------------------------
female 24.78 4.98 0.0403 2.0054 1.0000 0.9597
socst 1.98 1.41 0.5041 0.9752 1.4340 0.4959
fxss 26.27 5.13 0.0381 0.0194 10.1638 0.9619
------------------------------------------------------------------------
Mean VIF 17.68 Condition Number 10.1638
Determinant of correlation matrix 0.0380
Cond
Eigenval Index
---------------------------------
1 3.4495 1.0000
2 0.5122 2.5950
3 0.0325 10.2981
4 0.0057 24.5654
---------------------------------
Condition Number 24.5654
Eigenvalues & Cond Index computed from scaled raw sscp (w/ intercept)
Det(correlation matrix) 0.0380
collin female csocst fxcs
Collinearity Diagnostics
SQRT Cond R-
Variable VIF VIF Tolerance Eigenval Index Squared
------------------------------------------------------------------------
female 1.00 1.00 0.9972 1.7089 1.0000 0.0028
csocst 1.98 1.41 0.5041 0.9950 1.3105 0.4959
fxcs 1.98 1.41 0.5049 0.2961 2.4024 0.4951
------------------------------------------------------------------------
Mean VIF 1.66 Condition Number 2.4024
Determinant of correlation matrix 0.5035
Cond
Eigenval Index
---------------------------------
1 1.7849 1.0000
2 1.6574 1.0378
3 0.2996 2.4410
4 0.2581 2.6295
---------------------------------
Condition Number 2.6295
Eigenvalues & Cond Index computed from scaled raw sscp (w/ intercept)
Det(correlation matrix) 0.5035
Next, let's examine a polynomial regression.
summarize write
Variable | Obs Mean Std. Dev. Min Max
-------------+-----------------------------------------------------
write | 200 52.775 9.478586 31 67
generate cwrite = write - r(mean)
generate write2 = write^2
generate cwrite2 = cwrite^2
regress math write
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 1, 198) = 122.00
Model | 6658.72246 1 6658.72246 Prob > F = 0.0000
Residual | 10807.0725 198 54.5811744 R-squared = 0.3812
-------------+------------------------------ Adj R-squared = 0.3781
Total | 17465.795 199 87.7678141 Root MSE = 7.3879
------------------------------------------------------------------------------
math | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
write | .6102747 .0552524 11.05 0.000 .501316 .7192334
_cons | 20.43775 2.962373 6.90 0.000 14.5959 26.2796
------------------------------------------------------------------------------
regress math cwrite
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 1, 198) = 122.00
Model | 6658.72254 1 6658.72254 Prob > F = 0.0000
Residual | 10807.0725 198 54.581174 R-squared = 0.3812
-------------+------------------------------ Adj R-squared = 0.3781
Total | 17465.795 199 87.7678141 Root MSE = 7.3879
------------------------------------------------------------------------------
math | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cwrite | .6102747 .0552524 11.05 0.000 .501316 .7192334
_cons | 52.645 .5224039 100.77 0.000 51.61481 53.67519
------------------------------------------------------------------------------
regress math c.write##c.write
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 2, 197) = 70.23
Model | 7269.48676 2 3634.74338 Prob > F = 0.0000
Residual | 10196.3082 197 51.7579098 R-squared = 0.4162
-------------+------------------------------ Adj R-squared = 0.4103
Total | 17465.795 199 87.7678141 Root MSE = 7.1943
------------------------------------------------------------------------------
math | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
write | -1.35518 .5746805 -2.36 0.019 -2.488496 -.221865
|
c.write#|
c.write | .0194548 .0056634 3.44 0.001 .0082861 .0306235
|
_cons | 68.23992 14.21137 4.80 0.000 40.21397 96.26587
------------------------------------------------------------------------------
regress math c.cwrite##c.cwrite
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 2, 197) = 70.23
Model | 7269.48677 2 3634.74339 Prob > F = 0.0000
Residual | 10196.3082 197 51.7579098 R-squared = 0.4162
-------------+------------------------------ Adj R-squared = 0.4103
Total | 17465.795 199 87.7678141 Root MSE = 7.1943
------------------------------------------------------------------------------
math | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cwrite | .6982757 .0595918 11.72 0.000 .580756 .8157955
|
c.cwrite#|
c.cwrite | .0194548 .0056634 3.44 0.001 .0082861 .0306235
|
_cons | 50.90585 .7177094 70.93 0.000 49.49047 52.32123
------------------------------------------------------------------------------
collin write write2
Collinearity Diagnostics
SQRT Cond R-
Variable VIF VIF Tolerance Eigenval Index Squared
------------------------------------------------------------------------
write 114.08 10.68 0.0088 1.9956 1.0000 0.9912
write2 114.08 10.68 0.0088 0.0044 21.3149 0.9912
------------------------------------------------------------------------
Mean VIF 114.08 Condition Number 21.3149
Determinant of correlation matrix 0.0088
Cond
Eigenval Index
---------------------------------
1 2.9482 1.0000
2 0.0516 7.5593
3 0.0002 128.1167
---------------------------------
Condition Number 128.1167
Eigenvalues & Cond Index computed from scaled raw sscp (w/ intercept)
Det(correlation matrix) 0.0088
collin cwrite cwrite2
Collinearity Diagnostics
SQRT Cond R-
Variable VIF VIF Tolerance Eigenval Index Squared
------------------------------------------------------------------------
cwrite 1.23 1.11 0.8152 1.4299 1.0000 0.1848
cwrite2 1.23 1.11 0.8152 0.5701 1.5837 0.1848
------------------------------------------------------------------------
Mean VIF 1.23 Condition Number 1.5837
Determinant of correlation matrix 0.8152
Cond
Eigenval Index
---------------------------------
1 1.7409 1.0000
2 1.0000 1.3194
3 0.2591 2.5922
---------------------------------
Condition Number 2.5922
Eigenvalues & Cond Index computed from scaled raw sscp (w/ intercept)
Det(correlation matrix) 0.8152
Centering scores is a technique that is recommended by some (Aiken & West, 1991;
Bryk & Raudenbush, 1991)
and viewed as unnecessary by others (Kromrey & Foster-Johnson, 1998; Pedhazur, 1997).
Katrichis (1992) views centering negatively and has argued that this technique produces
systematically biased estimates of main effects. The arguments in favor of centering revolve primarily around 1) the greater ease of interpreting the coefficients and 2) reducing collinearity. As to reducing collinearity, modern statistical packages have sufficient numerical accuracy to estimate parameters for product and power variables.
Centering using group means
In this section we will center the socst variable using the means group means for males and females.
egen grmean = mean(socst), by(female)
generate grcss = socst - grmean
tabstat write, by(female) stat(n mean)
Summary for variables: write
by categories of: female
female | N mean
-------+--------------------
male | 91 50.12088
female | 109 54.99083
-------+--------------------
Total | 200 52.775
----------------------------
regress write socst
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 1, 198) = 114.19
Model | 6539.6427 1 6539.6427 Prob > F = 0.0000
Residual | 11339.2323 198 57.26885 R-squared = 0.3658
-------------+------------------------------ Adj R-squared = 0.3626
Total | 17878.875 199 89.843593 Root MSE = 7.5676
------------------------------------------------------------------------------
write | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
socst | .5339693 .0499688 10.69 0.000 .4354301 .6325086
_cons | 24.79234 2.672728 9.28 0.000 19.52167 30.063
------------------------------------------------------------------------------
regress write grcss
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 1, 198) = 106.93
Model | 6269.57313 1 6269.57313 Prob > F = 0.0000
Residual | 11609.3019 198 58.6328377 R-squared = 0.3507
-------------+------------------------------ Adj R-squared = 0.3474
Total | 17878.875 199 89.843593 Root MSE = 7.6572
------------------------------------------------------------------------------
write | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
grcss | .5235458 .0506298 10.34 0.000 .423703 .6233886
_cons | 52.775 .5414464 97.47 0.000 51.70726 53.84274
------------------------------------------------------------------------------In this model, an individual receives a predicted write score of 52.777 if they score at their group mean. For each point increase in the grcss the predicted write score increases by .52.
regress write grcss female
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 2, 197) = 70.30
Model | 7445.78654 2 3722.89327 Prob > F = 0.0000
Residual | 10433.0885 197 52.9598399 R-squared = 0.4165
-------------+------------------------------ Adj R-squared = 0.4105
Total | 17878.875 199 89.843593 Root MSE = 7.2774
------------------------------------------------------------------------------
write | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
grcss | .5235458 .0481182 10.88 0.000 .428653 .6184386
female | 4.869946 1.033367 4.71 0.000 2.832065 6.907826
_cons | 50.12088 .7628737 65.70 0.000 48.61643 51.62533
------------------------------------------------------------------------------Here the predicted score for males at their group mean is 50.12. For females, at their group mean the predicted score is 54.99 (50.12 + 4.87). The regression lines for both males and females are the same, .52.
regress write c.grcss##i.female
Source | SS df MS Number of obs = 200
-------------+------------------------------ F( 3, 196) = 49.26
Model | 7685.43528 3 2561.81176 Prob > F = 0.0000
Residual | 10193.4397 196 52.0073455 R-squared = 0.4299
-------------+------------------------------ Adj R-squared = 0.4211
Total | 17878.875 199 89.843593 Root MSE = 7.2116
------------------------------------------------------------------------------
write | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
grcss | .6247968 .0670709 9.32 0.000 .4925236 .7570701
1.female | 4.869946 1.024032 4.76 0.000 2.85041 6.889481
|
female#|
c.grcss |
1 | -.2047288 .0953726 -2.15 0.033 -.3928171 -.0166405
|
_cons | 50.12088 .7559823 66.30 0.000 48.62998 51.61178
------------------------------------------------------------------------------Again, the predicted score for males at their group mean is 50.12, and for females, at their group mean the predicted score is 54.99 (50.12 + 4.87). The slope for males is .62 and the slope for females is .42 (.62 - .20)
Linear Statistical Models Course
Phil Ender, 18feb02, 22dec00