### Collinearity Issues

First Thoughts

Many students and researchers are familiar with collinearity issues through the study of OLS regression. But concerns about collinearity are common to many types of statistical models including categorical and count models. Here are some first thoughts on the matter:

• Certainly, modern statistical software packages are capable of analyzing data with correlated independent variables.
• However, problems can arise from situations in which two or more prredictor variables are highly intercorrelated.
• No consensus about meaning of collinearity -
• Is it any degree of correlation? or
• Is it a matter of a high degree of intercorrelation?
• What constitutes a high degree of intercorrelation?
Simple Collinearity

• When two variables are highly correlated.
• Can be detected by looking at the zero order correlations.
• Usually, correlations in the .9's.

Multicollinearity

• Involves combinations of more than two variables.
• Variables that are uncorrelated are said to be orthogonal.
• Computation of regression coefficients involves inverting a matrix. If one variable is a perfect linear combination of two or more other variables then the inverse cannot be computed and the matrix is said to be singular.
• Example: sat total = sat verbal + sat math
• In matrix terms, a linear dependency exists when a row (or column) of a matrix can be obtained as a linear combination of other rows (or columns).

Common Indicators of Collinearity

• VIF -- variance inflation factor
• VIF values are large
• individual VIF greater than 10 should be inspected
• average VIF greater than 6
• tolerance
• tolerance values are small, close to zero
• tolerance less than .1
• tolerance = 1/VIF
Other Indicators of Collinearity

• Condition index -- large values
• Condition number -- large values
• Eigenvalues -- small values, close to zero
• Determinant of correlation matrix -- very small, close to zero
• Diagonal of R-1 (inverse of correlation matrix) -- large values, values close to one are good

Effects of Collinearity

• Imprecise estimates of regression coefficients.
• Slight fluctuations in correlation may lead to large differences in regression coefficients.
• Adding or dropping cases may lead to large differences in regression coefficients.
• Increases the standard error of coefficients, thus reducing tests of significance.

Checking for Collinearity in Stata

• Use the vif command after the regress command. See Stata example -->
• Also, the collin program which can be downloaded from UCLA ATS over the Internet

Stata Example Using -collin-

Most statistical software packages have options associated with their regression programs that are designed to check for collinearity problems. But since collinearity is a property of the set of predictor variables, it is not necessary to run regression in order to check for high collinearity. The -collin- command (findit collin) will compute a number of collinearity diagnostics.

```use http://www.ats.ucla.edu/stat/data/hsbdemo, clear

collin female schtyp read write math science socst

Collinearity Diagnostics

SQRT                   R-
Variable      VIF     VIF    Tolerance    Squared
----------------------------------------------------
female      1.25    1.12    0.8027      0.1973
schtyp      1.02    1.01    0.9819      0.0181
write      2.52    1.59    0.3962      0.6038
math      2.28    1.51    0.4378      0.5622
science      2.12    1.46    0.4717      0.5283
socst      1.91    1.38    0.5224      0.4776
----------------------------------------------------
Mean VIF      1.94

Cond
Eigenval          Index
---------------------------------
1     3.4004          1.0000
2     1.1347          1.7311
3     0.9782          1.8644
4     0.5229          2.5502
5     0.3577          3.0831
6     0.3299          3.2104
7     0.2762          3.5087
---------------------------------
Condition Number         3.5087
Eigenvalues & Cond Index computed from deviation sscp (no intercept)
Det(correlation matrix)    0.0643

use http://www.philender.com/courses/data/lahigh, clear

collin mathnce langnce mathpr langpr

Collinearity Diagnostics

SQRT                   R-
Variable      VIF     VIF    Tolerance    Squared
----------------------------------------------------
mathnce     24.20    4.92    0.0413      0.9587
langnce     28.31    5.32    0.0353      0.9647
mathpr     25.02    5.00    0.0400      0.9600
langpr     29.09    5.39    0.0344      0.9656
----------------------------------------------------
Mean VIF     26.65

Cond
Eigenval          Index
---------------------------------
1     3.3643          1.0000
2     0.5926          2.3827
3     0.0287         10.8179
4     0.0143         15.3294
---------------------------------
Condition Number        15.3294
Eigenvalues & Cond Index computed from deviation sscp (no intercept)
Det(correlation matrix)    0.0008

collin mathnce langnce

Collinearity Diagnostics

SQRT                   R-
Variable      VIF     VIF    Tolerance    Squared
----------------------------------------------------
mathnce      1.90    1.38    0.5256      0.4744
langnce      1.90    1.38    0.5256      0.4744
----------------------------------------------------
Mean VIF      1.90

Cond
Eigenval          Index
---------------------------------
1     1.6888          1.0000
2     0.3112          2.3295
---------------------------------
Condition Number         2.3295
Eigenvalues & Cond Index computed from deviation sscp (no intercept)
Det(correlation matrix)    0.5256```

Computational Examples

The following computational examples show some of the effects of high collinearity on standardized regression coefficients.

Example A

```     1   2    3    Y
1   -  .20  .20  .50
2       -   .10  .50
3            -   .50
Y                 -

R2 = .56373   Det = .918
Beta   Std Err     F
1   .34314  .07001  24.025
2   .39216  .06894  32.360
3   .39216  .06894  32.360
```

Example B

```     1   2    3    Y
1   -  .20  .20  .50
2       -   .85  .50
3            -   .50
Y                 -

R2 = .43079   Det = .2655
Beta   Std Err     F
1   .40960  .07872   27.073
2   .22599  .14642    2.382
3   .22599  .14642    2.382
```

Example C

```     1   2    3    Y
1   -  .20  .20  .50
2       -   .10  .50
3            -   .52
Y                 -

R2 = .57983
Beta   Std Err     F
1   .33922  .06870  24.378
2   .39085  .06765  33.376
3   .41307  .06765  37.279
```

Example D

```     1   2    3    Y
1   -  .20  .20  .50
2       -   .85  .50
3            -   .52
Y                 -

R2 = .44128
Beta   Std Err     F
1   .40734  .07799   27.277
2   .16497  .14507    1.293
3   .29831  .14507    4.229
```

Remedies

• Delete variables - may cause specification errors.