Variance Inflation Factor (VIF)

By Prachi Sinha and Paul SoreneEdited by Vanessa Kintu
Variance Inflation Factor (VIF)

Regression analysis is a statistical method used to find a relationship between a dependent variable and two or more independent variables. It’s under such an analysis that the variance inflation factor (VIF) can be found. 

VIF reflects multicollinearity in a regression model. 

The meaning of variance inflation factor stems from the correlation between independent variables within a regression model. This correlation leads to faulty results, considering the regression coefficient will be inflated as a result of multicollinearity between the underlying factors. Multicollinearity occurs when independent variables in a regression model are correlated.

VIF comes into play in order to estimate the magnitude of this inflated variance. The factor is indicative of the enhanced variance of a regression coefficient that exists due to overlapping association between the model’s independent variables.

What is VIF used for?

In statistics, variance measures variability in a data set. It reflects the spread across the data from the data mean (the average). The wider the spread, the larger the variance is in relation to the mean. In regression analysis, VIF is used for estimating how much the coefficient is inflated or influenced as a result of independent variables within the analysis.

VIF Formula

VIF is stated in numbers. A VIF of 1.5 means that the variance is 50% higher than what could be expected if there was no multicollinearity between the independent variables. As a general rule of thumb, if the VIF is more than 5, the regression analysis is said to be highly correlated. 

Please note, a higher VIF would mean less reliance on the regression results, as they would be skewed because of the multicollinearity. This is because any small change in the data could lead to larger changes in the estimated coefficients due to the correlation between the independent variables.

VIF is calculated by the following formula:

 

‘Ri2’ represents the unadjusted coefficient of determination for regressing the independent variable (i) on the remaining variables. The reciprocal of VIF is reflected by ‘Tolerance’, which can also be used to calculate the VIF.