Today's Question:
What are the consequences of heteroscedasticity and multicollinearity in regression? What are the possible remedies?
To my understanding, heteroscedasticity is a collection of random variables is heteroscedastic if there are sub-populations that have different variabilities from others. Here "variability" could be quantified by the variance or any other measure of statistical dispersion. In simple understanding, Heteroscedasticity means unequal scatter.
Multicollinearity is a state of very high intercorrelations or inter-associations among the independent variables.
We Can detect the Multicollinearity and heteroscedasticity by plotting them or statistical tests.
Several problems created by heteroscedasticity and Multicollinearity,
- The standard Error likely to increase and proper results won’t be estimated properly.
- Multicollinearity makes it difficult to assess the importance of an independent variable in explaining the variation caused by the dependent variable.
How to solve them (Remedies):
- You should identify the source of the non-constant variance to resolve the problem. Pick a variable that has a large range.
- Re-Build the data with the new predictor.
- Remove highly correlated predictors from the model.
I have attached the sample Plotting of heteroscedasticity and Multicollinearity.
Multicollinearity example with the Matrix plot:
Heteroscedasticity Example:
Thanks! and Happy DataCrushing <3
Comments
Post a Comment