Robust Regression Diagnostics of Influential Observations in Linear Regression Model

Affiliation(s)

^{1}
Department of Statistics, Ladoke Akintola University of Technology, Ogbomoso, Nigeria.

^{2}
2Department of Mathematics and Statistics, Lagos State Polytechnic, Ikorodu, Lagos, Nigeria.

ABSTRACT

In regression analysis, data sets often contain unusual observations called outliers. Detecting these unusual observations is an important aspect of model building in that they have to be diagnosed so as to ascertain whether they are influential or not. Different influential statistics including Cook’s Distance, Welsch-Kuh distance and DFBETAS have been proposed. Based on these influential statistics, the use of some robust estimators MM, Least trimmed square (LTS) and S is proposed and considered as alternative to influential statistics based on the robust estimator M and the ordinary least square (OLS). The statistics based on these estimators were applied into three set of data and the root mean square error (RMSE) was used as a criterion to compare the estimators. Generally, influential measures are mostly efficient with M or MM robust estimators.

In regression analysis, data sets often contain unusual observations called outliers. Detecting these unusual observations is an important aspect of model building in that they have to be diagnosed so as to ascertain whether they are influential or not. Different influential statistics including Cook’s Distance, Welsch-Kuh distance and DFBETAS have been proposed. Based on these influential statistics, the use of some robust estimators MM, Least trimmed square (LTS) and S is proposed and considered as alternative to influential statistics based on the robust estimator M and the ordinary least square (OLS). The statistics based on these estimators were applied into three set of data and the root mean square error (RMSE) was used as a criterion to compare the estimators. Generally, influential measures are mostly efficient with M or MM robust estimators.

Cite this paper

Ayinde, K. , Lukman, A. and Arowolo, O. (2015) Robust Regression Diagnostics of Influential Observations in Linear Regression Model.*Open Journal of Statistics*, **5**, 273-283. doi: 10.4236/ojs.2015.54029.

Ayinde, K. , Lukman, A. and Arowolo, O. (2015) Robust Regression Diagnostics of Influential Observations in Linear Regression Model.

References

[1] Barnett, V. and Lewis, T. (1994) Outliers in Statistical Data. New York, Wiley.

[2] Belsley, D.A., Kuh, E. and Welsch, R.E. (1980) Regression Diagnostics; Identifying Influence Data and Source of Collinearity. Wiley, New York. http://dx.doi.org/10.1002/0471725153

[3] Chatterjee, S. and Hadi, A.S. (1988) Sensitivity Analysis in Linear Regression. Wiley Series in Probability and Mathematical Statistics. Wiley, New York. http://dx.doi.org/10.1002/9780470316764

[4] Turkan, S., Meral, C.C. and Oniz, T. (2012) Outlier Detection by Regression Diagnostics Based on Robust Parameter Estimates. Hacettepe Journal of Mathematics and Statistics, 41, 147-155.

[5] Chen, C. (2002) Robust Regression and Outlier Detection with the ROBUSTREG Procedure. Proceedings of the Twenty-Seventh Annual SAS Users Group International Conference, SAS Institute Inc., Cary, NC.

[6] Gujarati, N.D. (2003) Basic Econometrics. 4th Edition, Tata McGraw-Hill, New Delhi, 748, 807

[7] Huber, P.J. (1973) Robust Regression: Asymptotics, Conjectures and Monte Carlo. Annals of Statistics, 1, 799-821. http://dx.doi.org/10.1214/aos/1176342503

[8] Rousseeuw, P.J. and Yohai, V. (1984) Robust Regression by Means of S Estimators in Robust and Nonlinear Time Series Analysis. In: Franke, J., Härdle, W. and Martin, R.D., Eds., Lecture Notes in Statistics, 26, Springer-Verlag, New York, 256-274.

[9] Rousseeuw, P.J. and Leroy, A.M. (1987) Robust Regression and Outlier Detection. Wiley Interscience, New York (Series in Applied Probability and Statistics), 329 pages. http://dx.doi.org/10.1002/0471725382

[10] Yohai, V.J. (1987) High Breakdown Point and High Efficiency Robust Estimates for Regression. Annals of Statistics, 15, 642-656. http://dx.doi.org/10.1214/aos/1176350366

[11] Rousseeuw, P.J. and van Driessen, K. (2006). Computing LTS Regression for Large Data Sets. Data Mining and Knowledge Discovery, 12, 29-45. http://dx.doi.org/10.1007/s10618-005-0024-4

[12] Cook, R.D. (1977) Detection of Influential Observations in Linear Regression. Technometrics, 19, 15-18. http://dx.doi.org/10.2307/1268249

[13] Michael, H.K., Christopher, J.N., John, N. and William L. (2005) Applied Linear Statistical Models. 5th Edition, New York, McGraw-Hill.

[1] Barnett, V. and Lewis, T. (1994) Outliers in Statistical Data. New York, Wiley.

[2] Belsley, D.A., Kuh, E. and Welsch, R.E. (1980) Regression Diagnostics; Identifying Influence Data and Source of Collinearity. Wiley, New York. http://dx.doi.org/10.1002/0471725153

[3] Chatterjee, S. and Hadi, A.S. (1988) Sensitivity Analysis in Linear Regression. Wiley Series in Probability and Mathematical Statistics. Wiley, New York. http://dx.doi.org/10.1002/9780470316764

[4] Turkan, S., Meral, C.C. and Oniz, T. (2012) Outlier Detection by Regression Diagnostics Based on Robust Parameter Estimates. Hacettepe Journal of Mathematics and Statistics, 41, 147-155.

[5] Chen, C. (2002) Robust Regression and Outlier Detection with the ROBUSTREG Procedure. Proceedings of the Twenty-Seventh Annual SAS Users Group International Conference, SAS Institute Inc., Cary, NC.

[6] Gujarati, N.D. (2003) Basic Econometrics. 4th Edition, Tata McGraw-Hill, New Delhi, 748, 807

[7] Huber, P.J. (1973) Robust Regression: Asymptotics, Conjectures and Monte Carlo. Annals of Statistics, 1, 799-821. http://dx.doi.org/10.1214/aos/1176342503

[8] Rousseeuw, P.J. and Yohai, V. (1984) Robust Regression by Means of S Estimators in Robust and Nonlinear Time Series Analysis. In: Franke, J., Härdle, W. and Martin, R.D., Eds., Lecture Notes in Statistics, 26, Springer-Verlag, New York, 256-274.

[9] Rousseeuw, P.J. and Leroy, A.M. (1987) Robust Regression and Outlier Detection. Wiley Interscience, New York (Series in Applied Probability and Statistics), 329 pages. http://dx.doi.org/10.1002/0471725382

[10] Yohai, V.J. (1987) High Breakdown Point and High Efficiency Robust Estimates for Regression. Annals of Statistics, 15, 642-656. http://dx.doi.org/10.1214/aos/1176350366

[11] Rousseeuw, P.J. and van Driessen, K. (2006). Computing LTS Regression for Large Data Sets. Data Mining and Knowledge Discovery, 12, 29-45. http://dx.doi.org/10.1007/s10618-005-0024-4

[12] Cook, R.D. (1977) Detection of Influential Observations in Linear Regression. Technometrics, 19, 15-18. http://dx.doi.org/10.2307/1268249

[13] Michael, H.K., Christopher, J.N., John, N. and William L. (2005) Applied Linear Statistical Models. 5th Edition, New York, McGraw-Hill.