Identification of High Leverage Points in Linear Functional Relationship Model

Abu Sayed Md. Al Mamun; A.H.M. R. Imon; A. G. Hussin; Y. Z. Zubairi; Sohel Rana

doi:10.18187/pjsor.v16i3.2620

Download

PDF

Statistic

Read Counter : 734 Download : 589

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Abstract

In a standard linear regression model the explanatory variables, , are considered to be fixed and hence assumed to be free from errors. But in reality, they are variables and consequently can be subjected to errors. In the regression literature there is a clear distinction between outlier in the - space or errors and the outlier in the X-space. The later one is popularly known as high leverage points. If the explanatory variables are subjected to gross error or any unusual pattern we call these observations as outliers in the - space or high leverage points. High leverage points often exert too much influence and consequently become responsible for misleading conclusion about the fitting of a regression model, causing multicollinearity problems, masking and/or swamping of outliers etc. Although a good number of works has been done on the identification of high leverage points in linear regression model, this is still a new and unsolved problem in linear functional relationship model. In this paper, we suggest a procedure for the identification of high leverage points based on deletion of a group of observations. The usefulness of the proposed method for the detection of multiple high leverage points is studied by some well-known data set and Monte Carlo simulations.

Keywords

Errors in variable Leverages Masking Swamping Monte Carlo simulation

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors who publish with this journal agree to the following License

CC BY: This license allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.

Author Biographies

Abu Sayed Md. Al Mamun, University of Rajshahi

Associate Professor, Department of Satistiscs,Â

Univesity of Rajshahi, Rajshahi-6205, Bangladesh.

A.H.M. R. Imon, Department of Mathematical Sciences, Ball State University, Muncie, IN 47306, USA.

Professor,Â Department of Mathematical Sciences, Ball State University, Muncie, IN 47306, USA.

A. G. Hussin, Faculty of Science and Defence Technology, National Defence University of Malaysia, Kuala Lumpur, Malaysia.

Professor, Faculty of Science and Defence Technology, National Defence University of Malaysia,
Kuala Lumpur, Malaysia.

Y. Z. Zubairi, Mathematics Division, Centre for Foundation Studies in Science, University of Malaya, Kuala Lumpur, Malaysia.

Associate Professor,Mathematics Division, Centre for Foundation Studies in Science, University of Malaya,
Kuala Lumpur, Malaysia.

Sohel Rana, Department of Applied Statistics, East West University, Dhaka, Bangladesh.

Associate Professor,Department of Applied Statistics, East West University, Dhaka, Bangladesh.

How to Cite

Md. Al Mamun, A. S., Imon, A. R., Hussin, A. G., Zubairi, Y. Z., & Rana, S. (2020). Identification of High Leverage Points in Linear Functional Relationship Model. Pakistan Journal of Statistics and Operation Research, 16(3), 491-500. https://doi.org/10.18187/pjsor.v16i3.2620

References

Abdullah, M. B. (1995). Detection of influential observations in functional errors-in- variables model. Communications in Statistics: Theory and Methods. 24:1585-1595.
Bagheri, A., Habshah, M. and Imon, A.H.M.R. (2009). Two-step robust diagnostic method for identification of multiple high leverage points. Journal of Mathematics and Statistics. 5: 97-206.
Chatterjee, S. and Hadi, A. S. (1988). Sensitivity Analysis in Linear Regression, Wiley, New York.
Fuller, W.A. (1987). Measurement error models, Wiley, New York.
Habshah, M., Norazan, R. and Imon, A.H.M.R. (2009). The performance of diagnostic-robust generalized potentials for the identification of multiple high leverage points in linear regression. Journal of Applied Statistics. 36: 507-520.
Hadi, A.S. (1992). A new measure of overall potential influence in linear regression. Computational Statistics and Data Analysis. 14: 1-27.
Hand, D. J., Daly, F., Lunn, A. D., McConway, K. J. and Ostrowski, E. (1994) A Handbook of Small Data Sets, Chapman and Hall, London.
Hoaglin, D.C. and Welsch, R.E. (1978). The hat matrix in regression and ANOVA. The American Statistician. 32:17-22.
Hocking, R.R. and Pendleton, O.J. (1983). The regression dilemma. Communications in Statistics-Theory and Methods. 12: 497-527.
Huber, P.J. (1981). Robust Statistics. Wiley, New York.
Imon, A.H.M.R. (2002). Identifying multiple high leverage points in linear regression. Journal of Statistical Studies. 3(Special Volume): 207-218.
Imon, A.H.M.R. (2005). Identifying multiple influential observations in linear regression. Journal of Applied Statistics. 32: 929 - 946.
Imon, A. H. M. R. (2009). Deletion residuals in the detection of heterogeneity of variances in linear regression. Journal of Applied Statistics. 36:347-358.
Imon, A. H. M. R. and Khan, M.A.I. (2003a). A solution to the problem of multicollinearity caused by the presence of multiple high leverage points. International Journal of Statistical Sciences. 2:37-50.
Imon, A.H.M.R. and Khan, M.A.I. (2003b). A comparative study on the identification of high leverage points in linear regression. Journal of Statistical Studies. 23: 27-32.
Kamruzzaman, M. and Imon, A. H. M. R. (2002). High leverage point: Another source of multicollinearity. Pakistan Journal of Statistics.18: 435-448.
Kendall, M.G. and Stuart, A. (1979). The Advance Theory of Statistics, Vol.2, Griffin, London.
Mahdizadeh, M., and Zamanzade, E. (2020). Estimating asymptotic variance of M-estimators in ranked set sampling. Computational Statistics. https://doi.org/10.1007/s00180-019-00946-3
Mahdizadeh, M., and Zamanzade, E. (2019). Efficient body fat estimation using multistage pair ranked set sampling. Statistical Methods in Medical Research. 28 (1): 223-234.
Mahdizadeh, M. and Zamanzade, E. (2018). Smooth estimation of a reliability function in ranked set sampling, Statistics: A Journal of Theoretical and Applied Statistics. 52(4): 750-768.
PeÃ±a, D.and Yohai, V. J. (1995). The detection of influential subsets in linear regression by using an influence matrix. Journal of the Royal Statistical Society Ser- B. 11(57): 18-44.
Ryan, T.P. (1997). Modern Regression Methods, Wiley, New York.
Rousseeuw, P.J. and Leroy, A. (1987). Robust Regression and Outlier Detection, Wiley, New York.
Vellman, P.F. and Welsch, R.E. (1981). Efficient computing of regression diagnostics. The American Statistician. 35:234-42.
Vidal, I., Iglesias, P. and Galea, M. (2007). Influential observations in the functional measurement error model. Journal of Applied Statistics. 34:1165-83.
Wellman, M. J. and Gunst, R. F. (1991). Influence diagnostics for linear measurement error models. Biometrika. 78: 373-380.
Zamanzade, E., and Mahdizadeh, M. (2020). Using ranked set sampling with extreme ranks in estimating the population proportion. Statistical Methods in Medical Research. 29 (1): 165-177.