Main Article Content
In a standard linear regression model the explanatory variables, , are considered to be fixed and hence assumed to be free from errors. But in reality, they are variables and consequently can be subjected to errors. In the regression literature there is a clear distinction between outlier in the - space or errors and the outlier in the X-space. The later one is popularly known as high leverage points. If the explanatory variables are subjected to gross error or any unusual pattern we call these observations as outliers in the - space or high leverage points. High leverage points often exert too much influence and consequently become responsible for misleading conclusion about the fitting of a regression model, causing multicollinearity problems, masking and/or swamping of outliers etc. Although a good number of works has been done on the identification of high leverage points in linear regression model, this is still a new and unsolved problem in linear functional relationship model. In this paper, we suggest a procedure for the identification of high leverage points based on deletion of a group of observations. The usefulness of the proposed method for the detection of multiple high leverage points is studied by some well-known data set and Monte Carlo simulations.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
- Abdullah, M. B. (1995). Detection of influential observations in functional errors-in- variables model. Communications in Statistics: Theory and Methods. 24:1585–1595.
- Bagheri, A., Habshah, M. and Imon, A.H.M.R. (2009). Two-step robust diagnostic method for identification of multiple high leverage points. Journal of Mathematics and Statistics. 5: 97–206.
- Chatterjee, S. and Hadi, A. S. (1988). Sensitivity Analysis in Linear Regression, Wiley, New York.
- Fuller, W.A. (1987). Measurement error models, Wiley, New York.
- Habshah, M., Norazan, R. and Imon, A.H.M.R. (2009). The performance of diagnostic-robust generalized potentials for the identification of multiple high leverage points in linear regression. Journal of Applied Statistics. 36: 507–520.
- Hadi, A.S. (1992). A new measure of overall potential influence in linear regression. Computational Statistics and Data Analysis. 14: 1-27.
- Hand, D. J., Daly, F., Lunn, A. D., McConway, K. J. and Ostrowski, E. (1994) A Handbook of Small Data Sets, Chapman and Hall, London.
- Hoaglin, D.C. and Welsch, R.E. (1978). The hat matrix in regression and ANOVA. The American Statistician. 32:17-22.
- Hocking, R.R. and Pendleton, O.J. (1983). The regression dilemma. Communications in Statistics-Theory and Methods. 12: 497-527.
- Huber, P.J. (1981). Robust Statistics. Wiley, New York.
- Imon, A.H.M.R. (2002). Identifying multiple high leverage points in linear regression. Journal of Statistical Studies. 3(Special Volume): 207–218.
- Imon, A.H.M.R. (2005). Identifying multiple influential observations in linear regression. Journal of Applied Statistics. 32: 929 – 946.
- Imon, A. H. M. R. (2009). Deletion residuals in the detection of heterogeneity of variances in linear regression. Journal of Applied Statistics. 36:347–358.
- Imon, A. H. M. R. and Khan, M.A.I. (2003a). A solution to the problem of multicollinearity caused by the presence of multiple high leverage points. International Journal of Statistical Sciences. 2:37–50.
- Imon, A.H.M.R. and Khan, M.A.I. (2003b). A comparative study on the identification of high leverage points in linear regression. Journal of Statistical Studies. 23: 27–32.
- Kamruzzaman, M. and Imon, A. H. M. R. (2002). High leverage point: Another source of multicollinearity. Pakistan Journal of Statistics.18: 435–448.
- Kendall, M.G. and Stuart, A. (1979). The Advance Theory of Statistics, Vol.2, Griffin, London.
- Mahdizadeh, M., and Zamanzade, E. (2020). Estimating asymptotic variance of M-estimators in ranked set sampling. Computational Statistics. https://doi.org/10.1007/s00180-019-00946-3
- Mahdizadeh, M., and Zamanzade, E. (2019). Efficient body fat estimation using multistage pair ranked set sampling. Statistical Methods in Medical Research. 28 (1): 223-234.
- Mahdizadeh, M. and Zamanzade, E. (2018). Smooth estimation of a reliability function in ranked set sampling, Statistics: A Journal of Theoretical and Applied Statistics. 52(4): 750-768.
- Peña, D.and Yohai, V. J. (1995). The detection of influential subsets in linear regression by using an influence matrix. Journal of the Royal Statistical Society Ser- B. 11(57): 18-44.
- Ryan, T.P. (1997). Modern Regression Methods, Wiley, New York.
- Rousseeuw, P.J. and Leroy, A. (1987). Robust Regression and Outlier Detection, Wiley, New York.
- Vellman, P.F. and Welsch, R.E. (1981). Efficient computing of regression diagnostics. The American Statistician. 35:234-42.
- Vidal, I., Iglesias, P. and Galea, M. (2007). Influential observations in the functional measurement error model. Journal of Applied Statistics. 34:1165-83.
- Wellman, M. J. and Gunst, R. F. (1991). Influence diagnostics for linear measurement error models. Biometrika. 78: 373–380.
- Zamanzade, E., and Mahdizadeh, M. (2020). Using ranked set sampling with extreme ranks in estimating the population proportion. Statistical Methods in Medical Research. 29 (1): 165-177.