Main Article Content

Abstract

Cluster analysis is applied to group data so that samples within the same group are similar. A common problem with multivariate data implementation is that the data differs significantly from most of the other data. Outliers can significantly impact data analysis and model performance, making their detection crucial in various domains. This study presents an investigation of the outlier detection method using multiple linear regression for grouped multivariate data. The research compares the performance of the proposed method with two existing approaches, namely the Caroni and Billor (2007) method and the Hardin and Rocke (2004) method. In the case of uncontaminated data, the proposed method demonstrates a high percentage of detected outliers as the number of variables and sample size increase, indicating its effectiveness in outlier identification. In the scenario of contaminated data, the results reveal that the proposed method consistently outperforms both the Caroni and Billor method and the Hardin and Rocke method in terms of accuracy and precision. These findings highlight the effectiveness of the proposed method for outlier detection in grouped multivariate data. The study contributes to the existing knowledge of outlier detection approaches and provides insights into their performance under different data conditions. Researchers and practitioners can benefit from these findings when selecting appropriate outlier detection methods for various applications.

Keywords

Multivariate data Multiple linear regression Outliers Cluster analysis

Article Details

Author Biography

Wuttichai Srisodaphol, Khon Kaen University

Department of Statistics
How to Cite
Phuttisen , S., & Srisodaphol, W. (2024). Detection of Outliers Method in Grouped Multivariate Data: A Method Based on Multiple Linear Regression. Pakistan Journal of Statistics and Operation Research, 20(3), 445-453. https://doi.org/10.18187/pjsor.v20i3.4575