Main Article Content
Abstract
Cluster analysis is applied to group data so that samples within the same group are similar. A common problem with multivariate data implementation is that the data differs significantly from most of the other data. Outliers can significantly impact data analysis and model performance, making their detection crucial in various domains. This study presents an investigation of the outlier detection method using multiple linear regression for grouped multivariate data. The research compares the performance of the proposed method with two existing approaches, namely the Caroni and Billor (2007) method and the Hardin and Rocke (2004) method. In the case of uncontaminated data, the proposed method demonstrates a high percentage of detected outliers as the number of variables and sample size increase, indicating its effectiveness in outlier identification. In the scenario of contaminated data, the results reveal that the proposed method consistently outperforms both the Caroni and Billor method and the Hardin and Rocke method in terms of accuracy and precision. These findings highlight the effectiveness of the proposed method for outlier detection in grouped multivariate data. The study contributes to the existing knowledge of outlier detection approaches and provides insights into their performance under different data conditions. Researchers and practitioners can benefit from these findings when selecting appropriate outlier detection methods for various applications.
Keywords
Article Details
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following License
CC BY: This license allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.