Main Article Content
Abstract
The procedure of outliers detection in univariate circular data can be developed using clustering algorithm. In clustering, it is necessary to calculate the similarity measure in order to cluster the observations into their own group. The similarity measure in circular data can be determined by calculating circular distance between each point of angular observation. In this paper, clustering-based procedure for outlier detection in univariate circular biological data with different similarity distance measures will be developed and the performance will be investigated. Three different circular similarity distance measures are used for the outliers detection procedure using single-linkage clustering algorithm. However, there are two similarity measures namely Satari distance and Di distance that are found to have similarity in formula for univariate circular data. The aim of this study is to develop and demonstrate the effectiveness of proposed clustering-based procedure with different similarity distance measure in detecting outliers. Therefore, in this study the circular similarity distance of SL-Satari/Di and another similarity measure namely SL-Chang will be compared at certain cutting rule. It is found that clustering-based procedure using single-linkage algorithm with different similarity distances are applicable and promising approach for outlier detection in univariate circular data, particularly for biological data. The result also found that at a certain condition of data, the SL-Satari/Di distance seems to overperform the performance of SL-Chang distance.
Keywords
Article Details

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following License
CC BY: This license allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
Funding data
-
Ministry of Higher Education, Malaysia
Grant numbers FRGS/1/2019/STG06/UMP/02/6 -
Universiti Malaysia Pahang
Grant numbers RDU190363;RDU1901168;PGRS210328
References
- Abuzaid, A. H. (2012). Analysis of Mother’s Day celebration via circular statistics. The Philippine Statistician, 61(2), 39–52.
- Abuzaid, A. H. (2013). On the Influential Points in the Functional Circular Relationship Models. Pakistan Journal of Statistics and Operation Research, 9(3), 333–342.
- Abuzaid, A. H. (2020). Identifying density-based local outliers in medical multivariate circular data. Statistics in Medicine, 1–6.
- Abuzaid, A. H., Hussin, A. G., Rambli, A., & Mohamed, I. (2012). Statistics for a New Test of Discordance in Circular Data. Communications in Statistics—Simulation and Computation, 41, 1882–1890.
- Abuzaid, A. H., Mohamed, I. B., & Hussin, A. G. (2009). A New Test of Discordancy in Circular Data. Communications in Statistics - Simulation and Computation, 38(4), 682–691.
- Ahmed, H. I. E. S., Abuzaid, A. H., & Awar, I. I. Al. (2019). Detection of Outliers in Circular Data using Kernel Density Function. Life Sciences: An International Journal (LSIJ), 1(1), 1–11.
- Alkasadi, N. A., Abuzaid, A. H. M., Ibrahim, S., & Yusoff, M. I. (2018). Outliers Detection in Multiple Circular Regression Model via DFBETAc Statistic. International Journal of Applied Engineering Research, 13(11), 9083–9090.
- Chang-chien, S., Hung, W., & Yang, M.-S. (2012). On mean shift-based clustering for circular data. Soft Comput, 16, 1043–1060.
- Collett, D. (1980). Outliers in Circular Data. Journal of the Royal Statistical Society, 29(1), 50–57.
- Di, N. F. M., & Satari, S. Z. (2017). The effect of different distance measures in detecting outliers using clustering-based algorithm for circular regression model. AIP Conference Proceedings, 1842.
- Fisher, N. I. (1993). Statistical Analysis in Circular Data. Cambridge University Press.
- Hung, W. L., Chang-Chien, S. J., & Yang, M. S. (2012). Self-updating clustering algorithm for estimating the parameters in mixtures of von Mises distributions. Journal of Applied Statistics, 39(10), 2259–2274.
- Jammalamadaka, S. R., & Sengupta, A. (2001). Topics in Circular Statistics. World Scientific Publishing Co. Pte. Ltd. P.
- Johnson, R., & Wichern, D. (2014). Applied Multivariate Statistical Analysis (Sixth). Pearson.
- Klutchnikoff, N., Poterie, A., & Rouviere, L. (2021). Statistical analysis of a hierarchical clustering algorithm with outliers. HAL Open Science.
- Mahmood, E. A., Rana, S., Hussin, A. G., & Midi, H. (2017). Adjusting Outliers in Univariate Circular Data. Pertanika J. Sci. & Technol. 25, 25(4), 1147–1158.
- Ott, L., Pang, L., Ramos, F. T. & Chawla, S. (2014). On integrated clustering and outlier detection. Advances in Neural Information Processing Systems, 1359-1367.
- Rambli, A. (2015). A half-circular distribution and outlier detection procedures in directional data. PhD Thesis. University of Malaya.
- Satari, S. Z. (2015). Parameter Estimation and Outlier Detection for Some Types of Circular Model. PhD Thesis. University of Malaya.
- Satari, S. Z., Muhammad Di, N. F., Zubairi, Y. Z., & Hussin, A. G. (2021). Comparative Study of Clustering-Based Outliers Detection Methods in Circular- Circular Regression Model. Sains Malaysiana, 50(6), 1787–1798.
- Sebert, D. M., Montgomery, D. C., & Rollier, D. A. (1998). A clustering algorithm for identifying multiple outliers in linear regression. Computational Statistics and Data Analysis, 27(4), 461–484.
- Zulkipli, N. S., Satari, S. Z., & Yusoff, W. N. S. W. (2020). Descriptive analysis of circular data with outliers using Python programming language. Data Analytics and Applied Mathematics (DAAM), 01(01), 31–36.