Main Article Content

Abstract

The procedure of outliers detection in univariate circular data can be developed using clustering algorithm. In clustering, it is necessary to calculate the similarity measure in order to cluster the observations into their own group. The similarity measure in circular data can be determined by calculating circular distance between each point of angular observation. In this paper, clustering-based procedure for outlier detection in univariate circular biological data with different similarity distance measures will be developed and the performance will be investigated. Three different circular similarity distance measures are used for the outliers detection procedure using single-linkage clustering algorithm. However, there are two similarity measures namely Satari distance and Di distance that are found to have similarity in formula for univariate circular data. The aim of this study is to develop and demonstrate the effectiveness of proposed clustering-based procedure with different similarity distance measure in detecting outliers. Therefore, in this study the circular similarity distance of SL-Satari/Di and another similarity measure namely SL-Chang will be compared at certain cutting rule. It is found that clustering-based procedure using single-linkage algorithm with different similarity distances are applicable and promising approach for outlier detection in univariate circular data, particularly for biological data. The result also found that at a certain condition of data, the SL-Satari/Di distance seems to overperform the performance of SL-Chang distance.

Keywords

Similarity measure Circular distance circular data Outliers Clustering algorithm

Article Details

Author Biographies

Nur Syahirah Zulkipli, 0000-0003-2354-8725

Centre for Mathematical Sciences, College of Computing and Applied Sciences, Universiti Malaysia Pahang, Malaysia

Siti Zanariah Satari, Centre for Mathematical Sciences, Universiti Malaysia Pahang, Malaysia

Centre for Mathematical Sciences, College of Computing and Applied Sciences, Universiti Malaysia Pahang, Malaysia

Wan Nur Syahidah Wan Yusoff, Centre for Mathematical Sciences, Universiti Malaysia Pahang, Malaysia

Centre for Mathematical Sciences, College of Computing and Applied Sciences, Universiti Malaysia Pahang, Malaysia

How to Cite
Zulkipli, N. S., Satari, S. Z., & Wan Yusoff, W. N. S. (2022). The Effect of Different Similarity Distance Measures in Detecting Outliers Using Single-Linkage Clustering Algorithm for Univariate Circular Biological Data. Pakistan Journal of Statistics and Operation Research, 18(3), 561-573. https://doi.org/10.18187/pjsor.v18i3.3982

Funding data

References

  1. Abuzaid, A. H. (2012). Analysis of Mother’s Day celebration via circular statistics. The Philippine Statistician, 61(2), 39–52.
  2. Abuzaid, A. H. (2013). On the Influential Points in the Functional Circular Relationship Models. Pakistan Journal of Statistics and Operation Research, 9(3), 333–342. DOI: https://doi.org/10.18187/pjsor.v9i3.595
  3. Abuzaid, A. H. (2020). Identifying density-based local outliers in medical multivariate circular data. Statistics in Medicine, 1–6. DOI: https://doi.org/10.1002/sim.8576
  4. Abuzaid, A. H., Hussin, A. G., Rambli, A., & Mohamed, I. (2012). Statistics for a New Test of Discordance in Circular Data. Communications in Statistics—Simulation and Computation, 41, 1882–1890. DOI: https://doi.org/10.1080/03610918.2011.624239
  5. Abuzaid, A. H., Mohamed, I. B., & Hussin, A. G. (2009). A New Test of Discordancy in Circular Data. Communications in Statistics - Simulation and Computation, 38(4), 682–691. DOI: https://doi.org/10.1080/03610910802627048
  6. Ahmed, H. I. E. S., Abuzaid, A. H., & Awar, I. I. Al. (2019). Detection of Outliers in Circular Data using Kernel Density Function. Life Sciences: An International Journal (LSIJ), 1(1), 1–11.
  7. Alkasadi, N. A., Abuzaid, A. H. M., Ibrahim, S., & Yusoff, M. I. (2018). Outliers Detection in Multiple Circular Regression Model via DFBETAc Statistic. International Journal of Applied Engineering Research, 13(11), 9083–9090.
  8. Chang-chien, S., Hung, W., & Yang, M.-S. (2012). On mean shift-based clustering for circular data. Soft Comput, 16, 1043–1060. DOI: https://doi.org/10.1007/s00500-012-0802-z
  9. Collett, D. (1980). Outliers in Circular Data. Journal of the Royal Statistical Society, 29(1), 50–57. DOI: https://doi.org/10.2307/2346410
  10. Di, N. F. M., & Satari, S. Z. (2017). The effect of different distance measures in detecting outliers using clustering-based algorithm for circular regression model. AIP Conference Proceedings, 1842. DOI: https://doi.org/10.1063/1.4982854
  11. Fisher, N. I. (1993). Statistical Analysis in Circular Data. Cambridge University Press. DOI: https://doi.org/10.1017/CBO9780511564345
  12. Hung, W. L., Chang-Chien, S. J., & Yang, M. S. (2012). Self-updating clustering algorithm for estimating the parameters in mixtures of von Mises distributions. Journal of Applied Statistics, 39(10), 2259–2274. DOI: https://doi.org/10.1080/02664763.2012.706268
  13. Jammalamadaka, S. R., & Sengupta, A. (2001). Topics in Circular Statistics. World Scientific Publishing Co. Pte. Ltd. P. DOI: https://doi.org/10.1142/4031
  14. Johnson, R., & Wichern, D. (2014). Applied Multivariate Statistical Analysis (Sixth). Pearson. DOI: https://doi.org/10.1002/9781118445112.stat02623
  15. Klutchnikoff, N., Poterie, A., & Rouviere, L. (2021). Statistical analysis of a hierarchical clustering algorithm with outliers. HAL Open Science. DOI: https://doi.org/10.1016/j.jmva.2022.105075
  16. Mahmood, E. A., Rana, S., Hussin, A. G., & Midi, H. (2017). Adjusting Outliers in Univariate Circular Data. Pertanika J. Sci. & Technol. 25, 25(4), 1147–1158.
  17. Ott, L., Pang, L., Ramos, F. T. & Chawla, S. (2014). On integrated clustering and outlier detection. Advances in Neural Information Processing Systems, 1359-1367.
  18. Rambli, A. (2015). A half-circular distribution and outlier detection procedures in directional data. PhD Thesis. University of Malaya.
  19. Satari, S. Z. (2015). Parameter Estimation and Outlier Detection for Some Types of Circular Model. PhD Thesis. University of Malaya.
  20. Satari, S. Z., Muhammad Di, N. F., Zubairi, Y. Z., & Hussin, A. G. (2021). Comparative Study of Clustering-Based Outliers Detection Methods in Circular- Circular Regression Model. Sains Malaysiana, 50(6), 1787–1798. DOI: https://doi.org/10.17576/jsm-2021-5006-24
  21. Sebert, D. M., Montgomery, D. C., & Rollier, D. A. (1998). A clustering algorithm for identifying multiple outliers in linear regression. Computational Statistics and Data Analysis, 27(4), 461–484. DOI: https://doi.org/10.1016/S0167-9473(98)00021-8
  22. Zulkipli, N. S., Satari, S. Z., & Yusoff, W. N. S. W. (2020). Descriptive analysis of circular data with outliers using Python programming language. Data Analytics and Applied Mathematics (DAAM), 01(01), 31–36. DOI: https://doi.org/10.15282/daam.v1i01.5085