Main Article Content

Abstract

Panel data is commonly used for the numerical response variables, while the literature for forecasting categorical variables on the panel data structure is still challenging to find. Forecasting is important because it is helpful for government policies. This study aimed to forecast multiclass or categorical variables on the panel data structure. The proposed forecasting models were autoregressive multinomial logit and autoregressive C5.0. The strategy applied so that the two models could be used for forecasting was to add autoregressive effects and fixed predictor variables such as location, time, strata, and month of observations. The autoregressive effect  was assumed to be a fixed effect and treated as a dummy variable. The data used was the category of land conditions through The Area Sampling Frame (ASF) survey conducted by the BPS-Statistics Indonesia. The evaluation of both models was based on classification and forecasting performance. Classification performance was obtained by dividing the dataset into 75% training data for modeling and 25% test data for validation and then repeated 200 times. The classification results showed that the autoregressive C5.0 accuracy was 86.48%, while the autoregressive multinomial logit was 83.97%. A comparison of forecasting performance was obtained by dividing the data into training and testing based on the time sequence. The result showed that the forecasting performance was worse than the classification performance. Autoregressive C5.0 had an accuracy of 77.43%, while autoregressive multinomial logit had 77.77%.

Keywords

C5.0 Forecasting Multiclass Multinomial Logit Rice Growth Phases

Article Details

How to Cite
Ardiansyah, M., Wijayanto, H., Kurnia, A., & Djuraidah, A. (2023). Multiclass Forecasting on Panel Data Using Autoregressive Multinomial Logit and C5.0 Decision Tree. Pakistan Journal of Statistics and Operation Research, 19(1), 145-154. https://doi.org/10.18187/pjsor.v19i1.4053

References

  1. Abdalla ME. (2012). An Application on Multinomial Logistic Regression Model. Pakistan Journal of Statistics and Operation Research, 8(2), 271-291.
  2. Ardiansyah, Djuraidah A, Sumertajaya IM, Wigena AH, Fitrianto A. (2021). Development of the Panel ARDL by Adding Space-Time effect to Modeling Monthly Paddy Producer Price in Java. Journal of Physics: Conference Series, 1863, 1-18, 10.1088/1742-6596/1863/1/012053.
  3. Ardiansyah M, Kurnia A, Sadik K, Djuraidah A, Wijayanto H. (2021). Numerical Prediction of paddy weight of Crop Cutting Survey using Generalized Geoadditive Linear Mixed Model. Journal of Physics: Conference Series. 1863, 1-17, 10.1088/1742-6596/1863/1/012024.
  4. BPS. (2018). Manual of Integrated Food Crops Agricultural Statistics Data Collection Using the Area Sample Framework (ASF) Method. Jakarta: BPS-Statistics of Indonesia.
  5. Dutang C. (2017). Some explanations about the IWLS algorithm to fit generalized linear models. hal-01577698f.
  6. Grandini M, Bagli E, and Visani G. (2020). Metrics for Multiclass Classification: An Overview. A White Paper. arXiv preprint:2008.05756.
  7. Juodis A and Sarafidis V. (2020). A Linear Estimator for Factor-Augmented Fixed-T Panels with Endogenous Regressors. Journal of Business & Economic Statistics, 1–48, 10.1080/07350015.2020.1766469.
  8. Khan S, Ouyang F, and Tamer E. 2021. Inference on semiparametric multinomial response models. Quantitative Economics, 12, 743–777, 10.3982/QE1315.
  9. Kuhn M, Weston S, Culp M, Coulter N, Quinlan R. (2022). C5.0 Decision Trees and Rule-Based Models. R package version: 0.1.6.
  10. Pandya R and Pandya J. (2015). C5.0 Algorithm to Improved Decision Tree with Feature Selection and Reduced Error Pruning. International Journal of Computer Applications, 117(16), 18-21.
  11. Pasha GR, Aslam M, Abdullah M. (2007). Dynamic Panel Data Model for Investment, Real Value and Capital Stock Data. Pakistan Journal of Statistics and Operation Research, 3(1), 13-17.
  12. Ripley B and Venables W. (2022). Package ‘nnet’. R package version: 7.3-17.
  13. Saeed A and Aslam M. (2016). Improved Inference of Heteroscedastic Fixed Effects Models. Pakistan Journal of Statistics and Operation Research, 7(4), 589-608.
  14. Saeed MS, Mustafa MW, Sheikh UU, Jumani TA, Khan I, Atawneh S, and Hamadneh NN. (2020). An Efficient Boosted C5.0 Decision-Tree-Based Classification Approach for Detecting Non-Technical Losses in Power Utilities. Energies, 13, 3242, 10.3390/en13123242.
  15. Sarafidis, V., & Wansbeek, T. (2021). Celebrating 40 years of panel data analysis: Past, present and future. Journal of Econometrics, 220(2), 215-226, 10.1016/j.jeconom.2020.06.001.