Main Article Content
One of the essential problems in data mining is the removal of negligible variables from the data set. This paper proposes a hybrid approach that uses rough set theory based algorithms to reduct the attribute selected from the data set and utilize reducts to raise the classification success of three learning methods; multinomial logistic regression, support vector machines and random forest using 5-fold cross validation. The performance of the hybrid approach is measured by related statistics. The results show that the hybrid approach is effective as its improved accuracy by 6-12% for the three learning methods.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
- Al-Radaideh, Q. A., Sulaiman, M. N., Selamat, M. H., & Ibrahim, H. (2005). Approximate reduct computation by rough sets based attribute weighting. IEEE International Conference on Granular Computing, Beijing, China.
- Asilkan, O., Faqolli, A., Gerdeci, A., & Cico, B. (2012). Estimating the market values of houses in Tirana using data mining. AWER Procedia Information Technology and Computer Science, 1, 1224–1234.
- Bazan J. G., Skowron A., & Synak P. (1994). Dynamic reducts as a tool for extracting laws from decisions tables. In: Ras Z.W., Zemankova M. (eds) Methodologies for Intelligent Systems. ISMIS 1994. Lecture Notes in Computer Science (Lecture Notes in Artificial Intelligence), 869. Springer, Berlin, Heidelberg.
- Bazan, J. G. (1998). A comparison of dynamic and non-dynamic rough set methods for extracting laws from decision tables. Rough Sets in Knowledge Discovery 1: Methodology and Applications, 18 of Studies in Fuzziness and Soft Computing. Chapter 17, pp. 321-365, Physica-Verlag, Heidelberg, Germany.
- Breiman, L., Friedman, J., Stone, C.J., & Olshen, R.A. (1984). Classification and regression trees. Chapman and Hall/CRC Press, Florida.
- Burges, C.J.C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 121-167.
- Chogle, A., Khaire, P., Gaud, A., & Jain, J. (2017). House price forecasting using data mining techniques. International Journal of Advanced Research in Computer and Communication Engineering, 6(12), 81–90.
- Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.
- Hromada, E. (2015). Mapping of real estate prices using data mining techniques. Procedia Engineering, 123, 233–240.
- Liu, G., & Zong, X. (2017). Research of second-hand real estate price forecasting based on data mining. IEEE 2nd Information Technology, Networking, Electronic and Automation Control Conference, Chengdu, China.
- Godinez, F., Hutter, D., & Monroy, R. (2004). Attribute reduction for effective intrusion detection. Advances in Web Intelligence, Second International Atlantic Web Intelligence Conference, AWIC 2004, Cancun, Mexico.
- Han, J., & Kamber, M. (2006). Data mining: concepts and techniques. 2nd ed., Elsevier, San Francisco, California.
- Hastie, T., Tibshirani, R., & Friedman, J. (2008). The elements of statistical learning: data mining, inference, and prediction. 2nd ed., Springer, Standford, California.
- Johnson, D. (1974). Approximation algorithms for combinatorial problems. Journal of Computer and System Sciences, 9, 256–278.
- Lawson, C., & Montgomery, D. (2006). Logistic regression analysis of customer satisfaction data. Quality Reliability Engineering International, 22, 971-984.
- Ohrn, A. (2001). ROSETTA: Technical Reference Manual, Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway.
- Pawlak, Z. (1982). Rough sets. International Journal of Computer and Information Science, 11(5), 341–356.
- Pawlak, Z. (1991). Rough sets theoretical aspect of reasoning about data. Kluwer Academic, Boston, Mass, USA.
- Pawlak, Z., Grzymala-Busse, J., Slowinski, R. & Ziarko, W. (1995). Rough sets. Communications of the ACM, 38(11), 89-95.
- Skowron, A., & Rauszer, C. (1992). The discernibility matrices and functions in information systems, in Slowifiski R.(ed.), Intelligent Decision Support. Handbook of Applications and Advances of the Rough Sets Theory. Kluwer, Dortrecht. 331–362.
- Srivastava, D. K., Patnaik, K. S., & Bhambhu, L. (2010). Data classification: A Rough-SVM approach. Contemporary Engineering Sciences, 3(2), 77-86.
- Swiniarski, R.W., & Skowron, A. (2003). Rough set methods in feature selection and recognition. Pattern Recognition Letters, 24(6), 833-849.
- Tan, S., Cheng, X., & Xu, H. (2007). An efficient global optimization approach for rough set based dimensionality reduction. International Journal of Innovative. Computing, Information and Control, 3(3), 725–736.
- Vinterbo, S., & Ohrn, A. (2000). Minimal approximate hitting sets and rule templates. International Journal of Approximate Reasoning, 25(2), 123–143.
- Wroblewski, J. (1995). Finding minimal reducts using genetic algorithms. Second Annual Join Conference on Information Sciences, 186–189.
- Zeng, A., Pan, D., Zheng, Q. L., & Peng, H. (2006). Knowledge acquisition based on rough set theory and principal component analysis. IEEE Intelligent Systems, 21(2), 78– 85.