Synthetic Minority Oversampling Technique for Efforts to Improve Imbalanced Data in Classification of Lettuce Plant Diseases
Abstract
In this study we classified lettuce plant diseases. These plant diseases are available in the form of images that have been converted in .csv format to be classified. These plant diseases are available in the form of images that have been converted in .csv format to be classified. Image These plant diseases have been divided into several classes or categories. Then we determine the features of each row and column of the dataset. Each line in the CSV file represents one image, and each column represents one feature Each line in the CSV file represents one image, and each column represents one feature. Then a label is made for each line in the CSV file, namely the class or category where the images are grouped. Thus, so that we get datasets that are ready to be processed with machine learning. However, in processing the dataset, we get imbalanced data. So we added the Synthetic Minority Over-sampling Technique (SMOTE) method to overcome the imbalance that occurs. So that the data can be classified using several algorithms to find the best accuracy.
Downloads
References
A. E. Maxwell, T. A. Warner, and F. Fang, “Implementation of machine-learning classification in remote sensing: An applied review,” Int J Remote Sens, vol. 39, no. 9, pp. 2784–2817, 2018, doi: 10.1080/01431161.2018.1433343.
S. B. Kotsiantis, I. D. Zaharakis, and P. E. Pintelas, “Machine learning: A review of classification and combining techniques,” Artif Intell Rev, vol. 26, no. 3, pp. 159–190, 2006, doi: 10.1007/s10462-007-9052-3.
R. Konieczny and R. Idczak, “Mössbauer study of Fe-Re alloys prepared by mechanical alloying,” Hyperfine Interact, vol. 237, no. 1, pp. 1–8, 2016, doi: 10.1007/s10751-016-1232-6.
O. F.Y, A. J.E.T, A. O, H. J. O, O. O, and A. J, “Supervised Machine Learning Algorithms: Classification and Comparison,” International Journal of Computer Trends and Technology, vol. 48, no. 3, pp. 128–138, 2017, doi: 10.14445/22312803/ijctt-v48p126.
A. A. Soofi and A. Awan, “Classification Techniques in Machine Learning: Applications and Issues,” Journal of Basic & Applied Sciences, vol. 13, pp. 459–465, 2017.
H. Sunaryanto, M. A. Hasan, and G. Guntoro, “Classification Analysis of Unilak Informatics Engineering Students Using Support Vector Machine (SVM), Iterative Dichotomiser 3 (ID3), Random Forest and K-Nearest Neighbors (KNN),” IT Journal Research and Development, vol. 7, no. 1, pp. 36–42, Aug. 2022, doi: 10.25299/itjrd.2022.8912.
N. Nasution, M. Rizal, D. Setiawan, and M. A. Hasan, “IoT Dalam Agrobisnis Studi Kasus : Tanaman Selada Dalam Green House,” It Journal Research and Development, vol. 4, no. 2, pp. 86–93, 2019, doi: 10.25299/itjrd.2020.vol4(2).3357.
N. Thanh-Long, Tran-Minh, and L. Hong-Chuong, “A Back Propagation Neural Network Model with the Synthetic Minority Over-Sampling Technique for Construction Company Bankruptcy Prediction,” International Journal of Sustainable Construction Engineering and Technology, vol. 13, no. 3, pp. 68–79, Oct. 2022, doi: 10.30880/ijscet.2022.13.03.007.
F. Yang, K. Wang, L. Sun, M. Zhai, J. Song, and H. Wang, “A hybrid sampling algorithm combining synthetic minority over-sampling technique and edited nearest neighbor for missed abortion diagnosis,” BMC Med Inform Decis Mak, vol. 22, no. 1, p. 344, Dec. 2022, doi: 10.1186/s12911-022-02075-2.
O. Oluwaseyi et al., “SYNTHETIC MINORITY OVER-SAMPLING TECHNIQUE AND RESAMPLE APPROACH FOR ANDROID MALWARE DETECTION USING TREE-BASED CLASSIFIERS Detection of Phishing URLs View project Remote Weapon Station View project SYNTHETIC MINORITY OVER-SAMPLING TECHNIQUE AND RESAMPLE APPROACH FOR ANDROID MALWARE DETECTION USING TREE-BASED CLASSIFIERS.” [Online]. Available: https://www.researchgate.net/publication/365650520
E. Erlin, Y. Desnelita, N. Nasution, L. Suryati, and F. Zoromi, “Dampak SMOTE terhadap Kinerja Random Forest Classifier berdasarkan Data Tidak seimbang,” MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 21, no. 3, pp. 677–690, Jul. 2022, doi: 10.30812/matrik.v21i3.1726.
A. Fernández, S. García, F. Herrera, and N. v Chawla, “SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary,” 2018.
N. v Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” 2002.
H. Han, W.-Y. Wang, and B.-H. Mao, “Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning,” 2005.
L. Torgo, R. P. Ribeiro, B. Pfahringer, and P. Branco, “SMOTE for Regression.”
J. Wang, M. Xu, H. Wang, and J. Zhang, “Classification of Imbalanced Data by Using the SMOTE Algorithm and Locally Linear Embedding.”
P.-H. Chen, C.-J. Lin, and B. Schölkopf, “A Tutorial on ν-Support Vector Machines.”
T. Joachims, “SVMLight: Support Vector Machine,” 2018. [Online]. Available: https://www.researchgate.net/publication/243763293
S. Zhang, X. Li, M. Zong, X. Zhu, and D. Cheng, “Learning k for kNN Classification,” ACM Trans Intell Syst Technol, vol. 8, no. 3, Jan. 2017, doi: 10.1145/2990508.
M.-L. Zhang and Z.-H. Zhou, “Ml-knn: A Lazy Learning Approach to Multi-Label Learning.”
G. Guo, H. Wang, D. Bell, Y. Bi, and K. Greer, “KNN model-based approach in classification,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 2888, pp. 986–996, 2003, doi: 10.1007/978-3-540-39964-3_62.
G. Biau and E. Scornet, “A Random Forest Guided Tour,” Nov. 2015, [Online]. Available: http://arxiv.org/abs/1511.05741
L. Breiman, “Random Forests,” 2001.
Y. Qi, “Random Forest for Bioinformatics.”
S. R. Safavian and D. Landgrebe, “A Survey of Decision Tree Classifier Methodology,” 1990.
S. Nowozin, C. Rother, S. Bagon, T. Sharp, B. Yao, and P. Kohli, “Decision Tree Fields.”
J. R. Quinlan, “Learning Decision Tree Classifiers,” 1996.
Copyright (c) 2023 Nurliana Nasution, Feldiansyah Feldiansyah, Ahmad Zamsuri, Mhd Arief Hasan
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
This is an open-access article distributed under the terms of the Creative Commons Attribution-ShareAlike 4.0 International License which permits unrestricted use, distribution, and reproduction in any medium. Users are allowed to read, download, copy, distribute, search, or link to full-text articles in this journal without asking by giving appropriate credit, provide a link to the license, and indicate if changes were made. All of the remix, transform, or build upon the material must distribute the contributions under the same license as the original.