An Integrated Machine Learning and Deep Learning Approach for Multiclass Flood Risk Classification with Feature Selection and Imbalanced Data Handling
Abstract
Floods are hydrometeorological disasters that often occur in tropical regions such as Indonesia and can have significant impacts on infrastructure, economy, and public health. This study aims to build and compare the performance of 21 artificial intelligence models, consisting of 15 Machine Learning algorithms and 6 Deep Learning architectures, in classifying flood risk levels based on multivariate tabular data. The dataset used includes 22 relevant environmental and social variables, with classification targets in four classes: Low, Moderate, High, and Very High. To improve data quality, feature selection was carried out using the LASSO method and class balancing with the SMOTEENN technique. The evaluation results showed that the C4.5, MLP, Random Forest, and Logistic Regression models obtained the highest accuracy (>94%), followed by deep learning models such as BiLSTM, CNN, and BiGRU with competitive accuracy (≥90%). Confusion matrix analysis confirmed the consistency of predictions across classes with a balanced distribution, especially in the decision tree and deep neural network models. This study emphasizes the importance of selecting a model that suits the characteristics of the data to achieve optimal predictions. The pipeline developed in this study is expected to be the basis for a more accurate and adaptive AI-based early warning system in mitigating flood risks in the future.
Downloads
References
Sulistya W. Belajar dari Kejadian Bencana Alam Sepanjang Tahun 2021. J Widya Climago 2022;4:84–90.
Ihwan AS. MEMPERKUAT EKOSOSIAL UNTUK MENCEGAH DAMPAK BANJIR DI MALANG. Waskita J Pendidik Nilai Dan Pembang Karakter 2023;7:221–237. https://doi.org/10.21776/ub.waskita.2023.007.02.8.
Wirdatul C, Hardianti S, Sumianto S, Asnimawati A, Gustriana E. Peran Edukasi Masyarakat dan Dampak Banjir terhadap Kesehatan Lingkungan serta Proses Belajar Anak SD di Desa Batu Belah, Kabupaten Kampar. ANTHOR Educ Learn J 2025;4:19–28. https://doi.org/10.31004/anthor.v4i2.373.
Sandiwarno S. Penerapan Machine Learning Untuk Prediksi Bencana Banjir. J Sist Inf Bisnis 2024;14.
Sharfina H, Utami PY, Fakhruzi I. Prediksi Bencana Banjir Menggunakan Algoritma Deep Learning H2O Berdasarkan Data Curah Hujan. JATISI (Jurnal Tek Inform Dan Sist Informasi) 2023;10:2407–4322.
Alzahrani A, Alheeti KMA, Thabit SS, Al-ani MS. Intelligent Mobile Coronavirus Recognition n.d.;1:4–15.
Rahayu K, Fitria V, Septhya D. Text Classification for Detecting Depression and Anxiety among Twitter Users based on Machine Learning Klasifikasi Teks untuk Mendeteksi Depresi dan Kecemasan Pada Pengguna Twitter Berbasis Machine Learning. MALCOM Indones J Mach Learn Comput Sci 2023;3:108–14.
Aziz M, Lailatul T, Ananda D, Pertiwi A. Intelligent Systems with Applications New model combination meta-learner to improve accuracy prediction P2P lending with stacking ensemble learning *. Intell Syst with Appl 2023;18:200204. https://doi.org/10.1016/j.iswa.2023.200204.
Rezaei Melal S, Aminian M, Shekarian SM. A machine learning method based on stacking heterogeneous ensemble learning for prediction of indoor humidity of greenhouse. J Agric Food Res 2024;16:101107. https://doi.org/10.1016/j.jafr.2024.101107.
Nyaramneni S. ScienceDirect Advanced Ensemble Machine Learning Models to Predict SDN Advanced Ensemble Machine Learning Models to Predict SDN Tra ffi c Tra ffi. Procedia Comput Sci 2024;230:417–26.
Ahmad F, Waseem Z, Ahmad M, Ansari MZ. Forest Fire Prediction Using Machine Learning Techniques. 2023 Int Conf Recent Adv Electr Electron Digit Healthc Technol REEDCON 2023 2023:705–8. https://doi.org/10.1109/REEDCON57544.2023.10150867.
Mienye ID, Sun Y. A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects. IEEE Access 2022;10:99129–49. https://doi.org/10.1109/ACCESS.2022.3207287.
Wang W, Sheng R, Liao S, Wu Z, Wang L, Liu C, et al. LightGBM is an Effective Predictive Model for Postoperative Complications in Gastric Cancer: A Study Integrating Radiomics with Ensemble Learning. J Imaging Informatics Med 2024;37:3034–48. https://doi.org/10.1007/s10278-024-01172-0.
Chen C, Zhang Q, Yu B, Yu Z, Lawrence PJ, Ma Q, et al. Improving protein-protein interactions prediction accuracy using XGBoost feature selection and stacked ensemble classifier. Comput Biol Med 2020;123:103899. https://doi.org/10.1016/j.compbiomed.2020.103899.
Ullah A, Javaid N, Javed MU, Pamir, Kim BS, Bahaj SA. Adaptive Data Balancing Method Using Stacking Ensemble Model and Its Application to Non-Technical Loss Detection in Smart Grids. IEEE Access 2022;10:133244–55. https://doi.org/10.1109/ACCESS.2022.3230952.
Khumaidi A, Kusmanto P, Hikmah N. Optimizing Bitcoin Price Predictions Using Long Short- Term Memory Algorithm : A Deep Learning Approach. Ilk J Ilm 2024;16:38–45.
Zhou Y, Dong Z, Bao X. A Ship Trajectory Prediction Method Based on an Optuna–BILSTM Model. Appl Sci 2024;14. https://doi.org/10.3390/app14093719.
Febriani A, Wahyuni R, Irawan Y, Melyanti R. Improved Hybrid Machine and Deep Learning Model for Optimization of Smart Egg Incubator. J Appl Data Sci 2024;5:1052–68.
Pan Y, Li Y, Yao T, Ngo CW, Mei T. Stream-ViT: Learning Streamlined Convolutions in Vision Transformer. IEEE Trans Multimed 2025;PP:1–11. https://doi.org/10.1109/TMM.2025.3535321.
Özen F. Random forest regression for prediction of Covid-19 daily cases and deaths in Turkey. Heliyon 2024;10:1–19. https://doi.org/10.1016/j.heliyon.2024.e25746.
Wu K, Wu J, Feng L, Yang B, Liang R, Yang S, et al. An attention-based CNN-LSTM-BiLSTM model for short-term electric load forecasting in integrated energy system. Int Trans Electr Energy Syst 2021;31:e12637. https://doi.org/https://doi.org/10.1002/2050-7038.12637.
Sufi F. Generative Pre-Trained Transformer (GPT) in Research: A Systematic Review on Data Augmentation. Inf 2024;15. https://doi.org/10.3390/info15020099.
Riza F. Sistem Deteksi Intrusi pada Server secara Realtime Menggunakan Seleksi Fitur dan Firebase Cloud Messaging. J Sistim Inf Dan Teknol 2022;5:7–15. https://doi.org/10.37034/jsisfotek.v5i1.161.
Khairi MY, Sampetoding EAM, Pongtambing YS. Studi Literatur Penerapan Deep Learning dalam Analisis Citra Medis di Indonesia. Heal J Public Heal Perspect 2024;1:15–24. https://doi.org/10.62330/healthsense.v1i1.149.
Wijayanto A, Sugiharto A, Santoso R. Identifikasi Dini Curah Hujan Berpotensi Banjir Menggunakan Algoritma Long Short-Term Memory (Lstm) Dan Isolation Forest. J Teknol Inf Dan Ilmu Komput 2024;11:637–46. https://doi.org/10.25126/jtiik.938718.
Kasnanda Bintang Y, Imaduddin H, Kasnanda Y, Corresponding Author B. Pengembangan Model Deep Learning Untuk Deteksi Retinopati Diabetik Menggunakan Metode Transfer Learning. J Ilm Penelit Dan Pembelajaran Inform 2024;9:1442–55.
Fonda H, Irawan Y, Melyanti R, Wahyuni R, Muhaimin A. A Comprehensive Stacking Ensemble Approach for Stress Level Classification in Higher Education. J Appl Data Sci 2024;5:1701–14.
Gurcan F, Soylu A. Learning from Imbalanced Data: Integration of Advanced Resampling Techniques and Machine Learning Models for Enhanced Cancer Diagnosis and Prognosis. Cancers (Basel) 2024;16. https://doi.org/10.3390/cancers16193417.
Husain G, Nasef D, Jose R, Mayer J, Bekbolatova M, Devine T, et al. SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Addressing Class Imbalance in Regression Models. Algorithms 2025;18. https://doi.org/10.3390/a18010037.
Megouo TGP, Pierre S. A Stacking Ensemble Machine Learning Model for Emergency Call Forecasting. IEEE Access 2024;12:115820–37. https://doi.org/10.1109/ACCESS.2024.3445591.
Wang S, Chen Y, Cui Z, Lin L, Zong Y. Diabetes Risk Analysis based on Machine Learning LASSO Regression Model. J Theory Pract Eng Sci 2024;4:58–64. https://doi.org/10.53469/jtpes.2024.04(01).08.
Yang Y, Zhang G, Zhu G, Yuan D, He M. Prediction of fire source heat release rate based on machine learning method. Case Stud Therm Eng 2024;54:1–15. https://doi.org/10.1016/j.csite.2024.104088.
Copyright (c) 2025 Yuda Irawan, Refni Wahyuni, Herianto

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
This is an open-access article distributed under the terms of the Creative Commons Attribution-ShareAlike 4.0 International License which permits unrestricted use, distribution, and reproduction in any medium. Users are allowed to read, download, copy, distribute, search, or link to full-text articles in this journal without asking by giving appropriate credit, provide a link to the license, and indicate if changes were made. All of the remix, transform, or build upon the material must distribute the contributions under the same license as the original.












