Lumpy Skin Disease Classification on Imbalanced Data: A Comprehensive Approach Using Machine and Ensemble Learning

Authors

  • Shahla U. Umar Department of Software, College of Computer Science and Information Technology, University of Kirkuk, Kirkuk, Iraq

Keywords:

Machine learning, Imbalanced data, Classification, SMOTE, Lumpy Skin disease

Abstract

Lumpy Skin Disease (LSD) is a major problem of veterinary care that requires proper, prompt diagnosis and classification to enable effective treatment. However, medical datasets are often affected by such challenges as class imbalance, missing values, outliers, and high dimensionality, which makes it challenging to create effective diagnostic  tools. This study presents a hybrid machine learning (ML) model which aims to address these typical data complexities. We integrate the Synthetic Minority Over-Sampling Technique (SMOTE) as an effective method of dealing with data imbalance with ensemble learning, Bagging in particular, to improve the precision of the classification. In our research, three well-known ML algorithms Decision Trees (DT), Logistic Regression (LR) and Naive Bayes (NB) are rigorously tested together with our Bagging model. The evaluation of the performance was performed on a wide range of measures: accuracy, precision, recall, F-score, and Matthews Correlation Coefficient (MCC). The findings are conclusive in the sense that the Bagging ensemble has been shown to be better than the rest of the models in the sense that it had the highest accuracy of 89.5% and the highest precision and recall rates as well. Moreover, SMOTE was very useful in reducing the bias in the dataset thus making model training more accurate. These results highlight the significant performance of SMOTE and ensemble learning combination towards producing very reliable and accurate diagnostic tools. Such an original method can bring a major breakthrough in LSD treatment and has an enormous potential to enhance the diagnosis of other multifactorial disorders in veterinary medicine.

References

Y. C. Wang and C. H. Cheng, “A multiple combined method for rebalancing medical data with class imbalances,” Comput. Biol. Med., vol. 134, no. May, p. 104527, 2021, doi: 10.1016/j.compbiomed.2021.104527.

C. Azad, B. Bhushan, R. Sharma, A. Shankar, K. K. Singh, and A. Khamparia, “Prediction model using SMOTE, genetic algorithm and decision tree (PMSGD) for classification of diabetes mellitus,” Multimed. Syst., vol. 28, no. 4, pp. 1289–1307, 2022, doi: 10.1007/s00530-021-00817-2.

N. Sultan, M. Hasan, M. F. Wahid, H. Saha, and A. Habib, “Cesarean Section Classification Using Machine Learning With Feature Selection, Data Balancing, and Explainability,” IEEE Access, vol. 11, no. July, pp. 84487–84499, 2023, doi: 10.1109/ACCESS.2023.3303342.

M. Zheng et al., “An automatic sampling ratio detection method based on genetic algorithm for imbalanced data classification,” Knowledge-Based Syst., vol. 216, p. 106800, 2021, doi: 10.1016/j.knosys.2021.106800.

S. U. Umar, M. R. Baker, and K. H. Jihad, “Machine Learning for Enhanced Diabetes Prediction: A SMOTE-Based Comparative Study,” in Lecture Notes in Networks and Systems, 2025, vol. 1243 LNNS, pp. 123–133. doi: 10.1007/978-3-031-81080-0_12.

A. S. H. Alwazy, G. Buyrukoglu, S. Buyrukoglu, and M. R. Baker, “Evaluating machine learning and statistical learning techniques for cancer classification and diagnosis,” Iran J. Comput. Sci., 2025, doi: 10.1007/s42044-025-00233-z.

N. S. Rahmi, N. W. S. Wardhani, M. B. Mitakda, R. S. Fauztina, and I. Salsabila, “SMOTE Classification and Random Oversampling Naive Bayes in Imbalanced Data : (Case Study of Early Detection of Cervical Cancer in Indonesia),” Proc. 2022 IEEE 7th Int. Conf. Inf. Technol. Digit. Appl. ICITDA 2022, pp. 1–6, 2022, doi: 10.1109/ICITDA55840.2022.9971421.

A. Ishaq et al., “Improving the Prediction of Heart Failure Patients’ Survival Using SMOTE and Effective Data Mining Techniques,” IEEE Access, vol. 9, pp. 39707–39716, 2021, doi: 10.1109/ACCESS.2021.3064084.

R. Nithya, T. Kokilavani, and T. L. A. Beena, “Balancing cerebrovascular disease data with integrated ensemble learning and SVM-SMOTE,” Netw. Model. Anal. Heal. Informatics Bioinforma., vol. 13, no. 1, 2024, doi: 10.1007/s13721-024-00447-4.

G. Mulugeta, T. Zewotir, A. S. Tegegne, L. H. Juhar, and M. B. Muleta, “Classification of imbalanced data using machine learning algorithms to predict the risk of renal graft failures in Ethiopia,” BMC Med. Inform. Decis. Mak., vol. 23, no. 1, pp. 1–17, 2023, doi: 10.1186/s12911-023-02185-5.

L. Liu, X. Wu, S. Li, Y. Li, S. Tan, and Y. Bai, “Solving the class imbalance problem using ensemble algorithm: application of screening for aortic dissection,” BMC Med. Inform. Decis. Mak., vol. 22, no. 1, pp. 1–16, 2022, doi: 10.1186/s12911-022-01821-w.

M. R. Baker et al., “Comparison of Machine Learning Approaches for Detecting COVID-19-Lockdown-Related Discussions During Recovery and Lockdown Periods,” J. Oper. Intell., vol. 1, no. 1, pp. 11–29, Oct. 2023, doi: 10.31181/jopi1120233.

A. Abdellatif, H. Abdellatef, J. Kanesan, C. O. Chow, J. H. Chuah, and H. M. Gheni, “An Effective Heart Disease Detection and Severity Level Classification Model Using Machine Learning and Hyperparameter Optimization Methods,” IEEE Access, vol. 10, no. August, pp. 79974–79985, 2022, doi: 10.1109/ACCESS.2022.3191669.

B. Zhang, X. Dong, Y. Hu, X. Jiang, and G. Li, “Classification and prediction of spinal disease based on the SMOTE-RFEXGBoost model,” PeerJ Comput. Sci., vol. 9, pp. 1–20, 2023, doi: 10.7717/PEERJ-CS.1280.

N. L. Fitriyani, M. Syafrudin, G. Alfian, C. K. Yang, J. Rhee, and S. M. Ulyah, “Chronic Disease Prediction Model Using Integration of DBSCAN, SMOTE-ENN, and Random Forest,” 2022 ASU Int. Conf. Emerg. Technol. Sustain. Intell. Syst. ICETSIS 2022, pp. 289–294, 2022, doi: 10.1109/ICETSIS55481.2022.9888806.

J. Yang and J. Guan, “A Heart Disease Prediction Model Based on Feature Optimization and Smote-Xgboost Algorithm,” Inf., vol. 13, no. 10, 2022, doi: 10.3390/info13100475.

K. R. Mahmudah, F. Indriani, Y. Takemori‐sakai, Y. Iwata, T. Wada, and K. Satou, “Classification of imbalanced data represented as binary features,” Appl. Sci., vol. 11, no. 17, 2021, doi: 10.3390/app11177825.

S. Shahane, “Lumpy Skin Disease Dataset,” Kaggle Dataset, 2023. https://www.kaggle.com/datasets/saurabhshahane/lumpy-skin-disease-dataset

O. Nooruldeen, M. R. Baker, A. M. Aleesa, A. H. Ghareeb, and E. H. Shaker, “Strategies for predictive power: Machine learning models in city-scale load forecasting,” e-Prime - Adv. Electr. Eng. Electron. Energy, vol. 6, 2023, doi: 10.1016/j.prime.2023.100392.

E. F. Aziz and M. R. Baker, “Enhancing Multi-Class Password Strength Prediction Through Machine Learning and Ensemble Techniques,” Int. J. Saf. Secur. Eng., vol. 14, no. 5, pp. 1635–1645, 2024, doi: 10.18280/ijsse.140530.

K. H. Jihad, M. R. Baker, M. Farhat, and M. Frikha, “Machine Learning-Based Social Media Text Analysis: Impact of the Rising Fuel Prices on Electric Vehicles,” Lect. Notes Networks Syst., vol. 647 LNNS, pp. 625–635, 2023, doi: 10.1007/978-3-031-27409-1_57/TABLES/5.

Downloads

Published

2025-07-31

How to Cite

Umar, S. U. (2025). Lumpy Skin Disease Classification on Imbalanced Data: A Comprehensive Approach Using Machine and Ensemble Learning. Vital Annex: International Journal of Novel Research in Advanced Sciences (2751-756X), 4(7), 263–270. Retrieved from https://journals.innoscie.com/index.php/ijnras/article/view/102

Issue

Section

Articles

Similar Articles

1 2 > >> 

You may also start an advanced similarity search for this article.