Comparative Analysis of Machine Learning Algorithms for Diabetes Prediction with Feature and Hyperparameter Optimization
Article Sidebar
Abstract:
Background: Diabetes is a chronic disease with increasing global prevalence, making early detection essential. Machine learning has shown strong potential in improving prediction accuracy; however, robust validation and systematic optimization are still required.
Aims: This study tries to compare different machine learning methods to predict diabetes using a. reproducible and methodologically sound framework.
Methods: The Pima Indian Diabetes dataset (768 samples, 8 clinical features) was used. Six algorithms were evaluated: Logistic Regression, K-Nearest Neighbors, Decision Tree, Random Forest, Support Vector Machine, and Gradient Boosting. Hyperparameter tuning was done with GridSearchCV, and the models were checked using stratified 5-fold cross-validation. The performance of the model was assessed using several metrics including accuracy, precision, recall, F1-score, and AUC-ROC.
Results: The results show that ensemble methods outperform traditional models. Random Forest achieved the highest The model performed with an accuracy of 98% plus or minus 1.8% and an AUC-ROC of 0.999 plus or minus 0.02, then Gradient Boosting achieved 91% plus or minus 2.1%. Logistic Regression and KNN had lower performance with accuracy scores of 79% plus or minus 2.3% and 77% plus or minus 2.5%, respectively. The analysis of which features are most important found that glucose levels, BMI, and age are the top factors that have the biggest influence.
Conclusion: The study demonstrates that ensemble methods combined with hyperparameter optimization and robust validation significantly improve diabetes prediction performance and can support clinical decision-making.
Keywords: Algoritma , Decision tree , Diabetes , Random forest , KNN
Copyright (c) 2026 Fikri Fakhar Rahmadhan, Fikri Haikal, Muhammad Arif, Muhammad Agung Insani

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
References
Aditya, M. F., & Pramuntadi, A. (2024). Implementation of Decision Tree Method for Diabetes Mellitus Type 2 Prediction. MALCOM: Indonesian Journal of Machine Learning and Computer Science, 4(3), 1104–1110. https://doi.org/10.57152/malcom.v4i3.1284
Alzboon, M. S., Al-Batah, M., Alqaraleh, M., Abuashour, A., & Bader, A. F. (2025). A Comparative Study of Machine Learning Techniques for Early Prediction of Diabetes. Institute of Electrical and Electronics Engineers (IEEE), 1, 1–12. https://doi.org/10.1109/comnet60156.2023.10366688
Arfiah, S., Wajidi, F., & Nur, N. (2025). Optimasi Algoritma K-Nearest Neighbors pada Prediksi Penyakit Diabetes. JURIKOM (Jurnal Riset Komputer), 12(3), 230–240. https://doi.org/10.30865/jurikom.v12i3.8615
Bhatta, R. P. (2025). Comparative Study of Diabetes Prediction using Machine Learning Approaches NPRC Journal of Multidisciplinary Research. NPRC Journal of Multidisciplinary Research, 2(14), 1–15. https://doi.org/10.3126/nprcjmr.v2i14.86923
Diranisha, V., Triayudi, A., & Komalasari, R. T. (2024). Implementation of K-Nearest Neighbour (KNN) Algorithm and Random Forest Algorithm in Identifying Diabetes. SAGA: Journal of Technology and Information Systems, 2(2), 234–244. https://doi.org/10.58905/SAGA.vol2i2.253
Dwinnie, Z. C., Dwynne, Z. C., Islam, M. J., & Noviarni. (2024). Comparison of Machine Learning Algorithms in Diabetes Risk Classification. IJATIS: Indonesian Journal of Applied Technology and Innovation Science, 1(2), 54–60. https://doi.org/10.57152/ijatis.v1i2.1141
Enríquez-ortega, D., Chulde-fernández, B., Pozo-coral, P., Vaca, A., Zhinin-vera, L., Almeida-galárraga, D., Ramírez-cando, L., Tirado-espín, A., Cadena-morejón, C., Villalba-meneses, F., Guevara, C., & Acosta-vargas, P. (2025). Enhancing Diabetes Diagnosis Through Machine Learning : A Comparative Study. Aplpied Sciences, 15(10087), 1–18. https://doi.org/10.3390/app151810087
Fadhlullah, A. F., & Widiyaningtyas, T. (2024). Comparative Analysis of Decision Tree and Random Forest Algorithms for Diabetes Prediction. JTAM (Jurnal Teori Dan Aplikasi Matematika), 8(4), 1121–1132. https://doi.org/10.31764/jtam.v8i4.24388
Hadi, D. A., Agustin, D., & Sirodj, N. (2022). Metode Random Forest untuk Klasifikasi Penyakit Diabetes. Bandung Conference Series: Statistics, 3(2), 428–435. https://doi.org/10.29313/bcss.v3i2.8354
Kurniawati, F., & Arianto, D. B. (2023). Analisis Implementasi Seleksi Fitur Pada Klasifikasi Diabetes dengan Metode Corellation Matrix dan Algoritma Logistic Regression. JURNAL INFORMATIK, 4221(2020), 157–164. https://doi.org/10.52958/iftk.v19i3.6019
Liu, S. (2023). Diabetes Prediction by KNN , SVM , Random Forest and XGBoost. Highlights in Science, Engineering and Technology, 72, 1113–1120. https://doi.org/10.3126/nprcjmr.v2i14.86923
Mangai, M. J., Ayenajeh, G. T., Enekai, O. D., Jr, S. M., Dirting, B. D., & Betty, D. (2026). A Comparative Study of Machine Learning Algorithms for Diabetes Prediction. International Journal Of Research And Innovation In Applied Science (IJRIAS), XI(2454), 1273–1280. https://doi.org/10.51584/IJRIAS
Maulana, M. R., & Karomi, M. A. Al. (2023). Classification of Type 2 Diabetes using Decission Tree Algorithm. JAICT Journal of Applied Communication and Information Technologies, 8(2), 236–241. https://doi.org/10.32497/jaict.v8i2.4835
Mujumdar, A., & Vaidehi, V. (2019). ScienceDirect ScienceDirect ScienceDirect ScienceDirect Diabetes Prediction using Machine Learning Aishwarya Mujumdar Diabetes Prediction using Machine Learning Aishwarya Mujumdar Aishwarya. Procedia Computer Science, 165, 292–299. https://doi.org/10.1016/j.procs.2020.01.047
Nguyen, B., & Zhang, Y. (2025). A Comparative Study of Diabetes Prediction Based on Lifestyle Factors Using Machine Learning. ArXiv, 2(1), 1–5. https://doi.org/10.48550/arXiv.2503.04137
Oktaviana, A., Wijaya, D. P., Pramuntadi, A., & Heksaputra, D. (2024). Prediction of Type 2 Diabetes Mellitus Using The K-Nearest Neighbor ( K-NN ) Algorithm. MALCOM: Indonesian Journal of Machine Learning and Computer Science, 4(3), 812–818. https://doi.org/10.57152/malcom.v4i3.1268
Peerbasha, S., Iqbal, Y. M., Surputheen, M. M., & Raja, A. S. (2023). Diabetes Prediction using Decision Tree, Random Forest, Support Vector Machine, K- Nearest Neighbors, Logistic Regression Classifiers. Journal of Advanced Applied Scientific Research, 5(4), 42–54. https://doi.org/10.46947/joaasr54202368
Ridla, M. A., & Prayoga, M. N. (2026). Diabetes Prediction Analysis Using Decision Tree Method on Pima Indians Diabetes Dataset. Journal of Information System and Application Development, 4(1), 146–153. https://doi.org/10.26905/jisad.v4i1.16494
Siswoyo, B., & Nurhafidz, M. I. (2025). Penerapan Algoritma Random Forest Untuk Prediksi Risiko Diabetes Berdasarkan Data Kesehatan Pasien. Teknologi Informasi Digital (JTID), 1(1), 35–38. https://doi.org/10.31284/j.jtmik.2025.v5i2.XXXX
Sudestra, I. M. A., Wahyudi, A., Gama, O., & Prathama, G. H. (2026). Comparative Performance of Machine Learning Algorithms for Diabetes Prediction. Of Technology and Informatics (JoTI), 8(1), 50–61. https://doi.org/10.37802/joti.v8i1.1195
Wantoro, A., Yulia, A. F., Ayu, D. Y., Mustofa, S., Informatika, J. T., Pringsewu, U. A., Dokter, J. P., Kedokteran, F., & Lampung, U. (2025). Evaluasi Kinerja Algoritma Machine Learning (ML) ) Menggunakan Seleksi Fitur Pada Klasifikasi Diabetes. JIP (Jurnal Informatika Polinema), 11, 311–316. https://doi.org/10.33795/jip.v11i3.7290.
Yulianty, S., & Najib, M. K. (2025). Comparing the Accuracy of K-Nearest Neighbour (KNN ), Random Forest , and Decision Tree Methods in Predicting Diabetes. Al-Aqlu: Jurnal Matematika, Teknik Dan Sains, 3(2), 144–151. https://doi.org/10.59896/aqlu.v3i2.299
Zhafran, S., Krishandhie, R., & Purwinarko, A. (2025). Random Forest Algorithm Optimization using K-Nearest Neighbor and SMOTE on Diabetes Disease. Recursive Journal of Informatics, 3(1), 43–49. https://doi.org/10.15294/rji.v3i1.1576
Zhang, F. (2025). Comparative Analysis of Machine Learning Models in Early- Stage Diabetes Prediction. Proceedings of ICEGEE 2025 Symposium: Sensor Technology and Multimodal Data Analysis 2025, 1, 63–69. https://doi.org/10.54254/2753-8818/2025.AU24408