Application of K-Means, Random Forest, and Linear Regression to Improve the Accuracy of Kaggle E-Commerce Shopping Behavior Analysis
Article Sidebar
Abstract:
Background: Analyzing customer shopping behavior is essential for improving e-commerce marketing strategies. The use of machine learning enables the identification of purchasing patterns and enhances predictive accuracy in understanding customer preferences.
Aims: This study aims to improve model accuracy and classification performance in analyzing customer shopping behavior using the shopping_behavior_updated.csv dataset from Kaggle. The scope includes the application of both supervised and unsupervised learning techniques for segmentation, classification, and prediction tasks.
Methods: Three machine learning algorithms were applied: K-Means for customer segmentation, Random Forest Classifier for product category prediction, and Linear Regression for estimating purchase amounts. The research involved systematic preprocessing steps, including data cleaning, encoding, scaling, and feature engineering to enhance data quality and model interpretability.
Result: The results show that the optimized Random Forest model achieved 100% accuracy. K-Means clustering produced five distinct customer segments with an inertia value of 41,928.17 and a silhouette score of 0.065. However, the Linear Regression model demonstrated poor performance with an R² value of -0.02.
Conclusion: The findings indicate that integrating supervised and unsupervised learning methods is effective in identifying customer purchasing patterns and can contribute to improving e-commerce marketing strategies, although not all predictive models yield optimal performance.
Keywords: Customer Shopping Behavior, K-Means Clustering , Linear Regression , Machine Learning , Random Forest
Copyright (c) 2026 Azzahra Risa Putri, Anggraini Dwi Olivia, Virha Charoline Josica

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
References
Andrian, Y., & Muliono, R. (2025). Application of K-Means Algorithm in Customer Segmentation to Improve Marketing Strategy in E-Commerce. INCODING: Journal of Informatic and Computer Science Engineering, 5(1), 95–108. https://doi.org/10.34007/incoding.v5i1.825
Ardana, C. H., Khoyum, A. A. A. A. A., & Faisal, M. (2024). Segmentasi Pelanggan Penjualan Online Menggunakan Metode K-means Clustering. JISKA (Jurnal Informatika Sunan Kalijaga), 9(1), 1–9. https://doi.org/10.14421/jiska.2024.9.1.1-9
Ashraf, A., Rayed, C. A., Awad, N. A., & Sabry, H. M. (2025). A Framework for Customer Segmentation to Improve Marketing Strategies Using Machine Learning. Procedia Computer Science, 260, 616–625. https://doi.org/10.1016/j.procs.2025.03.240
Awalina, E. F. L., & Rahayu, W. I. (2023). Optimalisasi Strategi Pemasaran dengan Segmentasi Pelanggan Menggunakan Penerapan K-Means Clustering pada Transaksi Online Retail. Jurnal Teknologi dan Informasi, 13(2), 122–137. https://doi.org/10.34010/jati.v13i2.10090
Chen, X. (2025). Consumer Online Shopping Behavior Prediction Based on Machine Learning Algorithm. Procedia Computer Science, 262, 1395–1401. https://doi.org/10.1016/j.procs.2025.05.187
Chiquita, E., Assyarif, Z., & Nuryana, I. K. D. (2025). Penerapan Klasifikasi Pelanggan Berdasarkan Segmentasi Pelanggan pada UMKM Monex Toys Bekasi. Modem: Jurnal Informatika dan Sains Teknolog, 3(3), 45–67. https://doi.org/10.62951/modem.v3i3.533
Demir, D., & Karahan Adalı, G. (2025). Intelligent Customer Segmentation in Digital Commerce Using K-Means Clustering. International Journal of Innovative Science and Research Technology, 5(9), 757. https://doi.org/10.38124/ijisrt/25dec513
Fauzan, R. M., & Alfian, G. (2024). Segmentasi Pelanggan E-Commerce Menggunakan Fitur Recency, Frequency, Monetary (RFM) dan Algoritma Klasterisasi K-Means. JISKA (Jurnal Informatika Sunan Kalijaga), 9(3 SE-Articles), 170–177. https://doi.org/10.14421/jiska.2024.9.3.170-177
Gomes, A., Miguel, Meisen, & Tobias. (2023). A Review on Customer Segmentation Methods for Personalized Customer Targeting in E-Commerce Use Cases. Information Systems and e-Business Management, 2(3), 527–570. https://doi.org/10.1007/s10257-023-00640-4
Houssein, E. H., Mahdy, M. A., Shebl, D., Manzoor, A., Sarkar, R., & Mohamed, W. M. (2022). An efficient slime mould algorithm for solving multi-objective optimization problems. Expert Systems with Applications, 187, 115870. https://doi.org/10.1016/j.eswa.2021.115870
Jain, V., Malviya, B., & Arya, S. (2021). An Overview of Electronic Commerce (e-Commerce). Journal of Contemporary Issues in Business and Government, 27(03), 665–670. https://doi.org/10.47750/CIBG.2021.27.03.090
Japardi, A., Edi, & Tarigan, F. A. (2025). Application of K-means Clustering Algorithm in Consumer Shopping Behavior Segmentation in E-commerce. Journal of Artificial Intelligence and Engineering Applications (JAIEA), 5(1), 1505–1508. https://doi.org/10.59934/jaiea.v5i1.1659
John, J. M., Shobayo, O., & Ogunleye, B. (2023). An Exploration of Clustering Algorithms for Customer Segmentation in the UK Retail Market. Analytics, 2(4), 809–823. https://doi.org/10.3390/analytics2040042
Karulkar, Y., Srivastava, S., Nandwana, R., & Stanley, S. (2025). The Growing Complexity of Consumer Choices: Unravelling Consumer Patterns with K-Means and Fuzzy Logic. Journal of Statistical Theory and Applications, 24(4), 1165–1195. https://doi.org/10.1007/s44199-025-00143-w
Khan, R., Qaisar, Z. H., Mehmood, A., Ali, G., Alkhalifah, T., & Alturise, F. (2022). applied sciences A Practical Multiclass Classification Network for the Diagnosis of Alzheimer ’ s Disease. Applied Sciences, 12(13), 6507. https://doi.org/10.3390/app12136507
Li, Z. (2025). Customer Segmentation and Churn Prediction Based on K-Means and Random Forest: A Case Study of E-Commerce Data. Eurasia Journal of Science and Technology, 7(2), 14–19. https://doi.org/10.61784/ejst3071
Nehemia, Jekoniah Nahum Pakage, Veronica Lois, & Regina Arieskha. (2026). E-Commerce Customer Segmentation Application Based on the K-Means Algorithm. Journal of Artificial Intelligence and Engineering Applications (JAIEA), 5(2), 2546–2550. https://doi.org/10.59934/jaiea.v5i2.1922
Nigam, A., Maurya, A., Tiwari, S. K., & Pachauri, P. (Dr. . S. (2026). Customer Segmentation Using Unsupervised Machine Learning Techniques: A Data Driven Approach. Iconic Research And Engineering Journals, 9(10), 1977–1981. https://doi.org/10.64388/IREV9I10-1716518
Pera, R., Menozzi, A., Abrate, G., & Baima, G. (2021). When cocreation turns into codestruction. Journal of Business Research, 128, 222–232. https://doi.org/10.1016/j.jbusres.2021.01.058
Rajapandian, P., Karunamurthy, A., Vasanth, V., & Meganathan, M. (2025). E-Commerce Customer Segmentation: A Clustering Approach in A Web-Based Platform. Journal of Engineering Technology and Applied Physics, 7(1), 71–79. https://doi.org/10.33093/jetap.2025.7.1
Sheykhmousa, R. M., Mahdianpari, M., Ghanbari, H., Mohammadimanesh, F., Ghamisi, P., & Homayouni, S. (2020). Support Vector Machine vs. Random Forest for Remote Sensing Image Classification: A Meta-analysis and systematic review. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13, 6308–6325. https://doi.org/10.1109/JSTARS.2020.3026724
Siagian, R., Sirait, P. S. P., & Halima, A. (2021). E-Commerce Customer Segmentation Using K-Means Algorithm and Length, Recency, Frequency, Monetary Model. Journal of Informatics and Telecommunication Engineering, 5(1), 1–12. https://doi.org/10.31289/jite.v5i1.5182
Tabianan, K., Velu, S., & Ravi, V. (2022). K-Means Clustering Approach for Intelligent Customer Segmentation Using Customer Purchase Behavior Data. Sustainability, 14(12), 1–15. https://doi.org/10.3390/su14127243
Walean, F., Walean, R., & Koyongian, Y. (2025). Influential Elements in Consumer Decision-Making on E-Commerce Applications: A Study in North Minahasa. Jurnal Economic resources, 8(1), 350–361. https://doi.org/10.57178/jer.v8i1.1400
Zaghloul, M., Barakat, S., & Rezk, A. (2025). Enhancing customer retention in Online Retail through churn prediction: A hybrid RFM, K-means, and deep neural network approach. Expert Systems with Applications, 290, 128465. https://doi.org/10.1016/j.eswa.2025.128465