The The Use of the K-Means Algorithm in Analyzing E-Commerce Consumer Segmentation: A Case Study of the Online Retail Dataset (UK)
Abstract
This study aims to analyze consumer segmentation on e-commerce platforms by employing the K-Means algorithm as the primary clustering method. Using the Online Retail (UK) dataset, which contains comprehensive transaction records from a UK-based online retail company, the research focuses on identifying behavioral patterns among consumers. Several key variables, including purchase frequency, total transaction value, and recency or visit time, are processed to create meaningful clusters that represent different types of consumer behavior. The K-Means algorithm is applied through a series of preprocessing steps, such as data cleaning, feature selection, and normalization to ensure accurate clustering results. Once the clusters are formed, each consumer group is analyzed to determine its characteristics, purchasing tendencies, and potential value to the business. The segmentation results provide valuable insights for businesses in developing targeted marketing strategies and personalized service offerings. By understanding the unique preferences and behaviors within each cluster, companies can optimize promotional efforts, improve customer retention, and enhance overall user experience. The findings indicate that data-driven segmentation using the K-Means algorithm is a highly effective approach for gaining deeper, actionable insights into consumer behavior, thereby supporting more strategic decision-making in the e-commerce environment.
Downloads
References
M. Alves Gomes and T. Meisen, A review on customer segmentation methods for personalized customer targeting in e-commerce use cases, vol. 21, no. 3. Springer Berlin Heidelberg, 2023. doi: 10.1007/s10257-023-00640-4.
J. M. John, O. Shobayo, and B. Ogunleye, “An Exploration of Clustering Algorithms for Customer Segmentation in the UK Retail Market,” Analytics, vol. 2, no. 4, pp. 809–823, 2023, doi: 10.3390/analytics2040042.
M. Dibak et al., “UNICON: A Unified Framework for Behavior-Based Consumer Segmentation in E-Commerce,” Lect. Notes Electr. Eng., vol. 1299, pp. 53–71, 2025, doi: 10.1007/978-3-031-76878-1_4.
P. Rajapandian, A. Karunamurthy, V. Vasanth, and M. Meganathan, “Journal of Engineering Technology and Applied Physics E-Commerce Customer Segmentation: A Clustering Approach in A Web-Based Platform,” J. Eng. Technol. Appl. Phys., vol. 7, no. 1, pp. 2682–8383, 2025.
A. Wasilewski, “Customer segmentation in e-commerce: a context-aware quality framework for comparing clustering algorithms,” J. Internet Serv. Appl., vol. 15, no. 1, pp. 160–178, 2024, doi: 10.5753/jisa.2024.3851.
Laila Ali Putri, Mazayah Tsaqofah, Dea Syahfira Hasibuan, Hasti Fadillah, Maria Ulfa, and Mhd.Furqan, “Application of K-Means Clustering Algorithm for E-Commerce Data Analysis,” J. Artif. Intell. Eng. Appl., vol. 4, no. 3, pp. 2364–2367, 2025, doi: 10.59934/jaiea.v4i3.1170.
L. R. Singrapati, R. Dora, and R. Kurniawan, “Pengelompokkan Toko Kaus Termurah E-Commerce Shopee berdasarkan Reputasi Toko Menggunakan Metode Clustering K-Medoids dan K-Means,” J. Sist. dan Teknol. Inf., vol. 12, no. 1, p. 65, 2024, doi: 10.26418/justin.v12i1.69067.
Y. Putri, D. Aldo, and W. Ilham, “Retail Marketing Strategy Optimization: Customer Segmentation with Artificial Intelligence Integration and K-Means Clustering,” Sinkron, vol. 8, no. 4, pp. 2155–2163, 2024, doi: 10.33395/sinkron.v8i4.14000.
Lulu Yu, “The Application of K-means Clustering Algorithm in the Evaluation of E-Commerce Websites,” J. Electr. Syst., vol. 20, no. 6s, pp. 759–769, 2024, doi: 10.52783/jes.2738.
G. ASLANTAŞ, M. GENÇGÜL, M. RUMELLİ, M. ÖZSARAÇ, and G. BAKIRLI, “Customer Segmentation Using K-Means Clustering Algorithm and RFM Model,” Deu Muhendis. Fak. Fen ve Muhendis., vol. 25, no. 74, pp. 491–503, 2023, doi: 10.21205/deufmd.2023257418.
R. W. Sembiring Brahmana, F. A. Mohammed, and K. Chairuang, “Customer Segmentation Based on RFM Model Using K-Means, K-Medoids, and DBSCAN Methods,” Lontar Komput. J. Ilm. Teknol. Inf., vol. 11, no. 1, p. 32, 2020, doi: 10.24843/lkjiti.2020.v11.i01.p04.
A. Khumaidi, H. Wahyono, R. Darmawan, H. D. Kartika, N. L. Chusna, and M. K. Fauzy, “RFM-AR Model for Customer Segmentation using K-Means Algorithm,” E3S Web Conf., vol. 465, 2023, doi: 10.1051/e3sconf/202346502005.
A. Griva, E. Zampou, V. Stavrou, D. Papakiriakopoulos, and G. Doukidis, “A two-stage business analytics approach to perform behavioural and geographic customer segmentation using e-commerce delivery data,” J. Decis. Syst., vol. 33, no. 1, pp. 1–29, 2024, doi: 10.1080/12460125.2022.2151071.
G. Wang, “Customer segmentation in the digital marketing using a Q-learning based differential evolution algorithm integrated with K-means clustering,” PLoS One, vol. 20, no. 2 February, pp. 1–21, 2025, doi: 10.1371/journal.pone.0318519.
W. Zhang and Z. Wu, “E-commerce recommender system based on improved K-means commodity information management model,” Heliyon, vol. 10, no. 9, p. e29045, 2024, doi: 10.1016/j.heliyon.2024.e29045.
. W. Ahmad, H. U. Khan, T. Iqbal, and S. Iqbal, “Attention-Based Multi-Channel Gated Recurrent Neural Networks: A Novel Feature-Centric Approach for Aspect-Based Sentiment Classification,” Ieee Access, vol. 11, pp. 54408–54427, 2023, doi: 10.1109/access.2023.3281889.
. S. Smetanin, “The Applications of Sentiment Analysis for Russian Language Texts: Current Challenges and Future Perspectives,” Ieee Access, vol. 8, pp. 110693–110719, 2020, doi: 10.1109/access.2020.3002215.
. Y. Lin, J. Li, L. Yang, K. Xu, and H. Lin, “Sentiment Analysis With Comparison Enhanced Deep Neural Network,” Ieee Access, vol. 8, pp. 78378–78384, 2020, doi: 10.1109/access.2020.2989424.
. P. Thiengburanathum and P. Charoenkwan, “SETAR: Stacking Ensemble Learning for Thai Sentiment Analysis Using RoBERTa and Hybrid Feature Representation,” Ieee Access, vol. 11, pp. 92822–92837, 2023, doi: 10.1109/access.2023.3308951.
. Sundaram, H. Subramaniam, S. H. A. Hamid, and A. M. Nor, “A Systematic Literature Review on Social Media Slang Analytics in Contemporary Discourse,” Ieee Access, vol. 11, pp. 132457–132471, 2023, doi: 10.1109/access.2023.3334278.
. K. L. Tan, C. P. Lee, K. M. Lim, and K. S. M. Anbananthen, “Sentiment Analysis With Ensemble Hybrid Deep Learning Model,” Ieee Access, vol. 10, pp. 103694–103704, 2022, doi: 10.1109/access.2022.3210182.
. Yousaf et al., “Emotion Recognition by Textual Tweets Classification Using Voting Classifier (LR-SGD),” Ieee Access, vol. 9, pp. 6286–6295, 2021, doi: 10.1109/access.2020.3047831.
. J. Luo, M. Bouazizi, and T. Ohtsuki, “Data Augmentation for Sentiment Analysis Using Sentence Compression-Based SeqGAN With Data Screening,” Ieee Access, vol. 9, pp. 99922–99931, 2021, doi: 10.1109/access.2021.3094023.
. T. Subba and T. S. Chingtham, “Comparative Analysis of Machine Learning Algorithms With Advanced Feature Extraction for ECG Signal Classification,” Ieee Access, vol. 12, pp. 57727–57740, 2024, doi: 10.1109/access.2024.3387041.
. J. Khan, N. Ahmad, S. Khalid, F. Ali, and Y. Lee, “Sentiment and Context-Aware Hybrid DNN With Attention for Text Sentiment Classification,” Ieee Access, vol. 11, pp. 28162–28179, 2023, doi: 10.1109/access.2023.3259107.
Copyright (c) 2025 Ardo Kusdaryanto, Christoporus Dimas Wijanarko, Paskalis Dwi Widyantara Usat, Ary Prabowo

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
This is an open-access article distributed under the terms of the Creative Commons Attribution-ShareAlike 4.0 International License which permits unrestricted use, distribution, and reproduction in any medium. Users are allowed to read, download, copy, distribute, search, or link to full-text articles in this journal without asking by giving appropriate credit, provide a link to the license, and indicate if changes were made. All of the remix, transform, or build upon the material must distribute the contributions under the same license as the original.












