Implementing TF-IDF and Logistic Regression for Sentiment Analysis of YouTube Comments on the iPhone 16

Andi Riswawan

doi:10.36378/jtos.v7i2.4753

Andi Riswawan STMIK IKMI CIREBON

DOI: https://doi.org/10.36378/jtos.v7i2.4753

Keywords: Sentiment Analysis, TF-IDF, Logistic Regression, YouTube, iPhone 16

Abstract

Sentiment analysis of user opinions on social media has become a crucial aspect in understanding public perception of technological products. This study specifically aims to classify and analyze public sentiment reflected in YouTube comments regarding the iPhone 16 by employing the Term Frequency-Inverse Document Frequency (TF-IDF) approach and the Logistic Regression algorithm. The data was collected from product review videos on the GadgetIn channel using web scraping techniques.The preprocessing stage included cleaning processes such as converting characters to lowercase (case folding), removing common words that do not carry sentiment meaning (stopword removal), and reducing words to their root forms (stemming). The feature extraction results obtained through TF-IDF were used as input for the Logistic Regression model to classify the comments into three categories of emotional expression: positive (supportive), neutral, and negative sentiments toward the discussed topic. The model’s effectiveness was evaluated using accuracy, precision, recall, and F1-score metrics. Based on the evaluation results, the model demonstrated a reasonably optimal performance in classifying user opinions. The findings indicate that the model performs with stability and accuracy in handling high-dimensional sentiment data. This research contributes to the development of text-based sentiment classification systems in the context of technology review analysis.

Downloads

Download data is not yet available.

References

I. G. B. A. Budaya and I. K. P. Suniantara, “Comparison of Sentiment Analysis Algorithms with SMOTE Oversampling and TF-IDF Implementation on Google Reviews for Public Health Centers,” MALCOM Indones. J. Mach. Learn. Comput. Sci., vol. 4, no. 3, pp. 1077–1086, 2024, doi: 10.57152/malcom.v4i3.1459.

H. Liu, X. Chen, and X. Liu, “A Study of the Application of Weight Distributing Method Combining Sentiment Dictionary and TF-IDF for Text Sentiment Analysis,” Ieee Access, vol. 10, pp. 32280–32289, 2022, doi: 10.1109/access.2022.3160172.

K. Kowsari, K. J. Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, “Text classification algorithms: A survey,” Inf., vol. 10, no. 4, pp. 1–68, 2019, doi: 10.3390/info10040150.

H. Oh, “A YouTube Spam Comments Detection Scheme Using Cascaded Ensemble Machine Learning Model,” Ieee Access, vol. 9, pp. 144121–144128, 2021, doi: 10.1109/access.2021.3121508.

M.-J. Kim and H.-Y. Yoo, “Identification of Key Service Features for Evaluating the Quality of Metaverse Services: A Text Mining Approach,” Ieee Access, vol. 12, pp. 6719–6728, 2024, doi: 10.1109/access.2024.3352008.

M. A. Qureshi et al., “Sentiment Analysis of Reviews in Natural Language: Roman Urdu as a Case Study,” Ieee Access, vol. 10, pp. 24945–24954, 2022, doi: 10.1109/access.2022.3150172.

F. Mehmood, M. U. G. Khan, M. A. Ibrahim, R. Shahzadi, W. Mahmood, and M. N. Asim, “A Precisely Xtreme-Multi Channel Hybrid Approach for Roman Urdu Sentiment Analysis,” Ieee Access, vol. 8, pp. 192740–192759, 2020, doi: 10.1109/access.2020.3030885.

K. Maity, S. Bhattacharya, S. Saha, and M. Seera, “A Deep Learning Framework for the Detection of Malay Hate Speech,” Ieee Access, vol. 11, pp. 79542–79552, 2023, doi: 10.1109/access.2023.3298808.

R. C. Morales-Hernández, J. Gutiérrez, and D. Becerra‐Alonso, “A Comparison of Multi-Label Text Classification Models in Research Articles Labeled With Sustainable Development Goals,” Ieee Access, vol. 10, pp. 123534–123548, 2022, doi: 10.1109/access.2022.3223094.

F. Alattar and K. Shaalan, “Using Artificial Intelligence to Understand What Causes Sentiment Changes on Social Media,” Ieee Access, vol. 9, pp. 61756–61767, 2021, doi: 10.1109/access.2021.3073657.

N. Zhao, H. Gao, X. Wen, and H. Li, “Combination of Convolutional Neural Network and Gated Recurrent Unit for Aspect-Based Sentiment Analysis,” Ieee Access, vol. 9, pp. 15561–15569, 2021, doi: 10.1109/access.2021.3052937.

L. Xiaoyan and R. C. Raga, “BiLSTM Model With Attention Mechanism for Sentiment Classification on Chinese Mixed Text Comments,” Ieee Access, vol. 11, pp. 26199–26210, 2023, doi: 10.1109/access.2023.3255990.

Y. Feng and Y. Cheng, “Short Text Sentiment Analysis Based on Multi-Channel CNN With Multi-Head Attention Mechanism,” Ieee Access, vol. 9, pp. 19854–19863, 2021, doi: 10.1109/access.2021.3054521.

A. H. Nasution and A. Onan, “ChatGPT Label: Comparing the Quality of Human-Generated and LLM-Generated Annotations in Low-Resource Language NLP Tasks,” Ieee Access, vol. 12, pp. 71876–71900, 2024, doi: 10.1109/access.2024.3402809.

A. Erkan and T. Güngör, “Analysis of Deep Learning Model Combinations and Tokenization Approaches in Sentiment Classification,” Ieee Access, vol. 11, pp. 134951–134968, 2023, doi: 10.1109/access.2023.3337354.

W. Ahmad, H. U. Khan, T. Iqbal, and S. Iqbal, “Attention-Based Multi-Channel Gated Recurrent Neural Networks: A Novel Feature-Centric Approach for Aspect-Based Sentiment Classification,” Ieee Access, vol. 11, pp. 54408–54427, 2023, doi: 10.1109/access.2023.3281889.

S. Smetanin, “The Applications of Sentiment Analysis for Russian Language Texts: Current Challenges and Future Perspectives,” Ieee Access, vol. 8, pp. 110693–110719, 2020, doi: 10.1109/access.2020.3002215.

Y. Lin, J. Li, L. Yang, K. Xu, and H. Lin, “Sentiment Analysis With Comparison Enhanced Deep Neural Network,” Ieee Access, vol. 8, pp. 78378–78384, 2020, doi: 10.1109/access.2020.2989424.

P. Thiengburanathum and P. Charoenkwan, “SETAR: Stacking Ensemble Learning for Thai Sentiment Analysis Using RoBERTa and Hybrid Feature Representation,” Ieee Access, vol. 11, pp. 92822–92837, 2023, doi: 10.1109/access.2023.3308951.

A. Sundaram, H. Subramaniam, S. H. A. Hamid, and A. M. Nor, “A Systematic Literature Review on Social Media Slang Analytics in Contemporary Discourse,” Ieee Access, vol. 11, pp. 132457–132471, 2023, doi: 10.1109/access.2023.3334278.

K. L. Tan, C. P. Lee, K. M. Lim, and K. S. M. Anbananthen, “Sentiment Analysis With Ensemble Hybrid Deep Learning Model,” Ieee Access, vol. 10, pp. 103694–103704, 2022, doi: 10.1109/access.2022.3210182.

A. Yousaf et al., “Emotion Recognition by Textual Tweets Classification Using Voting Classifier (LR-SGD),” Ieee Access, vol. 9, pp. 6286–6295, 2021, doi: 10.1109/access.2020.3047831.

J. Luo, M. Bouazizi, and T. Ohtsuki, “Data Augmentation for Sentiment Analysis Using Sentence Compression-Based SeqGAN With Data Screening,” Ieee Access, vol. 9, pp. 99922–99931, 2021, doi: 10.1109/access.2021.3094023.

T. Subba and T. S. Chingtham, “Comparative Analysis of Machine Learning Algorithms With Advanced Feature Extraction for ECG Signal Classification,” Ieee Access, vol. 12, pp. 57727–57740, 2024, doi: 10.1109/access.2024.3387041.

J. Khan, N. Ahmad, S. Khalid, F. Ali, and Y. Lee, “Sentiment and Context-Aware Hybrid DNN With Attention for Text Sentiment Classification,” Ieee Access, vol. 11, pp. 28162–28179, 2023, doi: 10.1109/access.2023.3259107.