Wenping Wang
Adv. Artif. Intell. Mach. Learn., 3 (3):1259-1273
Wenping Wang : Individual Researcher
DOI: 10.54364/AAIML.2023.1174
Article History: Received on: 26-May-23, Accepted on: 25-Jul-23, Published on: 02-Aug-23
Corresponding Author: Wenping Wang
Email: wenpingw@alumni.cmu.edu
Citation: Wenping Wang, Jin Han, Chen Liang, Tong Chen, Chengze Fan, Jingxian huang (2023). Sentiment Analysis: A Systematic Case Study with Yelp Scores. Adv. Artif. Intell. Mach. Learn., 3 (3 ):1259-1273
Sentiment Analysis is a classic and well-defined task for machine learning and natural language processing. Over the years, we have seen much progress in machine learning as a whole and in natural language processing. Given that in commercial applications, we are heavily constrained by cost, throughput and latency, we wonder how better accuracy can be brought about by using complex, high-latency models, than easy, low-latency models that can be deployed in embedded devices and in high throughput scenarios. In this article, we focus on the Yelp Review dataset as a test bench. By predicting Yelp overall ratings based on user review text and other related features, we experiment with various existing machine learning algorithms, from easy logistic regression to BERT embedding-based deep models. We also use ensemble to combine the aforementioned models into a single predictor, seeing if a combination of these models will achieve better performance. Among all the models, we can see that a simple TF-IDF baseline with MLP ensemble can reach an accuracy higher than pure MLP models, proving that in a production scenario, we may be able to emphasize throughput and latency by using small models, instead of relying on heavy, multi-layer MLPs, with proper vectorizer and data processing.