Abstract:
Consumer review sites, social media and micro-blogs carry a wealth of
information on the general perspective, experience and feedback that consumers
have on products. When there is a high volume of product reviews, it can be
challenging to product developers to sift through and make a decision based on
consumers’ sentiments. Sentiment Analysis, a branch of Artificial Intelligence,
assists in providing data to help businesses understand customers’ desire and
track how brands and goods are perceived. When performing Sentiment
Analysis, feature extraction, converts raw text input into a machine learning
compatible format. A strong feature set is necessary in order to achieve high
prediction and object classification accuracy. Identifying an optimal feature set
combination is critical for increasing the overall performance of data
classification. In this research, we tackle this problem by identifying an optimal
feature extraction technique for product review Sentiment Analysis using a
feature-level analysis. N-gram, POS and techniques based on the lexicons
Stanford CoreNLP, TextBlob, and SentiWordNet in different combinations are
examined. Multinomial Naïve Bayes, Lexicon and Multinomial Naïve Bayes +
Unsupervised Lexicon ensemble classifiers were modeled for classification of the
reviews into positive, neutral and negative classes thereby identifying the
optimal feature combination. We explored optimal feature extraction technique
based on real product reviews datasets for two products; a car make and model
known as ―Nissan Sentra‖ and a mobile phone product known as ―Samsung
Galaxy A12‖. The optimal feature extraction technique for MNB and MNB +
Lexicon ensemble classifications was provided by a combination of N-Gram, Part
of Speech and TextBlob features while the optimal technique for unsupervised
Lexicon was provided by a combination of N-Gram, Part of Speech and VADER.