The business competition among different companies has exponentially increased in recent years. To remain in business, there is a pressing priority for an increased focus on customer satisfaction that ultimately fosters customer loyalty. The customer loyalty analysis is critically important to retaining current customers and attracting more new customers. The proposed study focuses on an efficient approach to determining the customer’s loyalty and satisfaction with a product. This is determined by using machine intelligence and sentiment analysis of a large dataset of product reviews which is obtained through Amazon. The novel Feature Selection Method is performed to improve performance for large data sets. This feature selection method is based on Dynamic Mutual Information (DMI) which helps in selecting only important features to reduce the dimensionality problems of very large datasets. Text preprocessing is performed initially, which includes Stopword removal, token, and lemma creation. SentiWordNet along with the Intelligent SVM technique is implemented for aspect-level sentiment analysis to categorize customer reviews into three different classes of loyalty.
Introduction
Recent advances in internet facilities have revolutionized and digitally transformed modern society. Recent innovations in technology have improved people’s lives by providing online banking, education, buying, and selling facilities for different products and services [1], online sales have increased instead of going to the mart or store for shopping. The pandemic has changed the way of shopping surprisingly. Nowadays, the e-commerce industry is growing rapidly by providing many online shopping websites. These sites include Amazon, eBay, Ali Express, and many more. One of the leading e-commerce websites, Amazon has had more than 4.7 billion trades in the past year with more than 400 million active customers [2] such e-commerce shopping sites create more comfort for users and sellers. However, some difficulties are there in these processes. One of the main issues faced by users is selecting trustworthy sellers for the best products. There is a need to provide a platform through which users get the best products according to their choices. This can be achieved by providing customer reviews to users on social networking sites [3, 4]. These reviews and feedback can assist new customers with better product selection.
Literature Review
Among the available techniques in literature, the first technique works with subjective reviews and the other works with objective reviews. The proposed study uses subjective reviews to extract sentiment scores from the SentiWordNet dictionary. The polarity of aspect level reviews is calculated which are Positive, Negative, and Neutral. The proposed technique works with an intelligent SVM algorithm to extract the overall customer loyalty level toward a product. 98.7% accuracy is achieved for aspect-level sentiment analysis [18]. Document-level Sentiment sorting is performed on a movie reviews dataset to analyze the sentiment levels of users for different movies [26]. Table 1 shows an overview of similar works identified in the literature.
Research Methodology
In the proposed research, the sentiment score of user reviews for products is evaluated in different steps. In the first step, customer review data of products is obtained from Amazon. DMI calculates the entropy of two variables, measures the relevancy of variables with each other, and assigns a score. SentiWordNet library is used to calculate the polarity scores of the selected features to obtain the sentiment level of the aspect. An aggregate score of the polarities is calculated to identify the overall customer loyalty toward the product. A support vector machine algorithm is used to classify the reviews based on constructing a Hyperplane to segregate classes. The better the Hyperplane the better the classification process will be. SVM's advantage is that it can treat outliers efficiently.
Polarity Analysis of Reviews
In the fourth step, POS tagging is applied to generate tags for individual terms. In the fifth step, DMI is used to generate a more reduced set of features so that performance is improved at the end. Entropy is calculated, and only those terms are considered that have values higher than the threshold value of 0.05. In the sixth step, SentiWordNet calculates the polarity score of only those terms that are filtered out, in this way only important terms are used in sentiment analysis. In the seventh step, the Sentence token score per word is computed. The sentence token score of important tokens is calculated to calculate the aspect level sentiment score as shown in equation (4).
In Step eight, an accumulative score of the sentence words is calculated using equation 1, in which the aggregate score is calculated. In step eight, the review level customer loyalty score is calculated to classify into positive, negative, or neutral classes. The review shown in Table 5 is obtained through the SentiWordNet dictionary. The last three rows in Table 5 show the percentage of review characteristics in different classes and the review is classified as positive review which is the highest percentage among all three classes.
Experiments and Results
The proposed methodology produces results in Rapidminer. SentiWordNet and SVM algorithms are used to generate results and predict customer loyalty. Input reviews are obtained from Amazon which are parsed, tokenized, and lemmatized. 5000 reviews are obtained about two different Samsung mobile phones. After preprocessing, features are extracted using a mutual information scheme, in which important features are filtered out using entropy. SentiWordNet 3.0 is used for PoS tagging and polarity score calculation. In this section, the sentiment analysis process and customer loyalty prediction using a single review are explained in Tables 4 and 5 for easy understanding. The aspect level sentiment score is calculated, and this score is aggregated to calculate the overall sentiment level of the customers. The percentage of positive negative and neutral reviews of 1000 reviews is calculated by using the following formula and presented in Figure 4.
.