Large Language Model (LLM): A Sentiment Analysis with Customer Product Reviews

Natural Language Processing Techniques, Data Augmentation, Data Balancing and Fine-Tuning

11 min readFeb 23, 2024

1. Introduction

Nowadays due to the digital marketing and social media, the companies deal with a huge amount of the customer reviews, articles, and comments. In order to encounter this very enormous increase in data volume, subjectivity, and heterogeneity different tools can be used. Today’s article will however concentrate on a very important business approach known as Sentiment Analysis using a Large Language Model (LLM) for exploration of the effectiveness of the company’s marketing campaigns and product reviews left by customers.

First of all, the questions are:

a) What is Sentiment Analysis?
b) Is Sentiment Analysis a good tool for your company for analysis of product review?
c) If yes, how to use effectively the huge data left by customer (i.e. comments, and product reviews)?

2. What is Sentiment Analysis?

Sentiment analysis as a part of Natural Language Processing (NLP) can be used to identify and extract subjective information from text (left on companies’ websites or social media), such as customer opinions, customer attitudes, appraisals, emotions, etc. to determines whether the customer review is positive, negative, or neutral. The main idea of Sentiment Analysis is to use computational methods, train the models and automatically classify the digital text like a human would do.

Use Cases

Product Analysis
Sentiment Analysis helps to learn more about your product, use the feedback and so spots the advantages and drawbacks, improve the service, and even discovers the new potential features.
Customer Analysis/ Employee Feedback Analysis
using the reviews data identify gaps and so improve customer experience.
Market Research
Analysis and research of the market trends: gathering valuable insights about the current products and future trends.
Social Media Monitoring
Analysis of the success of the company media performance, customer acceptance of the product or detection new trends.

Approaches

Rule-based/ Lexicon Based
The score whether the text is positive, negative, or neutral, is based on the predefined lexicons. This approach scans the text, detects, classifies, and calculates the score for the words in the sentence, sum this score up. The final score determines the sentiment of the sentence.
Machine Leaning
This technique involves training of model to detect the relationships and patterns in the data. This method implies the training of algorithms on the known data and then use this to detect the sentiment on the unknown data.
Some of the algorithms which are traditionally used are Naïve Bayes, Support Vector Machines, Logistic Regression, Decision Trees, K-nearest neighbours etc.
Hybrid
Combination of rule-based and Machine Learning approaches

In this project we wouldn’t apply the “traditional” Machine Learning approaches. Instead, we will use pre-trained Large Language Model and fine tune it.

3. Dataset

For this project, we will use the data provided by Kaggle Consumer Reviews of Amazon Products. The dataset includes basic product information, rating, review text, category, timestamp etc. for each product.

For the sentiment analysis we will concentrate more on the columns “rating” and “reviews text”.

The first step is data cleaning of the text reviews. But don’t worry I will not bore you with all tedious cleaning steps as the data is quite clear. So, at this point, I will only convert the most common emojis into their text description.

:‑) “Happy face or smiley”
:-)) “Very happy”

The second step is to deep dive into the rating score. The graphic below shows us clear how many reviews we have per category. Noteworthy, that the data has more positive reviews (4–5) and less negative (1–3).

In the next step, we convert the rating score to the sentiment, where we bring together the rating 4–5 as positive and the rest as negative.

The pie chart demonstrates even more clear how disproportional our dataset is. We will work on this challenge in the Section Data Augmentation & Data Balancing.

4. Sentiment Analysis using pre-trained Large Language Model (LLM)

A Large Language Model (LLM) is a type of artificial intelligence (AI) in particular a deep learning algorithm that is trained on a huge dataset to perform various NLP tasks. Notably, that LLMs must be pre-trained and then fine-tuned for the further downstream tasks such as text summarization/ generation/ classification, question-answering, translation etc.

In addition, the architecture of LLMs involve transformer model. A transformer is build-up of multiple transformer blocks called layers (e.g. feed-forward layer, attention layer or normalization layer). The major point here is, that all this layers together enables the model to detect patterns and explore the relationships between tokens deeper.

As a starting point for the sentiment analysis, we will use pre-trained model from Hugging Face called SiEBERT — English-Language Sentiment Classification. This models is a fine-tuned checkpoint of RoBERTa-large (Liu et al. 2019) and enables as to differentiate between positive and negative sentiment. Moreover, the SiEBERT outperformed DistilBERT SST-2 on average by 15 p.p. (78.1 vs 93.2).

The easier way to use sentiment analysis is through pipeline:

roberta_classifier = pipeline(“sentiment-analysis”,model=”siebert/sentiment-roberta-large-english”,truncation = True)

And then get predictions for each review on the test set:

def get_predictions(data_test, classifier):
    """
    Create new column with predictions
    Input:
        data_test: DataFrame
    Output:
        data_test: DataFrame (with predictions)
    """
    data_test['roberta_sentiment'] = data_test['text'].apply(lambda x : classifier(x))
    data_test.loc[:,'predicted'] =data_test["roberta_sentiment"].apply(convert_predictions)
    return data_test

So, let us take these results as our baseline! Our next steps will be to fine-tune the existing model and see if we can improve the outcomes.

5. Fine-tuning

The advantage of using pre-trained models is, that it reduces computational costs. Remember, that “RoBERTa large model” is trained on 160GB of text and if you don’t want to spend many thousand dollars on training then use the pre-trained models!

However, if you have a specific task or dataset, it is a good way forward to use pre-trained model and then fine-tune it to your task.

In this project we will fine-tune the model using two techniques

1. We will do random oversampling using Back-Translation techniques (Section 6.3).

2. We will integrate weighted loss (Section 7.1) into the Trainer

In the next sections we will deeply go through the concepts behind data augmentation and working with imbalanced dataset. But if you are more interested in results you are free to jump to the Section 8 and to see the implementation, please check out the code on GitHub.

6. Data Augmentation in NLP

In the real NLP projects, it is usual that we will deal with imbalanced dataset or not enough data and so, our sentiment analysis project is not an exception.

The solution can be of course to collect more data. Unfortunately, this approach is either not possible or too time-consuming and expensive.

And so, data augmentation comes into play!

Data augmentation is an approach of synthetically increasing the training set by generating modified datapoint from exiting data.

Data augmentation methods

6.1 EDA (Easy Data Augmentation)

synonym replacement
For given sentence choose n words which are not stop words and replace each of this word with its synonym

In the previous section, we discussed the core concept behind Random Forest.
In the previous chapter, we discussed the core idea behind Random Forest.

random insertion
Using this approach randomly select n words, find synonym for them, and then insert these synonyms at the random position in sentence.

In the previous section, we discussed the core concept behind Random Forest.
In the previous section, we discussed chapter the core concept behind Random Forest idea.

random swap
In this case, randomly chose n words in the sentence and swap their positions

In the previous section, we discussed the core concept behind Random Forest.
In the previous concept, we discussed the core section behind Random Forest.

random deletion
Select n words in the sentence with probability p and remove them from sentence

In the previous section, we discussed the core concept behind Random Forest.
In the section, we discussed the concept behind Random Forest.

Please note, to find synonyms you can use thesaurus or Word Embeddings (defining similarity based on k-nearest-neighbour (KNN) and cosine similarity).

6.2. Text Generation

Generating whole sentence or make the sentence longer (adding few words) and at the same time ensure that the sentence stays similar to the original.

6.3. Back translation

In this approach, we translate the text from original text into some other language and then back to original language.

As you can see, the “back translated sentence” is quite similar to original sentence. However, between two sentences there are small differences.

Due to the fact, that there are lots training data for English language, in this project, we will use the Back-Translation technique (using Huggingface 🤗 Transformers) in order to create additional data in the training.

Huggingface model hub host plenty pre-trained models for different languages.

We will use the approach described in NLP Data Augmentation using 🤗 Transformers.

So, first of all we translate the amazon reviews from English to German using Google T5 model and then back from German to English using Bert2Bert.

def back_translation(input_text):
    """ Translate the text from English to German
        and then from German to English
    Input:
        pandas Series row
    Output 
        pandas Series row
    
    
    """
    review_en_to_de = translation_en_to_de(input_text)
    text_en_to_de = review_en_to_de[0]['translation_text']
    input_ids = tokenizer(text_en_to_de, return_tensors="pt", add_special_tokens=False,max_length=512,truncation=True).input_ids
    output_ids = model_de_to_en.generate(input_ids)[0]
    augmented_review = tokenizer.decode(output_ids, skip_special_tokens=True)
    return augmented_review

7. Working with Imbalanced Dataset

In this project we will test two techniques how to deal with imbalanced dataset.

1. oversampling for minority class through Back Translation

2. Integrated weighted loss into customized Trainer.

7.1 Data Balancing

The bias in the training dataset may have a high impact on the results of machine learning algorithms. The disbalance of the data can lead to a good performance of the majority class and high misclassifications quote on the minority. Whereas the favourable output for minority class is even more important.

The two major approaches of resampling the training data are undersampling and oversampling.

Random Undersampling

Random undersampling implies the deletion of data points in training data of majority class. The downside of this approach is that the deletion can lead in losing the important information.

Oversampling

During the random oversampling, the data points in training data will be duplicated for minority class. However, this may cause the overfitting of the model.

The one way to do oversampling is randomly selecting the data point with replacement and adding it to the training set. But in this project, instead of duplicating the negative examples, we will use the technique of back translation of randomly selected review and then add it to the training. We will repeat this process until the size of the examples from minority size is the same as of majority.

def create_data_samples(data):
    """
    Create new samples for training
    Input:
        data: pandas data frame
    Output:
        data: pandas data frame with additional sampels
    
    """
    count_labels= data["label"].value_counts()
    n = count_labels[1]-count_labels[0]
    # oversampling for negative class
    data_temp = data[data.label==0].sample(n=n, replace=True, random_state=1)
    data_temp.loc[:,'samples'] =data_temp["text"].apply(back_translation)
    data_temp = data_temp.drop('text', axis=1)
    data_temp.rename(columns={'samples': "text"},inplace=True)
    data_sampled = pd.concat([data_temp, data], ignore_index=True)
    data_sampled = shuffle(data_sampled, random_state=0)
    return data_sampled

7.2 Weighted Loss

In order to limit the impact of the major class in the training, we will integrate the weighted loss function and so force the model to learn more from the minority class. The main idea behind this approach is that we will increase the weights for the minority class. This will lead to the increase of the loss and as the results, the model will pay more attention to these samples.

CLASS_WEIGHTS=class_weight.compute_class_weight('balanced',classes=np.unique(train["label"]),y=train["label"])
CLASS_WEIGHTS=torch.tensor(CLASS_WEIGHTS,dtype=torch.float,device="mps")

class CustomTrainer(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False):
        labels = inputs.get("labels")
        # forward pass
        outputs = model(**inputs)
        logits = outputs.get('logits')
        # compute custom loss
        loss_fct = nn.CrossEntropyLoss(weight=CLASS_WEIGHTS,reduction='mean')
        loss = loss_fct(logits.view(-1, self.model.config.num_labels), labels.view(-1))
        return (loss, outputs) if return_outputs else loss

8. Results

Since we are dealing with imbalanced data set, to evaluate the models we will use f1-score. The reason is, that we want to pay more attention on the performance of the minor class.

The table below shows us, that the model with weighted loss outperformed the baseline model on the negative sentiment by 0.03 and on the positive by 0.01.

Turning to the results of the Random Oversampling with Back Translation we can observe the drop of the f1-score for negative reviews by 0.09 compared to weighted-loss and by 0.06 compared to the baseline.

9. Conclusion and Recommendation

In this project our goal was to use sentiment analysis (LLM) to address the companies needs and show the methods of how to analyze the huge online customers reviews of your product, brand, or service and further we have:

1. explored the advantages of pre-trained Large Language Models (LLM) for Sentiment Analysis.

2. Moreover, we have dived deeper into different Data Augmentation methods in NLP as well as Data Balancing techniques.

3. In addition, we’ve gotten profound understanding of how to fine-tune the pre-trained models and tested two techniques of working with imbalanced data set such as Random Oversampling with Back Translation and Weighted-Loss.

In conclusion, Sentiment Analysis project have shown that despite the fact that Random Oversampling with Back Translation hasn’t outperform the Baseline, but we still found out that Weighted-Loss turned to be effective approach not only with better performance but also with saving computational costs. Further this article made it very clear that for the weighted-loss, it is not necessary to create additional data and even more that, we could avoid using this increased dataset in our training what saved us time.

Recommendation

As you have seen in this project Sentiment analysis is a useful tool that I recommend companies to use in order to help companies.

1) pinpoint customers sentiment to a certain product, brand or service.

2) detect how to raise the customers satisfaction.

3) track social media for customers reference of a product, brand or service.

4) effectively improve marketing campaigns by understanding how customers are responding to their campaigns, marketers can make better decisions, increase ROI, and improve the customer experience.

For more information about this project check out my code on GitHub.