Mohamed Ihab Khalifa | Freelancer Recurrent Neural Network For Twitter Sentiment Analysis

Recurrent Neural Network for Twitter Sentiment Analysis

Recurrent Neural Network for Twitter Sentiment Analysis This project involves the implementation and employment of a recurrent neural network with embedding and advanced attentional mechanisms for sentiment analysis, particularly training on and analyzing tweets. Sentiment analysis, as implied by the term, involves analyzing the content of a given text to determine the emotional undertone or sentiment most appropriate to it. This practice is particularly helpful for a variety of purposes, chief among them is understanding and learning from public perception and public opinion, whether to understand the public stance towards a particular topic, company, or service; for research and development; to better manage and monitor social spaces (for instance by uprooting hateful, inciting or inflammatory content); or even to better understand the general social and political climate, which has a direct bearing on the stocks and forex markets and can thus inform decisions related thereto. Indeed, with the unrelenting increased digitization of the public sphere, sentiment analysis, especially as applied to large-scale social media platforms like X/twitter, has become pivotal for companies everywhere to better manage the public reception of their products and services and to ensure their betterment and meet customers' expectations aptly, thus greatly aiding companies' pursuit of growth and quality. To that end, I endeavoured to develop a recurrent neural network with various attentional and optimization mechanisms and train it on a large dataset of tweets to learn to distinguish between different sentiment and identify the correct sentiment for each individual tweet. The dataset presented here was taken from Kaggle, which you can quickly access from the following link. This dataset is comprised of approximately 27,500 tweets, each tweet labelled in advance with the sentiment appropriate to it. Sentiment labels present are simply "positive", "negative", or "neutral". Each row in the data includes the tweet's text content and a corresponding sentiment label. The goal here, as mentioned, is to develop a neural network that learns from this dataset to correctly identify the sentiment or emotion that perfectly fits a given post. You can view each column and its description in the table below: Variable textID text selected_text sentiment Description Unique identifier for each tweet Raw text of the tweet Most relevant or informative part of the tweet for deciding sentiment Sentiment label corresponding to the tweet (neutral, positive, or negative) Prior to model development, the dataset was quickly inspected and cleaned before being thoroughly preprocessed and prepared for training. Preparatory steps included removing hyperlinks, hashtags, stop words, emojis and emoticons, and lemmatization. The tweets were then tokenized and padded to be fed appropriately to the model. A recurrent neural network was then developed and trained for sentiment analysis. It was also endowed with several attentional capabilities to facilitate sentiment analysis. As such, this network roughly consisted of the following: an embedding layer for word embedding; a mask layer, which specialized in masking sentiment or emotional words, adding more emphasis to them during training; a bidirectional Long-Short Term Memory (LSTM) layer to learn context and semantic dependencies in the data; a self-attention for added importance on the most relevant parts of the text; and finally a dense layer with 3 units for classification. Each layer fedforward to the next one before terminating at the classification layer to identify the sentiment appropriate to a given tweet. Finally, the network was then tested on a separate testing set for a final evaluation. The network yielded considerably favorable results. Overall, the project is comprised of 3 sections: 1) Reading and Inspecting the Data 2) Data Preparation and Preprocessing 3) Model Development and Evaluation Install Required Modules In [ ]: !pip install numpy pandas matplotlib seaborn nltk scikit-learn tensorflow -- Importing Python Modules In [ ]: #import necessary modules import os import re import math import random import string import numpy as np import pandas as pd import seaborn as sns import tensorflow as tf import matplotlib.pyplot as plt from nltk.corpus import stopwords from nltk.stem import WordNetLemmatizer from tensorflow.keras import layers, optimizers, regularizers from tensorflow.keras.callbacks import EarlyStopping, Callback from tensorflow.keras.preprocessing.text import Tokenizer from tensorflow.keras.preprocessing.sequence import pad_sequences from sklearn.metrics import accuracy_score, precision_score, recall_score, f from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression import warnings warnings.simplefilter('ignore') os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' #Adjust pandas data display settings pd.set_option('display.max_colwidth', 100) #Set plotting context and style sns.set_context('notebook') sns.set_style('white') %matplotlib inline Random Seed In [ ]: #Set random seed for reproducible results rs = 121 #set global random seed to libraries used random.seed(rs) np.random.seed(rs) tf.random.set_seed(rs) Defining Helper Functions for data analysis and visualization In [3]: #Defining a function to compute and report error scores def error_scores(ytest, ypred, model_accuracy, classes): error_metrics = { 'Accuracy': model_accuracy, 'Precision': precision_score(ytest, ypred, average=None), 'Recall': recall_score(ytest, ypred, average=None), 'F1 score': f1_score(ytest, ypred, average=None), } return pd.DataFrame(error_metrics, index=classes).apply(lambda x:round(x #Define function to plot the confusion matrix using a heatmap def plot_cm(cm, labels): plt.figure(figsize=(10,7)) hmap = sns.heatmap(cm, annot=True, fmt='g', cmap='Blues', xticklabels=labels, yticklabels=labels) hmap.set_xlabel('Predicted Value', fontsize=13) hmap.set_ylabel('Truth Value', fontsize=13) plt.tight_layout() #Define custom function to visualize model training history def plot_training_history(run_histories: list, metrics: list = None, title=' #If no specific metrics are given, infer them from the first history obj if not metrics: metrics = [key for key in run_histories[0].history.keys() if 'val_' else: metrics = [metric.lower() for metric in metrics] #Set up the number of rows and columns for the subplots n_metrics = len(metrics) n_cols = min(3, n_metrics) #Limit to a max of 3 columns for better read n_rows = math.ceil(n_metrics / n_cols) #Set up colors to use colors = ['steelblue', 'red', 'skyblue', 'orange', 'indigo', 'green', 'D #Ensure loss first is plotted first if 'loss' in metrics: metrics.remove('loss') metrics.insert(0,'loss') #Initialize the figure and axes fig, axes = plt.subplots(n_rows, n_cols, figsize=(7.5*n_cols, 5 * n_rows axes = axes.flatten() if n_metrics > 1 else [axes] #Loop over each metric and create separate subplots for i, metric in enumerate(metrics): #Initialize starting epoch epoch_start = 0 for j, history in enumerate(run_histories): epochs_range = range(epoch_start, epoch_start + len(history.epoc #Plot training and validation metrics for each run history axes[i].plot(epochs_range, history.history[metric], color=colors # axes[i].set_xticks(epochs_range) if f'val_{metric}' in history.history: axes[i].plot(epochs_range, history.history.get(f'val_{metric #Update the epoch start for the next run epoch_start += len(history.epoch) #Set the titles, labels, and legends axes[i].set(title=f'{metric.capitalize()} over Epochs', xlabel='Epoc axes[i].legend(loc='best') #Remove any extra subplots if the grid is larger than the number of metr for k in range(i + 1, n_rows * n_cols): fig.delaxes(axes[k]) fig.suptitle(title, fontsize=16, y=(0.95) if n_rows>1 else 0.98) plt.show() #Define custom function decode tokens, returning them to raw text def decode_tokens(indexed_tokens, idx2word_dict): return ' '.join([idx2word_dict[index] for index in indexed_tokens if index Part One: Reading and Inspecting the Data Loading and reading the dataset In [4]: #Access and read data into dataframe df = pd.read_csv('Tweets.csv') #Report total count print(f'Total number of tweets: {df.shape[0]:,}') Total number of tweets: 27,481 Inspecting the data Previewing the data In [5]: #Show a random sample of 10 tweets df.sample(10) Out[5]: textID 25212 e145f04c-c993b420a 17826 dbc02fbb6f 9905 8c32aab25f-d42b 21691 6bd9ddc-cc8138fd 11057 f973d336f-e9dce94a 9947 23fe4acb90 text selected_text sentiment D= indeed D= indeed neutral they are terrible little beast but if the garden is small you can e terrible negative simply collect them, or you c... I really wish i would hear from I really wish i would neutral josh hear from josh You Got Twitter! Yayy **** Yayy positive installed the iNav iBlue v2 Theme...gives a fresh feel fresh positive http://twitpic.com/4jfg4 if you`re in leeds _B if you`re in leeds you can _Byou can have one neutral have one from me from me One more thing 'Shattered' is amazing positive an amazing song by O.A.R. hiya! did you get a hiya! did you get a picture of picture of your your converse?? GET YOUR converse?? GET neutral CONVERSE OUT! YOUR CONVERSE OUT! Happy Mama`s day to all Happy positive mothers Will try to make it there at Will try to make it neutral 6:30pm there at 6:30pm Checking number of entries and data type per column In [6]: #Inspect columns, data types, number of non-null entries df.info() RangeIndex: 27481 entries, 0 to 27480 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------------------- ----0 textID 27481 non-null object 1 text 27480 non-null object 2 selected_text 27480 non-null object 3 sentiment 27481 non-null object dtypes: object(4) memory usage: 858.9+ KB Descriptive Statistics In [7]: #get statistical overview df.describe().T Out[7]: textID text selected_text sentiment count- unique top freq 27481 cb774db0d- I`d have responded, if I were going 1 22463 good 199 3 neutral 11118 Drop empty rows In [8]: #Drop null entries df = df.dropna(ignore_index=True) #report number of empty rows after dropping print('Number of empty rows:', df.isnull().sum().sum()) Number of empty rows: 0 Now having gleaned a general overview of the data, we can proceed to part two: preparing and preprocessing the data to make it ready for model development and training. Part Two: Data Preparation and Preprocessing In this section, I will start preparing the data and performing text preprocessing on the tweets column in anticipation of modelling the data. First, I will start with the target variable, sentiment, performing label encoding to give numeric labels to the sentiment classes and make them viable for analysis. Then, I will turn to the text data (tweets), performing text preprocessing to cut through the clutter and filter the text to its most informative elements. This process will consist of the following: 1. Lowercasing and removing whitespaces 2. Removing hyperlinks 3. Removing mentions and hashtags 4. Removing punctuations 5. Removing stop words 6. Lemmatizing the text, particularly nouns, reducing each to its dictionary root. 7. Removing emojis, emoticons, and symbols. 8. Text tokenization, converting text sequences to numeric sequences, whereby words are represented as unique numeric tokens. 9. Sequence padding, ensuring all sequences are of the same size. This should help make the analysis concise and focused on what's most relevant in the text and thus facilitate sentiment analysis and classification. Finally, having preprocessed the text thoroughly, I will identify the predictor and target variables and perform data splitting. I shall start with label encoding. Label Encoding For label encoding, I will assign 0 to neutral, 1 to positive sentiments, and 2 to negative ones. In [9]: #Perform label encoding on the target class classes = {'neutral': 0, 'positive': 1, 'negative': 2} #Replace string labels with numeric labels df['sentiment'] = df['sentiment'].replace(classes) Class Distribution In [10]: #Examine Class Distribution print('Class Distribution (in %):\n') print(df['sentiment'].value_counts(normalize=True).apply(lambda x: f'{x*100: #Visualizing the class distribution using count plot plt.figure(figsize=(10,7)) ax = sns.countplot(x=df['sentiment'], hue=df['sentiment'], order=[2,0,1], hu ax.set_title('Class Distribution', fontsize=20, pad=14) ax.set_xlabel('Sentiment', fontsize=14) ax.set_xticklabels(['negative', 'neutral', 'positive']) ax.legend(labels=['negative', 'neutral', 'positive']) plt.show() Class Distribution (in %): sentiment 0 40.45% 1 31.23% 2 28.32% Name: proportion, dtype: object Text Preprocessing Now I will proceed to deal with the text data in particular. To facilitate the process, I will create a custom function that handles most of the text processing steps in one go and apply it on the dataset. This function, preprocess_text , should normalize the text, remove hyperlinks, mentions, hashtags, punctuations, and stopwords, and lastly lemmatize the text. I will also create a second function, remove_emojis , that will identify and remove all emojis, emoticons and symbols from the data. This should output a completely clean version of the text. Finally, I will perform text tokenization and padding. In [11]: #Instantiate nltk's lemmatizer and stop words' list lemmatizer = WordNetLemmatizer() stop_words = set(stopwords.words('english')) #Preprocessing function def preprocess_text(text): text = text.lower().strip() #lowercase and remove whitespaces text = re.sub(r"http\S+|www\S+|https\S+", '', text, flags=re.MULTILINE) text = re.sub(r'@\w+|#\w+', '', text) #remove mentions and hashtags text = re.sub(f'[{re.escape(string.punctuation)}]', '', text) #remove p text = text.split() #split string to list text = [word for word in text if word not in stop_words] #remove stop w text = [lemmatizer.lemmatize(word) for word in text] #lemmatize text ( return ' '.join(text) #Define a function to remove emojis and emoticons from text def remove_emojis(text): #Regex pattern to match emojis and emoticons emoji_pattern = re.compile( "[" u"\U0001F600-\U0001F64F" #emoticons u"\U0001F300-\U0001F5FF" #symbols & pictographs u"\U0001F680-\U0001F6FF" #transport & map symbols u"\U0001F700-\U0001F77F" #alchemical symbols u"\U0001F1E0-\U0001F1FF" #flags (iOS) "]+", flags=re.UNICODE) return emoji_pattern.sub(r'', text) #remove emojis #Preprocess all tweets and save into new column df['text_preprocessed'] = df['text'].apply(preprocess_text).replace('', np.n #Drop empty rows df = df.dropna(ignore_index=True) #Apply second function to remove all symbols and emojis df['text_preprocessed'] = df['text_preprocessed'].apply(remove_emojis) #preview a sample after preprocessing print('Sample of preprocessed text:\n') df['text_preprocessed'].sample(5) Sample of preprocessed text: Out[11]: 21759 want new moon ahh im going crazy 5950 wow god whole ui sooo much snappier responsive tweetdeck tweet fee l like ims lol 6753 p ublic bathroom 16155 lunch w jason deli s tepped dog poo 21194 say impossible plurk work system administrator close d access firew Name: text_preprocessed, dtype: object Text tokenization In [12]: #Instantiate tokenizer tokenizer = Tokenizer() #tokenize text corpus tokenizer.fit_on_texts(df['text_preprocessed']) #convert the text into sequences of word indice df['text_preprocessed'] = tokenizer.texts_to_sequences(df['text_preprocessed #get token indices and report vocabulary size word2idx = tokenizer.word_index idx2word = {idx: word for word, idx in word2idx.items()} vocab_size = len(word2idx) + 1 print('vocabulary size:', vocab_size) vocabulary size: 25541 Sequence padding In [13]: #Get maximum sequence length max_seq_len = max([len(seq) for seq in df['text_preprocessed']]) #apply padding df['text_preprocessed'] = list(pad_sequences(df['text_preprocessed'], paddin #Preview data sample df.sample(5) Out[13]: textID 10360 f0b7a-dc 15160 8dd4fb2c3f 14710 a907a8f7ee 25436 4c0e5429a0 Data Selection text Thanks for the heads up, Ethan. Watching it now Is sooo waking both of them up on my way back to SB good night america There are some great honey based recipes for you & the kids to download at http://twurl.nl/l... Wal-Mart orientation...it`s work, but seriously...this sucks haha, sorry, it`s past my bedtime selected_text sentiment text_preprocessed [34, 198, 6438, 84, Thanks 1 0,0, 0,0, 0,0, 0,0, 0,0, 0,0, 0,0, 0,0, 0, 0, 0, 0, 0] Is sooo waking [315, 893, 68, 21, both of them up 3245, 3, 25, 1194, on my way back 1 0, 0, 0, 0, 0, 0, 0, 0, to SB good 0, 0, 0, 0, 0, 0, 0, 0, night america 0] great ho Wal-Mart orientation...it`s work, but seriously...this sucks sorry, [37, 1274, 2121, 1870, 201, 1049, 1 758, 48, 1274, 46, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] [2251, 17957, 10, 17958, 105, 0, 0, 0, 2 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] [43, 54, 527, 1548, 0, 2 0, 0,0, 0,0, 0,0, 0,0, 0,0, 0,0, 0,0, 0, 0, 0, 0, 0] In [14]: #Data Selection #Identify predictor and target variables X_data = df['text_preprocessed'] y_data = df['sentiment'].values Stratified Data Splitting In [16]: #Obtain training, testing, and validation sets (70% training / 15% testing / #first split 70/30 X_train, X_temp, y_train, y_temp = train_test_split(X_data, y_data, train_si #second split 67/33 X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0. #Convert data to numpy arrays X_train = tf.convert_to_tensor(X_train.tolist(), dtype=tf.int32) X_val = tf.convert_to_tensor(X_val.tolist(), dtype=tf.int32) X_test = tf.convert_to_tensor(X_test.tolist(), dtype=tf.int32) #Check the sizes of the training, validation and testing sets print(f'Number of training samples: {X_train.shape[0]:,}') print(f'Number of validation samples: {X_val.shape[0]:,}') print(f'Number of testing samples: {X_test.shape[0]:,}') Number of training samples: 19,187 Number of validation samples: 5,509 Number of testing samples: 2,714 Now data preprocessing is complete. I will now proceed to model development and training... Part Three: Model Development and Evaluation In this section, I will develop, train and evaluate a recurrent neural network for the present task of sentiment analysis. Well, first I will attempt to establish a baseline for performance by training a simple classification model, particularly a logistic regression model. Then, I will proceed to building the network to compare the results. This network should roughly consist of embedding layer, a sentiment mask layer for attention to sentimental or sentiment-adjascent terms, a bidirectional LSTM layer, a trainable selfattention layer, and finally a classification layer, as well as optimization and dropout layers in between (see full architecture below). What makes this network particularly special is the attentional capabilities built into it, making it particularly specialized for the task of sentiment analysis. With that being said, I shall proceed with establishing a performance baseline for the network to come. Establishing a Performance Baseline In [17]: #Instantiate a logistic regression object LR = LogisticRegression(max_iter=500, random_state=rs) #fit the model model = LR.fit(X_train, y_train) #generate predictions y_pred = model.predict(X_test) #report error scores print('Logistic Regression classification results:') error_scores(y_test, y_pred, accuracy_score(y_test, y_pred), classes=classes Logistic Regression classification results: Out[17]: Accuracy Precision Recall F1 score neutral positive negative- As seen for the results of the baseline logistic regression model, not much was gleaned from the data, with an overall accuracy score of 0.40. This is strikingly so for the positive and negative sentiments, with a modest precision scores and recall and F1 scores below 0.1 for both! Well, this should be expected for, indeed, these simple machine learning models are not particularly made for dealing with text data or natural language processing (NLP), unlike neural networks with specialized layers. Nonetheless, I will use this as a starting point, a baseline, against which I will compare the performance of the neural network to be built after training, which will give us a rough idea about how it fares relative to traditional machine learning models like this one. Model Design and Architecture Now we turn to model development. In this part, I will elaborate on the design of the network and the preparatory steps required for building it. The chief purpose of this network is to learn the appropriate embeddings by modeling a large collection of tweets and their corresponding sentiment labels so that it's able to perform sentiment analysis and thus become capable of identifying the sentiments most appropriate to any given tweet, especially in the case of positive and negative sentiments. As such, overall it will perform word embeddings, focusing more heavily on sentiment embeddings; attend to local patterns and word combinations most predictive; learn context and semantic dependencies between words in a sentence; learn to attend to the most relevant/predictive details; and finally classify tweets based on what have been learned, labeling each tweet as either positive, negative, or neutral. To that end, this network will primarily cosist of the following layers: (1) Embedding layer: a word embedding layer for transforming word tokens into (300-dimensional) dense vectors that represent semantic relationships in the input word sequences fed to it. This layer will utilize GloVe's pre-trained embedding weights for a start, but will remain trainable. (2) Gaussian Noise layer: this noise layer, enabled only during training, will introduce slight purtubations or noise into the learned word representations to help the model generalize better and reduce overfitting on the training data, forcing the network to rely on more stable and robust patterns. (3) Sentiment Mask layer: this masking layer will take in the embeddings from the earlier layer and the input word tokens and apply additional emphasis on sentiment terms, increasing their respective weights by a certain factor. To perform masking appropriately, I used a dictionary of sentimental or sentiment-laden terms (collected from EmoLex), and tokenized it using the same tokenizer for current text corpus to allow direct comparisons based on this particular dataset. (4) Self-Attention layer: a scaled dot-product attention layer with trainable weights to help selectively focus on the most relevant words relative to their sequences overall, even if far apart in time, assigning added weight to these terms deemed relevant or important for predicting sentiment. Such self-attention layers are particularly good for short and noisy data like tweets with a lot of slang and abrupt transitions. Further, placed after the sentiment mask layer, and before the convolutional layers, this layer would, on the one hand, quickly spot crucial sentimentladen tokens emphasized by the earlier before it and add further emphasis on them before structural modeling dilutes them later on, while, on the other hand, feeding a cleaner, better curated inputs to the later convolutional layers for improved feature extraction. (5) Convolutional 1D and Max Pooling layers: two parallel convolutional 1D layers with kernel sizes of 3 and 4 followed by max pooling layers for n-grams processing to detect and extract local n-gram patterns from the sequences (similar to how 2D convolutions in convolutional neural networks scan for local spatial features in images), which would help capture word combinations most predictive or important in a given sequence (instead of relying on singular terms). The first convolutional layer will use a kernel size of 3 for extracting 3-gram features (e.g., "very good movie"), while the second will use a kernel size of 4 for extracting 4-gram features (e.g., "not my best day"), which should accordingly help the network better perform sentiment analysis using the local patterns detected. The output from these layers will then be combined using a Concatenate layer and reshaped for subsequent processing the LSTM layer. (6) Bidirectional Long-Short Term (LSTM) layer: a specialized variant of a recurrent layer designed to capture temporal or semantic dependencies within the text sequences (sentences) and establishing long-term context. This layer will also be bidirectional, meaning that it will encode past and future contexts, scanning the sequences forwards and backwards, which should yield a richer context for understanding the text before producing sentiment classifications. (7) Global Average Pooling 1D layer: this layer compresses the variable-length sequence output from the self-attention layer into a fixed-size vector by averaging across all time steps. This vector would serve as a distilled summary of the most relevant features across the input. (8) Dense layer for classification: a final dense layer with 3 units and softmax activation for multi-class classification, producing probability scores for each sentiment category (neutral, positive, negative). In addition to these layers, the network will be peppered with dropout layers and regularization techniques throughout to further prevent overfitting and promote generalizability. Pre-training Preparations Now before training, a few preparations are imparative. As briefly mentioned, in order to give the embedding layer a head start and promote better learning, I will use Stanford's GloVe (Global Vectors for Word Representation))'s pretrained embeddings instead of learning word semantic representations from scratch GloVe's embeddings have been trained on a very large text corpus with approximately 840 billion tokens which thus already capture a lot of general language understanding and semantic relationships between words. This will take off a lot of the heavy lifting of learning word meanings from scratch. Nonetheless, given that with tweets we are not dealing with straightforward traditional English but a lot of slang, I will make this layer trainable all the same. Another prerequisite for training, particularly by the sentiment mask layer, is to prepare a lexicon of emotional and sentiment-bearing words that the layer can use for sentiment masking. For this task, I will utilize the National Research Council Canada (NRC)'s WordEmotion Association Lexicon (or EmoLex). The NRC's Emotion Lexicon consists of a list of English words and their association with the 8 basic emotion categories and two overarching sentiment categories (positive and negative). I will extract out all the terms from the lexicon, cross-check them against the current text corpus, and tokenize the terms found in that corpus with the same tokenizer used for it, thus ensuring consistency between the emotional lexicon and the dataset's lexicon. I will now begin with building and tokenizeing the sentiment lexicon to be used by the sentiment mask layer, and then proceed to prepare the weights matrix for the embedding layer. Preparing a Sentiment Lexicon for Sentiment Masking (Using EmoLex) In [18]: #Load NRC Emotion Lexicon into a dictionary emotional_words_set = set() with open("NRC-Emotion-Lexicon/NRC-Emotion-Lexicon-Wordlevel-v0.92.txt", "r" for line in f: word, emotion, association = line.strip().split("\t") if int(association) == 1: emotional_words_set.add(word) #sort and obtain final list emotional_words_lst = list(sorted(emotional_words_set)) #Convert emotional words to their indices (if found) sentiments_vocab_indices = [word2idx[word] for word in emotional_words_lst i #Preview a sample print(np.random.choice(emotional_words_lst, 10)) ['peculiarity' 'coursing' 'suggest' 'cautiously' 'overload' 'larceny' 'caution' 'armament' 'philanthropist' 'prevention'] Preparing Embeddings Weight Matrix using GloVe In [19]: #Define embeddings dimensions embedding_dims = 300 #Create embeddings matrix using GloVe #build embeddings index from the GloVe text file embeddings_index = {} with open('glove.840B.300d.txt', encoding='utf8') as f: for line in f: values = line.split() word = values[0] vector_values = values[1:] if len(vector_values) > embedding_dims: vector_values = vector_values[-embedding_dims:] coefs = np.asarray(vector_values, dtype='float32') embeddings_index[word] = coefs #Create embedding matrix embedding_matrix = np.zeros((vocab_size, embedding_dims)) for word, idx in word2idx.items(): embedding_vector = embeddings_index.get(word) if embedding_vector is not None: embedding_matrix[idx] = embedding_vector Model Development Now with these preparatory steps complete, I will now start to build the model. Given some of the model layers are custom layers, not readily available with Keras, I will start with building the custom layers necessary for the task. First, I will build a custom sentiment mask layer by subclassing from Keras' Layer class, then I will build a trainable self-attention layer also by inheriting from Keras' Layer class. Thereafter, I will build the full network as a Keras model class using all the layers discussed. Now, it's worth mentioning that a lot of tuning and optimization was performed in advance in order to determine the architecture of the present network, which informed a lot of the choices made regarding its layers and number of units, etc. Custom Sentiment Mask Layer In [20]: #Define sentiment mask layer class SentimentMaskLayer(layers.Layer): def __init__(self, sentiments_vocab_indices, sentiment_weighing_factor=1 super(SentimentMaskLayer, self).__init__(**kwargs) #Initialize parameters self.sentiment_vocab_tensor = tf.constant([idx for idx in sentiments self.sentiment_weighing_factor = sentiment_weighing_factor def call(self, inputs): #Inputs embedding_outputs, text_tokens = inputs #Compare all tokens with sentiment words sentiment_matches = tf.reduce_any(tf.equal(tf.expand_dims(text_token #Apply the sentiment weighting factor where matches are found sentiment_mask = tf.cast(sentiment_matches, tf.float32) * self.senti sentiment_mask = tf.expand_dims(sentiment_mask, -1) # Shape: (batc #Assign weight importances: Multiply by (1 + sentiment_mask) return tf.cast(embedding_outputs * (1.0 + sentiment_mask), dtype=tf. Custom Self-Attention Layer In [21]: #Define self-attention layer class SelfAttentionLayer(layers.Layer): def __init__(self, **kwargs): super().__init__(**kwargs) def build(self, input_shape): #Initialize weights for query, key, and value dims = input_shape[-1] #input dimensions self.WQ = self.add_weight(shape=(dims, dims), self.WK = self.add_weight(shape=(dims, dims), self.WV = self.add_weight(shape=(dims, dims), super().build(input_shape) matrices initializer='glorot_un initializer='glorot_un initializer='glorot_un def call(self, inputs): #Compute query, key, and value matrices Q = tf.matmul(inputs, self.WQ) K = tf.matmul(inputs, self.WK) V = tf.matmul(inputs, self.WV) #Compute key matrix dimensions for scaling attention scores d_k = tf.cast(tf.shape(K)[-1], tf.float32) #Compute attention scores attention_scores = tf.matmul(Q, K, transpose_b=True) / tf.sqrt(d_k) #Compute attention weights attention_weights = tf.nn.softmax(attention_scores, axis=-1) #Multipy attention weights with value matrix to get attention output attention_output = tf.matmul(attention_weights, V) # shape: (batch_ return attention_output Recurrent Neural Network Model In [ ]: #Create Keras model subclass to build a RNN model class RNN_Network(tf.keras.Model): def __init__(self, output_dims, embedding_input=50000, embedding_dims=30 Conv1D_filters=128, sentiments_vocab_indices=None, sentimen super().__init__(name='RNN_Network', **kwargs) #Define model layers #Embedding layer and dropout self.Embedding_layer = layers.Embedding(input_dim=embedding_input, o mask_zero=True, trainable=Tr self.Dropout1 = layers.Dropout(0.5, name='Dropout_layer1') #Gaussian noise layer self.Noise_layer = layers.GaussianNoise(0.08, name='Gaussian_Noise_l #Sentiment mask and dropout self.SentimentMaskLayer = SentimentMaskLayer(sentiments_vocab_indice self.Dropout2 = layers.Dropout(0.5, name='Dropout_layer2') #Self-Attention layer and dropout self.Attention_layer = SelfAttentionLayer(name='Self-Attention_layer self.Dropout3 = layers.Dropout(0.5, name='Dropout_layer3') #Convolutional and pooling layers (for varying n-grams processing) self.Conv1D_layer1 = layers.Conv1D(filters=Conv1D_filters, kernel_si self.MaxPool_layer1 = layers.MaxPooling1D(name='MaxPool_layer1') self.Conv1D_layer2 = layers.Conv1D(filters=Conv1D_filters, kernel_si self.MaxPool_layer2 = layers.MaxPooling1D(name='MaxPool_layer2') #Concatenation and Reshaping layers self.Concatenate_layer = layers.Concatenate(axis=-1, name='Concatena self.Reshape_layer = layers.Reshape((1, -1), name='Reshape_layer') #Bidirectional LSTM layer and spatial dropout self.Bidirectional_LSTM_layer = layers.Bidirectional( layers.LSTM(units=LSTM_units, activation='tanh', return_sequence kernel_regularizer=regularizers.l2(0.001), name='LST name='Bidirectional_LSTM_layer') self.SpatialDropout = layers.SpatialDropout1D(0.5, name='SpatialDrop #Global Average Pooling layer self.GlobalAvgPool_layer = layers.GlobalAveragePooling1D(name='Globa #Final classification layer self.Classification_layer = layers.Dense(output_dims, activation='so def call(self, inputs, training=None): text_tokens = inputs #Text Embedding text_embeddings = self.Embedding_layer(inputs) text_embeddings = self.Dropout1(text_embeddings) #Apply Gaussian Noise if training: text_embeddings = self.Noise_layer(text_embeddings, training=tra #Sentiment Masking embeddings_masked = self.SentimentMaskLayer([text_embeddings, text_t embeddings_masked = self.Dropout2(embeddings_masked) #Self-attention attention_output = self.Attention_layer(embeddings_masked) attention_output = self.Dropout3(attention_output) #Convolutional layers for n-grams threegrams_features = self.Conv1D_layer1(attention_output) threegrams_features = self.MaxPool_layer1(threegrams_features) fourgrams_features = self.Conv1D_layer2(attention_output) fourgrams_features = self.MaxPool_layer2(fourgrams_features) #Merge and reshape features merged_features = self.Concatenate_layer([threegrams_features, fourg features_reshaped = self.Reshape_layer(merged_features) #Bidirectional LSTM bi_LSTM_output = self.Bidirectional_LSTM_layer(features_reshaped) bi_LSTM_output = self.SpatialDropout(bi_LSTM_output) #Global average pooling global_avg_output = self.GlobalAvgPool_layer(bi_LSTM_output) #Final classification final_outputs = self.Classification_layer(global_avg_output) return final_outputs Instantiating the RNN Model In [23]: #Build RNN network using model subclass RNN_model = RNN_Network(embedding_input=vocab_size, LSTM_units=128, Conv1D_filters=128, output_dims=len(np.unique(y_data)), sentiments_vocab_indices=sentiments_vocab_indices, sentiment_weighing_factor=2.0) Now the model architecture is complete. I will proceed to implementing the necessary training configurations and then train the model. Training configurations To optimize the model during training, I will use the Adam (Adaptive Moment Estimation) optimizer and set a low learning rate (lr=0.0002) for better training stability (and epsilon=1e-6 for numeric stability). Given that the current task is multi-class classification, I will use sparse categorical cross-entropy as the loss function to train the model with. Further, I will implement and use a custom learning rate scheduler to monitor the training process and reduce the learning rate when necessary. This custom scheduler will basically be the same as Keras' ReduceLROnPlateau class, except I will extend it with additional parameters for better control over its behavior. Finally, I will use early stopping to stop training when the model is no longer learning new information or has reached convergence. Model Compilation In [24]: #Compile the model RNN_model.compile(optimizer=optimizers.Adam(learning_rate=0.0002, epsilon=1e loss='sparse_categorical_crossentropy', metrics=['accuracy Learning Rate Schedule and Early Stopping In [25]: #Build a custom Adaptive Learning Rate for the optimizer class AdaptiveLearningRate(Callback): ''' Custom learning rate scheduler that implements an adaptive learning rate the optimizer during model training. ''' def __init__(self, metric='val_loss', higher_is_better=False, patience=5 min_lr=-, min_delta=0.0, start_from_epoch=0, use_abs super(AdaptiveLearningRate, self).__init__() self.metric = metric self.higher_is_better = higher_is_better self.patience = patience self.decrease_factor = decrease_factor self.min_lr = min_lr self.min_delta = min_delta self.last_best_score = -np.inf if higher_is_better else np.inf self.use_absolute_best = use_absolute_best self.start_from_epoch = max(start_from_epoch - 1, 0) #to keep up w self.verbose = verbose self.wait = 0 def on_epoch_end(self, epoch, logs=None): #Get current metric value (loss or any other metric) current_score = logs.get(self.metric) if current_score is None: return if epoch >= self.start_from_epoch: #Check for improvement improvement = ( ((current_score - self.last_best_score) > self.min_delta if self.higher_is_better else ((self.last_best_score - current_score) > self.min_ ) if improvement: self.last_best_score = current_score self.wait = 0 #Reset wait since improvement happened else: self.wait += 1 #Check if patience is exceeded, reduce the learning rate if self.wait >= self.patience: #Reduce learning rate by a decrease factor and ensure it doe current_lr = float(tf.keras.backend.get_value(self.model.opt new_lr = max(current_lr * self.decrease_factor, self.min_lr) self.model.optimizer.learning_rate.assign(new_lr) #Set new if self.verbose > 0: print(f"\nEpoch {epoch + 1}: Learning rate reduced to {n if not self.use_absolute_best: #uses last best instead of self.last_best_score = current_score self.wait = 0 #Reset wait after learning rate adjustment #Adaptive learning rate scheduler lr_scheduler = AdaptiveLearningRate(metric='val_loss', patience=2, decrease_ #Define early stopping criterion early_stop = EarlyStopping(monitor='val_loss', min_delta=0.01, patience=5, s Model Training and Evaluation Proceeding finally to model training, I will train the model for 30 epochs using the training and validation sets, and set a smaller batch size of 16 to promote better generalizability. Tweets generally tend to be short and highly diverse with very limited contexts, thus decreasing batch size should help the model capture more varied microcontexts per epoch and adapt its parameters more flexibly, which should lead to more generalizability. Model Training In [26]: #Fit the model (30 training epochs) RNN_run_history = RNN_model.fit(X_train, y_train, epochs=30, batch_size=16, validation_data=(X_val, y_val), callbacks=[lr_scheduler, early_stop]) #Visualize run history plot_training_history([RNN_run_history], metrics=['loss', 'accuracy']) Epoch 1/30 1200/1200 ━━━━━━━━━━━━━━━━━━━━ 87s 70ms/step 875 - val_accuracy: 0.6611 - val_loss: 40.7153 Epoch 2/30 1200/1200 ━━━━━━━━━━━━━━━━━━━━ 84s 70ms/step 441 - val_accuracy: 0.6820 - val_loss: 20.2937 Epoch 3/30 1200/1200 ━━━━━━━━━━━━━━━━━━━━ 89s 75ms/step 985 - val_accuracy: 0.6876 - val_loss: 11.0231 Epoch 4/30 1200/1200 ━━━━━━━━━━━━━━━━━━━━ 85s 70ms/step 14 - val_accuracy: 0.6978 - val_loss: 6.5765 Epoch 5/30 1200/1200 ━━━━━━━━━━━━━━━━━━━━ 86s 71ms/step 65 - val_accuracy: 0.7043 - val_loss: 4.3114 Epoch 6/30 1200/1200 ━━━━━━━━━━━━━━━━━━━━ 86s 71ms/step 22 - val_accuracy: 0.7028 - val_loss: 3.0798 Epoch 7/30 1200/1200 ━━━━━━━━━━━━━━━━━━━━ 84s 70ms/step 79 - val_accuracy: 0.7061 - val_loss: 2.3397 Epoch 8/30 1200/1200 ━━━━━━━━━━━━━━━━━━━━ 84s 70ms/step 21 - val_accuracy: 0.7137 - val_loss: 1.8879 Epoch 9/30 1200/1200 ━━━━━━━━━━━━━━━━━━━━ 84s 70ms/step 70 - val_accuracy: 0.7163 - val_loss: 1.5883 Epoch 10/30 1200/1200 ━━━━━━━━━━━━━━━━━━━━ 83s 70ms/step 06 - val_accuracy: 0.7194 - val_loss: 1.3994 Epoch 11/30 1200/1200 ━━━━━━━━━━━━━━━━━━━━ 85s 70ms/step 51 - val_accuracy: 0.7232 - val_loss: 1.2601 Epoch 12/30 1200/1200 ━━━━━━━━━━━━━━━━━━━━ 85s 71ms/step 46 - val_accuracy: 0.7235 - val_loss: 1.1831 Epoch 13/30 1200/1200 ━━━━━━━━━━━━━━━━━━━━ 85s 71ms/step 74 - val_accuracy: 0.7195 - val_loss: 1.1237 Epoch 14/30 1200/1200 ━━━━━━━━━━━━━━━━━━━━ 83s 69ms/step 40 - val_accuracy: 0.7212 - val_loss: 1.0687 Epoch 15/30 1200/1200 ━━━━━━━━━━━━━━━━━━━━ 86s 72ms/step 64 - val_accuracy: 0.7245 - val_loss: 1.0350 Epoch 16/30 1200/1200 ━━━━━━━━━━━━━━━━━━━━ 84s 70ms/step 78 - val_accuracy: 0.7237 - val_loss: 1.0189 Epoch 17/30 1200/1200 ━━━━━━━━━━━━━━━━━━━━ 85s 71ms/step 90 - val_accuracy: 0.7195 - val_loss: 1.0019 Epoch 18/30 1200/1200 ━━━━━━━━━━━━━━━━━━━━ 84s 70ms/step 27 - val_accuracy: 0.7197 - val_loss: 0.9896 Epoch 19/30 1200/1200 ━━━━━━━━━━━━━━━━━━━━ 83s 69ms/step - accuracy: 0.5086 - loss: 75.1 accuracy: 0.6282 - loss: 34.4 accuracy: 0.6580 - loss: 17.4 accuracy: 0.6723 - loss: 9.71 accuracy: 0.6833 - loss: 5.92 accuracy: 0.6988 - loss: 3.96 accuracy: 0.7008 - loss: 2.86 accuracy: 0.7078 - loss: 2.21 accuracy: 0.7127 - loss: 1.79 accuracy: 0.7188 - loss: 1.53 accuracy: 0.7226 - loss: 1.35 accuracy: 0.7286 - loss: 1.22 accuracy: 0.7304 - loss: 1.13 accuracy: 0.7395 - loss: 1.06 accuracy: 0.7428 - loss: 1.01 accuracy: 0.7459 - loss: 0.97 accuracy: 0.7509 - loss: 0.93 accuracy: 0.7551 - loss: 0.91 accuracy: 0.7606 - loss: 0.88 41 - val_accuracy: 0.7201 - val_loss: 0.9694 Epoch 20/30 1200/1200 ━━━━━━━━━━━━━━━━━━━━ 84s 70ms/step - accuracy: 0.7623 - loss: 0.85 72 - val_accuracy: 0.7201 - val_loss: 0.9635 Epoch 21/30 1200/1200 ━━━━━━━━━━━━━━━━━━━━ 90s 75ms/step - accuracy: 0.7674 - loss: 0.83 93 - val_accuracy: 0.7228 - val_loss: 0.9474 Epoch 22/30 1200/1200 ━━━━━━━━━━━━━━━━━━━━ 86s 72ms/step - accuracy: 0.7641 - loss: 0.82 67 - val_accuracy: 0.7170 - val_loss: 0.9465 Epoch 23/30 1200/1200 ━━━━━━━━━━━━━━━━━━━━ 87s 72ms/step - accuracy: 0.7639 - loss: 0.81 39 - val_accuracy: 0.7137 - val_loss: 0.9299 Epoch 24/30 1200/1200 ━━━━━━━━━━━━━━━━━━━━ 89s 74ms/step - accuracy: 0.7733 - loss: 0.79 36 - val_accuracy: 0.7201 - val_loss: 0.9362 Epoch 25/30 1200/1200 ━━━━━━━━━━━━━━━━━━━━ 0s 68ms/step - accuracy: 0.7712 - loss: 0.788 9 Epoch 25: Learning rate reduced to-/1200 ━━━━━━━━━━━━━━━━━━━━ 88s 74ms/step - accuracy: 0.7712 - loss: 0.78 89 - val_accuracy: 0.7127 - val_loss: 0.9228 Epoch 26/30 1200/1200 ━━━━━━━━━━━━━━━━━━━━ 87s 72ms/step - accuracy: 0.7790 - loss: 0.76 37 - val_accuracy: 0.7174 - val_loss: 0.9240 Epoch 27/30 1200/1200 ━━━━━━━━━━━━━━━━━━━━ 0s 67ms/step - accuracy: 0.7819 - loss: 0.750 2 Epoch 27: Learning rate reduced to-/1200 ━━━━━━━━━━━━━━━━━━━━ 87s 73ms/step - accuracy: 0.7819 - loss: 0.75 02 - val_accuracy: 0.7163 - val_loss: 0.9203 Epoch 28/30 1200/1200 ━━━━━━━━━━━━━━━━━━━━ 87s 72ms/step - accuracy: 0.7826 - loss: 0.73 54 - val_accuracy: 0.7157 - val_loss: 0.9075 Epoch 29/30 1200/1200 ━━━━━━━━━━━━━━━━━━━━ 87s 72ms/step - accuracy: 0.7810 - loss: 0.73 04 - val_accuracy: 0.7139 - val_loss: 0.9084 Epoch 30/30 1200/1200 ━━━━━━━━━━━━━━━━━━━━ 0s 67ms/step - accuracy: 0.7851 - loss: 0.721 6 Epoch 30: Learning rate reduced to-/1200 ━━━━━━━━━━━━━━━━━━━━ 87s 72ms/step - accuracy: 0.7851 - loss: 0.72 16 - val_accuracy: 0.7141 - val_loss: 0.9034 Model Evaluation In [27]: #Evaluate the model on the testing set loss, accuracy = RNN_model.evaluate(X_test, y_test, verbose=0) #Get class predictions y_pred = RNN_model.predict(X_test, verbose=0).argmax(axis=-1) #Report results print('Model Evaluation Results:\n') error_scores(y_test, y_pred, accuracy, classes=classes) Model Evaluation Results: Out[27]: Accuracy Precision Recall F1 score neutral positive negative- As illustrated, the model training and performance on testing were satisfactory. Indeed, accuracy improved significantly from the baseline, increasing by 75%, with overall accuracy score of 0.72 (compared to 0.40 at baseline). Further, more importantly, the model successfully learned to differentiate positive and negative sentiments from each other and from neutral ones. We can see this more strikingly with the recall and F1 scores. Whereas the recall and F1 scores hardly went above 0.1 at baseline for positive and negative sentiments, with the current network they shot up to ~0.7-0.75. This indeed shows that the network learned appropriate embeddings, carving up a satisfactory representations space for the words and their semantic relationships and dependencies. This makes perfect sense of course as these types of networks are specialized for such a type of task, using a specialized embedding layer for word representation, sentiment emphasis and self-attention layers for identifying emotional terms and relevant terms overall, as well as convolutional layers for local pattern and feature extraction and a LSTM layer for context awareness and learning temporal/semantic dependencies. None of these capabilities are available for the simple logistic regression model, hence the stark difference in performance. Now we can use the model to generate sentiment predictions on a random sample of data from the testing set. Generating Sentiment Predictions from a Random Sample In [33]: #Extracting a random sample from the dataset random_indices = tf.constant(np.random.choice(len(X_test), size=10, replace= X_sample = tf.gather(X_test, random_indices) #Generate sentiment predictions using the model predicted_sentiment = RNN_model.predict(X_sample).argmax(axis=-1) #Decode tokens of selected sample X_sample_raw = [decode_tokens(row, idx2word) for row in X_sample.numpy()] #Create a dataframe X_sample_df = pd.DataFrame({'Tweet (preprocessed)': X_sample_raw, 'Predicted Sentiment': predicted_sentiment}) X_sample_df['Predicted Sentiment'] = X_sample_df['Predicted Sentiment'].map( #Display results X_sample_df 1/1 ━━━━━━━━━━━━━━━━━━━━ 0s 35ms/step Out[33]: - Tweet (preprocessed) going bed good night everyone love say good morning sweet dream haha best thing office birthday hey sanjaya forever hahahahahaha kel17 ehh carnt stand hot weather quick answer put bleach wash start middle dont laugh lol girl video disturbed love hk theme cute nobody home tonight except alone sigh oh wish fun though repeat final second game 3 dalden please official anyway happy mother day hello hows life side screen Summary Predicted Sentiment positive positive neutral negative neutral positive negative positive positive neutral In summary, this project set out to perform sentiment analysis on Twitter data using deep learning and NLP techniques. An advanced recurrent neural network was developed for the task, incorporating sophisticated sentiment detection and attentional capabilities, and used to classify tweets into positive, neutral, or negative. As demonstrated by the training history and evaluation metrics, the model achieved a relatively strong performance, with good accuracy and generalization to the test set. The network was also tested on individual tweets, providing real-time sentiment classification capability, and was able to make generally reasonable and consistent sentiment predictions across a variety of tweets, indicating that it effectively learned from the training data. Thus, the project successfully leveraged deep learning to develop a functional sentiment analysis model capable of classifying tweet sentiments with considerable accuracy.