1212
Sentiment analysis using Supervised Deep Learning model | Devpost

Inspiration

Learning about how a machine processes data was always a question that I had since childhood. But due to a lack of knowledge and resources, I could not discover this part of technology. I had a rough idea about machine learning before enrolling in Ignition Hacks. I thought that this competition would increase my tech stack and help me discover more about machine learning and deep learning. Soon machine learning will play a vital role in computer science, and having it as a skill will be helpful for me to face the competitive world.

What it does

Sentimental Analysis is a program that interprets the sentence given by the user and tells us if that sentence is positive or negative. To solve that sentence, it uses Pre-Processing to remove the inconsistencies. Then machine learning model gets trained by the data set provided and predicts the most likely outcome.

How I built it

  1. Pre-Processing Before applying machine learning algorithms to the data, we need to make sure that the information is free from ambiguity and noises For example:" Hello Adam!! Are you feeling good today??" In this sentence, punctuation like! And? It does not tell us about the sentiments of the sentence, but it creates ambiguity in our program; hence these must be removed. Another category of words is stopwords as "say," "me," etc., do not play any role in deciding sentiments; thus, they are removed. Pre-Processing is essential as it will help reduce unnecessary data and clean our data to reduce inconsistencies. To convert all the lines into their processed form, I used a function that used bs4 to remove the HTML tags and contractions to replace contractions in the string text.

  2. Creating tokenizer and embedding layers After pre-processing the data then, we tokenized the data by using an inbuilt function in Keras called the tokenizers. Words are called tokens, and splitting text into tokens is called tokenization. These tokens help understand the context or develop the NLP model. The tokenization allows interpreting the meaning of the text by analyzing the sequence of the words. For example, the text "It is raining" can be tokenized into 'It,' 'is,' and 'raining.'

  3. Embedding data The sole purpose of embedding data is to convert the low dimensional data (our original information) into high dimensional data. Our machine learning models are more efficient when we use high-dimensional vectors.

  4. Creation of neural network using Keras We have to build a neural network that will process all the data collected and predict the output. LSTM has been used to build the model. LSTMs are a special kind of RNN, capable of learning long-term dependencies.

Challenges I ran into

I faced some issues while making the program, but I researched and learned from the problems to overcome them. When I was testing the data file, I realized that the data file was large, and Google collab showed a Runtime Error. So I switched to the desktop version of visual studio code, which solved the problem. While I was using a universal sentence encoder on visual studio code, I learned that Google had not released the Windows version of TensorFlow_text, so I changed the encoder to do the same.

Accomplishments that I'm proud of

I am delighted that I could learn and complete a completely new project in two days.

What I learned

I learned about machine learning algorithms. I learned about supervised and Unsupervised learning and how to train a machine learning model. This project gave me insight into NLPs and their application in various technology parts.

What's next for Sentiment analysis using the Supervised Deep Learning model

I would explore new models like ensemble stacking methods to improve accuracy. The model uses neural networks, and I want to try NN variants like CNN1D BLSTM and other time series, NLP models, e.g., Hidden Markov Models, for better prediction. TF and glove.6B sentence encoders were a bit slow for 600,000 tuples, so I want to try them on distributed computing like Hadoop for faster pre-processing

Built With

Share this project:

Updates