You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<p>In this post, I am going to download and analyze the tweets regarding Bitcoin tweets from the last two weeks and perform sentiment analysis to gather market intelligence. What are people's opinions about Bitcoin tweets?</p>
3
-
<!-- /wp:paragraph -->
4
2
5
3
<!-- wp:heading {"textColor":"secondary"} -->
6
4
<h2class="has-secondary-color has-text-color" id="what-is-sentiment-analysis">What is Sentiment Analysis?</h2>
7
5
<!-- /wp:heading -->
8
6
9
-
<!-- wp:paragraph -->
10
7
<p>To do this, I will need to use Natural Language Processing as a way to gain insights into my data. One of the most common forms of analysis we can exploit using NLP is called sentiment analysis, and it consists of converting a text into a score that estimates its sentiment. There are several models we can use to perform sentiment analysis, but they all fulfill the same purpose.</p>
11
-
<!-- /wp:paragraph -->
12
8
13
-
<!-- wp:paragraph -->
14
9
<p>The most common use case of sentiment analysis is to estimate the demand of the market for a certain product, hopefully entering into a trend just when it begins. In Finance, this is one of the most searched ML applications.</p>
15
-
<!-- /wp:paragraph -->
16
10
17
-
<!-- wp:paragraph -->
18
11
<p>The project will be following these steps:</p>
19
-
<!-- /wp:paragraph -->
20
12
21
13
<!-- wp:list {"ordered":true} -->
22
14
<ol><li>Download data from Twitter</li><li>Preprocess the data</li><li>Perform sentiment analysis</li><li>Analyze results</li></ol>
<h2class="has-secondary-color has-text-color" id="1-download-data-from-twitter">1. Download data from Twitter</h2>
27
19
<!-- /wp:heading -->
28
20
29
-
<!-- wp:paragraph -->
30
21
<p>To download data from Twitter without using its metered API, hence without any limit on the volume of data I wish to scrape, I can use different libraries. One of the most common is called <strong>twint</strong>, however, after the latest Twitter updates, has not been working very well. </p>
31
-
<!-- /wp:paragraph -->
32
22
33
-
<!-- wp:paragraph -->
34
23
<p>As a valid and also simpler alternative, I will be using <strong>snscrape</strong>. </p>
<p>After installing the library with pip, I will need to declare which are the search parameters. Because I may need to use it on more queries, for example, I could search for the sentiment on the top 10 Billionaires, I want to be able to have a control panel that gives instruction to the program. </p>
43
-
<!-- /wp:paragraph -->
44
30
45
-
<!-- wp:paragraph -->
46
31
<p>As such, I will use movie_dict as a variable to store all the instructions to perform multiple searches. For each search, a csv will be created with all the data I have been able to scrape from Twitter:</p>
47
-
<!-- /wp:paragraph -->
48
32
49
33
<!-- wp:code {"backgroundColor":"primary"} -->
50
34
<preclass="wp-block-code has-primary-background-color has-background"><code>import snscrape.modules.twitter as sntwitter
<p>This code is an improved version of the <ahref="https://medium.com/dataseries/how-to-scrape-millions-of-tweets-using-snscrape-195ee3594721">standard code used to run a query</a> to filter the tweets you wish to download from Twitter. You can use it to download not only one query, but a list of query</p>
86
-
<!-- /wp:paragraph -->
87
67
88
68
<!-- wp:heading {"textColor":"secondary"} -->
89
69
<h2class="has-secondary-color has-text-color" id="2-preprocess-the-data">2. Preprocess the data</h2>
90
70
<!-- /wp:heading -->
91
71
92
-
<!-- wp:paragraph -->
93
72
<p>Now that a csv file has been created for every query in my control panel, let us look at the raw data of a single query:</p>
94
-
<!-- /wp:paragraph -->
95
73
96
74
<!-- wp:code {"backgroundColor":"primary"} -->
97
75
<preclass="wp-block-code has-primary-background-color has-background"><code>import pandas as pd
@@ -101,9 +79,7 @@ <h2 class="has-secondary-color has-text-color" id="2-preprocess-the-data">2. Pre
101
79
df</code></pre>
102
80
<!-- /wp:code -->
103
81
104
-
<!-- wp:paragraph -->
105
82
<p>Because some of the rows may be null when importing the dataset, I am dropping them and resetting the index. I am also going to apply a small preprocessing snippet. Preprocessing is a step that you can customize depending on your needs. In this case, because I only want to get rid of links and non-ascii characters, I am going to use the following two functions:</p>
106
-
<!-- /wp:paragraph -->
107
83
108
84
<!-- wp:code {"backgroundColor":"primary"} -->
109
85
<preclass="wp-block-code has-primary-background-color has-background"><code>#get rid of links and hashtags
@@ -114,25 +90,19 @@ <h2 class="has-secondary-color has-text-color" id="2-preprocess-the-data">2. Pre
114
90
df</code></pre>
115
91
<!-- /wp:code -->
116
92
117
-
<!-- wp:paragraph -->
118
93
<p>This is a screenshot of the dataframe after preprocessing:</p>
<p>I am now going to apply a sentiment analysis to our cleaned data. There is a myriad of sentiment analysis libraries you can use to perform the same task, from <strong>transformers</strong>, <strong>textblob</strong>, <strong>spacy</strong>. For this tutorial I am going to use the latest version of spacy, and its extension called <ahref="https://spacy.io/universe/project/spacy-textblob" target="_blank" rel="noreferrer noopener">spacytextblob</a>.</p>
131
-
<!-- /wp:paragraph -->
132
104
133
-
<!-- wp:paragraph -->
134
105
<p>To install it, I will need to run the following commands and restart the notebook:</p>
<p>Before analyzing the content of the tweets, we are first going to preprocess our data even more. There are several preprocessing strategies, in this post, we are going to:</p>
141
+
142
+
<!-- wp:list -->
143
+
<ul><li>Lemmatize each word</li><li>Delete extra characters</li><li>Remove stop words</li></ul>
144
+
<!-- /wp:list -->
145
+
146
+
<p>I am using my own function to perform this cleaning. Because of the high availability of similar preprocessing functions, if you wish to try other code, perhaps simpler or that it only performs a single preprocessing step, you can easily google it:</p>
147
+
148
+
<!-- wp:code {"backgroundColor":"primary"} -->
149
+
<preclass="wp-block-code has-primary-background-color has-background"><code>import re
150
+
import nltk
151
+
nltk.download('wordnet')
152
+
nltk.download('stopwords')
153
+
from nltk.tokenize import RegexpTokenizer
154
+
from nltk.stem import WordNetLemmatizer,PorterStemmer
155
+
from nltk.corpus import stopwords
156
+
lemmatizer = WordNetLemmatizer()
157
+
stemmer = PorterStemmer()
158
+
159
+
#adding a counter to check the progress of the algo while it runs
<p>There are several ways we can analyze the results from the sentiment analysis. One common practice is to separate the samples with negative sentiment from the ones with a positive sentiment and extract what are the most common words. </p>
<p>First of all, let us see how many positive and negative reviews we have been inferring from our data, to have a general idea about the opinion of the public regarding @project_name:</p>
0 commit comments