Skip to content

shubhamsidhu/Senitment-analysis-twitter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Sentiment Analysis for the Pandemic Situation due to the recent outbreak of SARS-nCoV2

Description about the project

December 2020 marked with a serious outbreak of a new kind of SARS virus known as the coronavirus or nCoV2 which created a panic and loss all over the world by not only killing more than hundreds of thousands of people but also creating a huge economic breakdown. This breakdown in economy will take some time to recover. This situation has created a flow of lot of mis information across the web. Epidemic outbreak is a major concern in many developing and underdeveloped countries in the world. Even developed countries find it difficult to deal with epidemic outbreaks. Many innocent lives are lost to the disease and many economies are ruined in the process. In these situations, social networking sites such as Twitter can serve as an important data source to provide awareness about health problems. The data from twitter is very unstructured and distributed, as well as large in volume. With appropriate data gathering algorithms, we can transform the data for meaningful use. We propose an epidemic surveillance system which uses spatial, temporal and text mining on twitter data. Real-time analysis results are in terms of disease surveillance maps, timelines and distribution of different diseases. We also include their symptoms and treatments, along with overall disease activity timelines. Our system can be useful for early prediction of epidemic outbreaks as well as for monitoring distribution of people affected by the epidemic along with useful treatments. This in turn can give way for faster response to epidemics and preparation for it. This method can be very useful for both patients and doctors in charge. In this whole research the focus in made on the generation of the sentiments (positive, negative and neutral) to see the kinds of the labels that is associated with the data. In order to take into the account for the data that needs to be used in the modelling and generation of a perfect sentiment analyser, twitter data is taken into the account and software that will be built on this will be a real time sentiment analyser. In this research major attention is given on the kind of the pre-processing of the data, different kinds of the statistical features generation and analysis on the different kinds of the machine learning techniques. In this analysis Support Vector Machines with the Linear Kernel and the features for the TF-IDF Vectorizer performed with an accuracy of 87%. When the Count Vectorizer was used, Extra Trees Classifier performed the best with an accuracy of 85.5% having the precision, recall and F1-Score as 86%, 86% and 85% respectively. In case of the n-grams having the Unigram and Bigram analysis Linear SVM performed with an accuracy of 85%. We conclude that for the given real time analysis of the sentiments of the tweets downloaded from the twitter API, Linear Support Vector Machine outperformed other machine learning algorithms used. Apart from the analysis that is done in order to explain the effect of modelling and the feature extraction techniques, tweet analysis is done using various data visualisation methods to have a better view of the modelling results.

About

Sentiment Analysis and Prediction from twitter API.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors