You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this project, we used Latent Dirichlet Allocation, an unsupervised machine learning algorithm for a document of more than 2200 BBC News Articles. We were provided with a wide variety of topics ranging from law,government,sports,entertainment and technology so we built a LDA model capable of classifying these topics into groups.
LDA is a generative probability model, which means it attempts to provide a model for the distribution of outputs and inputs based on latent variables. This is opposed to discriminative models, which attempt to learn how inputs map to outputs.
You can use LDA for a variety of tasks, from clustering customers based on product purchases to automatic harmonic analysis in music. However, it is most commonly associated with topic modeling in text corpuses. Observations are referred to as documents. The feature set is referred to as vocabulary. A feature is referred to as a word. And the resulting categories are referred to as topics.
Project Files:
News Articles.zip : This file contains the dataset used for this project.It includes 2200 news articles grouped into categories.
Topic Modeling on News Articles - This is a power point presentation file of a project. It includes various visualaized plots of EDA using Seaborn and Matplotlib. The result chart of various implemented algorithms.
Topic Modeling on News Articles.ipynb - This file includes Features description, exploratory data Analysis, data preprocessing and implemented LDA.
We made improvement in classifying the topics.Initially we were provided with 5 major topics but using the LDA model we have further classified into major subtopics thus ensuring reliability in choosing a topic.We have clustered the given categories into 10 major sub-categories for which we have acieved coherence score of 60%.
Scope :
Topic modelling applications cover a range of use cases, here are a few real-world examples: Annotation , eDiscovery , Content recommendation , Search engine optimization , Word sense disambiguation etc.This project provides an approach to use topic modelling for classifying various documents which can further be used in supervised learning models to make recommendations for topics.Furthermore, we can use both kinds of information to build a NLP model in the future.