Use this google doc to add in your results and conclusions: https://docs.google.com/document/d/1g6U86VO9ffv9M2yU8uz2INr77U9YFSxMQseBHqiryoQ/edit
Topic modelling is a technique to classify the number of topics a given text article contains. The method that is explained as follows:
- first u take the text/document. Then do the basic pre-processing and identify or try to classify and identify the different groups, or topics that can identified.
- The next is to group sentences with those topics together.
So first you need to identify the topics then get the content pertaining to that topic.
Given a set of documents u need to identify all the topics that are spoken about, then given an article u should be able to classify it and also classify the existing documents into topics or groups. The number of groups needs to be mentioned explicitly, but the goal here is to assign that value K based on some paramters and not do it manually
Most common in LDA Documents exhibit multiple topics - LDA Latent Dirchilet Allocation
Latent : Cannot be seen but is observed by some characterstics
Dirchilet is a distribution where total probablity is 1 of all
Allocation : is exactly what it means
VISUALIZING THE topic modeling https://github.com/bmabey/pyLDAvis
Latent semantic analysis
Step-by-Step implementation of LSA
- https://mccormickml.com/2016/03/25/lsa-for-text-classification-tutorial/
- https://www.datascienceassn.org/sites/default/files/users/user1/lsa_presentation_final.pdf
- https://www.datacamp.com/community/tutorials/discovering-hidden-topics-python
This is a walkthrough guide on hyperparameter tuning related to the alpha and beta and coherence