Z by HP Unlocked Challenge 2 - Topic Modeling - Special thanks to Nick Wan and Hunter Kempf for helping create this challenge! Watch the video tutorial here: https://youtu.be/mkOIb0EulkE
Summarize the main topics presented in the articles into the most relevant topic groups with text and visuals. This can be done using NLP tools especially Latent Dirichlet Allocation (LDA) and can provide insight into the relevant content within the articles without having to read through all of them.
Unlocked is an action-packed interactive film made for data scientists. Sharpen your skills and solve the data driven mystery here: https://www.hp.com/us-en/workstations/industries/data-science/unlocked-challenge.html
The Data is pretty straightforward and consists of text files within the challenge2-articles folder. Each text file will follow the format challenge2-articleXXX.txt where X is a number. There should be 144 total articles to summarize.
Feel free to follow along with the jupyter notebook or investigate and create your own topic model.
- https://www.machinelearningplus.com/nlp/topic-modeling-visualization-how-to-present-results-lda-models/
- https://www.tutorialexample.com/implement-lda-model-using-gensim-a-beginner-guide-gensim-tutorial/#:~:text=We%20can%20use%20gensim%20LdaModel%20to%20create%20a,example%3A%20from%20gensim.models%20import%20LdaModel%20num_topics%20%3D%2010
pyLDAvis Tutorials:
- https://neptune.ai/blog/pyldavis-topic-modelling-exploration-tool-that-every-nlp-data-scientist-should-know
- https://nbviewer.org/github/bmabey/pyLDAvis/blob/master/notebooks/pyLDAvis_overview.ipynb
Research Papers:
- http://vis.stanford.edu/files/2012-Termite-AVI.pdf
- https://nlp.stanford.edu/events/illvi2014/papers/sievert-illvi2014.pdf
Inspired by: https://nbviewer.org/github/bmabey/pyLDAvis/blob/master/notebooks/sklearn.ipynb