Topic modeling for the programming languages literature.
You can run our tool!
The analysis directory holds the R scripts we used to generate
figures for the
paper.
The lda directory holds the Python and bash scripts we used to run
David Blei's LDA-C. Outputs get
put in the out directory.
The sessions directory is the (not quite finished) analysis of
session data for POPL.
The www directory is the website frontend and backend.
You'll need David Blei's LDA-C, compiled and with lda on your path. You'll also need the Python library nltk, with the stopwords and wordnet modules installed.
To do the R analysis, you'll need R with ggplot2 installed.