Skip to content

Latest commit

 

History

History
31 lines (18 loc) · 1.37 KB

File metadata and controls

31 lines (18 loc) · 1.37 KB

httrees

httrees is a Python module for hierarchical topic modeling. It implements an algorithm that constructs a topic hierarchy tree through successive application of flat topic models. It also contains several text vectorizer implementations, including support for fine-tuning deep word embeddings.

This project was started in 2021 as part of CS410 at the University of Illinois Urbana-Champaign.

Dependencies

httrees requires:

  • NumPy
  • SciPy
  • Pandas
  • Gensim

It does not strictly require scikit-learn, but is intended to be used alongside sklearn flat clustering models, though any clustering model following the sklearn API will be compatible.

Installation

httrees can be installed from git:

pip install git+git://github.com/bllguo/CourseProject

Documentation and Usage

An example use case, along with a written overview of the implementation, can be found in IPython notebook form here. They can also be found at this page.

An example for fine-tuning embeddings can be found in this notebook and this page.

A video demo is available at this link.