Skip to content

Releases: kairosfuture/BERTopic

v1.0.7

24 Dec 12:51
874469c

Choose a tag to compare

  1. Introduced HDBSCAN auto leaf/eom mode and auto min-sample selection
  2. Fixed dependencies

v.1.0.6

30 Nov 17:27
1011b29

Choose a tag to compare

Reduce with embeddings

Two topic reduce methods are introduced:

  1. Reducing with clustering the topic vectors which created with document vector averaging. GMM is used for clustering and this reducing method works iff clustering method is GMM.
  2. Reducing with HDBSCAN_flat. This method use hierarchical structure of HDBSCAN and search for an epsilon value to satisfy desired topic number to reduce.

What's Changed

Full Changelog: v1.0.5...v1.0.6

v1.0.5

01 Apr 09:59
b4423b2

Choose a tag to compare

Keyword selection with new c-TF-IDF, custom embedding model and MMR

  • c-TF-IDF update

  • custom embedding model parameter, mmr update

  • updated mmr_keywords() method for one topic usage

  • updated embedding model as mmr_keywords() method parameter instead of bertopic class parameter

  • Fixed calculate probabilities condition

v1.0.4

10 Mar 18:47
3515184

Choose a tag to compare

New clustering method GM & block outlier merge

  • clustering_method parameter introduced, GM method implemented

  • topic index mismatch fixes, it was happening when clustering method does not produce outlier class

  • -1 merge blocked, outlier class cannot merge and cannot be merged into

  • topic number of reduce method is fixed for GMM clustering method

  • reduce_topic topic number fix, topic number cannot be reduced 0 or 1 anymore

  • Fixed HDBSCAN probs bug when it finds only -1 class

v1.0.4-beta

09 Mar 18:00

Choose a tag to compare

v1.0.4-beta Pre-release
Pre-release
reduce_topic topic number fix, topic number cannot be reduced 0 or 1 …

v1.0.3

22 Feb 18:02
549db48

Choose a tag to compare

Outlier probability calculation and umap seed

  • umap seed added
  • outlier topic probability calculation added
  • typo fixes in _append_outlier()

v.1.0.2

09 Feb 14:11
ad9350d

Choose a tag to compare

[DA-2218] Dont split phrases (#3)

* Check documents type utility modified. Now it only accepts list of list of strings.

* Phrases are not splitted anymore.

* CountVectorizer is fixed in BERTopic

* eps becomes float and cannot be negative.

* redundant eps check is gone

* Unclear variable name topic becames topic_docs

* Replaced lambda with identify def

* Fixed typo

Co-authored-by: zafercavdar <[email protected]>