Releases: kairosfuture/BERTopic
Releases · kairosfuture/BERTopic
v1.0.7
v.1.0.6
Reduce with embeddings
Two topic reduce methods are introduced:
- Reducing with clustering the topic vectors which created with document vector averaging. GMM is used for clustering and this reducing method works iff clustering method is GMM.
- Reducing with HDBSCAN_flat. This method use hierarchical structure of HDBSCAN and search for an epsilon value to satisfy desired topic number to reduce.
What's Changed
- Embeddings based topic reduction + get rid of ST by @dopc and @zafercavdar in #13
Full Changelog: v1.0.5...v1.0.6
v1.0.5
Keyword selection with new c-TF-IDF, custom embedding model and MMR
-
c-TF-IDF update
-
custom embedding model parameter, mmr update
-
updated mmr_keywords() method for one topic usage
-
updated embedding model as mmr_keywords() method parameter instead of bertopic class parameter
-
Fixed calculate probabilities condition
v1.0.4
New clustering method GM & block outlier merge
-
clustering_method parameter introduced, GM method implemented
-
topic index mismatch fixes, it was happening when clustering method does not produce outlier class
-
-1 merge blocked, outlier class cannot merge and cannot be merged into
-
topic number of reduce method is fixed for GMM clustering method
-
reduce_topic topic number fix, topic number cannot be reduced 0 or 1 anymore
-
Fixed HDBSCAN probs bug when it finds only -1 class
v1.0.4-beta
reduce_topic topic number fix, topic number cannot be reduced 0 or 1 …
v1.0.3
v.1.0.2
[DA-2218] Dont split phrases (#3) * Check documents type utility modified. Now it only accepts list of list of strings. * Phrases are not splitted anymore. * CountVectorizer is fixed in BERTopic * eps becomes float and cannot be negative. * redundant eps check is gone * Unclear variable name topic becames topic_docs * Replaced lambda with identify def * Fixed typo Co-authored-by: zafercavdar <[email protected]>