Tags: kairosfuture/BERTopic
Tags
HDBSCAN parameter search (#14) * Changed eom to leaf in HDBSCAN. * Added log for leaf method. * Implemented grid search to get a better HDBSCAN model. * Improved itertools product usage. * Improved nr_topics info usage. * Minor fixes * Fixed versions Co-authored-by: Zafer Çavdar <[email protected]>
Embeddings based topic reduction + get rid of ST (#13) * Moved reduce_topics method. * Introduced reduce_with_gmm and reduce_with_hdbscan methods. * Fixed mapped_topics problem. * Fixed bug of reduce_gmm's mapping. * Round probabilities. * Fixed round probabilities. * Fixed another problem of round probabilities. * Fixed calculate_probabilities flag. * Fixed hdbscan_reduce's mapping change. * Fixed deepcopy method's effect on gmm's mapping. * Fixed calcualate_probabilities. * Deleting prev. topic mapping parameter. * Fixed deleting prev. topic mapping parameter. * Fixed numpy where usage. * Fixed numpy indexing error. * Fixed some unnecessary controls, shortened some parts and removed manually added round() operations with @zafercavdar suggestions. * Removed another unnecessary control statement. * Removed sentence transformers dependency * Added setup cfg * Fixed init py Co-authored-by: Zafer Çavdar <[email protected]>
[DA-2380] Keyword selection with new c-TF-IDF, custom embedding model… … and MMR (#7) * c-TF-IDF update * custom embedding model parameter, mmr update * updated mmr_keywords() method for one topic usage * updated embedding model as mmr_keywords() method parameter instead of bertopic class parameter * Fixed calculate probabilities condition
New clustering method GM & block outlier merge (#5) * clustering_method parameter introduced, GM method implemented * topic index mismatch fixes, it was happening when clustering method does not produce outlier class * -1 merge blocked, outlier class cannot merge and cannot be merged into * topic number of reduce method is fixed for GMM clustering method * raise fix as @zafercavdar suggests Co-authored-by: Zafer Çavdar <[email protected]> * yet another raise fix @zafercavdar suggests Co-authored-by: Zafer Çavdar <[email protected]> * gm -> gmm name update * reduce_topic topic number fix, topic number cannot be reduced 0 or 1 anymore * nr_topics min value fix as @zafercavdar suggest Co-authored-by: Zafer Çavdar <[email protected]> * Fixed HDBSCAN probs bug when it finds only -1 class * removed todo about b81f912 Co-authored-by: Zafer Çavdar <[email protected]>
reduce_topic topic number fix, topic number cannot be reduced 0 or 1 … …anymore
[DA-2218] Dont split phrases (#3) * Check documents type utility modified. Now it only accepts list of list of strings. * Phrases are not splitted anymore. * CountVectorizer is fixed in BERTopic * eps becomes float and cannot be negative. * redundant eps check is gone * Unclear variable name topic becames topic_docs * Replaced lambda with identify def * Fixed typo Co-authored-by: zafercavdar <[email protected]>
PreviousNext