You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ToCount is a lightweight and extensible Python library for estimating token counts from text inputs using both rule-based and machine learning methods. Designed for flexibility, speed, and accuracy, ToCount provides a unified interface for different estimation strategies, making it ideal for tasks like prompt analysis, token budgeting, and optimizing interactions with token-based systems.
ℹ️ The training and testing dataset is taken from Lmsys-chat-1m [1] and Wildchat [2].
Usage
>>> from tocount import estimate_text_tokens, TextEstimator
>>> estimate_text_tokens("How are you?", estimator=TextEstimator.RULE_BASED.UNIVERSAL)
4
Issues & bug reports
Just fill an issue and describe it. We'll check it ASAP! or send an email to [email protected].
Please complete the issue template
You can also join our discord server
References
1- Zheng, Lianmin, et al. "Lmsys-chat-1m: A large-scale real-world llm conversation dataset." International Conference on Learning Representations (ICLR) 2024 Spotlights.
2- Zhao, Wenting, et al. "Wildchat: 1m chatgpt interaction logs in the wild." International Conference on Learning Representations (ICLR) 2024 Spotlights.
Show your support
Star this repo
Give a ⭐️ if this project helped you!
Donate to our project
If you do like our project and we hope that you do, can you please support us? Our project is not and is never going to be working for profit. We need the money just so we can continue doing what we do ;-) .