This is a Telegram bot that transcribes voice messages using either whisper.cpp or qwen-asr. You send a voice message, it replies with the text. It's built on top of botlib and follows the same philosophy: simple C code, one thread per request, minimal dependencies. We suggest using the Qwen3_ASR model and the linked Pure C inference implementation. Inference is much faster, and the quality very good.
The bot handles downloading audio, converting it to WAV, running the selected backend as an external process, and streaming the result back to Telegram as it's transcribed.
The bot uses botlib's thread-per-request model, but with a twist: ASR is CPU-heavy, so running many transcriptions in parallel usually hurts latency/throughput on small servers. Threads wait their turn using a mutex. The queue length is tracked with a C11 atomic, and if too many requests pile up, the bot tells users to retry later.
There's also a small optimization: when the queue is short, it uses the better model (MODEL_BEST). When the queue gets longer, it switches to the faster model (MODEL_FAST) to clear backlog faster.
The transcription is streamed back to Telegram by editing the message as new text arrives. If the transcription is very long, it automatically continues in a new message (never tested in practice, so far...).
- libcurl and libsqlite3 (for botlib)
- ffmpeg (for audio conversion)
- one ASR backend binary:
whisper.cpp(whisper-cli), orqwen_asrfrom antirez
-
Build both ASR backends (or at least the one you want to use), and download their models.
-
Edit
whisperbot.cand select exactly one backend:
// #define USE_WHISPER
#define USE_QWEN3_ASRThen set backend-specific paths in the matching block:
#define ASR_BIN_PATH "..."
#define MODEL_FAST "..."
#define MODEL_BEST "..."-
Create your bot with @BotFather and save the API key in
apikey.txt. -
Build and run:
make
./whisperbot
Use --verbose to see what's happening, --debug for even more output.
Everything is at the top of whisperbot.c:
#define MAX_QUEUE 10 // Max pending requests before rejecting
#define MAX_SECONDS 300 // Max audio duration (5 minutes)
#define MSG_LIMIT 4000 // Telegram message length limit
#define TIMEOUT 600 // Kill ASR process after 10 minutes
#define QUEUE_THRESHOLD_FAST 3 // Use MODEL_FAST when queue >= thisWhisper-only knobs in the USE_WHISPER block:
#define SHORT_AUDIO_THRESHOLD 1.5
#define DEFAULT_LANG "it"SHORT_AUDIO_THRESHOLD and DEFAULT_LANG are used only by Whisper.
- Backend and model paths are hardcoded (edit and recompile).
- No persistence: if you restart the bot, queued requests are lost.