A lightweight, multi-modal Generative AI chatbot that runs concurrently on Telegram and Discord, with an optional Gradio local debugging UI.
This system uses a decoupled architecture, separating platform integrations from core AI orchestration logic for easier scaling, maintenance, and extension.
- Retrieval-Augmented Generation (RAG) — Query internal company documents using local embeddings
- Vision Inference — Describe and tag uploaded images using local vision models
- Context Memory — Maintain short conversational continuity
- SQLite Caching — Store query hashes for faster repeated responses
- Source Snippets — Show which document was used to generate a response
- Conversation Summarization —
/summarizecondenses recent chat or image interactions - Platform-Aware Session Management — User history remains isolated between Telegram and Discord
graph TD
Client_TG[Telegram Client] --> Bot_TG[telegram_app.py]
Client_DC[Discord Client] --> Bot_DC[discord_app.py]
Client_UI[Gradio UI] --> Bot_UI[local_ui.py]
Bot_TG --> Handler[core/message_handler.py]
Bot_DC --> Handler
Bot_UI --> Handler
Handler --> State[State Manager<br>In-Memory Deque]
Handler --> Cache[Cache Manager<br>SQLite]
Handler --> RAG[RAG Engine]
Handler --> Vision[Vision Engine]
RAG --> Embed[sentence-transformers<br>all-MiniLM-L6-v2]
RAG --> DB[(SQLite Vector DB)]
RAG --> LLM[Ollama API<br>llama3]
Vision --> VLM[Ollama API<br>llava]
sequenceDiagram
autonumber
actor User
participant Platform as Discord/Telegram App
participant Brain as Core Message Handler
participant Cache as Cache Manager
participant State as State Manager
participant RAG as RAG Engine
participant Vision as Vision Engine
participant DB as SQLite DB
participant Ollama as Local LLM (Ollama)
User->>Platform: Sends /ask or Uploads Image
alt Text Query (/ask)
Platform->>Brain: handle_ask_command(user_id, query)
Brain->>Cache: get(query_hash)
alt Cache Hit
Cache-->>Brain: Cached response
else Cache Miss
Cache-->>Brain: None
Brain->>State: get_history(user_id)
State-->>Brain: Last 3 interactions
Brain->>RAG: retrieve(query)
RAG->>DB: Vector similarity search
DB-->>RAG: Top-K context chunks
Brain->>RAG: generate_answer(query, context, history)
RAG->>Ollama: POST /api/generate
Ollama-->>RAG: Generated text
RAG-->>Brain: Final answer
Brain->>Cache: set(query_hash, answer)
end
Brain->>State: add_to_history(query + answer)
Brain-->>Platform: Final response
else Image Upload
Platform->>Platform: Download image to temp folder
Platform->>Brain: handle_image_upload(user_id, image_path)
Brain->>State: add_to_history("[Uploaded Image]")
Brain->>Vision: describe_image(image_path)
Vision->>Ollama: POST /api/generate (llava)
Ollama-->>Vision: Caption + tags
Vision-->>Brain: Vision output
Brain->>State: add_to_history(vision output)
Brain-->>Platform: Final response
Platform->>Platform: Delete temp file
end
Platform-->>User: Deliver final message
llama3orphi3via Ollama- Used for conversational generation, summarization, and RAG answering
llavavia Ollama- Used for image captioning and semantic tagging
all-MiniLM-L6-v2via sentence-transformers- Used for local semantic retrieval
-
SQLite for:
- vector storage
- response caching
- metadata
python-telegram-botdiscord.pygradio
project_root/
├── bots/
│ ├── telegram_app.py (Handles Telegram API & routing)
│ └── discord_app.py (Handles Discord API & routing)
├── core/
│ ├── message_handler.py (The central brain routing to RAG/Vision)
│ ├── rag_engine.py (SQLite, Embeddings, Ollama Text)
│ ├── vision_engine.py (Image downloading, Ollama Llava)
│ ├── cache_manager.py (SQLite caching)
│ ├── state_manager.py (User history tracking)
│ ├── cache.db (Response cache database)
│ └── rag.db (Vector retrieval database)
├── data/ (Your markdown/text files)
├── main.py (Starts both bots concurrently)
├── local_ui.py (Gradio UI for debugging)
├── requirements.txt
└── testing.md
Make sure you have:
- Python 3.10+
- Ollama installed locally
Pull required models:
ollama pull llama3
ollama pull llavauv venv
source .venv/bin/activate
uv pip install -r requirements.txtWindows:
.venv\Scripts\activatepython -m venv venv
source venv/bin/activate
pip install -r requirements.txtWindows:
venv\Scripts\activateCreate a .env file in the project root:
TELEGRAM_TOKEN=your_telegram_bot_token_here
DISCORD_TOKEN=your_discord_bot_token_here.env to GitHub.
- Open Telegram and search for @BotFather
- Start a chat and run:
/newbot
- Follow the setup steps to create your bot
- Copy the generated token
- Paste it into your
.envfile
- Open the Discord Developer Portal
- Create a New Application
- Open the Bot tab
- Generate your bot token
- Add the token to your
.envfile
Enable the following setting inside Bot → Privileged Gateway Intents:
Message Content Intent
Without this enabled, the bot cannot read /ask commands.
Open OAuth2 → URL Generator
Under Scopes, select:
bot
applications.commands
Then under Bot Permissions, enable:
View Channels
Send Messages
Read Message History
Attach Files
Embed Links
Discord will automatically generate an invite URL.
Open that URL in your browser and add the bot to your server.
python main.pyuv run python main.pypython local_ui.pyuv run python local_ui.pyAccess locally at:
http://127.0.0.1:7860
| Command | Platform | Description |
|---|---|---|
/ask <query> |
Telegram + Discord | Queries internal documents using RAG and keeps last 3 interactions |
/summarize |
Telegram + Discord | Summarizes recent conversation history |
| Send Image | Telegram | Generates caption + 3 semantic tags |
/image |
Discord | Attach image for caption + tags |
A complete testing checklist is available in:
testing.mdThis includes:
- Local Gradio testing
- Telegram bot validation
- Discord bot validation
- Cache verification
- Memory verification
- Vision testing


