Build your own witty, tool-using, Indic-language-capable AI agent inspired by Grok — using only Python, LangChain, and llama.cpp.
Features:
- ReAct-style agent with tools (web search + Python REPL)
- Runs efficiently on CPU (or GPU) via llama.cpp GGUF models
- Fine-tuning support for Indic languages (Hindi, Marathi, etc.)
- Deployable on AWS EC2 Mumbai region (low-latency for Indian users)
- Total core logic < 400 lines
Perfect for Mumbai/Bengaluru devs who want local-first AI without spending ₹50k/month on cloud GPUs.
- Install dependencies
pip install -r requirements.txtDownload Llama 3.1 8B GGUF (Q5_K_M recommended)
huggingface-cli download TheBloke/Llama-3.1-8B-GGUF llama-3.1-8b.Q5_K_M.gguf --local-dir ./modelsRun simple chat test
python inference_test.pyRun full FastAPI server
uvicorn app:app --reload --port 8000Then visit http://localhost:8000/docs and try the /chat endpoint.
Indic Fine-Tuning (Optional)
python fine_tune_indic.pyRequires GPU or patience (takes ~4–12 hours on t4g.medium EC2). Deploy on AWS Mumbai (ap-south-1)
bash deploy_ec2.shSee deploy_ec2.sh for full instructions.