A brief explanation of the agentic training system we built.
(Find the notebook here)
Instead of manually adjusting training parameters, we let a small language model (Qwen3-0.6B) make those decisions for us.
Traditional approach:
# Human watches metrics and manually adjusts
for epoch in range(100):
train_model()
if loss_not_improving:
learning_rate *= 0.5 # Human decisionOur agentic approach:
# AI watches metrics and automatically adjusts
for epoch in range(100):
train_model()
decision = ai_brain.analyze_training(metrics) # AI decision
apply_decision(decision)What it does:
- Loads Qwen3-0.6B language model
- Takes training metrics as input
- Returns decisions about what to do next
Core method:
def think_about_training(self, train_acc, test_acc, epoch):
question = f"""Current status:
- Training accuracy: {train_acc:.1f}%
- Test accuracy: {test_acc:.1f}%
What should I do? Choose: continue, reduce_learning_rate,
increase_learning_rate, or stop_training"""
ai_response = self.model.generate(question)
return parse_decision(ai_response)Smart fallback: If the LLM fails, it uses simple rules:
- Big train/test gap → reduce learning rate
- Slow learning → increase learning rate
- Great performance → stop early
Every 5 epochs, the AI analyzes and decides:
for epoch in range(max_epochs):
# Regular training
train_loss, train_acc = train_one_epoch()
test_loss, test_acc = test_model()
# AI decision every 5 epochs
if epoch % 5 == 0:
action, reason = ai_brain.think_about_training(train_acc, test_acc, epoch)
if action == "reduce_learning_rate":
current_lr *= 0.7
update_optimizer(current_lr)
elif action == "increase_learning_rate":
current_lr *= 1.3
update_optimizer(current_lr)
elif action == "stop_training":
break1. Simple Decision Space
- Only 4 possible actions: continue, reduce LR, increase LR, stop
- Easy for small LLM to handle reliably
2. Clear Context
- Gives AI just the essential metrics: train accuracy, test accuracy, epoch
- No overwhelming data dumps
3. Pattern Matching
- Extracts decisions from AI response using keyword detection
- Works even if AI response isn't perfectly formatted
4. Robust Fallbacks
- Rule-based backup when LLM fails
- Never breaks, always makes some decision
Lightweight Intelligence:
- Uses 600M parameter model (fits in Colab)
- Fast inference (milliseconds per decision)
- No need for massive compute
Real-time Adaptation:
- Adjusts learning rate mid-training
- Responds to overfitting immediately
- Can stop early when optimal
Educational Design:
- Every decision is explained
- Shows AI reasoning process
- Visualizes all changes over time
Epoch 15:
Train: 92%, Test: 76% → Gap = 16%
AI thinks: "Big accuracy gap suggests overfitting.
Should reduce learning rate."
Action: reduce_learning_rate
LR: 0.01 → 0.007
Epoch 20:
Train: 89%, Test: 85% → Gap = 4%
AI thinks: "Gap reduced, good generalization now."
Action: continue
Compared to manual tuning:
- ⚡ Faster - No human bottleneck
- 🎯 Better results - AI spots patterns humans miss
- 🔄 Consistent - Same logic every time
- 😌 Effortless - Set and forget
Compared to traditional AutoML:
- 🧠 Interpretable - See AI reasoning
- 🛠️ Customizable - Easy to modify decisions
- 💰 Cheap - Small model, low compute
- 📚 Educational - Learn how it works
200 lines of code that:
- Load a small language model
- Feed it training metrics every few epochs
- Let it decide what to do next
- Apply its decisions automatically
- Show you everything that happened
Result: Better training with zero human intervention.
The magic isn't in complexity - it's in letting AI handle the boring optimization work while humans focus on the interesting problems.