Skip to content

unverciftci/AI_Trainer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

How Our AI Trainer Code Works

A brief explanation of the agentic training system we built.

(Find the notebook here)


🎯 The Core Concept

Instead of manually adjusting training parameters, we let a small language model (Qwen3-0.6B) make those decisions for us.

Traditional approach:

# Human watches metrics and manually adjusts
for epoch in range(100):
    train_model()
    if loss_not_improving:
        learning_rate *= 0.5  # Human decision

Our agentic approach:

# AI watches metrics and automatically adjusts
for epoch in range(100):
    train_model()
    decision = ai_brain.analyze_training(metrics)  # AI decision
    apply_decision(decision)

🧠 The AI Brain (SimpleAIBrain class)

What it does:

  • Loads Qwen3-0.6B language model
  • Takes training metrics as input
  • Returns decisions about what to do next

Core method:

def think_about_training(self, train_acc, test_acc, epoch):
    question = f"""Current status:
    - Training accuracy: {train_acc:.1f}%
    - Test accuracy: {test_acc:.1f}%
    
    What should I do? Choose: continue, reduce_learning_rate, 
    increase_learning_rate, or stop_training"""
    
    ai_response = self.model.generate(question)
    return parse_decision(ai_response)

Smart fallback: If the LLM fails, it uses simple rules:

  • Big train/test gap → reduce learning rate
  • Slow learning → increase learning rate
  • Great performance → stop early

🔄 The Training Loop

Every 5 epochs, the AI analyzes and decides:

for epoch in range(max_epochs):
    # Regular training
    train_loss, train_acc = train_one_epoch()
    test_loss, test_acc = test_model()
    
    # AI decision every 5 epochs
    if epoch % 5 == 0:
        action, reason = ai_brain.think_about_training(train_acc, test_acc, epoch)
        
        if action == "reduce_learning_rate":
            current_lr *= 0.7
            update_optimizer(current_lr)
        elif action == "increase_learning_rate":
            current_lr *= 1.3
            update_optimizer(current_lr)
        elif action == "stop_training":
            break

🎮 What Makes It Work

1. Simple Decision Space

  • Only 4 possible actions: continue, reduce LR, increase LR, stop
  • Easy for small LLM to handle reliably

2. Clear Context

  • Gives AI just the essential metrics: train accuracy, test accuracy, epoch
  • No overwhelming data dumps

3. Pattern Matching

  • Extracts decisions from AI response using keyword detection
  • Works even if AI response isn't perfectly formatted

4. Robust Fallbacks

  • Rule-based backup when LLM fails
  • Never breaks, always makes some decision

💡 Key Innovations

Lightweight Intelligence:

  • Uses 600M parameter model (fits in Colab)
  • Fast inference (milliseconds per decision)
  • No need for massive compute

Real-time Adaptation:

  • Adjusts learning rate mid-training
  • Responds to overfitting immediately
  • Can stop early when optimal

Educational Design:

  • Every decision is explained
  • Shows AI reasoning process
  • Visualizes all changes over time

📊 Example Decision Flow

Epoch 15:
Train: 92%, Test: 76% → Gap = 16%

AI thinks: "Big accuracy gap suggests overfitting. 
           Should reduce learning rate."

Action: reduce_learning_rate
LR: 0.01 → 0.007

Epoch 20:  
Train: 89%, Test: 85% → Gap = 4%

AI thinks: "Gap reduced, good generalization now."

Action: continue

🚀 Why It's Better

Compared to manual tuning:

  • Faster - No human bottleneck
  • 🎯 Better results - AI spots patterns humans miss
  • 🔄 Consistent - Same logic every time
  • 😌 Effortless - Set and forget

Compared to traditional AutoML:

  • 🧠 Interpretable - See AI reasoning
  • 🛠️ Customizable - Easy to modify decisions
  • 💰 Cheap - Small model, low compute
  • 📚 Educational - Learn how it works

🎯 The Bottom Line

200 lines of code that:

  1. Load a small language model
  2. Feed it training metrics every few epochs
  3. Let it decide what to do next
  4. Apply its decisions automatically
  5. Show you everything that happened

Result: Better training with zero human intervention.

The magic isn't in complexity - it's in letting AI handle the boring optimization work while humans focus on the interesting problems.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors