LocoOperator-4B is a 4B-parameter code exploration agent distilled from Qwen3-Coder-Next. Designed as a local sub agent for Claude Code-style agent loops β fast codebase navigation at zero API cost.
- π° News & Updates
- π Introduction
- β¨ Key Features
- ποΈ Architecture
- π Performance
- π Quick Start
- π§ Analysis Pipeline
- π Project Structure
β οΈ Known Limitations- π License
- π Acknowledgments
- [2026-02-24] π LocoOperator-4B-GGUF released for local deployment.
- [2026-02-23] π LocoOperator-4B model card and evaluation analysis released.
LocoOperator-4B is a tool-calling agent model trained via knowledge distillation from Qwen3-Coder-Next inference traces. It specializes in multi-turn codebase exploration β reading files, searching code, and navigating project structures within a Claude Code-style agent loop.
| LocoOperator-4B | |
|---|---|
| Base Model | Qwen3-4B-Instruct-2507 |
| Teacher Model | Qwen3-Coder-Next |
| Training Method | Full-parameter SFT (distillation) |
| Training Data | 170,356 multi-turn conversation samples |
| Max Sequence Length | 16,384 tokens |
| Training Hardware | 4x NVIDIA H200 141GB SXM5 |
| Training Time | ~25 hours |
| Framework | MS-SWIFT |
- π§ Tool-Calling Agent: Generates structured
<tool_call>JSON for Read, Grep, Glob, Bash, Write, Edit, and Task (subagent delegation) - π― 100% JSON Validity: Every tool call is valid JSON with all required arguments β outperforming the teacher model (87.6%)
- π Local Deployment: GGUF quantized, runs on Mac Studio via llama.cpp at zero API cost
- β‘ Lightweight Explorer: 4B parameters, optimized for fast codebase search and navigation
- π Multi-Turn: Handles conversation depths of 3β33 messages with consistent tool-calling behavior
LocoOperator-4B operates as a sub agent (explorer) within a two-tier agent system:
The main agent handles decision-making and code generation while delegating codebase exploration to LocoOperator-4B β keeping API costs low and latency minimal.
Evaluated on 65 multi-turn conversation samples from diverse open-source projects (scipy, fastapi, arrow, attrs, gevent, gunicorn, etc.), with labels generated by Qwen3-Coder-Next.
| Metric | Score |
|---|---|
| Tool Call Presence Alignment | 100% (65/65) |
| First Tool Type Match | 65.6% (40/61) |
| JSON Validity | 100% (76/76) |
| Argument Syntax Correctness | 100% (76/76) |
The model perfectly learned when to use tools vs. when to respond with text (100% presence alignment). Tool type mismatches are between semantically similar tools (e.g. Grep vs Read) β different but often valid strategies.
| Tool | LocoOperator-4B | Qwen3-Coder-Next | Delta |
|---|---|---|---|
| Read | 22 | 32 | -10 |
| Bash | 22 | 17 | +5 |
| Grep | 14 | 18 | -4 |
| Glob | 9 | 11 | -2 |
| Task | 7 | 7 | 0 |
| Write | 2 | 3 | -1 |
| Total | 76 | 89 | -13 |
| Model | JSON Valid | Argument Syntax Valid |
|---|---|---|
| LocoOperator-4B | 76/76 (100%) | 76/76 (100%) |
| Qwen3-Coder-Next (teacher) | 89/89 (100%) | 78/89 (87.6%) |
LocoOperator-4B achieves perfect structured output. The teacher model has 11 tool calls with missing required arguments (empty
arguments: {}).
- Claude Code β
npm install -g @anthropic-ai/claude-code - llama.cpp β build from source or
brew install llama.cpp - uv β
curl -LsSf https://astral.sh/uv/install.sh | sh - OpenRouter API key β https://openrouter.ai/keys
# Download the GGUF model
# (replace with actual model path)
# Start the server
./llama-server \
-m LocoOperator-4B.gguf \
--ctx-size 51200 \
--host 0.0.0.0 \
--port 8080| Parameter | Value | Rationale |
|---|---|---|
| Context size | 50K | Covers multi-turn exploration with room for tool outputs |
| Max turns | 10 | Sufficient for focused codebase exploration tasks |
| Temperature | 0.7 | Balanced between determinism and exploration |
This repo includes a hybrid analysis pipeline that combines LocoOperator-4B (local) with a cloud LLM via OpenRouter, orchestrated through claude -p.
claude -p (sonnet) βββ proxy (9091) βββ OpenRouter Qwen3-Coder-Next
ββ subagent (haiku) ββ proxy (9091) βββ local llama-server (8080)
The main agent runs as sonnet (cloud), and when it spawns subagents (Task tool) they default to haiku, which the proxy routes to the local 4B model. If the local model hits context limits or exceeds 10 turns, the proxy automatically falls back to OpenRouter.
The proxy (scripts/proxy.py) handles:
- Anthropic Messages API β OpenAI Chat Completions format conversion for the local model
- Parsing
<tool_call>text output from the local model back into Anthropic tool_use blocks - Automatic fallback to OpenRouter on context overflow
# Clone the repository
git clone https://github.com/LocoreMind/LocoOperator.git
cd LocoOperator
# Install Python dependencies
uv sync
# Configure your OpenRouter API key
cp .env.example .env
# Edit .env and set OPENROUTER_API_KEY.claude/settings.local.json is auto-generated on first run from your .env key. No need to create it manually.
Place your GGUF model at models/LocoOperator-4B-GGUF/LocoOperator-4B.gguf.
Place target projects under data/repos/.
./scripts/test_single.sh tqdm "How does tqdm detect if running in a Jupyter notebook?"./scripts/analyze.sh tqdmThis reads queries from data/queries/tqdm-queries.txt and saves results to data/outputs/tqdm/.
- Clone a project into
data/repos/:git clone --depth 1 https://github.com/user/repo data/repos/repo rm -rf data/repos/repo/.git
- Create a queries file at
data/queries/repo-queries.txt(tab-separatedid\tquery) - Run:
./scripts/analyze.sh repo
LocoOperator/
βββ .env.example # OpenRouter key template
βββ pyproject.toml
βββ models/LocoOperator-4B-GGUF/
β βββ LocoOperator-4B.gguf
βββ examples/ # model inference examples
β βββ quick_start.py
β βββ codebase_analysis_example.py
βββ data/
β βββ repos/ # target projects to analyze
β βββ queries/tqdm-queries.txt # analysis queries (tab-separated: id\tquery)
β βββ outputs/ # analysis results
βββ prompts/
β βββ analyze_query.txt # prompt template
βββ scripts/
βββ proxy.py # hybrid routing proxy
βββ setup.sh # auto-generates .claude/settings.local.json
βββ start_services.sh # auto-starts llama-server + proxy
βββ analyze.sh # batch analysis runner
βββ test_single.sh # single query test
π Click to expand full training configuration
| Parameter | Value |
|---|---|
| Base model | Qwen3-4B-Instruct-2507 |
| Teacher model | Qwen3-Coder-Next |
| Method | Full-parameter SFT |
| Training data | 170,356 samples |
| Hardware | 4x NVIDIA H200 141GB SXM5 |
| Parallelism | DDP (no DeepSpeed) |
| Precision | BF16 |
| Epochs | 1 |
| Batch size | 2/GPU, gradient accumulation 4 (effective batch 32) |
| Learning rate | 2e-5, warmup ratio 0.03 |
| Max sequence length | 16,384 tokens |
| Template | qwen3_nothinking |
| Framework | MS-SWIFT |
| Training time | ~25 hours |
| Checkpoint | Step 2524 |
- First-tool-type match is 65.6% β the model sometimes picks a different (but not necessarily wrong) tool than the teacher
- Tends to under-generate parallel tool calls compared to the teacher (76 vs 89 total calls across 65 samples)
- Preference for Bash over Read may indicate the model defaults to shell commands where file reads would be more appropriate
- Evaluated on 65 samples only; larger-scale evaluation needed
This project is licensed under the MIT License - see the LICENSE file for details.

