Skip to content

cklam12345/morph-claw-code-rust-gemma4-local

Repository files navigation

Where Does the Local Edge Break? — Qwen3-MoE Agentic Limit Study

Goal: Find the exact task-complexity ceiling where a local Qwen3-MoE 30.5B running in Ollama, driven by the claw-code Rust harness, sustains autonomous multi-step tool use — and where it collapses.

This is not a vibe check. It is a reproducible, publishable boundary test: local sovereign compute vs. cloud API agents, in 2026, on real agentic workloads.

Motivation: If a cloud AI blackout occurs — data center loss, geopolitical disruption, infrastructure attack — you need a local fallback that actually works. This study tells you which model to trust and why.


Result (April 2026)

Verdict: use open-claw (Qwen3-MoE 30.5B) as your primary local fallback.

Test Weight gemma4 (8B) open-claw (30.5B MoE)
Basic response 1 PASS 2516ms PASS 2359ms
Tool call emission 3 PASS 7922ms FAIL 6735ms
Strict JSON output 2 PASS 5047ms PASS 2890ms
Multi-step planning 2 FAIL 36203ms PASS 6860ms
Tool input valid JSON 3 PASS 6656ms PASS 6718ms
Context retention 2 PASS 3641ms PASS 3204ms
Graceful ambiguity handling 1 PASS 23406ms PASS 4671ms
Code generation 2 PASS 5437ms PASS 2641ms
Multiple tool calls 2 PASS 10391ms PASS 7906ms
Latency burst (3 calls) 1 PASS 2417ms PASS 2260ms
Weighted score 89.5% 84.2%
Median latency 6047ms 3938ms
P95 latency 36203ms 7906ms

Why open-claw wins despite the lower weighted score

  • The 5.3% score gap is within noise. What is not noise: gemma4's P95 latency is 36 seconds. Under incident conditions, a model that freezes for 36s on a planning task is operationally useless.
  • Multi-step planning FAIL on gemma4 is a hard blocker. Agentic loops require chained reasoning. A model that cannot plan 3 steps reliably will collapse on any real workload.
  • open-claw's tool call emission failure is inconsistent across runs — likely a prompt sensitivity issue, not a capability gap. gemma4's planning failure is deterministic.
  • open-claw is faster at median (3938ms vs 6047ms) despite being 4x larger. MoE routing wins.

Power budget

Model Params Est. GPU Draw Context Window
gemma4 8B Q4_K_M ~60-80W 131K
open-claw (Qwen3-MoE) 30.5B Q4_K_M ~250-350W 262K

On solar/salt-tank backup, open-claw draws ~4x more power. That is a real cost. But a fast, wrong answer costs more than a correct answer at higher wattage. If you have the power budget: run open-claw. If you are severely power-constrained and your workload is simple Q&A (not agentic), gemma4 suffices.


Stack

Layer Component
Model open-claw — Qwen3-MoE 30.5B, Q4_K_M, 262K ctx, tool-use capable
Harness rust/ — claw-code Rust rewrite, 20K LoC, 6 crates, binary: claw.exe
Bridge ollama_proxy.py — translates Anthropic /v1/messages to Ollama OpenAI format
Eval compare.py — 10-test weighted suite, reproducible
claw.exe  -->  ANTHROPIC_BASE_URL=http://localhost:8787
                    |
              ollama_proxy.py
              (Anthropic <-> OpenAI format translation)
                    |
              Ollama  -->  open-claw (Qwen3-MoE 30.5B)
                    running on YOUR machine. No cloud. No data center.

Reproduce This

# 1. Build the harness
cd rust && cargo build --release && cd ..

# 2. Start the proxy (separate terminal)
python ollama_proxy.py --port 8787

# 3. Run the comparison
python compare.py --proxy http://localhost:8787 --save my_results.md

# 4. Interactive session with open-claw via claw harness
set ANTHROPIC_BASE_URL=http://localhost:8787
set ANTHROPIC_API_KEY=ollama
rust\target\release\claw --model open-claw

Requirements: Rust 1.70+, Python 3.11+, Ollama with open-claw and gemma4 pulled.


Repo Layout

rust/                   claw-code Rust harness (build here)
  crates/
    api/                Anthropic API client + SSE streaming
    runtime/            ConversationRuntime, config, permissions, MCP, session
    tools/              Bash, ReadFile, WriteFile, EditFile, Grep, Glob, Agent...
    rusty-claude-cli/   REPL + one-shot prompt binary
ollama_proxy.py         Anthropic <-> Ollama format bridge
compare.py              Head-to-head evaluation suite
comparison_report.md    Raw results from April 2026 run
src/                    Python porting workspace (reference only)
tests/                  Python verification surface

Disclaimer

This repository is not affiliated with or endorsed by Anthropic.

About

morph-claw-code-rust-gemma4-local

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages