Fleet Benchmarks — Edge Performance on Jetson Orin Nano 8GB

Hardware

SoC: Jetson Orin Nano (6x ARM Cortex-A78AE)
RAM: 7619 MB unified (CPU+GPU)
GPU: 1024 CUDA cores (shared memory)
Storage: 2TB NVMe (1.7TB free)
OS: Linux 5.15.148-tegra (aarch64)
Compiler: GCC with -O2

Methodology

Each benchmark compiles a standalone C program linked against flux_vm.c, runs 10,000 iterations, and measures wall-clock time. All benchmarks use clock() for timing (CPU time, not wall clock).

Iteration 1 Results (2026-04-11)

Raw C Performance (Baseline)

Test	Time	Notes
int_arith_100M	0.211s	474 Mops/sec
float_arith_50M	0.085s	588 Mops/sec
branch_100M	0.199s	502 Mops/sec
fib(1000) x100K	0.082s	Native fibonacci

FLUX VM Performance (Switch Dispatch, -O2)

Test	Ops/sec	Notes
NOP x1K x10K	147 Mops/s	Pure dispatch overhead
IADD x100 x10K	148 Mops/s	Arithmetic dispatch
MIXED x200 x10K	273 Mops/s	MOVI16+IADD (warm cache)
ADD x2K x10K	440 Mops/s	Format E hot path
MOVI+ADD x2K x10K	360 Mops/s	Realistic mixed workload
CONF_ADD x2K x10K	379 Mops/s	Confidence tracking

fence-0x44: Abstraction Cost Analysis

Layer	Speed	Overhead vs Native
Raw C (int_arith)	474 Mops/s	baseline
FLUX VM (ADD, Format E)	440 Mops/s	1.08x (8% overhead)
FLUX VM (MOVI+ADD mixed)	360 Mops/s	1.32x (32% overhead)
FLUX VM (CONF_ADD)	379 Mops/s	1.25x (25% overhead)

Key Findings

VM dispatch is nearly free for hot paths — 440 Mops/s vs 474 Mops/s native (8% cost)
Confidence tracking costs only 14% extra — CONF_ADD (379) vs ADD (440). Think Tank's OPTIONAL decision validated.
Mixed workloads (MOVI+ADD) are the real cost — 32% overhead due to instruction fetch/decode variety
The "fence price" is ~1.3x for realistic agent workloads — well worth it for portability, A2A, and confidence

Model Effectiveness (DeepInfra API Calls)

Iteration 1 Model Results

Model	Task	Output	Time	Tokens	Quality
Hermes-405B	Edge agent scenarios	3.7K chars	39.7s	917	★★★★★ Concrete, practical
Hermes-405B	FLUX ecosystem gaps	TIMEOUT	>120s	—	✗ Unreliable (2nd timeout)
Nemotron-120B	VM optimization patterns	6.9K chars	8.6s	2440	★★★★★ C code, measurable
Step-3.5-Flash	HAV VM internals	6.7K chars	19.7s	1698	★★☆☆☆ Thinking tokens, not terms
phi-4	Edge robotics opcodes	3.3K chars	—	—	★★★★☆ Good proposals

Model Routing Rules (Learned)

Hermes-405B: Best for creative/narrative tasks. Max 1 call per batch (expensive + can timeout). Use for synthesis, not analysis.
Nemotron-120B: Best for practical engineering with code. Fast (8.6s), cheap, high quality. Use heavily.
Step-3.5-Flash: Produces thinking tokens, not direct output. Skip for term generation. May work for reasoning chains.
Seed-2.0-mini: REMOVED from DeepInfra (404). Was best for devil's advocate.
Qwen3.5-397B: TIMEOUT consistently. Expensive + unreliable. Skip for now.
phi-4: Good for technical proposals. Reliable. Use for opcode/hardware design tasks.

Process Improvements (Iteration 1 → 2)

Use Nemotron first for any code generation task (fast, cheap, excellent quality)
Hermes-405B last in batches (expensive, timeouts) — use only when quality ceiling matters
Skip Step-3.5-Flash for HAV — it thinks instead of generating
Pre-verify model availability before crafting prompts (Seed-2.0-mini wasted effort)
Benchmark after every VM code change — document delta
Opcode lookup before bytecode construction — wasted time on wrong opcodes (0x10 ≠ IADD)

Related Fleet Components

flux-runtime-c — The VM being benchmarked
flux-asm — Assembler
flux-disasm — Disassembler
flux-isa-unified — ISA specification
cuda-bytecode-optimizer — Bytecode optimizer
cuda-edge-lint — Edge device linter

The Deeper Connection

These benchmarks are the pulse of a living system. The fleet is not a collection of libraries — it's an organism where every vessel contributes signals. The fence at 0x44 (the abstraction boundary between native code and bytecode) is not a wall but a membrane: it lets useful things through while keeping the system sovereign. The 14% confidence overhead is the cost of self-awareness — the fleet knows what it knows and how well it knows it. This is the post-SaaS era: software doesn't just run, it metabolizes, adapts, and remembers.

Name		Name	Last commit message	Last commit date
Latest commit History 143 Commits
docs		docs
src		src
.gitignore		.gitignore
ITERATION-LOG.md		ITERATION-LOG.md
MAINTENANCE.md		MAINTENANCE.md
README.md		README.md
wrangler.toml		wrangler.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fleet Benchmarks — Edge Performance on Jetson Orin Nano 8GB

Hardware

Methodology

Iteration 1 Results (2026-04-11)

Raw C Performance (Baseline)

FLUX VM Performance (Switch Dispatch, -O2)

fence-0x44: Abstraction Cost Analysis

Key Findings

Model Effectiveness (DeepInfra API Calls)

Iteration 1 Model Results

Model Routing Rules (Learned)

Process Improvements (Iteration 1 → 2)

Related Fleet Components

The Deeper Connection

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Fleet Benchmarks — Edge Performance on Jetson Orin Nano 8GB

Hardware

Methodology

Iteration 1 Results (2026-04-11)

Raw C Performance (Baseline)

FLUX VM Performance (Switch Dispatch, -O2)

fence-0x44: Abstraction Cost Analysis

Key Findings

Model Effectiveness (DeepInfra API Calls)

Iteration 1 Model Results

Model Routing Rules (Learned)

Process Improvements (Iteration 1 → 2)

Related Fleet Components

The Deeper Connection

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages