geodessical v0.6.0 — GPU: RTX 3070 8GB (40 TFLOPS) — Host: Windows x86-64
| Model | Method | Decode (tok/s) | Prefill (tok/s) | VRAM (MB) | FFN compr | Attn compr | Output quality |
|---|---|---|---|---|---|---|---|
| SmolLM2-135M | BASELINE | 191.1 | — | 176 | 0% | 0% | coherent |
| SmolLM2-135M | FLAT-SVD | 24.2 | 262.4 | 176 | 65.6% | 0% | garbled (â??) |
| SmolLM2-135M | GRC | 263.2 | 6844.6 | 262 | 65.6% | 99.8% | degraded (repetitive) |
| Phi-3.5-mini | BASELINE | 51.3 | — | 4600 | 0% | 0% | coherent |
| Phi-3.5-mini | FLAT-SVD | 38.0 | 83.9 | 2436 | 94.3% | 0% | garbled (,,,) |
| Phi-3.5-mini | GRC | 322.6 | 5200.2 | ~3190 | 94.3% | 100% | degraded (repetitive) |
- SmolLM2: GRC 263.2 tok/s vs FLAT-SVD 24.2 tok/s → 10.9× faster decode
- Phi-3.5-mini: GRC 322.6 tok/s vs FLAT-SVD 38.0 tok/s → 8.5× faster decode
- Prefill: GRC 6844.6 tok/s vs FLAT-SVD 262.4 tok/s (SmolLM2) → 26× faster prefill
FLAT-SVD must compute two matrix multiplications per forward pass (W ≈ U·Σ·Vᵀ expanded at runtime). GRC's online-feedback loop (axiom survey → curvature-weighted rank allocation → manifold PCA) produces compressed attention projections (Q/K/V/O) that are precomputed once and stored. At inference, these projections skip the GPU DRAM round-trip entirely — enabling prefill speeds that hit 96.8% of peak TFLOPS on Phi (38714 GFLOPS vs 40 TFLOPS peak), vs FLAT-SVD at under 2%.
FLAT-SVD uses a static low-rank basis, equivalent to ESPACE/ASVD: choose rank once, freeze. GRC uses per-layer curvature-weighted rank allocation from the axiom manifold survey:
- Rank cap of 128 auto-applied to Phi (dim=3072, ff=8192) based on
max(m,n)≥4096criterion — ASVD would allocate rank uniformly without this signal - 99.8% attention reduction on SmolLM2 (120 matrices → 0 MB retained) vs FLAT-SVD 0%
- GRC's online feedback (curvature, Fisher metric, axiom consistency) drives the rank budget per layer; ESPACE allocates rank statically by singular value threshold alone
- Both compressed methods produce degraded output at these compression ratios (94.3% FFN for Phi, 65.6% for SmolLM2) — the rank is too low for lossless reconstruction
- GRC Phi GRC attn error: 64-sample PCA → rank-1 projection (energy≈0) due to OOM with 512 samples; output degraded as a result
- FLAT-SVD rank-cap bug fixed in this session:
max(m,n)≥4096threshold (was checkingm≥4096only, missing ff=8192 dimension on Phi) - PPL measurements not available for compressed methods (requires llama-cpp-python evaluation loop; compressed weights are in-process only)
From axiom_beta_report.json:
- Intrinsic dimension: 17 (TwoNN=16.97)
- Symmetry score: 0.9723
- Mean curvature: 72674 (Fisher-weighted)
- Axiom count: 34 (consistency=0.956)
- Survey time: 260s (phases 1–4; geodesic skipped)