Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

README.md

HAT Benchmark Reproducibility Package

This directory contains everything needed to reproduce the benchmark results from the HAT paper.

Quick Start

# Run all benchmarks
./run_all_benchmarks.sh

# Run abbreviated version (faster)
./run_all_benchmarks.sh --quick

Benchmark Suite

Phase 3.1: HAT vs HNSW Comparison

Test file: tests/phase31_hat_vs_hnsw.rs

Compares HAT against HNSW on hierarchically-structured data (AI conversation patterns).

Expected Results:

Metric HAT HNSW
Recall@10 100% ~70%
Build Time 30ms 2100ms
Query Latency 1.4ms 0.5ms

Key finding: HAT achieves 30% higher recall while building 70x faster.

Phase 3.2: Real Embedding Dimensions

Test file: tests/phase32_real_embeddings.rs

Tests HAT with production embedding sizes.

Expected Results:

Dimensions Model Recall@10
384 MiniLM 100%
768 BERT-base 100%
1536 OpenAI ada-002 100%

Phase 3.3: Persistence Layer

Test file: tests/phase33_persistence.rs

Validates serialization/deserialization correctness and performance.

Expected Results:

Metric Value
Serialize throughput 300+ MB/s
Deserialize throughput 100+ MB/s
Recall after restore 100%

Phase 4.2: Attention State Format

Test file: tests/phase42_attention_state.rs

Tests the attention state serialization format.

Expected Results:

  • All 9 tests pass
  • Role types roundtrip correctly
  • Metadata preserved
  • KV cache support working

Phase 4.3: End-to-End Demo

Script: examples/demo_hat_memory.py

Full integration with sentence-transformers and optional LLM.

Expected Results:

Metric Value
Messages 2000
Tokens ~60,000
Recall accuracy 100%
Retrieval latency <5ms

Running Individual Benchmarks

Rust Benchmarks

# HAT vs HNSW
cargo test --test phase31_hat_vs_hnsw -- --nocapture

# Real embeddings
cargo test --test phase32_real_embeddings -- --nocapture

# Persistence
cargo test --test phase33_persistence -- --nocapture

# Attention state
cargo test --test phase42_attention_state -- --nocapture

Python Tests

# Setup
python3 -m venv venv
source venv/bin/activate
pip install maturin pytest sentence-transformers

# Build extension
maturin develop --features python

# Run tests
pytest python/tests/ -v

# Run demo
python examples/demo_hat_memory.py

Hardware Requirements

  • Minimum: 4GB RAM, any modern CPU
  • Recommended: 8GB RAM for large-scale tests
  • Storage: ~2GB for full benchmark suite

Expected Runtime

Mode Time
Quick (--quick) ~2 minutes
Full ~10 minutes
With LLM demo ~15 minutes

Interpreting Results

Key Metrics

  1. Recall@k: Percentage of true nearest neighbors found

    • HAT target: 100% on hierarchical data
    • HNSW baseline: ~70% on hierarchical data
  2. Build Time: Time to construct the index

    • HAT target: <100ms for 1000 points
    • Should be 50-100x faster than HNSW
  3. Query Latency: Time per query

    • HAT target: <5ms
    • Acceptable to be 2-3x slower than HNSW (recall matters more)
  4. Throughput: Serialization/deserialization speed

    • Target: 100+ MB/s

Success Criteria

The benchmarks validate the paper's claims if:

  1. HAT recall@10 ≥ 99% on hierarchical data
  2. HAT recall significantly exceeds HNSW on hierarchical data
  3. HAT builds faster than HNSW
  4. Persistence preserves 100% recall
  5. Python bindings pass all tests
  6. End-to-end demo achieves ≥95% retrieval accuracy

Troubleshooting

Build Errors

# Update Rust
rustup update

# Clean build
cargo clean && cargo build --release

Python Issues

# Ensure venv is activated
source venv/bin/activate

# Rebuild extension
maturin develop --features python --release

Memory Issues

For large-scale tests, ensure sufficient RAM:

# Check available memory
free -h

# Run with limited parallelism
RAYON_NUM_THREADS=2 cargo test --test phase31_hat_vs_hnsw

Output Files

Results are saved to benchmarks/results/:

results/
  benchmark_results_YYYYMMDD_HHMMSS.txt  # Full output

Citation

If you use these benchmarks, please cite:

@article{hat2026,
  title={Hierarchical Attention Tree: Extending LLM Context Through Structural Memory},
  author={AI Research Lab},
  year={2026}
}