benchmarks

HAT Benchmark Reproducibility Package

This directory contains everything needed to reproduce the benchmark results from the HAT paper.

Quick Start

# Run all benchmarks
./run_all_benchmarks.sh

# Run abbreviated version (faster)
./run_all_benchmarks.sh --quick

Benchmark Suite

Phase 3.1: HAT vs HNSW Comparison

Test file: tests/phase31_hat_vs_hnsw.rs

Compares HAT against HNSW on hierarchically-structured data (AI conversation patterns).

Expected Results:

Metric	HAT	HNSW
Recall@10	100%	~70%
Build Time	30ms	2100ms
Query Latency	1.4ms	0.5ms

Key finding: HAT achieves 30% higher recall while building 70x faster.

Phase 3.2: Real Embedding Dimensions

Test file: tests/phase32_real_embeddings.rs

Tests HAT with production embedding sizes.

Expected Results:

Dimensions	Model	Recall@10
384	MiniLM	100%
768	BERT-base	100%
1536	OpenAI ada-002	100%

Phase 3.3: Persistence Layer

Test file: tests/phase33_persistence.rs

Validates serialization/deserialization correctness and performance.

Expected Results:

Metric	Value
Serialize throughput	300+ MB/s
Deserialize throughput	100+ MB/s
Recall after restore	100%

Phase 4.2: Attention State Format

Test file: tests/phase42_attention_state.rs

Tests the attention state serialization format.

Expected Results:

All 9 tests pass
Role types roundtrip correctly
Metadata preserved
KV cache support working

Phase 4.3: End-to-End Demo

Script: examples/demo_hat_memory.py

Full integration with sentence-transformers and optional LLM.

Expected Results:

Metric	Value
Messages	2000
Tokens	~60,000
Recall accuracy	100%
Retrieval latency	<5ms

Running Individual Benchmarks

Rust Benchmarks

# HAT vs HNSW
cargo test --test phase31_hat_vs_hnsw -- --nocapture

# Real embeddings
cargo test --test phase32_real_embeddings -- --nocapture

# Persistence
cargo test --test phase33_persistence -- --nocapture

# Attention state
cargo test --test phase42_attention_state -- --nocapture

Python Tests

# Setup
python3 -m venv venv
source venv/bin/activate
pip install maturin pytest sentence-transformers

# Build extension
maturin develop --features python

# Run tests
pytest python/tests/ -v

# Run demo
python examples/demo_hat_memory.py

Hardware Requirements

Minimum: 4GB RAM, any modern CPU
Recommended: 8GB RAM for large-scale tests
Storage: ~2GB for full benchmark suite

Expected Runtime

Mode	Time
Quick (`--quick`)	~2 minutes
Full	~10 minutes
With LLM demo	~15 minutes

Interpreting Results

Key Metrics

Recall@k: Percentage of true nearest neighbors found
- HAT target: 100% on hierarchical data
- HNSW baseline: ~70% on hierarchical data
Build Time: Time to construct the index
- HAT target: <100ms for 1000 points
- Should be 50-100x faster than HNSW
Query Latency: Time per query
- HAT target: <5ms
- Acceptable to be 2-3x slower than HNSW (recall matters more)
Throughput: Serialization/deserialization speed
- Target: 100+ MB/s

Success Criteria

The benchmarks validate the paper's claims if:

HAT recall@10 ≥ 99% on hierarchical data
HAT recall significantly exceeds HNSW on hierarchical data
HAT builds faster than HNSW
Persistence preserves 100% recall
Python bindings pass all tests
End-to-end demo achieves ≥95% retrieval accuracy

Troubleshooting

Build Errors

# Update Rust
rustup update

# Clean build
cargo clean && cargo build --release

Python Issues

# Ensure venv is activated
source venv/bin/activate

# Rebuild extension
maturin develop --features python --release

Memory Issues

For large-scale tests, ensure sufficient RAM:

# Check available memory
free -h

# Run with limited parallelism
RAYON_NUM_THREADS=2 cargo test --test phase31_hat_vs_hnsw

Output Files

Results are saved to benchmarks/results/:

results/
  benchmark_results_YYYYMMDD_HHMMSS.txt  # Full output

Citation

If you use these benchmarks, please cite:

@article{hat2026,
  title={Hierarchical Attention Tree: Extending LLM Context Through Structural Memory},
  author={AI Research Lab},
  year={2026}
}

Name		Name	Last commit message	Last commit date
parent directory ..
results		results
README.md		README.md
run_all_benchmarks.sh		run_all_benchmarks.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

HAT Benchmark Reproducibility Package

Quick Start

Benchmark Suite

Phase 3.1: HAT vs HNSW Comparison

Phase 3.2: Real Embedding Dimensions

Phase 3.3: Persistence Layer

Phase 4.2: Attention State Format

Phase 4.3: End-to-End Demo

Running Individual Benchmarks

Rust Benchmarks

Python Tests

Hardware Requirements

Expected Runtime

Interpreting Results

Key Metrics

Success Criteria

Troubleshooting

Build Errors

Python Issues

Memory Issues

Output Files

Citation

FilesExpand file tree

benchmarks

Directory actions

More options

Directory actions

More options

Latest commit

History

benchmarks

Folders and files

parent directory

README.md

HAT Benchmark Reproducibility Package

Quick Start

Benchmark Suite

Phase 3.1: HAT vs HNSW Comparison

Phase 3.2: Real Embedding Dimensions

Phase 3.3: Persistence Layer

Phase 4.2: Attention State Format

Phase 4.3: End-to-End Demo

Running Individual Benchmarks

Rust Benchmarks

Python Tests

Hardware Requirements

Expected Runtime

Interpreting Results

Key Metrics

Success Criteria

Troubleshooting

Build Errors

Python Issues

Memory Issues

Output Files

Citation