HP-Tree: A Breakthrough Multi-Dimensional Index Structure

The HP-Tree (Homogeneity-Partitioned Tree) is a fundamentally new class of tree-structured index that resolves the B+ Tree's 50-year-old unidimensional limitation. By maintaining hierarchical per-subtree dimensional statistics (DimStats) at every inner node, the HP-Tree transforms the B+ Tree's unidimensional routing structure into a multi-dimensional pruning hierarchy --- enabling sub-linear filtering on arbitrary dimensions, constant-time range aggregation, and workload-adaptive leaf packing, all within a single dynamically updatable tree.

Authors: Sutirtha Chakraborty, Kingshuk Basak

Breakthrough Performance

Benchmarked against tlx::btree_multimap v0.6.1 (a production-quality, cache-optimised C++ B+ Tree) on 10 million records across 4 data distributions and 25 query types:

Metric	Result
HP-Tree wins	66 / 100 query-distribution cells
Maximum HP-Tree speedup	4,716x (Range Aggregation, Skewed)
Geometric mean speedup	3.00x across all 100 cells
B+ Tree's best win	2.80x (Dense Hyperbox, Skewed)
B+ Tree's median win margin	Just 1.19x

When HP-Tree Wins, It Wins Big

Query Type	Speedup Range	Why
Range Aggregation (Q7)	170--4,716x	O(1) subtree shortcut vs O(N) scan
Moving Window (Q14)	4--570x	Per-subtree DimStats pruning (rolling 3-month)
Dim Filter (Q5)	1.1--284x	Hierarchical DimStats exclusion
Ad-Hoc Drill (Q15)	2.3--70x	Multi-level subtree pruning
Group-By Agg (Q12)	1.1--61x	DimStats-accelerated grouped sums
Point Lookup (Q2)	1.8--2.5x	Cache-efficient node layout
Inserts (Q9)	1.8--3.2x	Sequential-append fast path
Deletes (Q10)	2.0--3.6x	Lean leaf design

When B+ Tree Wins, the Margin Is Thin

Query Type	B+ Advantage	Explanation
Full Scan (Q8)	1.6--1.9x	43% more leaves due to 70% fill (tunable)
Bulk Load (Q1)	1.1--1.3x	One-time DimStats construction cost
Narrow/Wide Range (Q3/Q4)	1.2--1.9x	Tighter leaf packing
Hypercube/Hyperbox (Q11/Q25)	1.5--2.8x	Bounding-box over-approximation on wide data
Correlated Sub (Q13)	1.05x	Arithmetic-bound (effective tie)

In 23 of the 34 B+ wins, the margin is less than 1.4x. The B+ Tree never exceeds 1.9x on structurally comparable queries.

Key Innovations

Hierarchical Per-Subtree DimStats --- Min, max, sum, and count vectors at every inner node. A single DimStats check at level $h$ can prune $B^h$ records. Unlike column-store zone maps (single granularity), HP-Tree DimStats operate at every level of the tree hierarchy.
4-Gate Pruning Cascade --- Key-range box test → DimStats exclusion → Full-containment fast-path → Adaptive fallback. Ensures the HP-Tree is never asymptotically slower than the B+ Tree.
Adaptive Pruning-Viability Probe --- Detects when DimStats descent yields no benefit and falls back to cache-friendly leaf-chain walk, avoiding overhead on high-selectivity queries.
Lean Leaf Design --- No per-leaf range metadata. Min/max derived in O(1) from sorted keys. Saves ~1.4 MB for a 10M-record tree.
Sequential-Append Fast Path --- Bypasses O(log N) descent for monotonically increasing keys. 2x faster inserts on time-series data.
Workload-Adaptive Leaf Packing --- Configurable fill factor via WorkloadProfile:
- ANALYTICAL → 70% fill (best overall: 66/100 wins)
- SCAN_HEAVY → 95% fill
- WRITE_HEAVY → 70% fill
- BALANCED → 84% fill

Repository Structure

HP-TREE/
├── README.md                       # This file
├── CLAUDE.md                       # Build instructions and project notes
├── HP_TREE_Research_Article.md     # Full research article (pandoc-compatible)
├── cpp/                            # C++17 header-only implementation
│   ├── include/
│   │   ├── hp_tree.hpp             # Core HPTree class (~1,050 lines)
│   │   ├── hp_tree_common.hpp      # Config, WorkloadProfile, schema, predicates
│   │   ├── hp_tree_node.hpp        # LeafNode, InnerNode definitions
│   │   ├── hp_tree_iterator.hpp    # Forward iterator with predicate filtering
│   │   └── hp_tree_stats.hpp       # Statistics collector
│   └── CMakeLists.txt
├── benchmark_cpp/                  # C++ benchmark harness
│   ├── src/
│   │   ├── hp_runner.cpp           # HP-Tree benchmark (25 queries)
│   │   └── bplus_runner.cpp        # B+ Tree benchmark (25 queries)
│   ├── python/
│   │   └── generate_datasets.py    # Dataset generator (4 distributions)
│   └── results/                    # Benchmark result JSON files
└── .gitignore

Quick Start

Build & Run Benchmarks

cd benchmark_cpp
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc)

# Generate datasets (if not present)
python3 python/generate_datasets.py --records 10000000

# Run benchmarks
./build/hp_benchmark
./build/bplus_benchmark

Requires C++17. Tested with Apple Clang 17+ and GCC 10+.

Render Research Article to PDF

pandoc HP_TREE_Research_Article.md -o HP_TREE_Research_Article.pdf --pdf-engine=xelatex

Requires pandoc and a LaTeX distribution (e.g., TeX Live or MacTeX).

C++ API Overview

#include "hp_tree.hpp"
using namespace hptree;

HPTreeConfig cfg;
cfg.workload_profile = WorkloadProfile::ANALYTICAL;
cfg.single_threaded  = true;

auto schema = make_default_sales_schema();
HPTree tree(cfg, schema);

// Bulk load
std::vector<Record> records = /* ... */;
tree.bulk_load(records);

// Point lookup
auto results = tree.search(key);

// Predicate search (year = 2022)
PredicateSet ps;
ps.predicates.push_back(Predicate::eq(0, schema.dimensions[0].encode(2022)));
auto filtered = tree.predicate_search(ps);

// Range aggregation — O(1) via DimStats
auto agg = tree.aggregate_dim(5);  // SUM(price)

// Iterator
for (auto it = tree.begin(); it.valid(); it.next()) {
    auto* rec = it.current();
}

Complexity Summary

Operation	B+ Tree	HP-Tree (worst)	HP-Tree (best)
Point Lookup	O(log N)	O(log N)	—
Dim Filter	O(N)	O(N)	O(I·D + R)
Range Aggregation	O(N)	O(N)	O(log N)
Group-By Agg	O(N)	O(N)	O(I)
Single Insert	O(log N)	O(log N + h·D)	O(h·D) (append)
Space overhead	—	—	0.16% of record storage

Composite Key Schema

The benchmark uses a 7-dimension, 56-bit retail sales schema:

Dimension	Bits	Type	Domain
Year	8	Linear	2000–2255
Month	4	Linear	1–12
Day	5	Linear	1–28
State	5	Dictionary	15 US states
Product	5	Dictionary	9 categories
Price	19	Linear (×100)	$0.00–$5,242.87
Version	10	Linear (×100)	0.00–10.23

License

This project is provided for academic and research purposes.

Citation

If you use this work in your research, please cite:

Chakraborty, S. and Basak, K. (2026). "HP-Tree: A Homogeneity-Partitioned
Multi-Dimensional Index Structure with Per-Subtree Aggregate Pruning."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HP-Tree: A Breakthrough Multi-Dimensional Index Structure

Breakthrough Performance

When HP-Tree Wins, It Wins Big

When B+ Tree Wins, the Margin Is Thin

Key Innovations

Repository Structure

Quick Start

Build & Run Benchmarks

Render Research Article to PDF

C++ API Overview

Complexity Summary

Composite Key Schema

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
benchmark		benchmark
benchmark_cpp		benchmark_cpp
cpp		cpp
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
HP_TREE_Research_Article.md		HP_TREE_Research_Article.md
HP_TREE_Research_Article.pdf		HP_TREE_Research_Article.pdf
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

HP-Tree: A Breakthrough Multi-Dimensional Index Structure

Breakthrough Performance

When HP-Tree Wins, It Wins Big

When B+ Tree Wins, the Margin Is Thin

Key Innovations

Repository Structure

Quick Start

Build & Run Benchmarks

Render Research Article to PDF

C++ API Overview

Complexity Summary

Composite Key Schema

License

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages