The HP-Tree (Homogeneity-Partitioned Tree) is a fundamentally new class of tree-structured index that resolves the B+ Tree's 50-year-old unidimensional limitation. By maintaining hierarchical per-subtree dimensional statistics (DimStats) at every inner node, the HP-Tree transforms the B+ Tree's unidimensional routing structure into a multi-dimensional pruning hierarchy --- enabling sub-linear filtering on arbitrary dimensions, constant-time range aggregation, and workload-adaptive leaf packing, all within a single dynamically updatable tree.
Authors: Sutirtha Chakraborty, Kingshuk Basak
Benchmarked against tlx::btree_multimap v0.6.1 (a production-quality, cache-optimised C++ B+ Tree) on 10 million records across 4 data distributions and 25 query types:
| Metric | Result |
|---|---|
| HP-Tree wins | 66 / 100 query-distribution cells |
| Maximum HP-Tree speedup | 4,716x (Range Aggregation, Skewed) |
| Geometric mean speedup | 3.00x across all 100 cells |
| B+ Tree's best win | 2.80x (Dense Hyperbox, Skewed) |
| B+ Tree's median win margin | Just 1.19x |
| Query Type | Speedup Range | Why |
|---|---|---|
| Range Aggregation (Q7) | 170--4,716x | O(1) subtree shortcut vs O(N) scan |
| Moving Window (Q14) | 4--570x | Per-subtree DimStats pruning (rolling 3-month) |
| Dim Filter (Q5) | 1.1--284x | Hierarchical DimStats exclusion |
| Ad-Hoc Drill (Q15) | 2.3--70x | Multi-level subtree pruning |
| Group-By Agg (Q12) | 1.1--61x | DimStats-accelerated grouped sums |
| Point Lookup (Q2) | 1.8--2.5x | Cache-efficient node layout |
| Inserts (Q9) | 1.8--3.2x | Sequential-append fast path |
| Deletes (Q10) | 2.0--3.6x | Lean leaf design |
| Query Type | B+ Advantage | Explanation |
|---|---|---|
| Full Scan (Q8) | 1.6--1.9x | 43% more leaves due to 70% fill (tunable) |
| Bulk Load (Q1) | 1.1--1.3x | One-time DimStats construction cost |
| Narrow/Wide Range (Q3/Q4) | 1.2--1.9x | Tighter leaf packing |
| Hypercube/Hyperbox (Q11/Q25) | 1.5--2.8x | Bounding-box over-approximation on wide data |
| Correlated Sub (Q13) | 1.05x | Arithmetic-bound (effective tie) |
In 23 of the 34 B+ wins, the margin is less than 1.4x. The B+ Tree never exceeds 1.9x on structurally comparable queries.
-
Hierarchical Per-Subtree DimStats --- Min, max, sum, and count vectors at every inner node. A single DimStats check at level
$h$ can prune$B^h$ records. Unlike column-store zone maps (single granularity), HP-Tree DimStats operate at every level of the tree hierarchy. -
4-Gate Pruning Cascade --- Key-range box test → DimStats exclusion → Full-containment fast-path → Adaptive fallback. Ensures the HP-Tree is never asymptotically slower than the B+ Tree.
-
Adaptive Pruning-Viability Probe --- Detects when DimStats descent yields no benefit and falls back to cache-friendly leaf-chain walk, avoiding overhead on high-selectivity queries.
-
Lean Leaf Design --- No per-leaf range metadata. Min/max derived in O(1) from sorted keys. Saves ~1.4 MB for a 10M-record tree.
-
Sequential-Append Fast Path --- Bypasses O(log N) descent for monotonically increasing keys. 2x faster inserts on time-series data.
-
Workload-Adaptive Leaf Packing --- Configurable fill factor via
WorkloadProfile:-
ANALYTICAL→ 70% fill (best overall: 66/100 wins) -
SCAN_HEAVY→ 95% fill -
WRITE_HEAVY→ 70% fill -
BALANCED→ 84% fill
-
HP-TREE/
├── README.md # This file
├── CLAUDE.md # Build instructions and project notes
├── HP_TREE_Research_Article.md # Full research article (pandoc-compatible)
├── cpp/ # C++17 header-only implementation
│ ├── include/
│ │ ├── hp_tree.hpp # Core HPTree class (~1,050 lines)
│ │ ├── hp_tree_common.hpp # Config, WorkloadProfile, schema, predicates
│ │ ├── hp_tree_node.hpp # LeafNode, InnerNode definitions
│ │ ├── hp_tree_iterator.hpp # Forward iterator with predicate filtering
│ │ └── hp_tree_stats.hpp # Statistics collector
│ └── CMakeLists.txt
├── benchmark_cpp/ # C++ benchmark harness
│ ├── src/
│ │ ├── hp_runner.cpp # HP-Tree benchmark (25 queries)
│ │ └── bplus_runner.cpp # B+ Tree benchmark (25 queries)
│ ├── python/
│ │ └── generate_datasets.py # Dataset generator (4 distributions)
│ └── results/ # Benchmark result JSON files
└── .gitignore
cd benchmark_cpp
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j$(nproc)
# Generate datasets (if not present)
python3 python/generate_datasets.py --records 10000000
# Run benchmarks
./build/hp_benchmark
./build/bplus_benchmarkRequires C++17. Tested with Apple Clang 17+ and GCC 10+.
pandoc HP_TREE_Research_Article.md -o HP_TREE_Research_Article.pdf --pdf-engine=xelatexRequires pandoc and a LaTeX distribution (e.g., TeX Live or MacTeX).
#include "hp_tree.hpp"
using namespace hptree;
HPTreeConfig cfg;
cfg.workload_profile = WorkloadProfile::ANALYTICAL;
cfg.single_threaded = true;
auto schema = make_default_sales_schema();
HPTree tree(cfg, schema);
// Bulk load
std::vector<Record> records = /* ... */;
tree.bulk_load(records);
// Point lookup
auto results = tree.search(key);
// Predicate search (year = 2022)
PredicateSet ps;
ps.predicates.push_back(Predicate::eq(0, schema.dimensions[0].encode(2022)));
auto filtered = tree.predicate_search(ps);
// Range aggregation — O(1) via DimStats
auto agg = tree.aggregate_dim(5); // SUM(price)
// Iterator
for (auto it = tree.begin(); it.valid(); it.next()) {
auto* rec = it.current();
}| Operation | B+ Tree | HP-Tree (worst) | HP-Tree (best) |
|---|---|---|---|
| Point Lookup | O(log N) | O(log N) | — |
| Dim Filter | O(N) | O(N) | O(I·D + R) |
| Range Aggregation | O(N) | O(N) | O(log N) |
| Group-By Agg | O(N) | O(N) | O(I) |
| Single Insert | O(log N) | O(log N + h·D) | O(h·D) (append) |
| Space overhead | — | — | 0.16% of record storage |
The benchmark uses a 7-dimension, 56-bit retail sales schema:
| Dimension | Bits | Type | Domain |
|---|---|---|---|
| Year | 8 | Linear | 2000–2255 |
| Month | 4 | Linear | 1–12 |
| Day | 5 | Linear | 1–28 |
| State | 5 | Dictionary | 15 US states |
| Product | 5 | Dictionary | 9 categories |
| Price | 19 | Linear (×100) | $0.00–$5,242.87 |
| Version | 10 | Linear (×100) | 0.00–10.23 |
This project is provided for academic and research purposes.
If you use this work in your research, please cite:
Chakraborty, S. and Basak, K. (2026). "HP-Tree: A Homogeneity-Partitioned
Multi-Dimensional Index Structure with Per-Subtree Aggregate Pruning."