This is the official implementation of the paper Hi-PNG: Efficient Interval-Filtering ANNS via Hierarchical Interval Partition Navigating Graph.
- C++ 17
- Python
- OpenMP
# python requirements
pip install -r requirements.txtOur experiment use CMake to compile, and use pybind11 to bind C++ code into Python package. To achieve this, u can use the code as follows:
mkdir -p build && cd build
cmake ..
make -jBenchmark datasets in this paper are from ann-benchmark, SIFT1M, GIST1M, GloVe, MNIST, DEEP1M, DBpedia-OpenAI and real-world data UCF-Crime, S&P 500. Data is embedded into vector space for similarity search. Intervals for SIFT1M, GIST1M, GloVe, MNIST, DEEP1M, UCF-Crime, and DBpedia-OpenAI are generated from a distribution in fvecs, ivecs, fbin and ibin.
You don't need to worry about any reproducibility issues with the data, as you can use the create_dataset.py script to generate the dataset.
mkdir -p data
python create_dataset.py -h
python create_dataset.py --dataset sift-128-euclidean \
--train_distr uniform --train_left 0 --train_right 1000 \
--test_distr uniform --test_left 0 --test_right 1000 \
--k 100 --num_threads 48All experimental sections are corresponding to a code file in experiments.
| 🔬 Experiment | ✍ Code |
|---|---|
| Overall Experiment | overall.py |
| Impact of Factors (1) threads (2) interval distribution (3) query interval fraction |
(1) impact_thread.py(2) overall.py(3) impact_rho.py |
| Impact of Parameters (1) max iteration (2) overlap threshold |
(1) overall.py(2) overall.py |
| Effect of Partition Point Selection | balance.py |
@inproceedings{10.1145/3711896.3736997,
author = {Yang, Ming and Cai, Yuzheng and Zheng, Weiguo},
title = {Hi-PNG: Efficient Interval-Filtering ANNS via Hierarchical Interval Partition Navigating Graph},
year = {2025},
isbn = {9798400714542},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3711896.3736997},
doi = {10.1145/3711896.3736997},
abstract = {Approximate nearest neighbor search (ANNS) is widely used to retrieve similar vectors from high-dimensional data. However, many real-world applications require additional interval filtering based on numerical constraints, such as stock price ranges. In this paper, we introduce interval-filtering ANNS (IF-ANNS), a novel yet general retrieval task where both base and query vectors are associated with numerical intervals. The goal is to retrieve nearest neighbors whose intervals are fully contained within the query interval. To efficiently address this problem, we propose Hierarchical Interval Partitioning Navigating Graph (Hi-PNG), a framework that hierarchically partitions the interval space to efficiently locate the search space, eliminating unnecessary computations and achieving significant speedups. We provide a theoretical analysis of the search space, demonstrating the superiority of our approach. Extensive experiments on eight datasets, including commonly used benchmarks and real-world stock price data, show that Hi-PNG outperforms four state-of-the-art graph-based ANNS baselines, achieving up to 15\texttimes{} acceleration while maintaining high precision. These results highlight Hi-PNG's effectiveness in solving the IF-ANNS problem. All codes and data generation are available at https://github.com/PUITAR/Hi-PNG.git.},
booktitle = {Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2},
pages = {3518–3529},
numpages = {12},
keywords = {approximate nearest neighbor search, hierarchical interval partition, interval-filtering},
location = {Toronto ON, Canada},
series = {KDD '25}
}