High-performance FISHDBC clustering in Rust.
Chutoro implements the FISHDBC algorithm — a scalable, density-based clustering method that discovers clusters of arbitrary shape without requiring the user to specify the number of clusters or a distance scale. It builds on the lineage of DBSCAN and HDBSCAN* while replacing the expensive all-pairs distance matrix with an approximate HNSW graph, making it practical for large and high-dimensional datasets.
- No parameter guessing: FISHDBC extracts an optimal flat
clustering from a density hierarchy — no need to choose
epsorkup front. - Bring your own metric: implement the
DataSourcetrait with any distance function, and chutoro handles the rest. - Rust-safe parallelism: the CPU backend uses Rayon for parallel
HNSW insertion and MST construction, with zero
unsafein user-facing code. - Extensible by design: feature-gated backends (CPU today, GPU planned) and a builder API that stays stable as the engine evolves.
[dependencies]
chutoro-core = "0.1.0"use chutoro_core::{
ChutoroBuilder, DataSource, DataSourceError, ExecutionStrategy,
};
/// Each row is one point in 2-D space.
struct Points(Vec<[f32; 2]>);
impl DataSource for Points {
fn len(&self) -> usize { self.0.len() }
fn name(&self) -> &str { "points-2d" }
fn distance(&self, i: usize, j: usize)
-> Result<f32, DataSourceError>
{
let a = self.0.get(i)
.ok_or(DataSourceError::OutOfBounds { index: i })?;
let b = self.0.get(j)
.ok_or(DataSourceError::OutOfBounds { index: j })?;
Ok(a.iter()
.zip(b)
.map(|(x, y)| (x - y).powi(2))
.sum::<f32>()
.sqrt())
}
}
let chutoro = ChutoroBuilder::new()
.with_min_cluster_size(3)
.with_execution_strategy(ExecutionStrategy::CpuOnly)
.build()?;
let result = chutoro.run(&Points(vec![
[0.0, 0.0], [0.1, 0.1], [0.2, 0.0], // cluster A
[5.0, 5.0], [5.1, 4.9], [5.0, 5.2], // cluster B
]))?;
println!("Found {} clusters", result.cluster_count());
# Ok::<(), chutoro_core::ChutoroError>(())- Four-stage CPU pipeline: HNSW construction, candidate edge harvest, Kruskal MST, and stability-based hierarchy extraction (design document).
- Built-in Euclidean and cosine distance helpers with input validation and pre-computed norm caching (users' guide § distance helpers).
- Property-tested (proptest suites) and formally verified — Kani harnesses for HNSW graph invariants and distance-metric properties; Verus proofs for edge harvest ordering and extraction (developers' guide § Verus proofs).
- Criterion benchmarks with synthetic data generators (Gaussian blobs, ring/Swiss-roll manifolds, text mutation, MNIST download) (developers' guide § benchmarks).
- Optional
metricscrate integration for distance-cache telemetry (users' guide § feature flags). - CLI tool (
chutoro-cli) and bundled data-source providers: dense vectors via Parquet/Arrow (chutoro-providers-dense) and text via Levenshtein distance (chutoro-providers-text).
- Users' guide — full API documentation and usage patterns
- Developers' guide — contributing, benchmarks, and formal verification
- Design document — architecture and literature survey
- Roadmap — planned features and progress
ISC — see LICENSE for details.
Contributions welcome! Please see AGENTS.md for guidelines.