The Innovation Game - Latest posts

Protocol Update 0.0.5 Summary

@Aoibheann — Thu, 05 Feb 2026 12:00:45 +0000

This update introduces three changes to the TIG protocol:

Tracks are now randomly assigned per benchmark
Addition of a new challenge: Job Shop Scheduling (JSS)
Updates to several existing challenges, including new Knapsack tracks, an upgraded SAT track, and a change in the Neural Net Optimiser challenge.

1. Randomising Tracks

Benchmarkers will no longer choose which track to benchmark. Instead, tracks will be randomly assigned. Per benchmark, benchmarkers will commit to:

an algorithm; and
track settings: a mapping from track id to hyperparameters, the number of bundles, and the fuel budget for that given track.

For example:
For example, in the VRP challenge, algorithm hgs_v1:

"track_settings": {
        "n_nodes=600": {
                    "num_bundles": 4,
                    "hyperparameters": '{"exploration_level": 3}',
                    "fuel_budget": 123456
                    },
        "n_nodes=700": {
                    "num_bundles": 4,
                    "hyperparameters": '{"exploration_level": 3}',
                    "fuel_budget": 123456
                    },
        ... 
        }
"algorithm_id": "c002_a089"

This change aims to restore benchmarkers signalling the best algorithm per challenge, rather than per track.

Expected Effects on Innovators

As a result of this change, all algorithms submitted to a given challenge will be tested across all tracks. This is expected to incentivise “meta-algorithms”. Either general-purpose algorithms that perform well across tracks, or composite algorithms that identify the track and apply appropriate subroutines.

2. New Challenge: Job Shop Scheduling

This update introduces a new Job Shop Scheduling (JSS) challenge based on the Flexible Job Shop Scheduling Problem (FJSP), a well-studied and practically relevant class of scheduling problems.

In this challenge, a set of jobs must be processed on a set of machines. Each job consists of a sequence of operations that must be completed in order. For each operation, there may be one or more machines capable of processing it, and the processing time can depend on the chosen machine. The objective is to minimize the makespan, i.e. the time at which the last job finishes.

Each track represents a different scheduling environment generated from the same underlying instance generator, spanning from structured flow-shop settings to highly flexible job-shop variants. The different track parameters are: the number of
jobs, n, the number of machines, m, the number of operation types, o, and the flow type.

The tracks available to benchmark will be:

n=50,m=30,o=30,flow=flow_shop
n=50,m=30,o=30,flow=h_flow_shop
n=50,m=30,o=30,flow=job_shop
n=50,m=30,o=30,flow=fjsp_med
n=50,m=30,o=30,flow=fjsp_high

Track summaries

Flow Shop: All jobs follow the same stage order; the main challenge is sequencing under a fixed production flow.
Hybrid Flow Shop: A flow-shop structure with parallel machine options at stages; emphasizes balancing load while maintaining a fixed flow.
Job Shop: Jobs can have different routes through the shop; emphasizes complex sequencing across heterogeneous job pathways.
FJSP (Medium): Flexible job shop instances with moderate flexibility; requires jointly deciding machine assignment and operation sequencing.
FJSP (High): Highly flexible, highly combinatorial instances; strongly rewards robust assignment + sequencing strategies that generalize across varied structures.

3. Challenge Updates

New Knapsack Tracks

The Knapsack challenge is transitioning from standard QKP instances to Team Formation QKP instances, based on recent work by Hochbaum et al [^1]. This introduces more realistic, structurally rich instances. The following tracks will be featured on the protocol:

n_items = 1000,budget = 5  
n_items = 1000,budget = 10  
n_items = 1000,budget = 25  
n_items = 5000,budget = 10  
n_items = 5000,budget = 25

SAT Challenge Track Update

SAT track n_vars=100000, ratio=410 is being increased in difficulty to ratio=420, moving closer to the phase transition ratio of 4.267 where the hardest instances exist.

Neural Net Challenge Update

The weights in the neural net optimiser challenge are now being made visible to the optimizer step function. This change enables optimizers to adapt updates based on the actual parameter landscape, bringing the challenge closer to real-world training dynamics.

Knapsack Instances Update

@Aoibheann — Fri, 06 Feb 2026 09:25:23 +0000

We are updating the Knapsack instance class from Standard QKP to Team Formation QKP. This post includes a detailed description of the Team Formation QKP instance class ^[1], which we will use across all knapsack tracks. The goal of this update is to introduce more realistic, real-world instances that will incentivise the development of algorithms with more practical industry relevance.

Team Formation

The Team Formation instances are derived from the team formation benchmark introduced by Hochbaum et al.^[2] In the original team formation problem, the goal is to select a team of experts that maximizes collaboration utility while satisfying additional constraints on required skills.

The collaboration utility between two experts i and j is defined as the Jaccard similarity:

J(i,j) = \frac{|P_i \cap P_j|}{|P_i \cup P_j|},

where P_i denotes the set of projects on which expert i has worked.

The synthetic team formation instances are formed by assigning projects to experts using a lognormal distribution with mean 4 and standard deviation 1.

Each instance is defined by the number of experts n and number of projects p:

n \in \{1000, 2000, 4000, 6000, 8000, 10{,}000\},

p \in \{30000, 70000\}.

The project universe is first partitioned into subsets whose sizes are drawn sequentially from a lognormal distribution; the final subset contains any remaining projects. Each participant i is assigned a target number n_i of projects P_i, drawn independently from the same lognormal distribution.

Participant i then selects one subset uniformly at random. If the subset size is at least n_i, then n_i projects are sampled uniformly from it; otherwise, the entire subset is assigned and the remaining projects are drawn uniformly from the rest of the project universe.

Pairwise utilities are computed as Jaccard similarities:

p_{ij} = \frac{|P_i \cap P_j|}{|P_i \cup P_j|}, \quad \text{for } i < j,

with p_{ij} = p_{ji}. If two experts share no projects, then p_{ij} = 0.

In team formation instances, linear profits are 0, p_i=0 \text{ } \forall i.
This procedure yields a sparse, weighted, undirected graph whose edge weights capture collaboration strength derived from project co-membership.

A weight w_i is assigned to each expert i:

w_i \sim \mathrm{UD}[1,10].

The capacity of the knapsack is defined as a budget fraction of the total weight:

C = b \sum_{i=1}^n w_i \quad \text{where } b \in \{0.025, 0.05, 0.1, 0.25, 0.5, 0.75\}.

Tracks

The following tracks will form the initial setup:

n_items=1000, budget=5

n_items=1000, budget=10

n_items=1000, budget=25

n_items=5000, budget=10

n_items=5000, budget=25

D.S. Hochbaum et al. A fast and effective breakpoints heuristic algorithm for the quadratic knapsack problem. European Journal of Operational Research (2024). ↩︎
Dorit S. Hochbaum, Zhihao Liu, and Olivier Goldschmidt. A Breakpoints Based Method for the Maximum Diversity and Dispersion Problems. SIAM Conference on Applied and Computational Discrete Algorithms. ↩︎

Sigma Update Part II

@Aoibheann — Mon, 24 Nov 2025 16:16:55 +0000

Part II completes the Sigma update, aligning TIG with industry and academic practice. This update implements the following changes:

The Definition of a Solution
Benchmark Averaging
Challenge Tracks
Two-Tier Solution Verification
Benchmarker Access to Fuel

These changes will make the protocol fairer and simpler, whilst incentivising higher value innovation.

Part I is already live.

The Definition of a Solution

Current: In TIG, a solution is a nonce that solves a challenge instance to at least a benchmarker-specified quality threshold, while satisfying all instance constraints (e.g., in Vehicle Routing, vehicles must arrive no later than each customer’s due time; in the Quadratic Knapsack Problem, total weight must not exceed capacity).

Sigma Update Part II: Any nonce whose output satisfies the instance constraints is a solution. Quality no longer determines whether something counts as a solution—it determines how well it performs. Benchmarkers still have strong incentives to find high-quality solutions via the reward mechanism, which now uses Benchmark Averaging.

Benchmark Averaging

When committing to a benchmark, a benchmarker chooses:

an algorithm,
the hyperparameters,
a number of bundles (of nonces),
a challenge Track, and
a fuel limit.

For each bundle in the benchmark, the bundle outputs a \text{bundle\_quality} and, once verified, is plotted on the Challenge Track Plot at the coordinate (Track, \text{bundle\_quality}).

The details :

The parameter Track controls the type of the challenge instances (see the Challenge Tracks section below).
A bundle is defined as a subset of nonces in a benchmark. Each track has a bundle parameter \eta\in \mathbb{N} which is defined as the number of nonces in a bundle. When committing to a benchmark, a benchmarker will commit to a track and a \text{num\_bundles}, and will then compute \eta \cdot \text{num\_bundles} nonces.
We compute the quality of each bundle in the benchmark as follows. For a given \text{bundle}, its quality \text{bundle\_quality} is defined as the average of the qualities of the solutions in it.
After a benchmark has been verified, each of its bundles are plotted on the Challenge Track Plot at the points (Track, \text{bundle\_quality}).
If any nonce in a benchmark fails to yield a valid solution, the entire benchmark fails (the benchmarker reports the failure and the benchmark is dropped).
Since benchmarkers no longer commit to a solution quality, they must configure hyperparameters or fuel (e.g., run-time) to target higher-quality solutions.
Each challenge defines a minimum quality threshold, such that, if any of the qualities are below that threshold then the benchmark will be dropped.

This approach better matches how performance is assessed in industry and academia, reduces variance versus baselines (curbing overly greedy methods), and simplifies the protocol by removing the separate “reliability” concept.

Challenge Tracks

The concept of difficulty parameters no longer exists in the TIG protocol, with Challenge Tracks and Benchmark Averaging replacing it. Benchmarkers are now required to commit to a particular challenge Track for each benchmark.

Each challenge will define a fixed list of allowed Track ids.
From this list, benchmarkers commit to benchmark a Track.

This prevents compute being spread too thin over similar challenge instances. Now we can let benchmarkers focus compute to get higher quality solutions. This aligns with academic benchmarking. For example the Track could determine the size of the challenge instance. Tracks also let TIG define a range of sub-challenges with different instance-generation procedures, encouraging innovation in interesting problem features rather than simply changing the problem size.

Rewarding Tracks

Benchmarker rewards are no longer determined by the pareto frontier. The Tracks are rewarded as follows: in each track rank the bundles by their \text{bundle\_quality}, the top p bundles in each track are then defined as the qualifying bundles. Now a benchmarkers “number of qualifying bundles” is the metric used in their challenge factor calculation.

Two Tiered Solution Verification

This applies only to challenges that use a baseline algorithm.

The goal is to introduce more performant baselines, reducing variance and yielding a stronger comparative metric. Historically, baselines had to be extremely fast due to proof-of-work verification constraints protecting the network from DDOS attacks. With Sigma, verification is split into two tiers:

Tier 1 — Proof-of-work verification
Confirm the submitted solution is at least as good as the cheap baseline (e.g., pure greedy).

Tier 2 — Quality Measurement
Compute the solution’s quality using a sophisticated baseline.

Hence when verifying a nonce we carry out the following Steps in the verification process :

Generate instance.
Check that the solution satisfies the instance constraints:
- If invalid → verification fails.
Run cheap baseline algorithm (pure greedy).
- If solution is not better than the cheap baseline → verification fails.
Run sophisticated baseline algorithm (previous well-performing TIG algorithm).
- Compute solution quality, check it matches the quality claimed by the benchmarker.

Benchmarker Access to Fuel

In TIG, an algorithm’s fuel is used as a proxy for the compute it consumes. When starting a benchmark, benchmarkers now pre-commit to a fuel budget. If, while solving a nonce, an algorithm exceeds this budget, it is terminated and the current saved solution is returned instead — see Sigma update part 1.

Giving benchmarkers direct control over fuel lets them roughly control how long their algorithms will run. In addition, the maximum fuel limit is increasing, allowing benchmarkers to target higher-quality solutions when they are willing to spend more compute.

Advance Submission: Vector Search

@alexa Alex A — Fri, 19 Sep 2025 01:59:04 +0000

review_alex.pdf (155.7 KB)

The Sigma Update

@Aoibheann — Fri, 10 Oct 2025 15:18:40 +0000

The Sigma update is the next step in the evolution of the protocol. It incorporates many of the improvements we’ve identified internally as well as recommendations from the community.

For Benchmarkers → A reworked deposit system:

TIG’s deposits are now integrated solely within the Optimizable Proof-of-Work mechanism, and are no longer part of cutoff.
Deposits are split into self-deposit and delegated-deposit, giving benchmarkers more control and strategic options.

For Innovators → A more realistic and flexible framework to develop algorithms.

Innovators now control algorithm inputs through hyperparameters.
Innovators manage an algorithm’s output in case of running out of fuel or the nonce crashing.

In addition, this update will include the introduction of minimum and maximum frontiers, as well as, the hiding of baseline related fields and verify_solution from algorithms. These major enhancements will bring TIG in line with real-world development.

The following post outlines each of these six key changes in more detail:

Cutoff.
Separating Deposit Factors.
Hyperparameter Inputs.
Saved Solution Output.
Minimum and Maximum Frontiers.
Hiding Baseline Fields and Verify Solution.

1. Cutoff

Cutoff will no longer be linked to deposit. This prevents the protocol from becoming a proof of stake and this lowers the barrier to entry for benchmarkers.

Each benchmarker i is attributed a cutoff, denoted \text{cutoff}_{i}. Benchmarkers are capped by their \text{cutoff}_{i}, that is benchmarker i can’t have more than \text{cutoff}_{i} qualifiers for challenge x. A benchmarker’s cutoff is recalculated every block:

\text{cutoff}_{i} = 1.2 \times \min_{x} ( \text{num_solutions}_{i,x})

where \text{num_solutions}_{i,x} is the number of active solutions benchmarker i has for challenge x.

2. Separating Deposit Factors

This update will introduce two separate deposit factors, a self-deposit factor and delegated deposit factor. Both self‑deposit and delegated‑deposit will now feed into the influence metric, so benchmarkers must balance their deposits as well as their challenge output.

Because the deposit required is determined by the market rather than synthetically fixed, there’s no mandated minimum deposit. Moreover, since we weight the deposit factors as well as cap them we ensure that Proof of Work remains the dominant anti Sybil mechanism.

Details on how deposit factors are derived from TIG deposits are explained in detail in this forum post.

3. Hyperparameter Inputs

This update redefines how hyperparameters are handled in innovator submissions. Hyperparameters are tuning parameters for a given algorithm and will now be set as inputs for TIG algorithms.

Currently, innovators often submit multiple versions of essentially the same algorithm, one tuned to return high quality solutions say and another optimised for speed, even when the only difference is a change in the chosen values of the hyperparameters. This duplicates code and allows competing innovators to resubmit another’s algorithm with potentially just optimised choices for the hyperparameters rather than genuine innovative improvements.

Details of the Update

Innovators can now submit a single algorithm with variable hyperparameters. The base algorithm itself remains the same but hyperparameters, eg: the stopping condition: maximum number of iterations, are now variable.
Benchmarkers, when running the algorithm on a challenge, choose the hyperparameter input values.
All chosen hyperparameters are fully transparent to other benchmarkers.
Innovators may publish recommended hyperparameter values or ranges for particular problem-size regimes, but the ultimate choice lies with the benchmarker.

Benefits

Reduces unnecessary duplication in the system.
Encourages focus on genuine innovation, not trivial resubmissions.

4. Saved Solution Output

This update will introduce a save_solution function which innovators can call any number of times during the algorithm. If a nonce runs out of fuel or crashes, the last saved solution will be used as the solution, rather than returning nothing.

Currently, if an algorithm runs out of fuel before returning a full solution, it returns no value at all. This wastes partial progress.

With the new mechanism:

Innovators can save their best-known solution as frequently as wished during the runtime of their algorithm.
If evaluation halts prematurely, verify solution is called on the last saved solution rather than returning a non-solution for crashing or running out of fuel.

Innovators will need to incorporate this function into their algorithms to save their best-known solution periodically.

Formalisation

Assume the challenge is a maximisation problem, let f(s) denote the objective function. An algorithm generates a sequence of candidate solutions (s_1, s_2, \dots, s_t). At time period, t, chosen by the innovator, the save solution function keeps track of the current best-known solution:

s^*_t = \arg\max_{1 \leq j \leq t} f(s_j).

If execution stops, due to running out of fuel or the program crashing, at time T, the outputed solution is s^*_T. Verify solution is then called on s^*_T.

5. Minimum and Maximum Frontiers

We’re updating the current fixed minimum and maximum difficulty points to dynamic difficulty frontiers. These frontiers will be defined for each challenge based on testing results and algorithmic performance data.

A minimum frontier: Prevents trivial or spam submissions. A solution must exceed a minimal level of computational effort or improvement relative to a given threshold dependent on the difficulty parameters.
A maximum frontier: Prevents gaming by filtering for instances where the baseline algorithm performed exceptionally poor.

Following the implementation of difficulty frontiers and the save-solution capability, sub-instances will be removed from all challenges where they are currently used. An upcoming protocol update will focus on mitigating variance in the baseline algorithm.

6. Hiding Baseline Fields and Verify Solution.

This update will hide baseline related fields and verify_solution from algorithms. This is achieved by making those fields/functions private to the tig-challenge crate, i.e. inaccessible from tig-algorithms.

For TIG challenges which include a better_than_baseline parameter, this update will remove the better_than_baseline parameter as an input for algorithms.

In both literature and real-world applications, algorithms tend not to have access to the optimal solution or any metric that gives them an indication of how far from the optimal they are. This input allowed algorithms to shortcut complete exploration of the search space S. This input therefore artificially reduced realism, and could have unintentionally biased results toward algorithms that leveraged better_than_baseline as a tuning crutch.

This update will impact innovators who used better_than_baseline as a stopping condition. The goal is for hyperparameters and fuel to now dictate the stopping condition of an algorithm, aligning more with real world use cases.

Advance Submission: Vector Search

@DanielAdams DanielAdams — Sat, 13 Sep 2025 12:08:53 +0000

The Vector Search Challenge

The TIG vector search challenge has Innovators build algorithms that receive a database and query set of vectors. For each query vector, the algorithm must return a candidate database vector, such that the mean distance between the queries and the candidates is within some threshold. The database and query vectors are sampled from a mixture of Gaussians. Each instance of the TIG challenge is a new sampling of the database and query set.

Stat-filter High-Level Algorithm Flow

The Stat-filter algorithm has two main components, first a quantization is performed on all the vectors and then a MAD filter using the L2 norms of bit quantized vectors is applied before the search phase begins.

Quantization.
First the per-dimension min/max of the vectors is identified. Data shifting is applied, let \text{overall}\_\min be the minimum across all dimensions of all database vectors, if negative, a uniform shift is applied (every component of every vector gets \text{overall}\_\min subtracted from it). All data is now non-negative. By using the per-dimension min/max, the vectors are then quantized (each vector has its components in each dimension binned into 1 of either 4 or 16 bins, corresponding to 2-bit or 4-bit quantization.) By default 4-bit quantization is done, unless STATFILT_BIT_MODE=2, in which case 2-bit quantization is done.
MAD filter.
First the L2 norms of the quantized data are calculated. MAD is computed on quantized norms, not original norms! That is, MAD creates a length-based filter that rejects database vectors that are unlikely to be close to a query, based on their quantized L2 norms. What this means practically: the quantized database norms are sorted and then the median is found. The absolute deviations between the norms and the median is calculated for each database vector, and put into a list - the median of this list is then identified as a value called \text{mad}. A threshold is constructed via the formula

\text{threshold}= \text{scale} * \text{mad} * \text{MAD_SCALE_NORMAL}

For each query vector q and database vector p if

|\|q\|_2-\|p\|_2|>\text{threshold},

then when performing a search with q we will skip p. The two scaling terms are :

\text{MAD_SCALE_NORMAL}, which is set to 1.4826, which comes from the property that for a Normal distribution X \sim N(\mu,\sigma^2), if one defines MAD = \text{median}(|X - \text{median}(X)|) then \sigma \approx 1.4826 \times \text{MAD}. In the case of the submission X is the quantized L2 norm of a vector (which since we are in high dimensions is normal). Thus, the threshold effectively measures “how many standard deviations away from the median” a norm lies. This filtering makes sense due to the reverse triangle inequality.
The parameter \text{scale} adapts selectivity: for few queries (≤700) it is 0.20, giving fast but aggressive filtering (~80–90% removed), while for many queries (≥2500) it is 1.00, giving broader coverage (~50–60% removed) to preserve accuracy.

Brute Force Search. For each query, a GPU block computes the quantized dot product between it and non-filtered database vectors using bit-sliced operations. Each bit plane is compared using AND operations, then popcount tallies the matches. These counts are weighted by powers of 2 and summed to reconstruct the approximate dot product in quantized space. Each thread tracks its own top-K candidates, then all threads combine results to find the final top-K for exact reranking.

Performance

Here we evaluate the performance of the submission by testing it against other vector search algorithms. We stick to vector search algorithms that have a low build time. We test the algorithms on the fashion-mnist-784-euclidean dataset.

Two bar plots of stat_filter algorithm vs IVF-flat and IVF-PQ and Brute Force. On the left the total time algorithms took and on the right the recall (the proportion of candidate neighbours which were the true nearest neighbour). The submitted algorithm stat_filter achieves 0.95 recall, whilst being the fastest algorithm. We used the provided SOTA repo to produce these results. We also point to the evidence form which shows stat_filters performance against improved_search a previously high performing merged tig algorithm.

Claim of Novelty

According to the Advance Evidence Form, the claim of novelty relies on the following:

The method itself is entirely new,
The method represents a new combination of prior art,
The method applies known techniques in a novel way that produces a distinct technical effect.

As described above in the Stat-filter High-Level Algorithm Flow, the submission combines bit quantization with a Median Absolute Deviation (MAD) gating layer to eliminate low-probability candidate vectors before the costly distance computation stage of an Approximate Nearest Neighbour (ANN) pipeline. In the search phase, the method then applies an optimized brute-force approach, where a quantized dot product is used to efficiently search for approximate nearest neighbours. Within the ANN literature on filtering, this approach clearly belongs to the pre-filtering category: for each query, the database is filtered first, and search is performed only on the reduced set.

We now assess this method against the three points of novelty, noting that only one of these needs to be true in order to be considered “novel” (novelty being a requirement to be eligible for Advance rewards):

Entirely new method Neither MAD filtering nor bit quantization are new in themselves, and bit quantization methods for ANN are well established ^[1]. To the best of our knowledge, however, MAD filtering has not been applied in this specific way to ANN search. Similar examples of tasks for MAD are outlier detection ^[2] or as a robust data scaling method ^[3]. General filtering in ANN is also not new, with common examples including:

Metadata filters: restricting search to subsets of vectors based on attached scalar values ^[4]
IVF methods: using cluster centroids for coarse partitioning (IVF-flat) or quantization (IVF-PQ) ^[5]
Locality-Sensitive Hashing (LSH): bucketing vectors by random hash functions ^[6]

New combination of prior art
To the best of our knowledge, there are no prior examples of combining bit-quantized MAD L2-norm filtering with bit-sliced similarity evaluation in this way for ANN search. However using bit-quantized vectors for distance calculations in itself is not a new combination.
Novel technical effect
The specific way these techniques are combined produces tangible technical benefits. MAD L2-norm filtering becomes significantly faster because it operates on quantized values, which in turn reduces the number of candidate vectors considered in the distance search. Since bit-sliced similarity itself requires bit quantization, the pipeline is tightly coupled, leading to improved efficiency overall. However as advised by the submitter when running the method on heavy-tailed distributions (like SIFT dataset) then STATFILT_MAD_SCALE must be 0. Any non-zero value will cause severe recall degradation. Hence when STATFILT_MAD_SCALE=0 the technical effect of the submission is only attributed to the quantization and bit-slicing, not the pre-filter.

The submitted evidence form reflects this analysis. It acknowledges that the method does not meet criterion (1) but makes a case that it qualifies for novelty under criteria (2) and (3).

Claim of inventiveness

Having established novelty, we now turn to inventiveness (or “non-obviousness”). The evidence from Granite Labs argues that the submission is non-obvious because a person of ordinary skill in the art (POSITA) would not naturally expect that combining a robust statistical measure from outside the ANN field with coarse quantization would yield state-of-the-art results.

Most existing ANN filters are based on mean statistics or clustering. In this context, the use of the Median Absolute Deviation (MAD) as a pre-filter is unconventional. MAD is typically used in outlier detection or robust scaling, but its application here to reject improbable candidates in a high-dimensional ANN setting is not an obvious choice. The method achieves two outcomes:

predictable speedups from filtering and low-precision computation, and
unexpectedly high recall, even after pre-filtering and quantized similarity evaluation.

The latter result strengthens the inventiveness claim. A POSITA might anticipate efficiency gains but would reasonably doubt whether accuracy could be preserved under such coarse approximations. The fact that the method demonstrates both efficiency and strong recall suggests that its effectiveness is not obvious from prior art.

Summary

The submission combines two well-known mathematical methods but applies them jointly to ANN search in a way that, to the best of our knowledge, has not previously been documented. The combination achieves clear technical benefits, as shown both in the evidence form and in our performance evaluations. Importantly, this method should not be directly compared to ANN algorithms that rely on pre-built or learned structures, since those approaches achieve faster query times at the cost of substantially higher build times.

Weaviate Documentation. Binary Quantization. Available at: https://docs.weaviate.io/weaviate/concepts/vector-quantization#binary-quantization. ↩︎
Zilliz. How do I detect and handle outlier embeddings? Available at: https://zilliz.com/ai-faq/how-do-i-detect-and-handle-outlier-embeddings. ↩︎
Kalinin, A. A., Arevalo, J., Serrano, E., Vulliard, L., Tsang, H., Bornholdt, M., Muñoz, Á. F., Sivagurunathan, S., Rajwa, B., Carpenter, A. E., et al. (2025). A versatile information retrieval framework for evaluating profile strength and similarity. Nature Communications, 16(1), 5181. ↩︎
Lin, Y., Zhang, K., He, Z., Jing, Y., & Wang, X. S. (2025). Survey of Filtered Approximate Nearest Neighbor Search over the Vector-Scalar Hybrid Data. arXiv preprint arXiv:2505.06501. ↩︎
Jégou, H., Douze, M., & Schmid, C. (2010). Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 117–128. ↩︎
Gionis, A., Indyk, P., & Motwani, R. (1999). Similarity search in high dimensions via hashing. In Proceedings of VLDB (Vol. 99, pp. 518–529). ↩︎

Advance Submission: Vector Search

@GraniteLabs — Wed, 10 Sep 2025 15:50:28 +0000

Thanks to @Aoibheann for opening the discussion, and to @Jake_Logos whose perspectives, one focused on practical overhead, the other on operational reality; capture exactly the gap we set out to close.

The core idea behind Stat_Filter is simple and unified: do most of the similarity work in the compressed domain, then verify a tiny shortlist exactly. Concretely, we use a bit‑sliced, low‑bit comparator, the same mechanism we originally prototyped at 1-2 bits, and have now expanded to 2-4 bits per dimension to score candidates with GPU bitwise ops + popcount across bit‑planes. We optionally apply a robust MAD gate when the distribution helps (and disable it when it hurts, e.g., heavy‑tailed SIFT‑1M). Finally, we run a small FP16 exact re‑rank over the shortlist to return the true nearest neighbor. Two tunable knobs: (1) K (shortlist size) and (2) MAD (gate aggressiveness), this lets practitioners move along the speed/recall curve without changing the method. This is incredibly useful by itself, reducing context switching and cognitive overhead.

For those focused on theoretical foundations and implications, the contribution is a clean, analyzable pipeline: rate-distortion in the quantizer; bit‑plane weighted scoring that’s monotone with respect to the underlying metric in expectation; and a correctness‑preserving exact re‑rank that bounds error while keeping the computational budget in O(K) per query. We also address the details that usually get hand‑waved: per‑dimension range selection (e.g., ~80% caps on heavy‑tailed dims), dataset‑aware tunable gating policies (MAD off for SIFT‑1M, moderate MAD for Fashion‑60K), and explicit reporting of “reported” vs “actual” recall when an answer key contains label noise.

With a focus on industry and production applications, the meaning is straightforward: we move the cost out of index rebuilds (usually 64-128 multi-CPU for flashy benchmark numbers, but typically are run with single-CPU builds “in the wild”) into fast, predictable GPU work. There’s essentially no heavy index to build or warm. You can ingest and serve immediately, the latency envelope is dominated by lightweight bit‑ops and a tiny exact pass; and the operational surface area shrinks significantly. We also evaluated in a way that mirrors how teams actually deploy: single‑CPU builds, one GPU, and results reported as total time (build+search), not just search‑only. That fairness constraint makes the numbers believable and portable.

This begs the question: do classic IVF/HNSW/PQ still have a place? Absolutely, we’re not suggesting this is a replacement for everything. If your data is largely static and your batches are huge, amortization can work in the favor of IVF et al. But modern production systems increasingly live in the small‑to‑mid batch, continuously updating regime. That’s the operating point Stat_Filter targets, and it’s where replacing most FLOPs with bit‑ops and reranking a handful we believe is a very good trade, and is quite novel. In practice, this can sit in front of existing stacks as a drop‑in prefilter, or stand on its own when rebuild windows are the bottleneck.

This qualifies as an Advance because it’s not a clever micro‑optimization of a known index; it’s a completely different cost model for ANN that remains accurate by construction (thanks to exact re‑rank), hardware‑efficient (with bit‑sliced GPU scoring), and operationally honest (using single‑CPU‑build totals reflecting real world dynamics). It generalizes across datasets with clear, reproducible policies, and it opens room for hybrid designs that combine compressed‑domain screening with light partitioning or graphs where appropriate.

In short, Stat_Filter makes vector search less brittle and more deployable in the places real systems actually live in 2025 and beyond. That is the kind of algorithmic shift an Advance would recognize. We welcome careful scrutiny of the code, the settings, and the fairness constraints, and I’m confident the community will find this is a solid foundation others can build on and commercialize thoroughly.

Looking forward to hearing from the community and happy to discuss further.

Inventor,
Granite Labs LLC

Update to the TIG Reward Mechanism

@DanielAdams DanielAdams — Fri, 10 Oct 2025 14:59:15 +0000

TIG operates on an Optimizable Proof of Work (OPoW) concept in which benchmarkers are rewarded in proportion to their Proof of Work, whilst being incentivized to balance different factors evenly during the PoW process.

The forthcoming Sigma Update protocol redesign adjusts how those rewards are calculated. Both self‑deposit and delegated‑deposit now feed into the influence metric, so benchmarkers must balance their deposits as well as their challenge output. Because the deposit required is determined by the market rather than synthetically fixed, there’s no mandated minimum stake. Moreover, since we weight the deposit factors as well as cap them we protect against the protocol becoming proof of stake. The proposed mechanism is an added defence against Sybil attacks.

Benchmarker Rewards

Benchmarker rewards are in essence calculated in the same way, benchmarkers are rewarded for the amount of computational work they do, as well as how well that work is balanced. The total rewards distributed for proof‑of‑work done by benchmarkers is a fraction \text{BM%} of the total block reward:

\text{opow_reward_pool} = \text{block_reward} \cdot \text{BM%}

Each benchmarker i receives \text{benchmarker_reward}_i from a block (before delegator deductions), where

\text{benchmarker_reward}_i = \text{opow_reward_pool} \cdot \text{influence}_i

The term \text{influence}_i represents how much proof‑of‑work benchmarker i has performed and how well they have balanced their factors. Before giving the explicit formula for influence, we detail the factors used to calculate it. There are two types of factors: challenge factors and deposit factors. Factors are recalculated each block.

Challenge Factors

Challenge factors have been decoupled from reliability, the factor is now just based on the number of qualifying solutions a benchmarker gets. Challenge factors are derived from individual challenges. For each challenge x, benchmarker i has an associated challenge factor calculated using their qualifying solutions:

\text{challenge_factor}_{i,x} = \frac{\text{num_qualifiers}_{i,x}}{\text{total_qualifiers}_x}

Here \text{num_qualifiers}_{i,x} is the number of qualifiers benchmarker i has for challenge x in the current block, and \text{total_qualifiers}_x = \sum_i \text{num_qualifiers}_{i,x}. If there are currently n challenges, then each benchmarker has n challenge factors—one for each challenge.

Deposit Factors

Deposit factors are derived from TIG deposits. Each benchmarker i has

a self deposit factor \text{self_deposit_factor}_i, determined by their own deposit, and
a delegated deposit factor \text{delegated_deposit_factor}_i, determined by deposits delegated to them.

They are calculated as follows:

\text{self_deposit_factor}_i = \min\!\Biggl\{ \langle\text{challenge_factor}_i\rangle \times 1.2,\; \frac{\text{self_deposit}_i}{\text{total_deposit}} \Biggr\}

\text{delegated_deposit_factor}_i = \min\!\Biggl\{ \langle\text{challenge_factor}_i\rangle \times 1.2,\; \frac{\text{delegated_deposit}_i}{\text{total_delegated_deposit}} \Biggr\}

Note that \langle\text{challenge_factor}_i\rangle =\frac{1}{n}\sum_x \text{challenge_factor}_{i,x} is the average of the challenge factors and measures how actively a benchmarker is participating in the current block. The \min function caps a benchmarker’s deposit factor based on their challenge performance, this stops the protocol from becoming proof of stake.

Here

\text{self_deposit}_i is the amount of TIG that benchmarker i has deposited,
\text{delegated_deposit}_i is the amount of TIG delegated to benchmarker i through the delegated deposit mechanism, and
\text{total_deposit} and \text{total_delegated_deposit} are the total deposits and delegated deposits of all benchmarkers whose cutoff is non‑zero.

Influence

To calculate the influence of a benchmarker i, the following steps are performed:

Collect the set of factors for benchmarker i : \{f_j ~:~ f_j ~\text{is a factor for benchmarker }i\}. Let \hat f_i be the vector of these factors.
Compute weights w_j, and attribute each factor f_j with a weight w_j. The weights are normalised. The weights are the same for all benchmarkers.
Compute the weighted mean \langle \hat f_i \rangle and variance \sigma^2_i of the factors of benchmarker i. Set \mathcal{S}_i=\frac{\sigma_i^2}{\langle \hat f_i \rangle (1-\langle \hat f_i \rangle )}.
We then set the influence of benchmarker i to be:

\text{influence}_i \propto \langle \hat f_i \rangle \exp\Big\{ -k \mathcal{S}_i \Big\}

Note: In the above formula:

k is a constant currently set to 1.5.
The exponential term is bounded in [0,1].
As \mathcal{S}_i increases, the benchmarker is penalized for their imbalance, no variation means the exponential term takes the value 1 and the benchmarker is not penalized.
The term \mathcal{S}_i is bounded in [0,1]. For a fixed mean \langle \hat f_i \rangle, the maximum of \sigma_i^2 is \langle \hat f_i \rangle(1-\langle \hat f_i \rangle), hence \mathcal{S}_i measures the spread of the data relative to its mean.
Weighting lets us:
- Onboard new challenges smoothly.
- Weight Deposit Factors differently from Challenge Factors.

Figure 1 – Influence vs deposits for six benchmarkers with equal qualifiers.
Five maintain deposits of 100 while one varies from 0 to 100. The plot shows convergence to equal influence (16.67 %) when all benchmarkers become identical. Deposit weight = 0.1 highlights the protocol’s performance‑focused reward mechanism.

Advance Submission: Vector Search

@Jake_Logos — Sun, 07 Sep 2025 08:02:27 +0000

I think a lot of people might not realize how huge this advance submission actually is. Most of the current SOTA vector search methods, HNSW, IVF, PQ, etc. are only really fast once you’ve built a heavy index. This is fine if your data never changes, but in real life data is constantly being updated. Every rebuild costs time and money, and that’s where most systems get bogged down.

The thing that is different with stat_filter is that this approach basically skips the heavy index step and still manages to be both fast and accurate. It shifts the expensive math into lightweight GPU operations and then double-checks a tiny shortlist to make sure nothing is missed. The end result is that you get near-perfect accuracy but at speeds that are tens or even hundreds of times faster in the situations that matter most, when your dataset is changing all the time.

That has huge implications for real-world systems such as recommendation engines, fraud detection, search, even retrieval-augmented generation for LLMs. This is not just a minor improvement, it is a massive efficiency improvement. At scale, this can translate directly into massive cost savings and faster performance for services people rely on every day.

While stat_filter currently won’t topple IVF/HNSW it has certainly found a niche that will be very attractive for someone.

The guys at Granite Labs have done a fantastic job with this vector_search algorithm, one that truly embodies what TIG is all about. This is proper innovation this!

In my opinion this deserves advance rewards, it’s not just a simple improvement, it’s an absolute game-changer and sets the foundations for literally changing the vector_search landscape.

Advance Submission: Vector Search

@Aoibheann — Fri, 05 Sep 2025 16:58:29 +0000

Advance Rewards Submission

We are delighted to share that our first Advance submission for Vector Search has been made public today!

The algorithm, created by a research team at Granite Labs LLC, is now open for the community to review.

Please see here: Advance Evidence Form.
The code submission that embodies the method described above: Code Submission.

You can compare this submission against state-of-the-art (SOTA) algorithms using Granite Labs LLC’s TIG-SOTA-reproduce repo.

You are invited to explore the evidence and code and prepare to cast your vote in the token-weighted vote on the submission’s eligibility for Advance rewards. Voting will open at the beginning of the next round (round 83) and remain open until the end of that round.

Vector Search

Vector search is a powerful method for finding similar items within a massive dataset. It works by converting data, whether it’s a document, an image, or even a molecule, into a numerical representation known as a vector.

Vectors capture the most important attributes of an item, and items with shared traits will have vectors that are numerically close to one another. Vector search, also known as Approximate Nearest Neighbor (ANN) search, is the process of efficiently finding the “closest” vectors in a dataset to a given “query” vector.

You likely interact with vector search algorithms daily without realizing it. When Netflix recommends a particular series, it’s using vector search. It converts your history into a vector of your viewing preferences, then matches it against similar shows.

Why It Matters

Vector search is a fundamental technology that powers countless applications, especially in the age of big data and AI. It’s crucial for:

Navigating Huge Datasets: It excels at sifting through millions or even billions of data points to find relevant results quickly.
Handling Complex Data: It can efficiently search high-dimensional data, like the complex vectors that represent images or text.
Powering Real-Time Applications: Its speed makes it ideal for recommendation engines, search systems, fraud detection, and Retrieval-Augmented Generation (RAG), which helps Large Language Models (LLMs) access more relevant and up-to-date information.

Applications

The Vector Search challenge hosted on TIG focuses on an important area: vector search on dynamic datasets. Most real-world data isn’t static; it’s constantly changing. New content is created, old content is updated, and some is deleted. For vector search to be effective in these scenarios, the system must handle these continuous updates in real time without degrading search quality or speed.

This is essential for applications such as:

Online Data Collection: Such as in autonomous navigation systems that constantly process new sensor data.
Real-Time Machine Learning: Where ML models are continuously updated, changing the vector representations of data.
Dynamic Robotics: For tasks like motion planning in ever-changing environments.
Molecular Similarity Search: Accelerating drug discovery by rapidly finding compounds with similar molecular structures.

Vector search underpins critical production systems, many of which run continuously 365 days a year. Improvements in efficiency compound into substantial savings in compute, energy, and infrastructure costs.

This submission represents a potential step forward in one of the most fundamental problems in computer science. Your participation in reviewing, discussing, and voting will help determine whether it qualifies for Advance Rewards.

Upcoming Challenge: Neural Network Gradient Descent

@DanielAdams DanielAdams — Tue, 02 Sep 2025 08:08:51 +0000

Here’s a quick update on the data generation procedure for the upcoming Neural Network Gradient Descent challenge. The data is still distributed as a Gaussian process, however this is now done only approximately via Random Fourier Features RFFs. This change has been employed for memory efficiency.

The dataset is generated by adding white noise to a smooth random function, constructed as

y_i = f(\mathbf{x}_i) + \xi_i, \quad f(\mathbf{x}) = \mathbf{a} \cdot \boldsymbol{\phi}(\mathbf{x}), \quad \mathbf{a} \sim \mathcal{N}(\mathbf{0}_K, \mathbf{I}_K),

where the inputs \mathbf{x}_i are sampled uniformly at random from the unit hypercube. The basis \boldsymbol{\phi}(\mathbf{x}) is built using RFFs, with K randomised basis functions.

The dataset \mathcal{D} is then split into training \mathcal{D}_{\text{train}}, validation \mathcal{D}_{\text{val}}, and test \mathcal{D}_{\text{test}} sets of sizes N_{\text{train}}, N_{\text{val}}, N_{\text{test}}.

To ensure the underlying functions are infinitely differentiable, we use RBF kernels (see Rasmussen & Williams, Gaussian Processes for Machine Learning, 2006). We approximate the RBF kernel using Random Fourier Features (Rahimi & Recht, Random Features for Large-Scale Kernel Machines, NeurIPS 2007).

The features are constructed as

\boldsymbol{\phi}(\mathbf{x}) = \sqrt{\frac{2}{K}} \left[ \cos{(\boldsymbol{\omega}_1 \cdot \mathbf{x} + b_1)}, \ldots, \cos{(\boldsymbol{\omega}_K \cdot \mathbf{x} + b_K)} \right],

where

\boldsymbol{\omega}_j \sim \mathcal{N}(0, \tfrac{1}{\ell^2} I) are frequencies drawn from a Gaussian distribution determined by the kernel length scale \ell, and
b_j \sim \text{Uniform}(0, 2\pi) are random phase shifts.
The parameter \ell controls the length scale of fluctuations in f.

[SOTA Comparison] Vehicle Routing

@Aoibheann — Fri, 15 Aug 2025 12:33:44 +0000

Our Vehicle Routing Challenge Evaluator provides a streamlined framework for benchmarking TIG Vehicle Routing algorithms against state-of-the-art (SOTA) methods on key academic datasets.

The goal of this thread is to encourage community involvement in shaping our evaluation framework. We invite you to contribute by engaging with this post – read our rationale for the current approach, share your feedback, suggest new benchmark datasets, propose complementary comparison metrics, or recommend additional SOTA algorithms to include.

The remainder of this post explains the rationale behind our current selection of datasets, benchmark SOTA algorithms, and comparison metrics, and outlines the features we are planning to add in future updates.

Datasets

The evaluation suite currently supports one primary benchmark dataset for the Vehicle Routing with Time Windows Problem (VRPTW). All instances and their corresponding best known solution (BKS) are sourced from CVRPLIB.

Currently supported benchmark dataset:

Homberger–Gehring Instances^[1]: An extended VRPTW benchmark consisting of 300 instances: 10 instances for each of the 6 Solomon class types at customer sizes of 200, 400, 600, 800, and 1,000.

Benchmark instance types, defined by Solomon (1987)^[2], are categorized according to customer distribution patterns: random (R), clustered (C), or random-clustered (RC), and the tightness of time windows: tight (1) or loose (2). The instances in TIG align most closely with the RC1 category.

SOTA Benchmark Methods

Rationale for Selected Method

To provide a relevant and rigorous comparison, we benchmark TIG algorithms against the SOTA method “Hybrid Genetic Algorithm with Adaptive Diversity Management” (HGSADM). The results for HGSADM are sourced from the paper by Vidal et al. (2013)^[3]

Vidal’s Hybrid Genetic Search (HGS) is widely regarded as one of the most effective algorithms for solving a wide range of Vehicle Routing Problem (VRP) variants. The version included in this evaluator, HGSADC, is tailored for VRPTW. Its performance in terms of both solution quality and runtime is a benchmark milestone we aim to surpass.

Comparison Metrics

Similar to how we compare solution quality in our knapsack evaluator, we evaluate algorithm performance using the Relative Percentage Deviation (RPD) from the Best Known Solution (BKS). It is calculated as:

RPD(\%)=\frac{\text{solution_distance} - \text{BKS}}{\text{solution_distance}}\times100

This metric focuses on total distance, as this is the main objective function we aim to minimize, and does not consider the number of routes.

The evaluator currently includes two visualisations:

A graph tracking the average RPD of the top-earning TIG algorithm by round against the SOTA algorithm HGSADM for each benchmark dataset type.
- A graph comparing the average RPD by instance size across benchmark instance types for the top-performing TIG algorithms alongside HGSADC.

Future Work

We are continually enhancing the evaluation suite, with several key improvements planned:

Support for additional datasets, including the original Solomon VRPTW instances.
Precise runtime comparisons to better evaluate the speed and efficiency of TIG algorithms relative to SOTA methods.
Inclusion of more SOTA comparison algorithms.

Gehring, H. and Homberger, J., 1999, May. A parallel hybrid evolutionary metaheuristic for the vehicle routing problem with time windows. In Proceedings of EUROGEN99 (Vol. 2, pp. 57-64). Springer Berlin. ↩︎
Marius M. Solomon. Algorithms for the Vehicle Routing and Scheduling Problems with Time Window Constraints . Operations Research , 35(2):254–265, 1987. ↩︎
Vidal, T., Crainic, T.G., Gendreau, M. and Prins, C., 2013. A hybrid genetic algorithm with adaptive diversity management for a large class of vehicle routing problems with time-windows. Computers & operations research , 40 (1), pp.475-489. ↩︎

[SOTA Comparison] Knapsack

@Aoibheann — Fri, 15 Aug 2025 12:33:49 +0000

Our Knapsack Challenge Evaluator provides a streamlined framework for benchmarking TIG Knapsack algorithms against state-of-the-art (SOTA) methods on key academic datasets.

The goal of this thread is to encourage community involvement in shaping our evaluation framework. We invite you to contribute by engaging with this post – read our rationale for the current approach, share your feedback, suggest new benchmark datasets, propose additional comparison metrics, or recommend new SOTA algorithms to include.

Datasets

The evaluation suite uses several benchmark datasets for the Quadratic Knapsack Problem (QKP). All instances and their corresponding optimal (or best-known) objective function values (OFV) are sourced from benchmark-instances-for-qkp and results-for-qkp-benchmark-instances.

These instances are Standard QKP instances, generated based on the procedure proposed by Gallo et al. ^[1], which has been a standard in the QKP literature for decades (e.g., Caprara et al., 1999^[2]; Pisinger et al., 2007^[3]; Chen and Hao, 2017^[4]). The generation process follows these key steps:

Linear (p_i) and quadratic profits (p_{ij}=p_{ji}) are non-zero with a given density, d. Non-zero values are drawn uniformly from [1,100].
The graph density d is varied across d\in\{0.25,0.5,0.75,1.0\}.
Item weights w_i are drawn uniformly from [1,50].
The knapsack capacity C is selected randomly from the interval [50,\sum_{i=1}^nw_i].

The knapsack evaluator currently supports the following collections:

Standard QKP: 100 classic instances widely used in the literature, with sizes ranging from 100 to 300 items.
QKP Group II: 80 larger instances with 1,000 to 2,000 items.
QKP Group III: 40 large-scale instances with 5,000 to 6,000 items.
Large QKP: A newer set of 144 instances with sizes from 500 to 10,000 items. Unlike the others, each graph in this collection has multiple capacity constraints, defined as a fraction γ of the total item weight: C=⌊γ\sum_{j=1}^nw_j⌋, where γ\in\{0.025,0.05,0.1,0.25,0.5,0.75\}.

SOTA Benchmark Methods

Rationale for Selected Methods

To provide a relevant and rigorous comparison, we benchmark TIG algorithms against a set of SOTA methods featured in the recent computational study by Hochbaum et al. (2025)^[5].

The primary algorithm for comparison, QKBP, was chosen because its design philosophy aligns with the TIG ecosystem: it prioritizes achieving high-quality solutions with exceptional speed. The authors note that QKBP “consistently delivers high quality solutions regardless of instance size, density, or budget… in significantly faster running times than all leading algorithms.” This focus on both speed and quality makes it an excellent benchmark for our purposes. The other algorithms are included to provide the same comprehensive context as the original paper.

The following table details the SOTA comparison algorithms:

Name	Abbreviation	Reference
Breakpoints Algorithm	QKBP	Hochbaum et al. (2025)
Relative Greedy Heuristic	RG	Julstrom (2005)
Iterated Hyperplane Exploration Approach	IHEA	Chen and Hao (2017)
Gurobi-based Approach	Gurobi	www.gurobi.com
Hexaly-based Approach	Hexaly	www.hexaly.com

Key Benchmarks

IHEA: A highly effective algorithm that consistently finds near-optimal solutions. However, this accuracy comes at a significant computational cost, running approximately 100x slower than QKBP.
QKBP (2025): A recently published, peer-reviewed algorithm that produces high-quality solutions with significantly faster runtimes than previous SOTA methods.

The current top-performing TIG algorithm demonstrates superior solution quality compared to QKBP across a wide range of benchmark instances while operating at a similarly fast runtime. Therefore, comparison against this SOTA algorithm provides the most direct and fair “apples-to-apples” comparison.

Comparison Metrics

We evaluate algorithm performance using the Relative Percentage Deviation (RPD) from the best known Objective Function Value (OFV), in this case the OFV is the knapsack value. RPD is a standard metric in the QKP literature ^[4:1]^[5:1] for comparing solution quality. It is calculated as:

RPD(\%)=\frac{\text{knapsack_value} - \text{BKS}}{\text{knapsack_value}}\times100

The primary visualization in the evaluator is a graph tracking the average RPD of the top-earning TIG algorithms by round against the SOTA algorithms for each benchmark dataset.

Future Work

We are continually enhancing the evaluation suite, with several key improvements on the horizon.

To strengthen our comparison benchmarks, we plan to include more challenging and structurally diverse datasets. As Schauer (2016)^[6] highlights, the standard Gallo et al.^[1:1] procedure can sometimes yield instances that are relatively easy for simple greedy heuristics, particularly as instance size grows. To address this, we will add support for instance types that avoid this property, including:

Dispersion-QKP Instances^[3:1]
Densest-k Subgraph (DKS) Instances^[3:2]
Hidden Clique Instances^[6:1]

Runtime is a critical performance factor in QKP research. We aim to introduce precise runtime comparisons to quantify the computational speed-up and efficiency of TIG algorithms relative to SOTA methods. For example, while QKBP (2025)^[5:2] delivers slightly lower solution quality than IHEA (2017)^[4:2], it achieves results in a fraction of the time, reflecting a deliberate trade-off between quality and speed. Accurately capturing this balance is essential for fair evaluation.

Finally, we plan to add richer performance visualizations to help innovators more effectively create, test, and compare algorithmic behavior.

Giorgio Gallo, Peter L. Hammer, and Bruno Simeone. “Quadratic knapsack problems”. In: Combinatorial Optimization (1980), pp. 132–149. ↩︎ ↩︎
Caprara, A., Letchford, A.N., and Salazar-González, J.J. (1999). European Journal of Operational Research, 123(2), pp. 222–231. ↩︎
D. Pisinger, A.B. Rasmussen & R. Sandvik (2007). Solution of large quadratic knapsack problems through aggressive reduction. INFORMS J. Comput., 19, pp.280–290. ↩︎ ↩︎ ↩︎
Chen, Y. and Hao, J.K., 2017. An iterated “hyperplane exploration” approach for the quadratic knapsack problem. Computers & Operations Research, 77, pp.226–239. ↩︎ ↩︎ ↩︎
Hochbaum, D.S., Baumann, P., Goldschmidt, O. and Zhang, Y., 2025. A fast and effective breakpoints heuristic algorithm for the quadratic knapsack problem. European Journal of Operational Research, 323(2), pp.425–440. ↩︎ ↩︎ ↩︎
J. Schauer (2016). Asymptotic behavior of the quadratic knapsack problem. Eur. J. Oper. Res., 255, pp.357–363. ↩︎ ↩︎

Upcoming Challenge: Hypergraph Partitioning

@Aoibheann — Fri, 20 Jun 2025 16:12:23 +0000

Many thanks to @Tasuku for his thorough review and suggestions. Below is a summary of the resulting updates to the Balanced Hypergraph Partitioning challenge.

Addition of Noise via a Random Hypergraph
We will integrate noise by combining our structured hypergraph with a purely random hypergraph, following the approach outlined by Kaminski et al. ^[1]. Here, the parameter \xi controls the proportion of each node’s degree attributed to random (background) hyperedges.

To smoothly incorporate noise within our generation process, we adjust the level-weight vector (see random instance generation details in our previous update), achieving the same effect as directly splitting each node’s degree.

Specifically, let:

\mathbf{p}_s = (p_1, p_2, \dots, p_L), \quad \text{ where }\sum_{\ell=1}^L p_\ell = 1

represent the structured hypergraph’s level-weight vector, and let:

\mathbf{p}_r = (1,0,\dots,0).

represent a purely random hypergraph, where hyperedges are uniformly chosen over all nodes.

We define the combined level‐weight vector \mathbf{p} as:

\mathbf{p} = \xi\,\mathbf{p}_r + (1-\xi)\,\mathbf{p}_s.

By sampling a fraction, \xi , of hyperedges according to \mathbf{p}_r and (1-\xi) according to \mathbf{p}_s , we ensure control over the proportion of random hyperedges, while preserving \sum_{\ell=1}^{L} p_{\ell} = 1.

We propose to initially add 20% noise (\xi =0.2), aligning with Kaminski et al. ^[1:1]. This value can be adjusted if necessary.
Varying the Level-Weight Vector \mathbf{p}
Instead of using a fixed multi-peak vector \mathbf{p}, we will now sample it from a narrow distribution derived from multiple HyperLap runs on our reference real-world hypergraph. This adjustment increases challenge difficulty and more effectively conceals the underlying hierarchy, as recommended.

Due to variations from random sampling and normalization effects, the net effect of \xi = 0.2 in our chosen implementation results in noise ranging between 17% and 25%.
Minimum Part Size for \mathbf{k=64} Parts
To ensure sufficient hypergraph size for a 64-way partition and to align with Gottesbüren et al. ^[2], who partition medium-sized hypergraphs with at least 7,500 nodes into up to 128 parts (~ 58 nodes per part), we have set our minimum difficulty to num_hyperedges = 4000 . Given that the number of nodes is roughly 92% of the number of hyperedges, this translates to approximately 3,680 nodes (~58 nodes per part).

Most state-of-the-art partitioners employ a coarsening-partitioning-uncoarsening approach, typically coarsening down to hypergraphs of approximately 160k nodes before partitioning. As the challenge evolves, we anticipate scaling our minimum difficulty toward this standard.
Baseline Algorithm
Considering speed and stability remain the most critical characteristics, the current greedy baseline algorithm will remain unchanged for now. Concerns about infeasibility with arbitrary node weights do not apply here since we maintain uniform node weights (w[v]=0).

Kamiński, B., Pralat, P., & Theberge, F. (2023). Hypergraph Artificial Benchmark for Community Detection (h–ABCD). Journal of Complex Networks, 11. doi:10.1093/comnet/cnad028 ↩︎ ↩︎
L. Gottesbüren, T. Heuer, N. Maas, P. Sanders, and S. Schlag. Scalable high-quality hypergraph partitioning. ACM Transactions on Algorithms 20.1 (2024), pp. 1–54. doi: 10.1145/3626527. ↩︎

Upcoming Challenge: Hypergraph Partitioning

@tasuku Tasuku Soma — Wed, 28 May 2025 05:04:55 +0000

Hi, I’m attaching my report on the hypergraph challenge.
main.pdf (352.6 KB)

Upcoming Challenge: Hypergraph Partitioning

@Aoibheann — Fri, 16 May 2025 14:14:21 +0000

Hypergraph Partitioning Challenge Design Update

Random Instance Generation

Motivation

To ensure that the challenge drives progress on practically relevant problems,
our synthetic instances must resemble the hypergraphs most frequently encountered in the literature that resemble real world workloads. We therefore aim to match key structures found in hypergraphs from the SuiteSparse Matrix Collection ^[1] (used as a data source for evaluating hypergraph partitioners by Knottenbelt et. al. ^[2]).

Methodology

We generate each instance with HyperLap ^[3], a parallelizable, hierarchical hypergraph extension of the Fast Chung-Lu (FCL) model ^[4], that preserves an input hypergraph’s degree distribution by sampling hyperedges according to expected node degrees. HyperLap extends this model by employing a hierarchical multilevel partitioning scheme designed to reproduce the heavy-tailed, community-centric overlap patterns observed in real-world hypergraphs. HyperLap also maps naturally onto GPUs, which was a major motivation for this choice of hypergraph generator.

Fixed Parameters

Nodes vs. hyperedges. The number of hyperedges is one of the difficulty parameters, num_hyperedges. To reflect the sparsity pattern observed in SuiteSparse matrices, the number of nodes is chosen to be approximately equal to num_hyperedges. In practice, the generation method produces a node count around 92% of the hyperedge count.
Uniform weights and costs. All node weights and hyperedge costs are one, w[n]=c[n]=1, so the objective simplifies to minimising the connectivity metric now defined as \sum_n (\lambda_n-1).
Node degree and hyperedge size distribution. Node degrees and hyperedge sizes are sampled from truncated power-law distributions. This method effectively captures the heavy-tailed distributions commonly observed in real-world hypergraphs.

Given:

an array of node degrees;
an array of desired hyperedge sizes; and
a level‑weight vector \mathbf{p}=(p_1,\dots,p_L) satisfying \sum_{\ell}p_\ell=1,

the generator proceeds in two stages:

Hierarchical layout:
For num_hyperedges, we create L=\lfloor\log_2\text{num_hyperedges}\rfloor levels. Each level \ell contains exactly 2^{\ell-1} nested groups of equal size: if two nodes share a group at level \ell, they also share a group at every lower level.
Edge construction:
For every desired hyperedge size s we
i) sample a level \ell with probability proportional to p_\ell;
ii) choose one of the 2^{\ell-1} groups at that level uniformly at random; and
iii) repeatedly sample nodes within the chosen group proportional to their node degree until s distinct nodes are selected.

This method ensures the preservation of both the input degree sequence and hyperedge size distribution while introducing realistic community structures and overlaps.

Interpreting the Level-Weight Vector

The level-weight vector \mathbf{p} dictates the resolution at which hyperedges form: low levels span large node sets, high levels cover tiny groups. Shifting probability mass therefore tunes how strongly hyperedges overlap and how “visible’’ the community structure is:

Low-level weight dominance (p_1{+}p_2 \approx 1). Edges sampled from the coarsest groups, so overlaps are largely accidental and community structure is weak, these instances would be hard for partitioners to solve due to their lack of structure.
High-level weight dominance (p_{L,L-1,...}\gg p_{1,2, ...}). Edges stay inside very small groups, yielding tight micro-communities; coarse partitioning would be relatively easy.
Multi-dominant low levels (p_2{+}p_3{+}p_4{+}p_5\approx 1). Weight spread over the first few refinements produces a realistic, multiscale hierarchy with heavy-tailed overlaps.
One-level dominant. Nearly all mass on one level k makes block-diagonal structure at a single, known scale - easy to partition if k is exploited.

Using HyperLap+ (which automatically fits optimal level-weight vectors to observed hypergraph data) we observed two dominant regimes in SuiteSparse hypergraphs: a single-level dominant pattern and a multi-dominant low-level pattern. We adopt the latter pattern due to its richer overlap structure, posing greater challenges for partitioning algorithms.

Baseline Calculation

Our chosen baseline is a greedy bipartition algorithm. The algorithm recursively partitions the node set into two subsets until the desired number of partitions is reached. Before applying any greedy refinements, the algorithm begins with an initial bipartition seeded by the two level-1 groups produced by the hypergraph generation method. Utilizing this initial partition capitalizes on the inherent structure of the node set, providing an effective starting point that results in a more stable baseline partition. Each subsequent bipartitioning step proceeds as follows:

Determine target sizes. Given a current subset of nodes, calculate how many nodes should go to the left and right parts (e.g., if we aim for 2^d total parts, each subdivision targets two parts of prescribed sizes).
Sort nodes by degree. For the nodes in the current subset, compute their degrees (the number of hyperedges to which each node belongs). Sort them in descending order so that higher-degree nodes are placed first.
Place nodes greedily. Initialize two arrays (one per part) to track hyperedges already “activated” (those with at least one node assigned) within each part. For each node in sorted order:

Count how many of its hyperedges are activated in the left part and how many in the right.
If one side has a strictly higher overlap, assign the node to that side (provided it has not reached its target size). If overlaps are equal, assign to the part with fewer nodes. If one part has already reached capacity, the node is assigned to the other part by default.
Continue assigning nodes until one part reaches its target size, then assign any remaining nodes to the other part.

Recursive Subdivision. After producing each bipartition, recursively apply the same procedure to each newly formed part until the desired number of parts (e.g., 64) is reached.

Finally, the connectivity metric of the complete multiway partition is computed, giving the baseline_value. Although this local greedy strategy does not capture global interactions perfectly, it remains computationally efficient, intuitive, and serves as a solid performance benchmark for more sophisticated methods.

Expert Review

We are collaborating with a leading expert in combinatorial optimisation, machine learning, submodular optimisation, and graph and hypergraph algorithms to review and refine the design of our upcoming Hypergraph Partitioning Challenge. This consultation is ongoing, and any adjustments or enhancements resulting from this expert review will be shared and implemented once finalized.

Scott Kolodziej et al. The SuiteSparse Matrix Collection Website Interface. In: Journal of Open Source Software 4 (Mar. 2019), p. 1244. doi: 10.21105/joss.01244. ↩︎
Trifunovic, A., & Knottenbelt, W. (2008). Parallel multilevel algorithms for hypergraph partitioning. J. Parallel Distrib. Comput., 68, 563–581. ↩︎
Geon Lee, Minyoung Choe, and Kijung Shin. How Do Hyperedges Overlap in Real-World Hypergraphs? – Patterns, Measures, and Generators. 2021. arXiv: 2101.07480 [cs.SI]. url: [2101.07480] How Do Hyperedges Overlap in Real-World Hypergraphs? -- Patterns, Measures, and Generators. ↩︎
Fan Chung and Linyuan Lu. 2002. The average distances in random graphs with
given expected degrees. PNAS 99, 25 (2002), 15879–15882. ↩︎

Vector Search Data Generation

@DanielAdams DanielAdams — Wed, 11 Jun 2025 16:10:26 +0000

Vector search underpins a wide range of real-world applications—from semantic web search and product recommendations to near-duplicate image detection and anomaly monitoring. In machine learning, for instance, raw text is first embedded as dense vectors through methods such as Word2Vec, GloVe, or transformer-based sentence encoders; this embedding step is a foundational building block used by every modern large-language-model pipeline to retrieve contextually similar words, sentences, or documents.

In that spirit, the synthetic data generation for the Vector Search challenge has been redesigned. We are aligning the challenge with real-world scenarios: rather than following a uniform distribution, the data now forms Gaussian clusters. This introduces meaningful complexity, and demands more sophisticated algorithms for efficient solutions.

Size of the Problem

To guide our decision on the size of the problem, we use the following table of real-world datasets:

Dataset	Dimension	# Base	# Query
UQ-V	256	1,000,000	10,000
Msong	420	992,272	200
Audio	192	53,387	200
SIFT1M	128	1,000,000	10,000
GIST1M	960	1,000,000	1,000
Crawl	300	1,989,995	10,000
GloVe	100	1,183,514	10,000
Enron	1,369	94,987	200

We will keep the dimension fixed at \text{dim}=250, this is a moderate dimension size. This choice is reasonable because techniques such as dimensionality reduction are typically employed to handle larger dimensions; by selecting a moderate dimension size, we eliminate the need for this step.
The number of database vectors will scale with the number of queries. We keep the ratio fixed at \frac{\# \text{Database}}{\# \text{Queries}}= 100, this is in-line with real-world data. In particular, we set \#\text{Database}= \lfloor 100 \cdot \#\text{Queries} \rfloor. We keep \# \text{Queries} as the difficulty parameter controlled by the Benchmarkers.

Distribution of the Data

Currently, the data is generated uniformly in the [0,1]^{\text{dim}} hypercube. This is not realistic, as real-world data is typically clustered - benchmarks usually employ Gaussian clusters (see the references ^[1] and ^[2]). Therefore, we extend the domain to [-1,1]^{\text{dim}} to allow vector directions to vary. We then sample data using Gaussian clusters.

The Number of Clusters and their Size

To guide our decision on the size of the clusters, we use the following frequently benchmarked datasets :

Dataset	Domain	# Points	# Classes	Points per Cluster (Ratio)
MNIST	Handwritten digits	70,000	10	7,000
CIFAR-100	Tiny images (fine-grained)	60,000	100	600
SVHN	House number digits (real-world)	99,289	10	9,928
ImageNet-1k	Natural images	1,281,167	1,000	~1,281
Tiny ImageNet	Subset of ImageNet	100,000	200	500
VGGFace2	Face recognition	3,310,000	9,131	~363
Google Speech Commands v2	Audio classification	105,829	35	~3,023
Wikipedia (small category sets)	Sentences by topic	~30,000	13	~2,308

The number of clusters will scale with the size of the database, such that \frac{\# \text{Database}}{\# \text{Clusters}}=r_{\text{clust}}=700. This ratio corresponds to medium-grained clustering, which is sufficiently challenging to reflect real-world scenarios. We set the number of clusters to be \#\text{Clusters}= \lfloor(1 + \delta) \cdot \frac{\#\text{Database}}{r_{\text{clust}}} \rfloor for \delta \sim \text{Unif}[-0.05,0.05]. This introduces some noise into the number of clusters.

Importantly, clusters will not contain a uniform number of points, as this would be unrealistic. For example, in the MNIST dataset (digits), there is a mild imbalance: digits like “1” and “0” are more frequent than “5” or “9”. In the Wikipedia articles dataset (topic-based), the imbalance is more extreme: common topics like “sports” vastly outnumber niche ones like “theology”. Let K be the number of points in a cluster. For each cluster i, we sample K_i from a log-normal distribution, that is

K_i\sim e^{\mathcal{N}(\mu,\sigma^2)} \quad i=1,\ldots,\#\text{Clusters}.

This distribution is skewed, resulting in many small clusters, some medium-sized clusters, and only a few large clusters. The question becomes how to choose \mu,\sigma^2 : we note that

\mathbf{E}[K]=e^{\mu+\frac{\sigma^2}{2}}, \qquad \text{and} \qquad \textbf{Var}[K]= (\mathbf{E}[K])^2(e^{\sigma^2}-1) .

So that the standard deviation of K is

\mathbf{E}[K]\sqrt{e^{\sigma^2}-1}.

We pick \sigma^2=0.2, so that the standard deviation scales with the mean as

\mathbf{E}[K]\sqrt{e^{0.2}-1} = \mathbf{E}[K]0.47 .

Setting \mu= \log (r_{\text{clust}})-\frac{\sigma^2}{2}, gives \mathbf{E}[K]=r_{\text{clust}}, as desired. When sampling a database or query point, we sample from cluster i with probability

\frac{K_i}{\sum_j^{\# \text{Clusters}} K_j}.

The following is an example of 60 cluster sizes when sampled in this manner:

The Location and Shape of the Clusters

The cluster vectors will be sampled from truncated multivariate Gaussians \mathcal{N}_{[-1,1]^{\text{dim}}}(\mathbf{\bar{\mu}},\Sigma), such that the samples stay within the region [-1,1]^{\text{dim}}. The mean, \mathbf{\bar{\mu}}, of each cluster will be sampled uniformly from the [-1,1]^{\text{dim}} hypercube. Real-world data is inherently anisotropic, hence we choose

\Sigma = \begin{pmatrix} \sigma^2_1 & 0 & \cdots & 0 \\ 0 & \sigma^2_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \sigma^2_{\text{dim}} \end{pmatrix}

so that the eigenvalues \sigma^2_i vary. Sampling from the truncated multivariate Gaussian is cheap since the components of the Gaussian are independent. To sample x from \mathcal{N}_{[-1,1]^{\text{dim}}}(\mathbf{\bar{\mu}},\Sigma) we sample each component as x_j from \mathcal{N}_{[-1,1]}(\mathbf{\bar{\mu}}_j,\sigma^2_j) directly.

We now choose the variances, \Sigma, so that some discrepancy is kept between the shape of the clusters. We proceed in the following way: for any cluster k\in \{1,\ldots,\#\text{Clusters}\}, we set a mean deviation, \sigma(k), and a range, \epsilon(k), and then sample \sigma_i(k)\sim \text{Unif}[\sigma(k)-\epsilon(k),\sigma(k)+\epsilon(k)].

For each k, \sigma(k) is sampled uniformly in the interval [1,1.1] and \epsilon(k) in the interval [0, 0.05]. This gives a cluster overlap of approximately 8% (that is, 8% of vectors are closer to a cluster mean which is not their own).

Streaming Query Vectors

Currently, all query vectors are received simultaneously, meaning the assignment of nearest neighbours can be optimised globally. However, in many real-world applications - such as real-time recommendation systems or online search - queries arrive sequentially in a stream. This streaming setting introduces new challenges, as decisions must be made incrementally without full knowledge of future queries. We encourage discussion on whether to extended the challenge to incorporate this streaming.

Shimomura, L. C., Oyamada, R. S., Vieira, M. R., & Kaster, D. S. (2021). A survey on graph-based methods for similarity searches in metric spaces. Information Systems, 95, 101507. ↩︎
Wang, M., Xu, X., Yue, Q., & Wang, Y. (2021). A comprehensive survey and experimental comparison of graph-based approximate nearest neighbor search. arXiv preprint, arXiv:2101.12631. ↩︎

Upcoming Challenge: Neural Network Gradient Descent

@davindicode David Liu — Tue, 06 May 2025 10:05:16 +0000

Those are great suggestions, thanks for the ideas and the reference! Once we have some real metrics of the initial challenge runs on testnet or beyond, it would be great to discuss these ideas more in depth. In particular, we are keen to extend to training CNN/RNN/transformer architectures after the MLP case. More real-world setups like the LLM training setup you mentioned will be relevant in those cases.

Upcoming Challenge: Neural Network Gradient Descent

@Jake_Logos — Thu, 01 May 2025 12:43:15 +0000

Thanks you for your reply, that all makes sense and I really appreciate the clarification on what’s available to the optimizer and why.

Regarding the BatchNorm question, I was thinking about some of the work exploring adaptive optimizers that adjust learning rates based on feature statistics rather than just gradients. I am not sure if this is exactly relevant, but this is the sort of thing I came across https://arxiv.org/abs/1805.11604

I am glad that generalization is the main metric as ideally I would focus on designs that try to stay robust across different seeds and datasets. It’s also good that training cost is not directly scored, however I also agree about the compute time being important too.

And on the last point, totally understand the value of keeping things clean and comparable early on. I do think it would be exciting to explore things like dynamic patience or learning-based batch selection in a future version, especially since some recent LLM training setups are trending that way. I think this would align more with real-world LLM setups, even though I realise this may be a little further down the line.

Thank you for your time in your replies

Upcoming Challenge: Neural Network Gradient Descent

@davindicode David Liu — Thu, 01 May 2025 08:11:26 +0000

Hi Jake,

Glad to hear your interest, we do aim to introduce more structured challenges like this one in the near future. And for your comments/questions:

Are there any limits on what the optimizer is allowed to store between steps? For instance, can it keep track of previous gradients or running averages over time as part of its internal state?

There are no limits apart from the fact that the optimiser state is constructed and updated inside the innovator-submitted functions, which have a fixed input and output signature. This means that your optimiser algorithm and state updates only have access to those variables at every iteration (e.g. previous training loss and validation loss, previous gradients and their moving averages, etc.). Indeed, keeping track of running gradient statistics is one of the key ingredients of modern adaptive optimisers like Adam.

Is the optimizer expected to work only with the gradients and parameters it receives directly, or is it allowed to make use of other information from the model, like BatchNorm statistics?

Related to the answer for (1), yes you are right, the optimiser algorithm submission can only use the variables that are given for each of the functions. There is no way you can access the model directly within the framework proposed currently. This is great point though to think about, as this means we (the challenge designers) should provide as many relevant variables as possible that may be needed for constructing novel powerful optimisers. BatchNorm statistics have not been considered but may very well be powerful for designing optimisers, did you have any particular examples/reference papers where you have seen this before?

When designing optimizers, is the main focus on generalization to unseen data, or is the rate of convergence during training, for example reaching low validation loss quickly, also a key factor in how submissions are evaluated?

Great question, so far the rate of convergence is not explicitly considered for ranking optimiser algorithms. However, if optimisers take way longer on average this will penalize benchmarkers in terms of compute time. We purely evaluate optimisers based on test loss, with the main argument being that in many use cases of deep learning people tend to push for better performance even when the compute requirements for training grow very large (e.g. self-driving cars, LLMs, etc.). Of course, many engineering innovations are very valuable to make training less costly but those tend to involve tangential aspects like quantization, parallelization etc.

Looking ahead, would there be interest in allowing optimizers to influence other aspects of the training process, such as how batches are sampled or when early stopping is triggered?

Very good point! We thought about this quite a bit, since it is known these aspects like batch size and early stopping patience affect the final results significantly given some optimiser. We are definitely open to suggestions, so far we have started with the simplest approach to keep all aspects not directly related to the optimiser fixed to get somewhat of a fair comparison playground.

Upcoming Challenge: Neural Network Gradient Descent

@Jake_Logos — Fri, 25 Apr 2025 17:50:49 +0000

This sounds like a really interesting and well thought out challenge. I do have some questions though if I may?

Are there any limits on what the optimizer is allowed to store between steps? For instance, can it keep track of previous gradients or running averages over time as part of its internal state?
Is the optimizer expected to work only with the gradients and parameters it receives directly, or is it allowed to make use of other information from the model, like BatchNorm statistics?
When designing optimizers, is the main focus on generalization to unseen data, or is the rate of convergence during training, for example reaching low validation loss quickly, also a key factor in how submissions are evaluated?
Looking ahead, would there be interest in allowing optimizers to influence other aspects of the training process, such as how batches are sampled or when early stopping is triggered?

Really looking forward to getting involved in this challenge and look forward to your reply

Upcoming Challenge: Neural Network Gradient Descent

@davindicode David Liu — Tue, 22 Apr 2025 10:54:18 +0000

Challenge description and formulation

To focus innovation on the optimiser component of the training algorithm, the Neural Network Gradient Descent challenge has a different overall structure from current TIG challenges in two key aspects:

The optimiser algorithm submission does not output a ‘solution’, in the sense that it is embedded inside a ‘parent’ algorithm called the training loop that uses the optimiser iteratively to compute the solution.
The optimiser algorithm is ran many times for a single challenge instance, with inputs and outputs constrained to the structure of the training loop and determined by intermediate states.

High-level structure

A challenge instance is defined by three main components:

A dataset \mathcal{D} = \{\mathbf{x}_i, y_i\}_i^N that is randomly generated by adding white noise \xi_i \sim \mathcal{N}(0,\sigma_{\text{data}}^2) to a random smooth function f: \mathbb{R}^D \to \mathbb{R} drawn from a Gaussian process (GP) with some chosen kernel function k(\mathbf{x}, \mathbf{x}')

y_i = f(\mathbf{x}_i) + \xi_i \quad \text{with} \quad f(\cdot) \sim \mathcal{GP}(0. k(\cdot, \cdot)),

evaluated at uniform random locations in the unit hypercube \mathbf{x}_i \in [-1, 1]^D, and the dataset \mathcal{D} is furthermore split into train \mathcal{D}_{\text{train}}, validation \mathcal{D}_{\text{val}} and test \mathcal{D}_{\text{test}} sets of sizes N_{\text{train}}, N_{\text{val}} and N_{\text{test}}.

A standard MLP architecture \hat{f}_{\mathbf{w}}: \mathbb{R}^D \to \mathbb{R} with randomly initialized parameters \mathbf{w} where its hidden layers all share the same specified width and contain ReLU-BatchNorm activation functions.
A mean squared error (MSE) loss function between some set of targets \{y_i\} and MLP outputs \{\hat{f}_{\mathbf{w}}(\mathbf{x}_i)\}

\mathcal{L}(\mathbf{w}; \mathcal{D}) = \frac{1}{N} \sum_{i=1}^{N} \| y_i - \hat{f}_\mathbf{w} (\mathbf{x}_i) \|^2,

which is used during train, validation and test evaluations.

Details of the random instance generation are given in Section 3 of the detailed write-up attached. The train set is then divided into B batches \mathcal{D}_{\text{train}} \to \{\mathcal{D}_{\text{batch }b}\}_1^B of size N_{\text{batch}} that are then fed into the training loop, where each loop iteration (epoch) consists of:

Sampling batches in random order, where for every batch:
- Compute the parameter location \tilde{\mathbf{w}} (which can be different to current \mathbf{w} like in Nesterov momentum) at which we evaluate the training loss gradients
- Compute the regression loss \mathcal{L} and its gradients \mathbf{g} = \nabla_\mathbf{w} \mathcal{L}(\tilde{\mathbf{w}}; \mathcal{D}_{\text{batch}}) using a forward-and-backward pass through the MLP
- Run one optimiser step to transform \mathbf{g} into parameter updates \mathbf{u}
- Apply updates \mathbf{w} \to \mathbf{w} + \mathbf{u}
Note that multiple gradient steps are applied on different subsets (batches) of \mathcal{D}_{\text{train}} per epoch, hence the term ‘stochastic’ gradient descent.
Evaluating the validation loss with a MLP forward pass \mathcal{L}(\mathbf{w}; \mathcal{D}_{\text{val}}).
Repeating the above steps for multiple epochs until either the maximum number of epochs has been reached or the validation loss has not improved for some chosen number of “patience” epochs, which is a standard early stopping criterion.

Furthermore, the final two layers of the MLP are frozen at initialisation to ensure asymmetry of the challenge instance for solution verification (see Section 4 of the detailed write-up attached). The overall challenge structure is depicted in the schematic figure below, and the detailed structure of this standard training loop is given in Algorithm 1 of the detailed write-up attached. The data generation, MLP construction and training loop components are deterministic conditioned on a random seed associated with the current challenge instance, which allows method verification reproducibility.

Goal

Valid challenge instance attempts involve optimisers that output MLP parameters when run in the training loop for which the MLP has a test error \mathcal{L}(\mathbf{w}; \mathcal{D}_{\text{test}}) lower than a dataset-dependent baseline test error \epsilon_{*} computed without the computationally expensive MLP training loop

\epsilon_{*}^2 > \mathcal{L}_{\text{MSE}}(\mathbf{w} ; \mathcal{D}_{\text{test}}) = \frac{1}{N_{\text{test}}} \sum_{i=1}^{N_{\text{test}}} \| y_i - \hat{f}_\mathbf{w} (\mathbf{x}_i) \|^2,

which formally specifies the solution criterion for \mathbf{w}. We propose a simple empirical expression for \epsilon_{*} in Section 4 of the detailed write-up attached that maintains computational asymmetry of the solution criterion when combined with freezing the final two MLP layers parameters. This represents MLPs that successfully approximate the random function f(\cdot) using only a finite set of noisy observations \{y_i\} at \{\mathbf{x}_i\}.

Submission constraints

This challenge requires innovators to design components of a gradient descent iteration loop of the structure Algorithm 1 of the detailed write-up attached that can successfully optimise MLP parameters to perform regression on a dataset assessed by holdout test set performance. In particular, the optimiser step also has access to current MLP parameters, allowing for optimiser-inherent regularization techniques like weight decay. In order to isolate contributions from optimiser innovation as much as possible, all training loop aspects outside of the optimiser algorithm are preserved across all challenge instances. The backpropagation backbone for computing gradients in particular is identical across all challenge instances, as modifying this would result in moving away from gradient descent.

Optimiser hyperparameters

Note that innovators must specify not only the optimize algorithm, but also hyperparameters such as the learning rate. These hyperparameters should generally depend on challenge difficulty parameters (see Section 5 of the detailed write-up attached), since gradient descent optimizers are known to be sensitive to hyperparameter choices. Extended innovator rewards can be rewarded to improvements solely in hyperparameter selection for existing optimiser algorithm proposals.

Further details

Given the increased level of complexity of this challenge, we leave the remaining technical details and specifications in the technical write-up attached to this post. We encourage the reader to go through the remaining details, as these will complete the specification of the full challenge pipeline. With this first public release of the technical challenge specification, we hope to gain valuable feedback on the challenge design from the community even before live challenge runs on testnet, and are open to modifying design details necessary for aligning this challenge with the target goals set out in our introductory post of the Neural Network Gradient Descent challenge.

Detailed technical write-up (1.1 MB)

Realigning Benchmark Incentives: Sub-Instance Averaging and New Difficulty Ranges

@Aoibheann — Fri, 25 Apr 2025 18:55:09 +0000

Introduction

Typically, the quality of a solution is measured by its deviation from the optimal solution. In situations where the optimal solution is unknown, the baseline algorithm establishes a reference value, ideally set at a fixed percentage (e.g., 20%) below the optimal. By measuring the improvement of a Benchmarker’s solution relative to this baseline, we can gauge its proximity to the optimal solution and, hence, its overall quality. It is therefore crucial that our baseline algorithms remain stable, exhibiting minimal variance in their deviation from the optimal across all instances. Achieving such consistency is challenging due to the inherent variability among instances and the differing suitability to the baseline algorithm.

When the baseline’s gap to the optimum varies widely across instances, picking a high better_than_baseline can be profitable even for greedy algorithms that get lucky finding ‘easy’ instances, where the baseline algorithm performed poorly. The goal of this note is to realign incentives by

Sub‑instance averaging: Bundle M i.i.d. sub‑instances into each instance and judge the root‑mean‑square (RMS) improvement.
Updating Difficulty Range: Choosing the range of better_than_baseline, i.e., the max and min difficulty, so that a large fraction of instances remain solvable.

Theory

In this section, we consider Benchmarkers solving a TIG challenge with difficulty [N, \beta], where

\beta = \frac{\texttt{better_than_baseline}}{1000}.

In the protocol, Benchmarkers may increment the better_than_baseline parameter in integer steps.

Consider a random instance i of a challenge. Let

B_i be the solution found by the baseline algorithm,
J_i be the optimal solution obtained via an exact algorithm, and
Y_i be the solution provided by the Benchmarker.

Recall that for the Knapsack challenge (QKP), a maximization problem, a Benchmarker’s solution qualifies if

Y_i \geq B_i\left(1+\beta\right),~~~\text{i.e.,}~~ \frac{Y_i}{B_i}\geq 1+\beta.

Thus, for a given instance i and a specified \beta, the probability that a solution exists is

\mathbb{P}(\text{Solution Exists}_\beta)= \mathbb{P}\left(\beta\leq \frac{J_i}{B_i}-1\right).

Similarly, for the Vehicle Routing problem, which is a minimization problem, a solution exists if

\frac{Y_i}{B_i} \leq 1 - \beta,~~~\text{i.e.,}~~ \beta \leq 1-\frac{Y_i}{B_i}.

Hence, the probability that a solution exists for a given instance i is

\mathbb{P}(\text{Solution Exists}_\beta)= \mathbb{P}\left(\beta \leq 1-\frac{J_i}{B_i}\right).

In an ideal scenario, we would employ a baseline solver with zero variance, ensuring that the distance to the optimal solution remains fixed (e.g., 20%) for every instance (solid black in Fig. 1). In such a case, if a Benchmarker selects a \beta greater than 0.2, no instances would be solvable. In practice, however, the ratio of our baseline solver to the optimum exhibits some variance. For example, if a Benchmarker increases \beta to 0.25, approximately 3.3% of instances remain solvable (as indicated by the grey dotted line in Figure 1), and currently, the larger rewards tied to the higher \beta more than offset the reduced reliability that comes from the lower chance of encountering a solvable instance.

Figure 1: Probability of an instance being solvable with current variance in the baseline solver (vehicle routing, N=200)

Variance Reduction via Sub-Instance Averaging

To address the variability issue, we propose the introduction of sub-instances. Under this approach, each benchmark instance comprises M i.i.d. sub-instances, each \textit{identical} to the original instance, thereby redefining an instance as a collection of M sub-instances. To find a solution to an instance, the average improvement (better_than_baseline) across all sub-instances must exceed the chosen difficulty.

Old Approach without Averaging
Consider a minimization problem (such as vehicle routing) with a fixed challenge size. Suppose a Benchmarker selects a difficulty parameter, \beta, and for each challenge instance, i, the condition

\beta\leq \frac{B_i - Y_i}{B_i} \implies \frac{Y_i}{B_i}\leq 1-\beta

must be satisfied. If we consider the ratio \frac{Y_i}{J_i} - the indication of the performance of the benchmarker’s solution - this implies that

\frac{Y_i}{J_i}=\frac{Y_i}{B_i}\frac{B_i}{J_i} \leq (1-\beta) \frac{B_i}{J_i} \quad (1)

Thus, the ratio \frac{Y_i}{J_i} remains small if 1-\beta is small, with the stability of this claim hinging on the variance of \frac{B_i}{J_i}, i.e., \text{Var}\Big[\frac{B_i}{J_i}\Big].

New Approach with Averaging
First, consider the following definitions:

\mathbf{\frac{Y}{B}} = \begin{pmatrix} \frac{Y}{B}_1\\ \vdots \\ \frac{Y}{B}_M \end{pmatrix},\quad \left(\mathbf{\frac{Y}{B}}\right)_{RMS} = \sqrt{ \frac{1}{M}\sum_{i=1}^M \Big(\frac{Y_i}{B_i}\Big)^2}

where \mathbf{\frac{Y}{B}} , an instance, is a vector of length M sub-instances.

Now, let us require that for an instance composed of M sub-instances, the following condition holds:

\left(\mathbf{\frac{Y}{B}}\right)_{RMS} = \sqrt{ \frac{1}{M}\sum_{i=1}^M \Big(\frac{Y_i}{B_i}\Big)^2} \leq 1-\beta. \quad (2)

This allows us to assess the average performance of the Benchmarker’s solution to each sub-instance \frac{Y_i}{J_i}:

\frac{1}{M}\sum_{i=1}^M \frac{Y_i}{J_i} = \frac{1}{M}\sum_{i=1}^M \frac{Y_i}{B_i}\frac{B_i}{J_i} \qquad\qquad\qquad\qquad\qquad\qquad

\qquad\qquad\qquad\qquad \qquad \leq \frac{1}{M}\sqrt{\sum_{i=1}^M\left(\frac{Y_i}{B_i}\right)^2}\sqrt{\sum_{i=1}^M\left(\frac{B_i}{J_i}\right)^2} \quad \text{(by Cauchy-Schwarz)}

\qquad\qquad \qquad\leq (1-\beta) \sqrt{\frac{1}{M}\sum_{i=1}^M\left(\frac{B_i}{J_i}\right)^2} \quad \text{(by equation (2))}

= (1-\beta) \left(\mathbf{\frac{B}{J}}\right)_{RMS} \qquad\qquad \qquad (3)

The right-hand side of equation (3) is considerably more stable than that of equation (1), since

\text{Var}\left[\left(\mathbf{\frac{B}{J}}\right)_{RMS}\right] \ll\text{Var}\left[\frac{B_i}{J_i}\right].

This enhanced stability is illustrated in the following plot (Figure 2):

Figure 2: Probability of an instance being solvable with reduced variance by introduction of sub-instances shown in green (vehicle routing, N=200)

Introducing a Maximum Difficulty

Even with the improved variance from averaging, Benchmarkers might still select a better_than_baseline value that once again puts them in a regime with a low probability of solvable instances. To mitigate this, we propose imposing a maximum difficulty cap.

The cap is chosen so that a desired fraction of instances, e.g. 95%, remain solvable. For Vehicle Routing with N=200, this would translate to setting the maximum difficulty to \beta_{max} = 0.188.

The following plot illustrates the impact of the cap on both the original and the new averaging-based approaches. The advantage of the sub-instance approach combined with the cap, compared to merely applying a cap within the original setup, is that it reduces the variance from the optimal solution for each instance, in addition to ensuring a given fraction of instances are solvable. This ensures that innovation is encouraged closer to the optimal solution.

Figure 3: Plot of the solvability of an instance with a maximum difficulty cap set so that 95% of instances are solvable (vehicle routing, N=200)

Potential Downsides

Introducing a maximum difficulty means that some instances or sub-instances will no longer be solved optimally, as the cap restricts the rewards for the maximum achievable solution in some cases. However, as the protocol matures and the algorithms developed within TIG consistently approach or surpass current state-of-the-art (SOTA) performance, there will be opportunities to reassess and raise the cap.

New Range for Difficulty Parameters

Vehicle Routing

The new choice for the difficulty range of better_than_baseline for Vehicle Routing is therefore

\texttt{better_than_baseline} \in [15,200]

where \texttt{better_than_baseline} = \frac{Y_i-B_i}{B_i}\times1000.
The min difficulty for num_nodes is set to 100.

Knapsack

For Knapsack, the performance of the baseline solver is such that the new difficulty range of better_than_baseline would be [1,8]. Currently, each unit jump in better_than_baseline represents a 0.1% improvement over the baseline. This would lead to abrupt changes in difficulty as this range is too coarse. To provide a smoother progression, we now let one unit correspond to a 0.01% improvement, matching the granularity adopted in recent work on quadratic knapsack problems (e.g., Fennich et al. ^[1]).
(This change is only for the better_than_baseline difficulty parameter for the knapsack challenge.)

The revised choice for the interval of better_than_baseline for Knapsack is therefore

\texttt{better_than_baseline} \in [10,80]

with

\texttt{better_than_baseline} = \frac{Y_i-B_i}{B_i}\times10000

The min difficulty for num_items is set to 100.

Fennich, Eliass & Djeumou Fomeni, Franklin & Coelho, Leandro. (2024). A novel dynamic programming heuristic for the quadratic knapsack problem. European Journal of Operational Research. 319. 10.1016/j.ejor.2024.06.034. ↩︎

Vehicle Routing Challenge Update

@Aoibheann — Tue, 01 Apr 2025 13:52:20 +0000

Thanks, @Jake_Logos and @syebastian, for your comments. It’s great to get feedback on challenge design.

Regarding the fixed parameters—such as grid size and depot due time—we were advised that using a fixed or variable grid size is a relatively minor design decision. In the Set X instances, a fixed grid was chosen primarily for numerical stability; it helps avoid excessively large numbers that can cause precision issues in exact solvers. Additionally, since some algorithms are sensitive to input scale (for example, distances in meters versus kilometers), keeping instance objective values within a similar order of magnitude minimizes calibration issues. This influenced our decision of a fixed grid over a variable one, meaning that depot due time now depends solely on the furthest customer from the depot. The difference should just be in the scale of the distances and depot due time without altering instance difficulty.

We also acknowledge the concern that innovators might hard-code optimizations, leading to overly narrow algorithm designs. We will be monitoring the evolution of algorithms within TIG and consult with an expert on design and parameter specifications if innovation tends in an unfavourable direction.

Our instances are designed to align closely with established benchmarks (Solomon, Homberger–Gehring, and the new CVRP Set X benchmark design), and our analysis indicates that algorithm performance on our randomly generated instances is comparable to these standard benchmarks. Our long-term aim is to periodically compare TIG algorithms against SOTA algorithms on standard benchmarks to ensure we continue to incentivise and reward the development of genuinely innovative solvers.

I hope this addresses your questions/concerns.

Vehicle Routing Challenge Update

@Jake_Logos — Mon, 31 Mar 2025 18:11:24 +0000

Given the proposed instance generation method presented here (fixed grid size, integer-rounded distances, specific time window distribution parameters, etc.), how do you plan to ensure that submitted solvers don’t become overly specialized or optimized towards these exact parameters or scenarios? This is something syebastian has raised and I think it’s an important question to address. Have you considered additional testing methodologies, such as hidden validation instances with slightly varying parameters, periodic updates to the instance-generation parameters, or explicit checks on solver robustness across different benchmarks, to encourage and reward the development of genuinely innovative solvers?

Vehicle Routing Challenge Update

@syebastian — Fri, 28 Mar 2025 01:01:13 +0000

Good job, @Aoibheann and @vidalt. The first things comes into my mind, don’t you afraid to fix many parameters like grid size, due time, etc and encourage innovators to hardcode some optimizations and further face problems with development of very narrow limited solvers or even basically have trouble in algorithm comparison on Solomon or Homberger and Gehring benchmarks?

Vehicle Routing Challenge Update

@Aoibheann — Thu, 27 Mar 2025 09:26:42 +0000

Vehicle Routing Update: Finalised Version

At TIG, we are dedicated to ensuring that our challenges align with real-world benchmarks with the aim of incentivising state-of-the-art innovation. As such, in collaboration with a world-leading expert in this field, @vidalt, we have refined our challenge instance generation process, and are pleased to share our final design.

Instance Generation

To align with almost all existing VRPTW benchmark instances, distances are two-dimensional Euclidean. Both the depot and customers are positioned at integer coordinates within a [0, 1000] × [0, 1000] grid. As described in the paper “New Benchmark Instances for CVRP” by Uchoa et al. ^[1], instances can be characterised by several attributes: the number of customers, depot positioning, customer positioning, demand distribution, and average route size. Below, we detail the choices made for these attributes for our instances.

Instance Attributes

Depot Positioning

Central (C) – the depot is positioned in the center of the grid, point (500,500).

Customer Positioning

Random-Clustered (RC) – half of the customers are positioned in clusters while the other half are randomly placed across the grid. Every customer, as well as the depot, is located at a unique point on the grid.

This positioning process differs slightly from that described in ^[1:1] to ensure a faster and consistent instance generation. First, a number S—representing the cluster seeds—is selected from a uniform discrete distribution UD[3,8]. These S seeds are then randomly positioned on the grid. Instead of using an exponential decay mechanism to attract the remaining N/2−S customers to clusters as in ^[1:2], each customer now has a 50% chance of being assigned to a cluster. If selected, the customer randomly picks one of the cluster seeds and its position is generated using a truncated normal distribution centered on the seed’s coordinates, with a standard deviation of 60. This ensures that the customer is placed close to the seed while still adhering to the grid boundaries. The chosen standard deviation was determined experimentally to best replicate the clustering behavior achieved by the exponential decay attraction in ^[1:3].

Demand Distribution

Small Values, Large Variance – demands from UD[1,35].

Average Route Size

The average route size, r_{\text{av}} , is chosen to be in the range ~11-12 customers per route, in line with the average route size chosen across instances in ^[1:4].

r_{\text{av}} = \frac{\small\text{capacity}}{\small\text{avg\_demand}} = \small\frac{200}{17.5} = 11.43

Time Window Generation

Depot Due Time

Based on experimentation and a detailed comparison with depot due times in the Solomon and Homberger instances ^[2], the following formula was selected for the depot due time:

l_0 = d_{0i_F} + (s_{i} + d_{\text{av}})\times r_{\text{av}},

where l_0 is the depot due time, d_{0i_F} is the direct distance between the depot and the furthest customer i_F , s_{i} is the service time, with s_i =10 \forall i , and d_{\text{av}} is the average distance between customers.

The average customer distance, d_{\text{av}}, is derived from the mean distance between two points in a square of side length grid_size/2. As the depot is centered, we compute the average in a quarter of the grid using the formula for uniformly random points in a square ^[3]^[4].

d_{\text{av}} = \frac{\small\text{grid\_size}}{2} \times 0.5214 = 260.7.

Therefore, the depot due time is given by

l_0 = d_{0i_F} + 3094.1.

Customer’s Due Times

The due times of the time windows are determined differently based on whether customer locations are randomly distributed or clustered.

Initially, every customer is assigned a due time drawn uniformly from the interval,

[d_{0i}, l_0 - d_{0i} - s_i],

where d_{0i} is the distance from the depot to customer i , l_0 is the depot’s due time, and s_i is the service time for customer i .

This method ensures that each time window is feasible, allowing a vehicle to leave the depot, reach the customer, perform the service, and return on time.

Clustered Customers:
For clustered customers, the due times are adjusted to be closer to their respective seed customer’s due time. This is achieved by recalculating each clustered customer’s due time as the average of its original due time and that of its seed, while still respecting the original bounds.

Ready Times

The depot’s ready time is set to zero. The ready times for a subset of customers is defined as their due time minus a randomly selected time window width chosen from a uniform distribution UD[10,60]. The density parameter, currently set at 50%, determines the proportion of customers with non-zero ready times. This approach creates variability in the time window widths, yielding both tight and loose windows.

Decisions on Conventions

The euclidean distances will be rounded to the nearest integer distance, as was chosen in ^[1:5].
The number of routes considered a solution for each instance will not be fixed, but an upper bound will be set based on the baseline solution.

Eduardo Uchoa, Diego Pecin, Artur Pessoa, Marcus Poggi, Thibaut Vidal, Anand Subramanian. New benchmark instances for the Capacitated Vehicle Routing Problem. European Journal of Operational Research, 257(3):845–858, 2017. ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎
SINTEF. Vehicle Routing Problem with Time Windows (VRPTW). VRPTW, Accessed: 2024-12-05. ↩︎
Eric W. Weisstein. Square Line Picking. MathWorld–A Wolfram Web Resource. Square Line Picking -- from Wolfram MathWorld, Accessed: 2024-12-05. ↩︎
Mean line segment length. Wikipedia. Mean line segment length - Wikipedia, Accessed: 2024-12-05. ↩︎

Enhanced Block Reward Function

@DanielAdams DanielAdams — Fri, 21 Mar 2025 09:03:04 +0000

Hi Haver,

Glad to have you involved in the conversation. That is a great question!

If a benchmarker has two benchmarks within the frontier’s difficulty band, we expect the band to be narrow enough that the solution ratio between the benchmarks remains relatively constant. This stability implies that their profitability will not drop significantly, and the failsafe should not be triggered.

Ultimately, the primary objective is to achieve qualifying solutions, so targeting a slightly easier difficulty will not provide a meaningful advantage.

Enhanced Block Reward Function

@haver — Thu, 20 Mar 2025 05:01:33 +0000

Am I right in thinking that benchmarkers that work at higher difficulty levels closer to “scaled” in order to produce qualified solutions (since the nonce/solution count will be higher), end up with a worse ratio. As a result, their profitability is lower than at lower difficulty. Moreover, such a benchmarker can end up with no reward for its solution if it fails to meet the failsafe threshold, even if it does have a valid solution on the most difficult frontier.

Enhanced Block Reward Function

@DanielAdams DanielAdams — Fri, 14 Mar 2025 16:51:24 +0000

A Revised Enhancement of the Block Reward Function.

Before diving in, I’d like to introduce myself—I’m Daniel, a new member of TIG Labs.

We are simplifying our proposed revision to the block reward function and invite discussion on this new design.

Overall Motivation

Our motivation remains the same: Benchmarkers are currently incentivized to maximize the number of qualified solutions. However, this creates a bias toward greedy algorithms, misaligning innovation in TIG with real-world adoption.

Motivation for Revising Our Proposal

The previous proposal had a fundamental flaw. By using weighted percentiles based on the number of nonces, greedy algorithms— which naturally generate far more nonces— could disproportionately skew the percentiles. As a result, greedy algorithms could achieve high reliability scores, undermining the mechanism’s core purpose: to favor sophisticated algorithms over greedy ones.

This revised proposal addresses the flaw by eliminating the use of a global metric (i.e., percentiles). Instead, each Benchmarker’s score will be calculated solely based on their own benchmarks.

The New Proposal

A Benchmarker’s reward across all challenges is an increasing function of their reward factors. Reward factors are calculated using all qualified benchmarks over the recent 120 blocks.

For a challenge x, and Benchmarker i, denote:

S^i_x total number of solutions,
V^i_x total number of nonces,
f_x^i total number of qualified solutions,

in their qualified benchmarks for challenge x. Also denote their solution ratio as

r_x^i= \frac{S^i_x}{V^i_x},

and their reward factor as

\hat f_x^i := \frac{f^i_x (r_x^i)^{\alpha_x}}{\sum_j f^j_x (r^j_x)^{\alpha_x}}.

The exponent \alpha_x controls sensitivity to greediness. Initially, we set \alpha_x = 1 for all x. The logic for adopting this expression of \hat f_x^i is that

\text{reward factor} \propto \text{(qualified solutions)} \times \text{(solution ratio)}^\alpha.

This encourages Benchmarkers to adopt algorithms that generate many qualified solutions while also favoring those with a higher solution ratio.

Example

Benchmarker	Solutions S_x^i	Nonces V_x^i	Solution Ratio r_x^i	Qualified Solutions f_x^i	Reward Factor \hat f_x^i
Alice	100	200	0.50	80	0.625
Bob	300	3000	0.10	240	0.375
Totals	400	3200		320	1

We see from the table that even though Bob has 3× more qualified solutions than Alice, his low solution ratio (0.10) reduces his reward factor significantly.

Effect on the Frontier.

We expect that Benchmarkers will begin to adopt less greedy algorithms, but since fewer solutions will be found, the frontier will initially recede. As players create better algorithms with good solution ratios, the frontier will again grow.

A Failsafe Feature

In addition to the incentive structure favoring less greedy algorithms, we introduce a failsafe criterion to ensure that benchmarks maintain a minimum level of reliability.

For a benchmark to qualify in a given block, its solution ratio must be at least 10% of the previous block’s average solution ratio. That is, if the average solution ratio of all qualified benchmarks in the previous block was r, then a benchmark can only qualify if its solution ratio is at least 0.1 \times r.

Benchmarkers may still submit benchmarks with a lower solution ratio, but these will not qualify for rewards in that block.

Like the exponent parameter \alpha_x introduced earlier, the 10% threshold is subject to future tuning.

This mechanism prevents a potential cyclic equilibrium, where:

Sophisticated algorithms push the frontier down by favoring high-quality solutions over sheer quantity.
Once the frontier is sufficiently low, greedy algorithms regain an advantage by exploiting their sheer volume of nonces.
This cycle could repeat indefinitely, preventing stable progress.

By enforcing a minimum reliability threshold, we prevent greedy strategies from dominating the system once the frontier has receded.

Thoughts and Expectations on TIG’s Future

@steven — Fri, 21 Feb 2025 22:21:52 +0000

Alongside typical marketing/advertising we are pursuing multiple avenues including sponsoring online coding competitions and hackathons, collaborating with content creators in the science and technology space, building relationships with research institutions directly, and setting up an ambassador program to incentivise the community to onboard innovators in their network.

If you have any alternative ideas though please let us know!

Thoughts and Expectations on TIG’s Future

@lhamartia mustafa — Fri, 21 Feb 2025 07:40:44 +0000

How do you think we can do this?
I think we should follow social media algorithms carefully.
We can use our Twitter page more efficiently and effectively. I think this will increase interest and interest.

Or let’s wait for Elon Musk to tweet about TIG. ahahahahha

Upcoming Challenge: Hypergraph Partitioning

@Jake_Logos — Thu, 20 Feb 2025 17:01:20 +0000

Thank you for the response Aoibheann. I am glad to hear that you are actively working to address the issue with greedy algorithms. I did read that the reliability factor was going to be the base of the project soon, so hopefully all new challenges will adopt this.

I do understand where you are coming from about the baseline and it does make sense that it needs to be efficient and consistent, especially considering the instance verification and Sybil defense. I didn’t give that a thought in my response to be honest.

That said, I do still feel there’s an important balance to strike. Even with the new reward system and fuel limits encouraging better approaches, there’s always the risk that “innovators” focus on just beating a weak baseline rather than pushing toward truly high-quality solutions.

Would it make sense to benchmark the baseline against something stronger (like KaHyPar or a multi-level solver) to make sure the better_than_baseline factor actually challenges people to innovate? If the baseline is too weak, even with the reward function changes, we might not be making full use of the computational resources now available.

I look forward to seeing how this all plays out

Upcoming Challenge: Hypergraph Partitioning

@Aoibheann — Thu, 20 Feb 2025 16:14:09 +0000

Hi Jake,

Thanks for contributing to our forum discussion! We’re also very excited about the Hypergraph Partitioning Challenge and want to design it as best as we can, so we really appreciate any and all feedback.

In case you weren’t aware, the team has been working on a new “Enhanced Block Reward Function” to counter the dominance of greedy algorithms across all challenges. We believe this issue stems from the current reward function prioritising the number of solutions found over the reliability of securing a solution for each instance attempted. You can find more details in our latest post here. We believe that once this mechanism is implemented, coupled with increased fuel limits, it will drive meaningful innovation by focusing more on solution quality and discouraging greedy approaches. This, in turn, will allow the better_than_baseline factor to more effectively incentivise higher solution quality.

The role of the baseline algorithm is to establish a performance target to be exceeded by the better_than_baseline factor for each randomly generated instance, creating a clear metric for what qualifies as a valid solution from one instance to the next. This is necessary because the randomness of our instance generation means we do not know the optimal solution in advance, so we need an alternative reference. We need the baseline to be efficient since instance generation is part of the solution verification process; if generating the instance took as long as solving it, we’d lose the asymmetry and compromise our Sybil-defense. Additionally, the baseline should offer a performance metric that is consistent and stable across instances—e.g., consistently about 20% below the best-known partition from a SOTA solver like KaHyPar. These two factors—efficiency and consistency—should be the only considerations for choosing our baseline solver.

Incentivising high-quality algorithms while ensuring that greedy approaches do not dominate is vital. To address these issues, we are implementing additional measures within the reward function and fuel limits. If our new updates or proposals don’t have the desired effect, we will continue to work on this issue until it is resolved.

Thanks again for your comments—we really appreciate them and hope this response clears up why the baseline isn’t a factor in incentivising higher solution quality and more innovative algorithms.

Thanks!

Upcoming Challenge: Neural Network Gradient Descent

@davindicode David Liu — Thu, 20 Feb 2025 15:05:29 +0000

We are excited to announce our upcoming challenge for TIG:

\large{ \textbf{Neural Network Gradient Descent}}

The recent surge in Artificial Intelligence (AI) has been largely driven by deep learning, a movement resulting from the combination of large datasets, highly parallelized compute, and rich neural network architectures with automatic differentiation. Central to deep learning progress is Stochastic Gradient Descent (SGD) and related algorithms for neural network training [1] [2] [3] [4], which apply some form of the first-order gradient in an iterative manner to the network parameters. In particular, the Adam optimiser [4] is currently one of the most cited papers of the decade, accumulating over 190 thousand citations.

To encourage the discovery of novel first-order gradient descent optimization algorithms capable of effectively optimizing deep neural networks, we introduce the Neural Network Gradient Descent challenge at TIG. This challenge aims to emulate the deep learning setting by training multilayer perceptrons (MLPs), the simplest foundational neural network architecture [5], on a nonlinear regression task, resulting in loss functions with typical deep learning characteristics such as:

Very large dimensionality with for example moderate MLP sizes easily reaching O(10^6) parameters.
Many local minima and saddle points due to the excessive overparameterizarion of neural networks from the classical perspective of statistics.
The phenomenon of overfitting where many training loss minima correspond to networks with poor performance on unseen data points.

Notably, the challenge has been specifically designed to develop optimisers with an inductive bias for training neural networks that generalize well beyond training data as seen in SGD [6] [7]. This is nontrivial, as many advanced gradient descent optimisers like Adam have been empirically observed to find minima that exhibit slightly worse generalization than vanilla SGD [8] despite outperforming in terms of training convergence [9]. No definite theoretical understanding has yet been found to explain these observations, which form a unique facet of deep learning optimisation that distinguishes it from traditional optimisation problems focused solely on finding the lowest loss.

On a practical level, the exploration for more efficient optimisers derived from MLP training may contribute to:

Democratizing AGI as improvements in neural network training can reduce the barriers to entry due to hardware costs and specialized training infrastructure.
Improving state-of-the-art models as MLP motifs pervade larger architecture, e.g.\ transformer blocks [10], and thus findings from here may inspire meaningful changes to training of larger neural network architectures.
Reducing cost and energy usage as training neural networks requires many iterations of applying gradient descent, with the latest state-of-the-art models costing over hundreds of millions of dollars to train, and thus small efficiency gains in the optimisation loop will have big impact.
Understanding SGD dynamics as MLPs have been toy models for understanding gradient descent [11].

The Neural Network Gradient Descent challenge requires innovators to design novel deep learning optimisers in the form of a gradient descent iteration function that is restricted to a particular structure set by the challenge. The detailed challenge specification and design is ready, and we will soon share a more accessible version of this technical write-up!

Enhanced Block Reward Function

@kilian — Thu, 20 Feb 2025 13:05:13 +0000

We are excited to announce a proposal to revise the block reward function. We encourage discussion of this new design.

\textbf{Motivation}

Currently, TIG’s OPoW mechanism determines the block reward for a benchmarker i with a novel \href{https://docs.tig.foundation/opow#benchmarker-influence}{influence} function that monotonically increases with the benchmarker’s factors:
\begin{equation} reward_i \propto \mathrm{influence}(f1_i, f2_i,.., fn_i). \end{equation}

In particular, for factors based on challenges, the factor is calculated using the number of qualifying solutions found by benchmarker i out of all benchmarkers B over the recent 120 blocks:
\begin{equation} f_i = \frac{\mathrm{num\_qualifiers}_i}{\sum_{b \in B} \mathrm{num\_qualifiers}_b}. \end{equation}

Thus, in order to increase their rewards, benchmarkers are incentivised to maximise their solution rate. However, this introduces an inherent bias in algorithm adoption by benchmarkers. The current structure favors exploitative algorithms over those that balance exploration and exploitation.

Specifically, algorithms that prioritize exploitation (e.g. greedy algorithms) tend to have have lower startup and execution costs, allowing for higher parallelisation, resulting in a more stable and likely higher solution rate. Current low fuel limits and execution overhead from WASM virtual machine exacerbates the exploitation advantage, although these issues will be addressed with future updates to increase fuel limits and to execute algorithms on native hardware.

This bias misaligns innovation in TIG with what the real-world adopts: valuable algorithms are ones that effectively balance the exploration-exploitation trade-off for discovering high-quality solutions within limited computational budgets, as pure exploitation often leads to sub-optimal quality solutions.

In the next section, we outline our proposed changes to address this bias.

\textbf{Reward function design}

To address the exploitation bias, we propose an enhanced factor calculation that incorporates a benchmarker reliability score \mathcal{R}:

\begin{equation} f_i = \frac{\mathcal{R}_i \times \mathrm{num\_qualifiers}_i}{\sum_{b \in B} \mathcal{R}_b \times \mathrm{num\_qualifiers}_b}. \end{equation}

The reliability score \mathcal{R}_i for benchmarker i is the weighted average of their qualifying benchmarks’ reliability scores, where a benchmark’s reliability score is based on that benchmark’s ratio of solutions to attempts. See the next section for details.

To enable a configurable limit on the maximum relative advantage that can be gained through reliability, \mathcal{R} is bounded to the range [\mathcal{R}_{\min}, \mathcal{R}_{\max}]. The constant \frac{\mathcal{R}_{\max}}{\mathcal{R}_{\min}} = C determines the strength of the direct incentive for benchmarkers to adopt algorithms that achieve higher ratios of solutions to attempts, a characteristic we expect to be associated with more explorative algorithms that invest computational resources in finding higher-quality solutions.

With the revised calculation we account for the aspects we care about:

\textbf{Quality}: This refers to how close solutions are to the optimal value. The new direct incentive for high solutions to attempts ratio means the difficulty frontier for each challenge better represents the true quality achievable with current algorithms and computational resources in the network.
\textbf{Speed}: This refers to the speed at which solutions are produced. While this is already naturally incentivized through higher solution rates leading to greater rewards, the previous mechanism lacked sufficient emphasis on solution quality. The revised calculation creates a better balance between these two aspects.

\textbf{Calculating} \mathcal{R}

For each challenge, a benchmarker’s reliability score is calculated through the following process:

For each qualifying benchmark k, calculate its solution ratio r_k:
\begin{equation} r_k = \frac{num\_solutions_k}{num\_nonces_k}, \end{equation}
where num\_nonces_k represents the number of challenge instances the benchmarker attempted to solve for that benchmark.
Determine the solution ratios a and b at a lower and upper weighted percentiles respectively, using num\_nonces as weights. The weighted percentiles are the sorted solution ratios at the p-th per cent of the total num\_nonces weight. This ensures benchmarks with more attempts have proportionally greater impact on the percentile boundaries.
For each benchmark k, calculate {w}_k:
\begin{equation} {w}_k \equiv f(r_k) := \begin{cases} \mathcal{R}_{\min}, \ r_k < a \\ \mathcal{R}_{\min} + \frac{\mathcal{R}_{\max} - R_{\min} }{b-a} (r_k-a), \ a \leq r_k < b \\ \mathcal{R}_{\max}, \ r_k \ge b \end{cases} \end{equation}.
Hence, the solution ratios a and b are defined such that f(a) \equiv \mathcal{R}_{\min} and f(b) \equiv \mathcal{R}_{\max}.
For each benchmarker i, compute their overall reliability score as the weighted average score of their benchmarks K_i: \{{{w}}\}_i
\begin{equation} \mathcal{R}_i = \frac{\sum_{k \in K_i} num\_nonces_k \times {w}_k}{\sum_{k \in K_i} num\_nonces_k}. \end{equation}

\textbf{Parameter Selection}

We propose an initial setting of \mathcal{R}_{\max}=1.5 and \mathcal{R}_{\min}=0.5 such that \frac{\mathcal{R}_{\max}}{\mathcal{R}_{\min}} = 3 to provide a meaningful incentive for innovation of algorithms with improved reliability, and use of 25th and 75th percentiles for robustness against manipulation (see next section).
These settings can be adjusted in the future based on observations of the network and/or expert feedback.

The block reward weighting is visualised here:

Bounded linear reward function with the requirements f: [\mathcal{R}_{\min}, \mathcal{R}_{\max}] and \frac{\mathcal{R}_{\max}}{\mathcal{R}_{\min}} =3. \mathcal{R}_{\min} and \mathcal{R}_{\max} reflect the 25th and 75th weighted percentile respectively. \mathcal{R}_{\mathrm{med.}} is the weighted median for the challenge. The red reference points in the figure are from some arbitrary distribution of r_k for illustrative purposes.

\textbf{Discussion}

The proposed enhanced calculation has two key advantages:

\textbf{Dynamic Adaptation:}
As algorithms improve, the use of percentiles ensures the calculation automatically adjusts, maintaining consistent incentives for higher reliability.
\textbf{Robustness Against Manipulation}
The choice of 25th and 75th weighted percentiles provides resistance against manipulation strategies such as submission of many small benchmarks. Also, the piece-wise linear function ensures smooth transitions, discouraging threshold-gaming behaviour

About the Protocol Enhancements category

@tigsocials2000 Tigsocials2000 — Wed, 19 Feb 2025 23:32:06 +0000

(Replace this first paragraph with a brief description of your new category. This guidance will appear in the category selection area, so try to keep it below 200 characters.)

Use the following paragraphs for a longer description, or to establish category guidelines or rules:

Why should people use this category? What is it for?
How exactly is this different than the other categories we already have?
What should topics in this category generally contain?
Do we need this category? Can we merge with another category, or subcategory?

Thoughts and Expectations on TIG’s Future

@steven — Tue, 18 Feb 2025 22:00:32 +0000

Hey there,

Thanks for getting in touch here and sharing your thoughts.

From my perspective, I think we should be focusing on a couple of areas in particular:

Making TIG easier to understand and getting more exposure in the wider crypto ecosystem (which relies on the community, cannot just be done by the foundation)
Onboarding as many innovators as possible (again, which will rely on network effect of the community alongside marketing done by the foundation)

Obviously we can’t comment on price, but thinking about growth of the protocol, I think if those two elements are executed well, the protocol and ecosystem could grow very significantly vey quickly.

From your perspective as a miner, what do you want to see?

Upcoming Challenge: Hypergraph Partitioning

@Jake_Logos — Tue, 18 Feb 2025 18:51:50 +0000

I’m personally looking forward to the new Hypergraph Partitioning Challenge, as hypergraphs are a crucial part of problem-solving in real-world applications. However, I would like the team to consider some suggestions regarding the baseline algorithm choice and its impact on the competitive nature of TIG.

Historically, many challenges in TIG have been dominated by greedy algorithms, which are often “optimized” for speed rather than solution quality. Given that solution quality and reliability are becoming central to the project, I believe it is important to establish a more complex and meaningful baseline algorithm, rather than relying on a greedy approach that may find solutions quickly but fails to capture the deeper structure of the problem. This is particularly concerning for large-scale and highly connected hypergraphs, where greedy methods tend to produce less reliable solutions.

To ensure that solution quality is a priority, I propose that the baseline algorithm should go beyond simple greedy bipartitioning. Instead, it could incorporate one of the following approaches:

Multi-Level Partitioning
Simulated Annealing (SA) or other refinement techniques
Spectral Partitioning

These are just initial suggestions, and I would be interested in hearing further input from the team.

With fuel limits increasing from 2 billion to 10 billion, we now have the ability to implement more complex and computationally feasible algorithms. This presents an opportunity to move beyond simple greedy approaches and encourage meaningful innovation rather than just incremental “tweaks” to existing heuristics.

Additionally, in real-world applications, purely greedy approaches are rarely used for hypergraph partitioning, as they tend to ignore global structure and fail to produce high-quality solutions. Aligning the baseline algorithm with more robust techniques would better reflect real-world problem-solving and support the project’s long-term goals.

I hope the team considers implementing a baseline solver that evaluates not just the number of solutions found, but also solution quality and reliability. This would create a more competitive and innovative challenge that pushes participants to develop truly novel approaches.

Upcoming Challenge: Hypergraph Partitioning

@Aoibheann — Mon, 17 Feb 2025 19:13:55 +0000

We are excited to introduce our upcoming challenge for TIG:

\large{ \textbf{Hypergraph Partitioning}}

Impact: Practical and/or Scientific

Hypergraphs are a powerful tool for representing complex networks in which relationships may involve more than two elements simultaneously. Hypergraph partitioning refers to dividing such a network into a specified number of groups that are roughly equal in size while keeping as many related items together as possible. Although the problem is computationally challenging (NP-hard), it has broad applications across numerous fields:

Parallel Computing & Load Balancing: By intelligently distributing tasks across processors, hypergraph partitioning minimizes communication overhead and enhances overall computational efficiency ^[1]^[2]^[3]^[4]^[5].
Distributed Neural Network Training: It enables the partitioning of compute graphs across multiple GPUs or servers, significantly accelerating the training of deep learning models ^[6]^[7].
VLSI & Circuit Design: By effectively grouping circuit components, it optimizes chip layouts and reduces interconnect complexity, leading to faster and more efficient designs ^[8]^[9].
Social Networks & Community Detection: Capturing multi-way relationships, hypergraph partitioning reveals hidden community structures and provides deeper insights into group dynamics ^[10].
Bioinformatics & Computational Biology: It facilitates the clustering of proteins, genes, and genomic regions to identify functional modules, thereby aiding discovery in biological research ^[11].
Machine Learning & Data Mining: By effectively modeling higher-order interactions, it improves data clustering and feature selection, enhancing analytical outcomes ^[12].
Other Applications: From optimizing database sharding and segmenting GIS regions to modularizing software systems, hypergraph partitioning transforms large-scale challenges into more tractable problems ^[1:1]^[7:1]^[4:1].

In the rapidly evolving field of Decentralized Physical Infrastructure Networks (DePIN) — which leverage blockchain technology and distributed nodes to manage physical assets — hypergraph partitioning plays an especially important role. By accurately modeling complex interactions, it can effectively group related tasks and resources across scenarios such as decentralized compute/storage, blockchain data sharding, IoT networks, or supply chain logistics ^[13]. This grouping helps minimize cross-node communication and balances workloads, ultimately enhancing the scalability and performance of these decentralized systems ^[14].

Problem Description and Formulation

The Hypergraph Partitioning challenge at TIG can be formulated as follows:

Goal: Divide a hypergraph into a specified number of balanced partitions while minimizing the number of cut hyperedges.

Consider a hypergraph H = (V, N), with:

A set of vertices V, where each vertex v \in V has a weight w[v].
A set of hyperedges (nets) N, where each hyperedge n \in N is a subset of V and has a cost c[n].

Constraints:

Each vertex belongs to exactly one part, i.e., V_k \cap V_l = \emptyset for all 1 \leq k < l \leq K.
Every part must contain at least one vertex, i.e., V_k \neq \emptyset.
The total weight of each part, W_k, must not exceed the average part weight W_{\text{avg}} by more than an allowed imbalance tolerance:

W_k \leq W_{\text{avg}}(1+\epsilon),

where W_k = \sum_{v \in V_k} w[v], W_{\text{avg}} = \frac{\sum_{k=1}^{K} W_k}{K}, and \epsilon is the allowed imbalance tolerance.

Objective:

\text{Minimize:}\quad \sum_{n \in N} c[n]\bigl(\lambda_n - 1\bigr),

where the connectivity, \lambda_n, is the number of parts that hyperedge n spans.

Baseline Calculation

The baseline algorithm provides a reference performance metric for each instance. It is crucial that this baseline is both stable (e.g., consistently within 20% of the best solution) and efficient.

Our chosen baseline is a greedy bipartition algorithm. The algorithm repeatedly partitions the vertex set into two parts until reaching the desired number of partitions. Each bipartition step proceeds as follows:

Determine target sizes. Given a current subset of vertices, calculate how many vertices should go to the left and right parts (e.g., if we aim for 2^d total parts, each subdivision targets two parts of prescribed sizes).
Sort vertices by degree. For the vertices in the current subset, compute their degrees (the number of hyperedges to which each vertex belongs). Sort them in descending order so that higher-degree vertices are placed first.
Place vertices greedily. Initialize two Boolean arrays (one per part) to track which hyperedges already have at least one vertex assigned to that part. For each vertex in sorted order:

Count how many of its hyperedges are “activated” in the left part and how many in the right.
If the left side has higher overlap, assign the vertex to the left part (unless it is at capacity); similarly, assign it to the right side if that overlap is higher. In case of a tie or if a part is filled, assign the vertex to the other part. Continue until one part reaches its target size, then assign any remaining vertices to the other part.

Recursive Subdivision. After producing a bipartition, apply the same procedure to each newly formed part until the desired number of parts (e.g., 64) is reached.

Finally, the connectivity of the complete multiway partition is computed, giving the baseline_value. Although this local greedy strategy does not capture global interactions perfectly, it is fast, straightforward, and serves as a reasonable performance benchmark for more sophisticated methods.

Random Instance Generation

Importance

The instance generation process aims to produce scenarios that closely resemble established academic benchmark instances. By mirroring real-world conditions, we ensure that the challenge motivates algorithmic advances relevant to practical applications.

Generation

We plan to generate synthetic hypergraphs that mimic the characteristics of academic benchmark datasets. Although our approach is still under development (with expert consultation planned), a promising method involves the degree-corrected hypergraph stochastic block model (DCHSBM) by Chodrow et al. ^[15]. This model naturally produces realistic clustering, varied node degrees, and heavy-tailed edge sizes. Ultimately, we want hypergraph instances whose sizes and structures reflect those found in widely used sparse matrix collections, such as the SuiteSparse Matrix Collection ^[16], as referenced in ^[3:1].

Choice of Parameters

We are considering two approaches for setting the number of hyperedges. In one approach, the number of hyperedges equals the number of vertices, reflecting the sparse matrices in the SuiteSparse Matrix Collection ^[16:1]. Alternatively, we may draw the number of hyperedges from a Poisson distribution, whose mean is derived from a product of node degree parameters and a cluster-based affinity function, as in ^[15:1].

For the TIG challenge, we are currently considering fixing the number of parts at 64. The number of clusters will be decided as part of the generation. We will set all weights and costs to unity (i.e., w[v] = c[n] = 1), thereby reducing the problem to minimizing solely the connectivity \lambda_n.

Difficulty Parameters

We define the following difficulty parameters that influence the Pareto frontier for qualifying solutions:

num_vertices: The number of vertices in the hypergraph instance. As num_vertices grows, the solution space expands significantly, making it harder to find or approximate an optimal partition. This tests an algorithm’s ability to scale efficiently while maintaining good solution quality.
better_than_baseline: The required proportional improvement over the baseline solution cost (measured by partition connectivity). A stricter threshold demands higher-quality optimization, pushing algorithms to deliver more refined solutions.

Our Challenge

At TIG, the baseline connectivity is determined by the greedy bipartition approach. The challenge is to develop algorithms that improve upon this baseline by at least the specified factor (better_than_baseline).

Devine, K.D., Boman, E.G., Heaphy, R.T., Bisseling, R.H., & Catalyurek, U.V. (2006). Parallel hypergraph partitioning for scientific computing. Proceedings 20th IEEE International Parallel & Distributed Processing Symposium. ↩︎ ↩︎
Aykanat, C., Cambazoglu, B., & Uçar, B. (2008). Multi-level direct K-way hypergraph partitioning with multiple constraints and fixed vertices. Journal of Parallel and Distributed Computing, 68, 609–625. ↩︎
Trifunovic, A., & Knottenbelt, W. (2008). Parallel multilevel algorithms for hypergraph partitioning. J. Parallel Distrib. Comput., 68, 563–581. ↩︎ ↩︎
Gottesbüren, L., & Hamann, M. (2022). Deterministic Parallel Hypergraph Partitioning. In Euro-Par 2022: Parallel Processing (pp. 301–316). Springer International Publishing. ↩︎ ↩︎
Schlag, S., Heuer, T., Gottesbüren, L., Akhremtsev, Y., Schulz, C., & Sanders, P. (2023). High-Quality Hypergraph Partitioning. ACM J. Exp. Algorithmics, 27(1.9), 39. ↩︎
Zheng, D., Song, X., Yang, C., LaSalle, D., & Karypis, G. (2022). Distributed Hybrid CPU and GPU Training for Graph Neural Networks on Billion-Scale Heterogeneous Graphs. In Proceedings (pp. 4582–4591). ↩︎
Catalyurek, U., Devine, K., Fonseca Faraj, M., Gottesbüren, L., Heuer, T., Meyerhenke, H., Sanders, P., Schlag, S., Schulz, C., & Seemaier, D. (2022). More Recent Advances in (Hyper)Graph Partitioning. ↩︎ ↩︎
Papa, D., & Markov, I. (2007). Hypergraph Partitioning and Clustering. In Handbook of Approximation Algorithms and Metaheuristics. ↩︎
Karypis, G., Aggarwal, R., Kumar, V., & Shekhar, S. (1999). Multilevel hypergraph partitioning: applications in VLSI domain. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 7(1), 69–79. ↩︎
Zhang, C., Cheng, W., Li, F., & Wang, X. (2024). Hypergraph-Based Influence Maximization in Online Social Networks. Mathematics, 12(17), 2769. ↩︎
Wang, S., Cui, H., Qu, Y., & Yijia, Z. (2025). Multi-source biological knowledge-guided hypergraph spatiotemporal subnetwork embedding for protein complex identification. Briefings in Bioinformatics, 26. ↩︎
Zhou, D., Huang, J., & Schölkopf, B. (2006). Learning with Hypergraphs: Clustering, Classification, and Embedding. In Advances in Neural Information Processing Systems 19 (2006), 1601–1608. ↩︎
Qu C, Tao M, Yuan R. A Hypergraph-Based Blockchain Model and Application in Internet of Things-Enabled Smart Homes. Sensors (Basel). 2018 Aug 24;18(9):2784. doi: 10.3390/s18092784. PMID: 30149523; PMCID: PMC6164253. ↩︎
K. Kumar et al. “SWORD: workload-aware data placement and replica selection for cloud data management systems”. In: The VLDB Journal 23 (Dec. 2014), pp. 845–870. doi: 10.1007/s00778-014-0362-1. ↩︎
Chodrow, P.S., Veldt, N., & Benson, A.R. (2021). Generative hypergraph clustering: From blockmodels to modularity. Science Advances, 7. ↩︎ ↩︎
Kolodziej, S., Mahmoudi Aznaveh, M., Bullock, M., David, J., Davis, T., Henderson, M., Hu, Y., & Sandstrom, R. (2019). The SuiteSparse Matrix Collection Website Interface. Journal of Open Source Software, 4, 1244. ↩︎ ↩︎

Thoughts and Expectations on TIG’s Future

@lhamartia mustafa — Sun, 16 Feb 2025 11:25:02 +0000

Hello everyone,

I have been involved with TIG for about two months now, actively mining and contributing to the ecosystem. However, with recent changes and market fluctuations, I’m curious to hear the community’s thoughts on TIG’s future.

I’m hopeful that the price will return to the $1-$2 range, and I’d love to discuss what needs to happen for us to reach that point. Mining rewards, liquidity, and ecosystem sustainability are key factors, and I’m interested in hearing different perspectives on these aspects.

What are your thoughts on TIG’s future? Can it grow stronger with the team’s decisions and community support?

Looking forward to your insights!

Thanks!

Breakthrough Submission: Vehicle Routing

@syebastian — Mon, 20 Jan 2025 19:23:51 +0000

Hello vidalt.
First, thank you for taking the time for your research. Addressing the statement about novelty, similar components might have existed in other works, as many people have worked on solving this problem for years, but nevertheless, there are different variations with different results, and the version I presented hasn’t been seen anywhere else. Moreover, it’s quite obvious why the variants proposed before me didn’t achieve success. The version with global parameters doesn’t introduce significant quality improvement, while the version with probabilities for each pair has scalability issues. I presented an idea that can handle both small and large tasks.

Second, regarding the results, I probably didn’t quite understand what you meant, because Enhanced CW consistently shows a gap from 0% to 1% (not from 1 to 2% as in your response) and for some instances goes beyond this range. But this only indicates the implementation is incomplete, not about the quality of the idea. If you paid attention to the implementation, the exploration process is completely randomized. No feedback, no reinforcement of reliable parameters, and no non-linear randomization. Also, the local search itself can be improved and better adapted to this idea. For example, one of my implementations found optimal values for a small set of nodes. Another was more balanced for any set of nodes. The potential behind this is enormous, and everything comes down to having enough time to search for the perfect solution for neighborhood parameters exploration. But whatever implementations may exist in the future, everything is built around one concept, and it’s this concept that deserves the breakthrough title.

Breakthrough Submission: Vehicle Routing

@vidalt Thib — Mon, 20 Jan 2025 18:10:30 +0000

Dear syebastian,

Thank you for your prompt response and for providing additional benchmarking results. I previously prepared the report included in John’s previous answer, so I am taking the opportunity to reply directly here to ensure a more direct discussion.

To clarify, my evaluation was conducted following peer-review standards, focusing on both novelty and performance relative to the current state-of-the-art (SOTA) published research and open-source algorithms. With many open-source implementations available for the CVRP, a breakthrough should, in my opinion, demonstrate improvements over this SOTA.

On Novelty. I have carefully understood the parameter adaptation mechanism you proposed. As highlighted in my earlier report, “ant-colony optimization” metaheuristics, popular in Operations Research in the 2000s, inherently include similar mechanisms for search parameter adaptation. For example, in [1], the method maintains a matrix \xi[i,j] of attractiveness for each possible (i,j) customer pair. The construction of solutions uses a probabilistic variant of the savings algorithm, where merging (i,j) pairs with higher \xi[i,j] values is more likely. The parameters \xi[i,j] evolve through the search, with those leading to better solutions being reinforced while others gradually diminish. Your proposed mechanism shares very close conceptual similarities with these approaches, though adaptation is restricted to fewer parameters. Overall, while there are differences in implementation, the core idea of adaptive parameters guiding solution construction in the savings algorithm is similar.

On Performance. Thank you for benchmarking the algorithm on the smaller instances of Set X. The error gaps of 1 to 2% you report (on the smaller instances) are noteworthy. Still, this might fall short of a breakthrough in this domain, in my view. This is in line with the comment of Aoibheann stating:

Whether these features rise to the level of a “breakthrough” depends on how effectively they outperform classical heuristics and whether the synergy can be demonstrated to be a true advance versus a recognizable extension of known methods

As noted in my previous report, current SOTA methods often achieve error gaps below 0.2% on the complete X set (that includes larger and harder instances), and multiple open-source methods achieving 1-2% gap are already available. While the results you presented are promising, especially in the short time frame dedicated to this development, getting closer to SOTA performance would strengthen your claims. Yet, this might require more extensive method redesign rather than simple fine-tuning, as stated.

Overall, I greatly appreciate the time and effort you have dedicated to this benchmarking. It is great to see all the efforts and dedication put into better solving the CVRP in this project!

[1] Reimann, M., Doerner, K. F., & Hartl, R. F. (2004). D-Ants: Savings based ants divide and conquer the vehicle routing problem. Computers & Operations Research, 31(4), 563–591.

Breakthrough Submission: Vehicle Routing

@syebastian — Sat, 18 Jan 2025 15:57:04 +0000

First of all, I want to thank everyone who shared their research results in this thread. Special thanks to Aoibheann for the very detailed research and for highlighting the factors that indicate the novelty of the method. Indeed, the heart of this approach is assigning a parameter to each node and searching for an effective set of parameters to obtain the optimal value. In this light, while John’s research mentions this, for some reason he overlooks it as a novel aspect of this method compared to other similar ones. I agree that benchmark A is not the most reliable, and to get an objective picture, ideally we should have results for benchmark X(Uchoa). I want to remind you that breakthrough primarily describes the novelty of the method while leaving room for various implementation variations and consequently for bringing the result to the desired gap. At the moment, I can provide results for benchmark X for instances in the range [100,256] as the work on the algorithm was focused on this range. Please note that 1) the algorithm has not been submitted yet (will be submitted at the end of the round), 2) there is still room for improvements, and achieving optimal results is quite possible using the novelty of this method.

Please pay attention that algorithm finds optimal values and close to optimal with desired below 0.2% gap. With enough time and dedication, it can be polished to achieve a much lower average gap, especially considering I only spent a few days on this (enhanced cw). Regardless of what changes might be made in the future, the core algorithm remains the same, which is another proof of why it deserves to be considered a breakthrough.

Breakthrough Submission: Vehicle Routing

@John — Sat, 18 Jan 2025 15:15:42 +0000

Assessment – UAI c002 b001 Adaptive Savings Algorithm for the CVRP: Breakthrough_Discovery_Assessment_CVRP.pdf (151.5 KB)

@syebastian

Vehicle Routing Challenge Update

@Aoibheann — Thu, 16 Jan 2025 17:44:11 +0000

Vehicle Routing Challenge Update: Introducing Time Windows

Challenge Formulation

The Vehicle Routing Problem with Time Windows (VRPTW) involves determining a set of cost-effective routes for a fleet of identical vehicles operating from a single depot to serve a geographically dispersed set of customers. Each vehicle has a fixed capacity and each customer has a known demand for goods and a defined time window during which service must begin. If a vehicle arrives before this time window, it must wait; if it arrives after, service is considered infeasible. The primary objective is to minimise the total distance the fleet must travel to deliver goods to all customers and return to the depot, such that:

Each customer is visited by exactly one vehicle.
The total demand serviced by each vehicle does not exceed its capacity.
Each vehicle starts and ends its route at the depot.
Service at each customer commences within the customer’s defined time window.
The number of vehicles utilised is less than a set fleet size.
Vehicles wait if they arrive early, and service durations are accounted for within the schedule.

Note: In the typical problem formulation of the VRPTW, the primary objectives are to minimise both the number of vehicles utilised and the total distance travelled, while strictly adhering to the stated constraints. The DIMACS ^[1] implementation challenge adopts the convention (already used in several recent works^[2]^[3]) of only minimising the total distance while enforcing a max fleet size.

In TIG, it is essential for a challenge to be asymmetric - challenging to solve yet straightforward
to verify. Many optimisation challenges are not inherently asymmetric but can be transformed into an asymmetric decision problem by specifying a baseline value and requiring solvers to exceed that value.

Baseline Algorithm

To ensure TIG’s resistance to Sybil attacks, it is essential to use a baseline solver that is both simple and computationally efficient. While a standard greedy algorithm is typically employed as a baseline, we are currently considering Solomon’s I1 insertion heuristic ^[4], contributed by a member of our community.

Note: Testing to verify the runtime of this algorithm is ongoing.

Several versions of baseline solvers, including heuristic and metaheuristic approaches, were developed and tested by a community member before they decided on Solomon’s I1 insertion heuristic ^[4:1]. This suggestion was based on multiple factors, including computational efficiency, simplicity, and resistance to Sybil attacks.

Testing on Solomon and Homberger-Gehring benchmark instances ^[5](200- and 400-customer cases) revealed that Solomon’s I1 produces solutions approximately 10% worse than best-known values on average. Attempts to enhance performance through techniques like 2-opt, r-opt, or swapping yielded marginal improvements that did not justify the added computational cost.

Solomon’s I1 heuristic is a widely recognised constructive approach for solving the VRPTW. It incrementally constructs routes by inserting customers into positions that minimise an insertion cost, while satisfying vehicle capacity and time window constraints.

Algorithm Steps

Initialisation: Start with an empty route and select an initial customer to serve.
Insertion Process: For each unrouted customer:
- Evaluate all feasible positions in the current route.
- Compute the insertion cost at each position, considering both distance and time adjustments.
- Select the position that minimises the cost while maintaining feasibility (capacity and time window constraints).
Route Completion: Insert the chosen customer into the route. Repeat until no more customers can be added.
Open a New Route: If unrouted customers remain, open a new route and repeat the process until all customers are routed.

Insertion Cost Criteria

The insertion cost is computed using two levels of cost functions:

First-Level Cost ( C_1 )

The first-level cost prioritizes minimising added distance and time adjustments. For a candidate customer u inserted between two consecutive nodes i_{p-1} and i_p , the cost function is:

C_1(i_{p-1}, u, i_p) = a_1 \bigl[d(i_{p-1}, u) + d(u, i_p) - \mu d(i_{p-1}, i_p)\bigr] + a_2 \bigl(b_{j,u} - b_j\bigr),

where:

d(x, y) is the distance between nodes x and y ,
b_{j,u} and b_j represent service start times,
a_1, a_2, \mu are parameters that balance distance and time impacts.

Second-Level Cost ( C_2 )

The second-level cost adjusts for the proximity of the candidate customer to the depot and is defined as:

C_2(i_{p-1}, u, i_p) = \lambda d(0, u) - C_1(i_{p-1}, u, i_p),

where \lambda weights the depot’s distance influence.

Why Solomon’s I1?

Solomon’s I1 heuristic has the following strengths:

It produces competitive solutions within approximately 10% of best-known values.
It is easy to implement without requiring complex data structures or preprocessing steps.
It is computationally efficient, even as problem instances scale.
It explicitly handles time feasibility, yielding well-structured routes.
It has adjustable cost parameters allowing for different insertion strategies.

Compared to alternative baselines, such as simple greedy heuristics or the Clarke & Wright savings heuristic (adapted for time windows), Solomon’s I1 achieves a strong balance of solution quality, feasibility, and implementation simplicity. Its broad acceptance in VRPTW research makes it a reliable benchmark for evaluating more sophisticated algorithms. However, if runtime proves to be excessively long, we may need to consider reverting to a simpler baseline algorithm to maintain computational efficiency.

Challenge Generation

Importance

The instance generation process is designed to produce scenarios that closely mirror established academic benchmark instances. It is crucial that these generated instances accurately reflect real-world conditions, as the game incentivises algorithms specifically optimised for these scenarios.

VRPTW

The following instance generation process was proposed by a member of the community to replicate the benchmark instances of Homberger and Gehring^[6]. It incorporates both random and clustered customer distributions, coupled with tight time window constraints and low vehicle capacity, characteristics defined as RC1 instances in Solomon’s seminal paper on the VRPTW^[4:2] and later utilised by Homberger and Gehring^[6:1]. These instances serve as critical benchmarks for evaluating algorithms addressing the VRPTW. While benchmark instances typically vary in customer distribution (random, clustered, or mixed), time window tightness (loose or tight), and vehicle capacity (small or large), the current instance generation is designed to produce the RC1 type. This was suggested by a member of the community as it reflects more realistic and operationally challenging conditions, where tighter time constraints on vehicle arrivals require algorithms to achieve a delicate balance between route optimisation and precise scheduling. Such instances emphasise the need for greater accuracy in sequencing decisions, pushing algorithms to deliver both efficient routing and careful time management.

Dataset Generation

Grid and Depot Initialisation

A grid of size \frac{N}{2} \times \frac{N}{2} is initialised, where N is the total number of customers.
The depot is located at the centre of the grid: (\frac{N}{4}, \frac{N}{4}) .

Difference: In contrast to the original CVRP challenge, which fixed the grid size at 500 \times 500 with the depot centred at (250, 250), this approach adopts a variable grid size. This mimics the majority of the Homberger instances (for N > 200)^[6:2]^[5:1].

Note: The paper by Uchoa^[7] on new benchmark instances for the CVRP mentions they generate instances on a fixed 1000 \times 1000 grid for customer instances ranging from 100–1000. This method could be implemented if beneficial.

Customer (Node) Position Assignment

Customers are divided evenly into two groups: one composed of clustered customers and the other of random customers.

To generate clustered locations, we begin by selecting k seed points at random within the grid, which define the centres of our clusters. We then determine the positions of the remaining \frac{N}{2} - k customers based on their proximity to these seed points. Candidate positions are generated at random and accepted with a probability governed by an exponential decay function, ensuring that locations nearer to a seed point are more likely to be chosen. This acceptance probability is given by:

P(\text{candidate}) = \sum_{i=1}^{k} \exp\Bigl(-\frac{\text{distance}(\text{candidate}, \text{seed}_i)}{\lambda}\Bigr)

where k is the total number of seed points, \text{distance}(\text{candidate}, \text{seed}_i) is the Euclidean distance from the candidate location to the i-th seed, and \lambda is a scaling factor that controls the decay rate.

The remaining \frac{N}{2} customers are placed randomly within a \tfrac{N}{2} \times \tfrac{N}{2} coordinate space, ensuring that their locations are independently distributed and not influenced by seed points.

Note: The paper by Uchoa^[7:1] mentioned sampling from a uniform distribution of [3,8] for the number of seed customers. This paper generates instances up to 1000 customers. This paper has a choice of \lambda = 40 based on experimental results. Smaller values create overly dense and isolated clusters, while larger values produce sparse, overlapping clusters. For the Homberger instances, it was observed that these were most easily replicated with a choice of 8 “seed” customers and \lambda = 6 . (The exact choice for these parameters remains to be decided.)

Demand Generation

Customer demands are randomly assigned integer values between 1 and 50. The depot’s demand is fixed at 0.

Note: Random sampling within the range [1,50] was employed in an effort to replicate the Homberger benchmark instances^[6:3], although the specific sampling methodology was not explicitly detailed in their paper.

Distance Matrix Construction

A symmetric distance matrix is built, where each entry represents the Euclidean distance between two nodes. Distances are calculated using:

\text{distance} = \sqrt{(x_1 - x_2)^2 + (y_1 - y_2)^2}

Distances are rounded to the nearest integer.

Note: Uchoa’s work^[7:2] identifies two common conventions for handling distances in the VRP literature: exact methods typically round values to the nearest integer, whereas heuristic methods generally avoid rounding. To mitigate any drawbacks associated with rounding, Uchoa’s instances were designed on a [0,1000]\times[0,1000] grid rather than the more traditional [0,100]\times[0,100] scale. This choice results in optimal solutions whose magnitudes are typically between 10^4 and 10^5, an advantageous range for exact methods with a precision of 10^{-6}. As a consequence, the “plateau effect” and the artificial performance boost sometimes observed in exact methods are substantially reduced.

Some contemporary solvers, such as OR-Tools and VROOM, require integer distance inputs. This facilitates a preprocessing step to scale distances (e.g., by a factor of 10 or 100) if a certain precision is required. After obtaining a solution, the results are then rescaled to their original units. Other solvers, like jspirit, accept floating-point inputs without any specific precision handling.

This rounding convention originally proposed in Solomon (1987)^[4:3] is defined as:

d_{ij} = \frac{\left\lfloor 10e_{ij}\right\rfloor}{10}

where d_{ij} is the final distance output, e_{ij} is the euclidean distance between customers i and j and \left\lfloor10e_{ij}\right\rfloor is the distance input into the solver.

Depot Due Time

The approach for establishing the depot due time wasn’t described by Solomon^[4:4]. From observations of the instances, a member of our community has proposed the following formula:

l_0 = d_{0i_F} + (s_{\text{av}} + d_{\text{av}})\times n_{\text{av}},

where

n_{\text{av}} = \frac{\text{capacity}}{\text{average customer demand}}.

Here l_0 is the depot due time (the latest time of arrival at the depot), d_{0i_F} is the direct distance between the depot and the furthest customer i_F , s_{\text{av}} is the average service time, d_{\text{av}} = \frac{N}{4} \times 0.5214 is the average distance between customers, N is the number of customers and n_{\text{av}} is the average number of customers per route.

The formula for d_{\text{av}} comes from the average distance between two points inside a square with side length \frac{N}{4} . Since the depot is always centred, the average distance inside a quarter of the grid was computed analytically using the formula for the average distance between two uniformly random points inside a square^[8]^[9].

Note: This formula does not account for the clustering of certain customers, which could influence the average distance between them. However, this is likely acceptable, as clustering generally results in shorter distances between customers along a given route. The average number of customers served per route is estimated based on the ratio of average demand to vehicle capacity, which, under the current parameters, is approximately 8 customers per route. Additionally, the distance to the furthest customer is used as an offset to ensure that depot time constraints are sufficiently flexible. This prevents overly restrictive limits on the solution space and range of feasible routes.

Time Window Assignment

Time windows due times are assigned differently depending on whether customer locations are randomly distributed or clustered.

Randomly Distributed Customers:
Time window assignments are determined by their distance from the depot. Specifically, each randomly distributed customer i is assigned a due time drawn uniformly from the interval:

[d_{0i}, l_0 - d_{0i} - s_i]

where d_{0i} is the distance from the depot to customer i , l_0 is the depot’s due time, and s_i is the service time for customer i . This method ensures that each time window is feasible, allowing a vehicle to leave the depot, reach the customer, perform the service, and return on time.

Clustered Customers:
Time windows due times are first allocated to a set of seed customers using the same distance-based approach as above. Once these seed time window due times are established, time windows due times for the remaining clustered customers are assigned by adding a random offset, uniformly drawn from a predetermined range (currently [20, 180]), to the seed customer’s due time.

Note: Assigning time windows based on seed customers enables quick assignment for clustered customers without requiring an initial optimisation step, making a more efficient generation. This seed-based method reflects realistic delivery schedules by clustering time windows geographically and temporally. The offset range of [20, 180] is an initial suggestion and could be altered.

Density

A subset of customers is assigned non-zero ready times to introduce variability. These non-zero ready times are set 30 units prior to the customer’s due time. The proportion of such customers is determined by the density parameter, currently set to 50%.

Note: It was noted by a member of our community that lower density values increase the number of feasible solutions, thereby adding to the problem’s complexity. However, excessively low densities may deviate from real-world conditions, where such scenarios are less common. The chosen 50% density strikes a balance between computational complexity and practical realism, though higher values, such as 75%, could also be considered for certain applications. It is worth noting that most instances with unknown optimal solutions tend to exhibit densities less than 50%.

TIG Challenge Summary

For this challenge, the two difficulty parameters are N, the number of customers and the parameter better_than_baseline, which defines the performance threshold for solution acceptance.

The baseline value is determined using Solomon’s I1 insertion heuristic that iteratively inserts customers into routes based on a cost function that balances distance and time constraints. The routes are built one by one until all customers are served. The goal is to produce a solution better than the baseline’s total distance by at least the specified factor (better_than_baseline), while ensuring all VRPTW constraints are satisfied.

Code Implementation

While the Rust implementation for this challenge update is not yet complete, it will mirror the logic used in this Python implementation. We thank a member of our community for contributing the original code, which is attached below with slight modifications.
solver_vrptw.py (21.9 KB)
generator_vrptw.py (6.0 KB)

DIMACS. “Vehicle Routing Problem with Time Windows (VRPTW) Challenge.” Accessed: 2025-01-15. http://dimacs.rutgers.edu/programs/challenge/vrp/vrptw/ ↩︎
Roberto Baldacci, Aristide Mingozzi, Roberto Roberti, Recent exact algorithms for solving the vehicle routing problem under capacity and time window constraints, European Journal of Operational Research, Volume 218, Issue 1, 2012, Pages 1-6, ISSN 0377-2217. ↩︎
Diego Pecin, Claudio Contardo, Guy Desaulniers, and Eduardo Uchoa. 2017. New Enhancements for the Exact Solution of the Vehicle Routing Problem with Time Windows. INFORMS J. on Computing 29, 3 (Summer 2017), 489–502. ↩︎
Marius M. Solomon. Algorithms for the Vehicle Routing and Scheduling Problems with Time Window Constraints. Operations Research, 35(2):254–265, 1987. ↩︎ ↩︎ ↩︎ ↩︎ ↩︎
SINTEF. Vehicle Routing Problem with Time Windows (VRPTW). VRPTW, Accessed: 2024-12-05. ↩︎ ↩︎
J. Homberger and H. Gehring. Two Evolutionary Metaheuristics for the Vehicle Routing Problem with Time Windows. Infor, 37:297–318, 1999. ↩︎ ↩︎ ↩︎ ↩︎
Eduardo Uchoa, Diego Pecin, Artur Pessoa, Marcus Poggi, Thibaut Vidal, Anand Subramanian. New benchmark instances for the Capacitated Vehicle Routing Problem. European Journal of Operational Research, 257(3):845–858, 2017. ↩︎ ↩︎ ↩︎
Eric W. Weisstein. Square Line Picking. MathWorld–A Wolfram Web Resource. Square Line Picking -- from Wolfram MathWorld, Accessed: 2024-12-05. ↩︎
Mean line segment length. Wikipedia. Mean line segment length - Wikipedia, Accessed: 2024-12-05. ↩︎

Breakthrough Submission: Vehicle Routing

@Aoibheann — Wed, 15 Jan 2025 15:47:14 +0000

Thoughts on the Algorithmic Submission for a Capacitated Vehicle Routing Problem (CVRP) Solver

High-Level Algorithm Flow

The solver operates as a multi-run, iterative metaheuristic:

Parametrized Clarke–Wright Construction: A modified Clarke–Wright savings algorithm, where each customer has a unique parameter influencing route formation, constructs an initial solution.
Parameter Perturbation: Each customer’s parameter is slightly adjusted to explore the solution space.
Solution Improvement: Local search, employing 2-Opt and inter-route swaps, optimizes the solution built with the new parameters.
Parameter Acceptance: Uses a Boltzmann-like rule akin to Simulated Annealing to accept or reject the new parameter set, guiding the search in parameter space.

Two distinct runs are performed, starting with node parameters set to 1.0 and 2.0 respectively; the solver retains the best global solution across both runs.

Detailed Logic and Code Steps

Initialization and Feasibility Checks

Feasibility Heuristic: A quick estimate checks if the instance is overly constrained. If infeasible, the solver aborts. (This is a TIG-specific optimization.)
Parameter Initialization: Runs are executed twice, initializing all node parameters to 1.0 in the first run, and 2.0 in the second.

Constructing a Route via Savings

Create Initial Savings List:
Following the Clarke–Wright logic, for each pair of distinct customers (i, j) , the code stores a potential “merge” record if the distance \text{dist}(i, j) is below half the maximum pairwise distance (a filtering step).
Recompute and Sort Savings:
A classical Clarke–Wright-like formula is used, but weighted by per-node parameters. In essence:

\text{Savings}(i,j) = (\text{params}[i] + \text{params}[j])\bigl(d_{0i} + d_{j0} - d_{ij}\bigr)

The list is then sorted in descending order of \text{Savings}(i, j) .
Create a Solution:
Each customer begins in its own route. Merges are attempted if capacity is not violated and the customers belong to different routes. Routes are finally wrapped with depot 0 at both ends.

Local Search (Postprocessing)

Intra-route 2-Opt: Eliminates crossing edges to shorten routes.
Inter-route Swaps: Enumerates beneficial swaps between routes, sorting them by descending improvement. A best-first aggregator applies each feasible swap without conflicts.

Parameter Tuning and Acceptance

Neighbor Generation: Slightly perturb each node’s parameter within \pm 0.05k , clamping to [1,2] . Increase k if the search stalls.
Rebuild + Postprocess: Re-sort the savings list using updated parameters, rebuild routes, then apply local search.
Acceptance Criteria: If the new solution is cheaper, accept it. Otherwise, accept with probability \exp(-\Delta / \beta) , where \beta \approx 0.005 \times (\text{current cost}) . This step is analogous to simulated annealing.^[1]
Updating the Best Solution: If accepted and better than the current best, save it. Each run terminates after 200 iterations or if an early stop is triggered.

Multiple Start Values and Global Best

Once each run (starting at parameter 1.0 and 2.0) is complete, the best solution of the two is returned as the global best.

Relevant Literature and Similar Approaches

Clarke–Wright Savings Heuristic

Clarke and Wright^[2] introduced the Savings heuristic, a foundation for numerous VRP approaches. Many variants add a shape parameter (\lambda) or employ multi-parameter expansions to accommodate factors such as demand. However, they typically apply one or two \lambda-like parameters globally^[3]^[4]^[5]^[6]^[7]. In contrast, this solver assigns distinct parameters to individual nodes.

Randomised Savings / GRASP-Style Approaches

Randomised Savings methods^[8]^[9] and GRASP-like (Greedy Randomised Adaptive Search Procedure) heuristics^[10]^[11] typically inject randomness directly into the merge selection process. Here, the code indirectly randomises merges by altering per-node parameters.
GRASP can be thought of as a multi-start local search, in which each initial solution is generated using one greedy randomised heuristic and improved by local search - similar to the logic followed in this code.

Simulated Annealing or Metropolis Acceptance

Simulated Annealing^[12] has been widely applied to VRPs, often accepting worse solutions probabilistically to escape local minima^[13]^[14]. The current solver follows a similar principle but applies it at the parameter level: the entire solution is reconstructed using new node parameters and optimised before deciding on acceptance. Deterministic annealing variants, as in ^[15], use a similar acceptance formula based on the cost.

Perturbation to Explore Solution Space

A key strategy in the literature is to perturb problem data so that conventional construction heuristics, like Clarke & Wright, can avoid becoming trapped in suboptimal configurations. For instance, ^[16] proposed a hybrid GA that lightly randomises customer coordinates, akin to using node-specific multipliers rather than altering real-space coordinates. Both approaches rely on small stochastic shifts to steer solutions away from local optima, aligning with the “noising” concept^[17]^[18] of perturbing instance data to broaden exploration. While ^[16:1] modifies coordinates and this code adjusts per-node parameters, both exploit the principle that shifting input data can yield significantly different (and often better) routes.

Iterated Local Search / Multi-start

Iterated Local Search for CVRP repeatedly perturbs a locally optimal solution, applies local search to find a new local optimum, and then uses an acceptance criterion to decide whether to move to the new solution or stay with the previous one, ultimately aiming to explore a wider solution space and find better solutions.
This solver’s repeated parameter perturbations, route reconstruction, local search and acceptance reflect the core ideas of Iterated Local Search (ILS)^[19] or multi-start heuristics, albeit with a parameter-space focus rather than direct route manipulations.

Aggregator-Based Multi-Route Swaps

The best-first aggregator for inter-route swaps is reminiscent of Large Neighborhood Search (LNS)^[20] and Variable Neighborhood Search (VNS)^[21], which also seek to find larger-scale improvements. Filtering the savings list by distance (only merges below half the max pairwise distance) parallels large-scale VRP strategies that emphasize “promising” edges^[18:1].

Main papers of interest

The paper by Pichpibul & Kawtummachai ^[22] enhances the classical Clarke–Wright (CW) heuristic by randomly reordering its savings list through a two-phase probabilistic selection. By avoiding the purely greedy ordering, it escapes local minima more effectively. A separate route post-improvement step (consisting of simple move and swap operations, both intra- and inter-route) further refines solutions.

The paper by Morgan et. al. ^[16:2] perturbs the traditional CW heuristic by altering customer coordinates before applying conventional heuristics. Conceptually, this mirrors per-node parameter tuning, since both methods inject randomness into the underlying savings calculations. By repeating these slight variations, solutions are diversified and local optima avoided, an idea closely related to “noising” and multi-start frameworks.

Potential Novel and Inventive Features

Per-Node Parameters: A separate parameter for each node offers fine-grained control, surpassing typical global-parameter expansions.
Parameter-Space SA Acceptance: The solver accepts a newly rebuilt solution based on a Boltzmann-like probability, but at the parameter level.
Global Aggregator Local Search: Ranks all feasible inter-route swaps in descending order of improvement, applying multiple high-gain moves in one pass.
Integrated Approach: Combining node-wise parameters, local search, and SA-style acceptance in a unified scheme. Its effectiveness hinges on how well these components synergize to produce competitive CVRP solutions relative to existing methods.

Summary

This research presents a new solver for the Capacitated Vehicle Routing Problem (CVRP) that combines a constructive heuristic (modified Clarke–Wright) with a metaheuristic framework (parameter-space simulated annealing) that incorporates local search. The solver iteratively perturbs parameters, reconstructs the solution using the constructive heuristic, applies local search (2-Opt and inter-route swaps) to improve the reconstructed solution, and then uses a Simulated Annealing-like acceptance criterion on the parameters to guide the search.

The main potentially novel and inventive innovations include:

Node-Specific Parametrization: Leverages classical savings but extends it with per-node parameters instead of a single global parameter. Each node’s parameter is perturbed to diversify route construction.
Acceptance in Parameter Space: A simulated annealing-like acceptance criterion operates on the parameters, not directly on the route. This parameter-space acceptance is not typically found in traditional route-based Simulated Annealing.
Aggregator-Based Local Search: The solver performs both intra-route 2-Opt and an aggregator-based multi-route swap that batches and ranks potential inter-route exchanges in descending order of cost savings, enabling multiple high-gain moves in a single pass.

Whether these features rise to the level of a “breakthrough” depends on how effectively they outperform classical heuristics and whether the synergy can be demonstrated to be a true advance versus a recognizable extension of known methods. However, the code’s node-based parameter tuning, aggregator local search, and cost-scaling acceptance represent an interesting and possibly novel integration within the CVRP heuristic toolbox.

S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization by simulated annealing,” Science, 220(4598):671–680, 1983. ↩︎
G. Clarke and J. W. Wright, “Scheduling of vehicles from a central depot to a number of delivery points,” Operations Research, 12(4):568–581, 1964. ↩︎
T. J. Gaskell, “Bases for Vehicle Fleet Scheduling,” Journal of the Operational Research Society, 18(3):281–295, 1967. ↩︎
P. C. Yellow, “A Computational Modification to the Savings Method of Vehicle Scheduling,” Journal of the Operational Research Society, 21(2):281–283, 1970. ↩︎
H. Paessens, “The savings algorithm for the vehicle routing problem,” European Journal of Operational Research, 34(3):336–344, 1988. ↩︎
İ. K. Altınel and T. Öncan, “A new enhancement of the Clarke and Wright savings heuristic for the capacitated vehicle routing problem,” Journal of the Operational Research Society, 56(8):954–961, 2005. ↩︎
T. Doyuran and B. Çatay, “A robust enhancement to the Clarke–Wright savings algorithm,” Journal of the Operational Research Society, 62(1):223–231, 2011. ↩︎
A. A. Juan, J. Faulin, R. Ruiz, B. Barrios, and S. Caballé, “The SR-GCWS hybrid algorithm for solving the capacitated vehicle routing problem,” Applied Soft Computing, 10(1):215–224, 2010. ↩︎
T. Pichpibul and R. Kawtummachai, “An improved Clarke and Wright savings algorithm for the capacitated vehicle routing problem,” ScienceAsia, 38(3):307–318, 2012. ↩︎
P. Festa and M. G. C. Resende, “Hybrid GRASP heuristics,” in Foundations of Computational Intelligence Volume 3: Global Optimization, pp. 75–100. Springer, 2009. ↩︎
A. Layeb, M. Ammi, and S. Chikhi, “A GRASP Algorithm Based on New Randomized Heuristic for Vehicle Routing Problem,” Journal of Computing and Information Technology, 21:37–48, 2013. ↩︎
S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, “Optimization by simulated annealing,” Science, 220(4598):671–680, 1983. ↩︎
I. H. Osman, “Metastrategy simulated annealing and tabu search algorithms for the vehicle routing problem,” Annals of Operations Research, 41(4):421–451, 1993. ↩︎
M. Gendreau, A. Hertz, and G. Laporte, “A tabu search heuristic for the vehicle routing problem,” Management Science, 40(10):1276–1290, 1994. ↩︎
M. Baranwal, P. M. Parekh, L. Marla, S. M. Salapaka, and C. L. Beck, “Vehicle routing problem with time windows: A deterministic annealing approach,” in 2016 American Control Conference (ACC), pp. 790–795, IEEE, 2016. ↩︎
M. Morgan and C. Mumford, “Capacitated vehicle routing: perturbing the landscape to fool an algorithm,” in Proc. IEEE Congress on Evolutionary Computation (CEC), pp. 2271–2277, 2005. ↩︎ ↩︎ ↩︎
R. H. Storer, S. D. Wu, and R. Vaccari, “New search spaces for sequencing problems with application to job shop scheduling,” Management Science, 38(10):1495–1509, 1992. ↩︎
I. Charon and O. Hudry, “The noising method: a new method for combinatorial optimization,” Operations Research Letters, 14(3):133–137, 1993. ↩︎ ↩︎
H. R. Lourenço, O. Martin, and T. Stützle, “Iterated local search,” in Handbook of Metaheuristics, F. W. Glover and G. Kochenberger (eds), pp. 321–353, Springer, 2003. ↩︎
S. Ropke and D. Pisinger, “An adaptive large neighborhood search heuristic for the pickup and delivery problem with time windows,” Transportation Science, 40(4):455–472, 2006. ↩︎
N. Mladenović and P. Hansen, “Variable neighborhood search,” Computers & Operations Research, 24(11):1097–1100, 1997. ↩︎
T. Pichpibul and R. Kawtummachai. “New enhancement for Clarke-Wright
savings algorithm to optimize the capacitated vehicle routing problem”. European Journal
of Scientific Research 78.1 (2012), pp. 119–134. ↩︎

Breakthrough Submission: Vehicle Routing

@syebastian — Mon, 13 Jan 2025 00:34:38 +0000

Seems like an output of ChatGPT

EDIT: Just to be clear, these points are already being considered in evidence and explained the difference between previous studies and this new technique.
I will just highlight some points that are misinterpreted in this post.

The post lumps “parametric reweighting” into a standard local search tweak, but it misses that the solver uses node-specific parameters, not a single or handful of global parameters. This difference is non-trivial: each node’s merging preference is adaptively tuned, which shapes the entire route structure in ways not seen in typical “one-parameter” expansions of Clarke Wright. The post basically calls it “a small ‘neighbor generation’ step,” but it’s actually the core engine for reconstructing or merging routes in each iteration.

Then calling it “not a full-blown SA framework,” he glosses over the synergy between param acceptance and immediate local search improvements. Even though there may not be an explicit “temperature schedule,” the exponential acceptance of new param sets is integrated within solution reconstruction cycles—this dynamic interplay is key. Then post stops at “it’s reminiscent of SA,” ignoring how the method re-invokes savings merges with the updated parameters and then systematically applies multi-route local search. That cyclical process isn’t a standard, off-the-shelf pattern.

After this, his post acknowledges local search operators are standard (which is true), but it fails to mention that the solver aggregates all possible swaps across routes, ranks them, and applies them in a “best-improvement-first” pass that prevents conflicting moves in one iteration. This aggregator approach can drastically reduce the number of partial or contradictory local improvements, effectively making the local search global in scope for each iteration. It’s not just “standard local search thrown in.”

The post suggests a typical multi-start or multi-iteration pattern (“tries multiple initial solutions and picks the best”). But the real approach does more than just random restarts: it continuously updates the node parameters, re-scores merges, and re-applies local search. Dismissing it as “multi-start structure” overlooks how deeply the constructive and improvement phases are integrated.

And as conclusion, most any new heuristic can be described as combining “building blocks”. Novelty in combinatorial optimization often arises from how those building blocks are interconnected (e.g., a cyclical parameter reconstruction approach integrated with multi-route local search). The post doesn’t engage with the possibility that the synergy or architecture is distinct.

Breakthrough Submission: Vehicle Routing

@zumGaBum — Mon, 13 Jan 2025 00:31:05 +0000

From a high-level overview, this solver is essentially stitching together several well-known VRP heuristics and improvement techniques into one pipeline rather than inventing an entirely new paradigm. Here are the relevant parts that show it is a combination of established ideas:

Savings-based construction
The code uses a structure reminiscent of the classic Clarke & Wright savings algorithm. You can see this in the creation of an initial “savings list” (the pairwise combination of nodes and their associated savings value) and then merging routes if it is capacity-feasible and beneficial. This is a direct nod to the standard savings approach to VRP.

Parametric reweighting of savings
There is a mechanism for perturbing or tweaking parameters params[i] and params[j] to recalculate savings. This is effectively a small “neighbor generation” step (similar to local search or metaheuristics like Simulated Annealing or Iterated Local Search), where we tweak parameters that influence the construction/merge process.

Acceptance criterion using probability
The portion:

else if rng.gen::() < (-delta / scaling_factor).exp() {
    current_params = neighbor_params;
}

is reminiscent of the Metropolis criterion in Simulated Annealing. Although not a full-blown SA framework (there’s no explicit temperature schedule here), it’s clearly adopting a “sometimes accept worse moves” approach from known metaheuristics.

Local Search (2-Opt + Inter-Route Swaps)
After building or perturbing solutions, the code uses classical VRP local-improvement heuristics:

2-Opt (intra-route improvement).
Swap moves (inter-route improvement).

Both of these are standard, well-established VRP improvement operators.

Iterative multi-start structure
The solver tries multiple initial solutions (different seeds or initial values). Then it picks and refines whichever turned out best. That multi-start pattern is yet another known strategy in heuristic design.

Putting these observations together, it’s fair to say:

This algorithm is not “new” in the sense of introducing a novel VRP algorithm.
It is a hybrid combining different established techniques (Savings Algorithm + local search + partial acceptance of worse solutions, etc.) in one framework.

Breakthrough Submission: Vehicle Routing

@TheDoctor — Sun, 12 Jan 2025 18:55:56 +0000

As some of you may have noticed, our first Breakthrough Rewards submission was made public today for Vehicle Routing.

Please see here: Breakthrough Evidence

There are two code submissions that embody the method described above:

Code 1

Code 2

You are invited to examine the evidence and code and prepare for the token-weighted vote on the submissions eligibility for breakthrough rewards which will open at the beginning of the next round (round 50). Voting closes at the end of round 50.