Simmo Saan activity

Simmo Saan approved merge request !1068: Fix Ali's surname at SoSy-Lab / Benchmarking / Formal-Methods Tools

2026-02-16T17:57:05Z

Simmo Saan commented on issue #45 at SoSy-Lab / Benchmarking / Competition Scripts

2026-02-04T15:26:43Z

Fixed by 2ed0ad70.

Simmo Saan closed issue #45: Unqualified tools are included in results at SoSy-Lab / Benchmarking / Competition Scripts

2026-02-04T15:26:42Z

Simmo Saan approved merge request !1034: Update Goblint techniques for SV-COMP 2026 at SoSy-Lab / Benchmarking / Formal-Methods Tools

2026-01-28T10:11:59Z

Simmo Saan opened merge request !1034: Update Goblint techniques for SV-COMP 2026 at SoSy-Lab / Benchmarking / Formal-Methods Tools

2026-01-28T10:11:30Z

Simmo Saan pushed new project branch goblint-svcomp26-techniques at SoSy-Lab / Benchmarking / Formal-Methods Tools

2026-01-28T10:09:03Z

Simmo Saan (434a0056) at 28 Jan 10:09

Update Goblint techniques for SV-COMP 2026

Simmo Saan opened issue #45: Unqualified tools are included in results at SoSy-Lab / Benchmarking / Competition Scripts

2026-01-19T11:44:25Z

Simmo Saan commented on merge request !228 at SoSy-Lab / Benchmarking / Competition Scripts

2026-01-12T10:59:22Z

Putting aside the interestingness to potential viewers, which everyone will probably never agree on, there is just the pragmatic reason that fully log-scale is less bunched up at the very low end. The height for 1s–1000s is decreased then, but not in a way that causes other lines to be more bunched up than before.

I suppose a lot of which scale shows the most just happens to depend on the practical characteristics of the tools. For example, the lines for most tools turn quite vertical and don't move much on the x-axis between 800s–900s. Whereas they happen to move a lot between 0s–1s. If there was competition in the 800s–900s range, or if there wasn't in the 0s–1s range, then things might be different.

I now wondered, whether things were different 15 years ago, as in there wasn't any competition below <1s (e.g. because none of the tools were so fast). But actually the quantile plot in the SV-COMP 2012 report is not that different in this regard:

Click to expand

Simmo Saan commented on merge request !228 at SoSy-Lab / Benchmarking / Competition Scripts

2025-12-30T15:28:11Z

I noticed that f0cf9d20 changes the quantile plots back to having linear scale <1s, as opposed to a fully log-scale plot. I know it was this way in the past with gnuplot, but... I have previously thought about suggesting a fully log-scale plot, so I was pleasantly surprised to see that on the results-page-to-be. With <1s squashed to a thin strip at the bottom, it's impossible to differentiate tools at the very low end.

The competition <1s is no less interesting than the competition >100s, so it's a shame that the plots squash it to be more-or-less irrelevant. Quantile plots rank the tools at all possible time limits, why is 500s more interesting than 0.5s? It entirely depends on what use case someone is considering the tools for. The squashing has a bias against tools which aim to be very fast as if nobody should be interested in that time range and no tool developer should aim to improve there.

Of course a fully log-scale plot can't go all the way to 0s, but neither do the tools we currently have (and if someone makes a tool that can solve tasks in 0.01s, we should show this!). BenchExec's quantile plots automatically use the minimum value in data for the log scale lower bound and that has been fine.

Or if a fully log-scale quantile plot really isn't reasonable, then I would at least vouch for making the height between 0s–1s equally large as the other heights (e.g. between 1s–10s).

For example, here's the symlog-style quantile plot for SV-COMP 2024 Overall:

Click to expand

Simmo Saan pushed to project branch website-sorted at SoSy-Lab / Benchmarking / Formal-Methods Tools

2025-12-29T15:38:52Z

Simmo Saan (287400c9) at 29 Dec 15:38

Add REUSE for .gitignore

Simmo Saan opened merge request !1018: Consistently sort everything on website at SoSy-Lab / Benchmarking / Formal-Methods Tools

2025-12-29T15:32:43Z

Mainly this sorts the "Input Languages" page which was completely unsorted before. Additionally this sorts other pages by str.lower (like "Tools" already were) because some techniques (like k-Induction) and frameworks (like cvc5) are lowercase.

Finally this adds the generated HTML to .gitignore.

Simmo Saan pushed new project branch website-sorted at SoSy-Lab / Benchmarking / Formal-Methods Tools

2025-12-29T15:28:46Z

Simmo Saan (d5ffde00) at 29 Dec 15:28

Add generated website HTML to .gitignore

... and 1 more commit

Simmo Saan closed merge request !1723: Draft: Invalid task definition procedure for SV-COMP 2026 at SoSy-Lab / Benchmarking / SV-Benchmarks

2025-12-23T16:41:21Z

Following the invalidation procedures from past years (https://gitlab.com/sosy-lab/sv-comp/sv-comp-2024/-/issues/91#note_1681023029 and !1606), I've now done it for this year. This required some fixes to the fingerprinter: competition-scripts!213.

This is not yet final and will have to be updated after remaining SV-COMP 2026 MRs are merged, but I wanted to already try out the procedure. Currently it also includes all the fingerprint CSV files for reference. Eventually the git history should be cleaned up to merge this into some svcomp26 final tag preparation branch (only adding a single set file).

The clean version is copied to !1732 for merging. This MR remains for reference.

TODO

Fix path beginnings in set file.
Add SV-LIB: most tasks were touched after deadline, so this may have to be done more manually.
Document entire procedure, including how and why some invalidations were overridden.
Redo after remaining SV-COMP 2026 MRs.

Procedure

1. Backported changes which should not invalidate tasks

I created a new branch (svcomp26-freeze-for-invalid-taskdefs) based on the svcomp26-freeze tag. To that branch, I backported the following changes made after svcomp26-freeze:

!1724. (Unnecessary in the end because all SV-LIB tasks are kept valid.) It should not invalidate tasks because it touched all SV-LIB task definitions, but not the actual programs nor the intended expected verdicts. This is also necessary for BenchExec (and the task fingerprinter based on it, competition-scripts!213) to run on the tasks at all.

Since there are other SV-LIB task changes after freezing, cherry-picking 25c9d260 would not work cleanly. Therefore, to backport I instead replicated that fix using the shell command from that commit message:
```
find ./sv-lib -type f -name '*.yml' -exec sed -i 's/expected_verdict: correct/expected_verdict: true/g; s/expected_verdict: incorrect/expected_verdict: false/g' {} +
```
!1693. (Unnecessary in the end because all SV-LIB tasks are kept valid.) It should not invalidate tasks because it touched all SV-LIB programs, but only regarding the witness options, not the actual program contents.

To backport, I cherry-picked 04f63794 and 77104c39.
!1698. It should not invalidate tasks because it only added documentation to the Verifier class used by most (if not all) Java tasks.

To backport, I cherry-picked ddaef1c2.

2. Generated task fingerprints

Using the fixed task fingerprinter (competition-scripts!213), I generated task fingerprint CSV files both on the backports branch (svcomp26-freeze-for-invalid-taskdefs) and this branch (svcomp26-invalid-taskdefs, based latest main). On both branches, I generated fingerprints using benchmark definitions of two tools:

cpachecker, because it participates in all C and SV-LIB categories. Using this shell command in sv-benchmarks:
```
PYTHONPATH=~/dev/benchexec python3 ../bench-defs/scripts/execute_runs/fingerprinter.py ../bench-defs/benchmark-defs/cpachecker.xml --no-container
```
This creates a cpachecker.fingerprints.csv file.

gdart, because it participates in all Java categories. Using this shell command in sv-benchmarks:

PYTHONPATH=~/dev/benchexec python3 ../bench-defs/scripts/execute_runs/fingerprinter.py ../bench-defs/benchmark-defs/gdart.xml --no-container

This creates a gdart.fingerprints.csv file.

The specific tools don't matter (nor are needed to generate fingerprints), just benchmark definitions covering all categories are needed. These commands are based on https://gitlab.com/sosy-lab/sv-comp/sv-comp-2024/-/issues/91#note_1681023029, adapted to the corresponding paths on my system.

3. Generated `Invalid-TaskDefs.set`

This consists of the following substeps:

Merged svcomp26-freeze-for-invalid-taskdefs into svcomp26-invalid-taskdefs. This causes merge conflicts on the fingerprint CSV files. I resolved this by keeping the ones from svcomp26-freeze-for-invalid-taskdefs as cpachecker.freeze-fingerprints.csv and gdart.freeze-fingerprints.csv, and the ones from svcomp26-invalid-taskdefs with their original names.

Generated Invalid-TaskDefs.set using the following two shell commands:

diff cpachecker.freeze-fingerprints.csv cpachecker.fingerprints.csv | grep "^\(<\|>\)" | cut -f 1 | sed -e "s/> //" -e "s/< //" | sort -u > Invalid-TaskDefs.set
diff gdart.freeze-fingerprints.csv gdart.fingerprints.csv | grep "^\(<\|>\)" | cut -f 1 | sed -e "s/> //" -e "s/< //" | sort -u >> Invalid-TaskDefs.set

These commands are based on https://gitlab.com/sosy-lab/sv-comp/sv-comp-2024/-/issues/91#note_1681023029, adapted to the filenames here.

Fixed relative paths (removing the prefix ../bench-defs/sv-benchmarks/) in Invalid-TaskDefs.set and sorted (again after appending cpachecker and gdart ones) using the following shell command:
```
sed -e 's|^../bench-defs/sv-benchmarks/||' Invalid-TaskDefs.set | sort | sponge Invalid-TaskDefs.set
```

4. Manually excluded more changes which should not invalidate tasks

These exclusions could not have been done by the earlier backporting:

!1681 fixed a class constructor in a file used under input_files: for all java/objects/ tasks. However, not all of them ever invoke that constructor.

Thus, I manually removed unaffected java/objects/ tasks from Invalid-TaskDefs.set, leaving only the affected ones also listed in !1717.
As discussed in !1723 (comment 2962396630), all SV-LIB tasks are kept valid.

Thus, instead of backporting !1687, I manually removed all sv-lib/ tasks from Invalid-TaskDefs.set.

5. Cross-referenced fingerprints with git

Alternatively, I generated Invalid-TaskDefs-git.set by relying on git diff using the following shell command:

git diff svcomp26-freeze-for-invalid-taskdefs svcomp26-invalid-taskdefs --name-only --no-renames | sed -nr 's/(\/Main)?\.(yml|i|c|java|svlib)$/.yml/p' | sort -u > Invalid-TaskDefs-git.set

It uses --no-renames as per !1723 (comment 2932694891) and attributes all file changes to the corresponding .yml file.

The cross-check using

diff Invalid-TaskDefs.set Invalid-TaskDefs-git.set

outputs

211a212
> .gitlab-ci.yml
233,237c234
< java/objects/objects01.yml
< java/objects/objects02.yml
< java/objects/objects05.yml
< java/objects/objects08.yml
< java/objects/objects09.yml
---
> java/objects/classes/svcomp/objects/A.yml
238a236,304
> sv-lib/c-translated/loop-acceleration/array3.yml
> sv-lib/c-translated/loop-acceleration/const_1-2_abstracted.yml
> sv-lib/c-translated/loop-invariants/linear-inequality-inv-a.1.yml
> sv-lib/c-translated/loop-invariants/linear-inequality-inv-a.2.yml
> sv-lib/c-translated/loop-invariants/linear-inequality-inv-a.yml
> sv-lib/c-translated/loop-invgen/fragtest_simple.yml
> sv-lib/c-translated/loop-invgen/heapsort.yml
> sv-lib/c-translated/loop-invgen/large_const.yml
> sv-lib/c-translated/loop-invgen/nest-if3.yml
> sv-lib/c-translated/loop-lit/bh2017-ex1-poly.yml
> sv-lib/c-translated/loop-lit/gcnr2008.yml
> sv-lib/c-translated/loops/count_up_down-1.1.yml
> sv-lib/c-translated/loops/count_up_down-1.2.yml
> sv-lib/c-translated/loops/count_up_down-1.yml
> sv-lib/c-translated/loops-crafted-1/in-de20.yml
> sv-lib/c-translated/loops-crafted-1/in-de31.1.yml
> sv-lib/c-translated/loops-crafted-1/in-de31.2.yml
> sv-lib/c-translated/loops-crafted-1/in-de31.yml
> sv-lib/c-translated/loops-crafted-1/in-de32.1.yml
> sv-lib/c-translated/loops-crafted-1/in-de32.2.yml
> sv-lib/c-translated/loops-crafted-1/in-de32.yml
> sv-lib/c-translated/loops-crafted-1/in-de41.1.yml
> sv-lib/c-translated/loops-crafted-1/in-de41.2.yml
> sv-lib/c-translated/loops-crafted-1/in-de41.yml
> sv-lib/c-translated/loops-crafted-1/in-de42.1.yml
> sv-lib/c-translated/loops-crafted-1/in-de42.2.yml
> sv-lib/c-translated/loops-crafted-1/in-de42.yml
> sv-lib/c-translated/loops-crafted-1/in-de61.1.yml
> sv-lib/c-translated/loops-crafted-1/in-de61.2.yml
> sv-lib/c-translated/loops-crafted-1/in-de61.yml
> sv-lib/c-translated/loops-crafted-1/in-de62.1.yml
> sv-lib/c-translated/loops-crafted-1/in-de62.2.yml
> sv-lib/c-translated/loops-crafted-1/in-de62.yml
> sv-lib/c-translated/loops-crafted-1/loopv3.yml
> sv-lib/c-translated/loops/sum01-1.yml
> sv-lib/c-translated/loops/sum01_bug02_sum01_bug02_base.case.yml
> sv-lib/c-translated/loops/sum_array-1.yml
> sv-lib/c-translated/loops/sum_array-2-1.yml
> sv-lib/c-translated/loops/sum_array-2-2.yml
> sv-lib/c-translated/loops/sum_array-2.yml
> sv-lib/c-translated/loops/terminator_03-1.yml
> sv-lib/c-translated/loop-zilu/benchmark01_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark06_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark12_linear.yml
> sv-lib/c-translated/loop-zilu/benchmark13_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark15_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark16_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark21_disjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark22_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark24_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark27_linear.yml
> sv-lib/c-translated/loop-zilu/benchmark28_linear.yml
> sv-lib/c-translated/loop-zilu/benchmark29_linear.yml
> sv-lib/c-translated/loop-zilu/benchmark30_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark31_disjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark36_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark38_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark39_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark40_polynomial.yml
> sv-lib/c-translated/loop-zilu/benchmark41_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark42_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark44_disjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark45_disjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark46_disjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark47_linear.yml
> sv-lib/c-translated/loop-zilu/benchmark48_linear.yml
> sv-lib/c-translated/loop-zilu/benchmark49_linear.yml
> sv-lib/c-translated/loop-zilu/benchmark50_linear.yml
> sv-lib/c-translated/loop-zilu/benchmark53_polynomial.yml

which is in line with the changes of !1681 and !1687 mentioned above.

Simmo Saan commented on merge request !1723 at SoSy-Lab / Benchmarking / SV-Benchmarks

2025-12-23T15:58:48Z

Since there were no strong opinions either way, but a preference towards keeping all SV-LIB tasks valid, I've updated this MR and !1732 (merged) to just remove sv-lib/ tasks from the invalidated set. None of them were touched after svcomp26-rc.1.

Simmo Saan pushed to project branch svcomp26-invalid-taskdefs-clean at SoSy-Lab / Benchmarking / SV-Benchmarks

2025-12-23T15:56:58Z

Simmo Saan (456cd964) at 23 Dec 15:56

Exclude all SV-LIB tasks from invalidation

Simmo Saan pushed to project branch svcomp26-invalid-taskdefs at SoSy-Lab / Benchmarking / SV-Benchmarks

2025-12-23T15:51:23Z

Simmo Saan (7bddb475) at 23 Dec 15:51

Exclude all SV-LIB tasks from invalidation

Simmo Saan commented on merge request !1732 at SoSy-Lab / Benchmarking / SV-Benchmarks

2025-12-23T09:39:27Z

There is also the undecided matter in !1723 (comment 2962396630). Currently, this invalidates some (but not all) SV-LIB tasks but this is open to decision.

Simmo Saan opened merge request !1732: Invalid task definitions for SV-COMP 2026 at SoSy-Lab / Benchmarking / SV-Benchmarks

2025-12-23T09:34:50Z

This simply copies the Invalid-TaskDefs.set file from !1723 to have a simple and clean git history which can easily be reverted after SV-COMP 2026. See !1723 for all the details.

Simmo Saan pushed new project branch svcomp26-invalid-taskdefs-clean at SoSy-Lab / Benchmarking / SV-Benchmarks

2025-12-23T09:31:40Z

Simmo Saan (fb907c2d) at 23 Dec 09:31

Add Invalid-TaskDefs.set for SV-COMP 2026

Simmo Saan activity

Simmo Saan approved merge request !1068: Fix Ali's surname at SoSy-Lab / Benchmarking / Formal-Methods Tools

Simmo Saan commented on issue #45 at SoSy-Lab / Benchmarking / Competition Scripts

Simmo Saan closed issue #45: Unqualified tools are included in results at SoSy-Lab / Benchmarking / Competition Scripts

Simmo Saan approved merge request !1034: Update Goblint techniques for SV-COMP 2026 at SoSy-Lab / Benchmarking / Formal-Methods Tools

Simmo Saan opened merge request !1034: Update Goblint techniques for SV-COMP 2026 at SoSy-Lab / Benchmarking / Formal-Methods Tools

Simmo Saan pushed new project branch goblint-svcomp26-techniques at SoSy-Lab / Benchmarking / Formal-Methods Tools

Simmo Saan opened issue #45: Unqualified tools are included in results at SoSy-Lab / Benchmarking / Competition Scripts

Simmo Saan commented on merge request !228 at SoSy-Lab / Benchmarking / Competition Scripts

Simmo Saan commented on merge request !228 at SoSy-Lab / Benchmarking / Competition Scripts

Simmo Saan pushed to project branch website-sorted at SoSy-Lab / Benchmarking / Formal-Methods Tools

Simmo Saan opened merge request !1018: Consistently sort everything on website at SoSy-Lab / Benchmarking / Formal-Methods Tools

Simmo Saan pushed new project branch website-sorted at SoSy-Lab / Benchmarking / Formal-Methods Tools

Simmo Saan closed merge request !1723: Draft: Invalid task definition procedure for SV-COMP 2026 at SoSy-Lab / Benchmarking / SV-Benchmarks

TODO

Procedure

1. Backported changes which should not invalidate tasks

2. Generated task fingerprints

3. Generated Invalid-TaskDefs.set

4. Manually excluded more changes which should not invalidate tasks

5. Cross-referenced fingerprints with git

Simmo Saan commented on merge request !1723 at SoSy-Lab / Benchmarking / SV-Benchmarks

Simmo Saan pushed to project branch svcomp26-invalid-taskdefs-clean at SoSy-Lab / Benchmarking / SV-Benchmarks

Simmo Saan pushed to project branch svcomp26-invalid-taskdefs at SoSy-Lab / Benchmarking / SV-Benchmarks

Simmo Saan commented on merge request !1732 at SoSy-Lab / Benchmarking / SV-Benchmarks

Simmo Saan opened merge request !1732: Invalid task definitions for SV-COMP 2026 at SoSy-Lab / Benchmarking / SV-Benchmarks

Simmo Saan pushed new project branch svcomp26-invalid-taskdefs-clean at SoSy-Lab / Benchmarking / SV-Benchmarks

3. Generated `Invalid-TaskDefs.set`