Simmo Saan activity https://gitlab.com/sim642 2026-02-16T17:57:05Z tag:gitlab.com,2026-02-16:5109598235 Simmo Saan approved merge request !1068: Fix Ali's surname at SoSy-Lab / Benchmarking / Formal-Methods Tools 2026-02-16T17:57:05Z sim642 Simmo Saan
tag:gitlab.com,2026-02-04:5067780267 Simmo Saan commented on issue #45 at SoSy-Lab / Benchmarking / Competition Scripts 2026-02-04T15:26:43Z sim642 Simmo Saan

Fixed by 2ed0ad70.

tag:gitlab.com,2026-02-04:5067780157 Simmo Saan closed issue #45: Unqualified tools are included in results at SoSy-Lab / Benchmarking / Competition Scripts 2026-02-04T15:26:42Z sim642 Simmo Saan tag:gitlab.com,2026-01-28:5041601544 Simmo Saan approved merge request !1034: Update Goblint techniques for SV-COMP 2026 at SoSy-Lab / Benchmarking / Formal-Methods Tools 2026-01-28T10:11:59Z sim642 Simmo Saan
tag:gitlab.com,2026-01-28:5041599478 Simmo Saan opened merge request !1034: Update Goblint techniques for SV-COMP 2026 at SoSy-Lab / Benchmarking / Formal-Methods Tools 2026-01-28T10:11:30Z sim642 Simmo Saan
tag:gitlab.com,2026-01-28:5041588853 Simmo Saan pushed new project branch goblint-svcomp26-techniques at SoSy-Lab / Benchmarking / Formal-Methods Tools 2026-01-28T10:09:03Z sim642 Simmo Saan

Simmo Saan (434a0056) at 28 Jan 10:09

Update Goblint techniques for SV-COMP 2026

tag:gitlab.com,2026-01-19:5008966267 Simmo Saan opened issue #45: Unqualified tools are included in results at SoSy-Lab / Benchmarking / Competition Scripts 2026-01-19T11:44:25Z sim642 Simmo Saan tag:gitlab.com,2026-01-12:4983559291 Simmo Saan commented on merge request !228 at SoSy-Lab / Benchmarking / Competition Scripts 2026-01-12T10:59:22Z sim642 Simmo Saan

Putting aside the interestingness to potential viewers, which everyone will probably never agree on, there is just the pragmatic reason that fully log-scale is less bunched up at the very low end. The height for 1s–1000s is decreased then, but not in a way that causes other lines to be more bunched up than before.

I suppose a lot of which scale shows the most just happens to depend on the practical characteristics of the tools. For example, the lines for most tools turn quite vertical and don't move much on the x-axis between 800s–900s. Whereas they happen to move a lot between 0s–1s. If there was competition in the 800s–900s range, or if there wasn't in the 0s–1s range, then things might be different.

I now wondered, whether things were different 15 years ago, as in there wasn't any competition below <1s (e.g. because none of the tools were so fast). But actually the quantile plot in the SV-COMP 2012 report is not that different in this regard:

Click to expand image

Clicking around the results of early years, I even came across SV-COMP 2014 HeapManipulation which was particularly crowded <1s:

Click to expand image

Anyway, don't let this delay any SV-COMP activity. I've voiced my thoughts and this could be discussed again in the future. It's not like the current plots do an injustice to Goblint.

tag:gitlab.com,2025-12-30:4951682884 Simmo Saan commented on merge request !228 at SoSy-Lab / Benchmarking / Competition Scripts 2025-12-30T15:28:11Z sim642 Simmo Saan

I noticed that f0cf9d20 changes the quantile plots back to having linear scale <1s, as opposed to a fully log-scale plot. I know it was this way in the past with gnuplot, but... I have previously thought about suggesting a fully log-scale plot, so I was pleasantly surprised to see that on the results-page-to-be. With <1s squashed to a thin strip at the bottom, it's impossible to differentiate tools at the very low end.

The competition <1s is no less interesting than the competition >100s, so it's a shame that the plots squash it to be more-or-less irrelevant. Quantile plots rank the tools at all possible time limits, why is 500s more interesting than 0.5s? It entirely depends on what use case someone is considering the tools for. The squashing has a bias against tools which aim to be very fast as if nobody should be interested in that time range and no tool developer should aim to improve there.

Of course a fully log-scale plot can't go all the way to 0s, but neither do the tools we currently have (and if someone makes a tool that can solve tasks in 0.01s, we should show this!). BenchExec's quantile plots automatically use the minimum value in data for the log scale lower bound and that has been fine.

Or if a fully log-scale quantile plot really isn't reasonable, then I would at least vouch for making the height between 0s–1s equally large as the other heights (e.g. between 1s–10s).

For example, here's the symlog-style quantile plot for SV-COMP 2024 Overall:

Click to expand image
And here's a fully log-scale version (ignore the stylistic differences):
Click to expand image
tag:gitlab.com,2025-12-29:4949102033 Simmo Saan pushed to project branch website-sorted at SoSy-Lab / Benchmarking / Formal-Methods Tools 2025-12-29T15:38:52Z sim642 Simmo Saan

Simmo Saan (287400c9) at 29 Dec 15:38

Add REUSE for .gitignore

tag:gitlab.com,2025-12-29:4949089100 Simmo Saan opened merge request !1018: Consistently sort everything on website at SoSy-Lab / Benchmarking / Formal-Methods Tools 2025-12-29T15:32:43Z sim642 Simmo Saan

Mainly this sorts the "Input Languages" page which was completely unsorted before. Additionally this sorts other pages by str.lower (like "Tools" already were) because some techniques (like k-Induction) and frameworks (like cvc5) are lowercase.

Finally this adds the generated HTML to .gitignore.

tag:gitlab.com,2025-12-29:4949080812 Simmo Saan pushed new project branch website-sorted at SoSy-Lab / Benchmarking / Formal-Methods Tools 2025-12-29T15:28:46Z sim642 Simmo Saan

Simmo Saan (d5ffde00) at 29 Dec 15:28

Add generated website HTML to .gitignore

... and 1 more commit

tag:gitlab.com,2025-12-23:4939481601 Simmo Saan closed merge request !1723: Draft: Invalid task definition procedure for SV-COMP 2026 at SoSy-Lab / Benchmarking / SV-Benchmarks 2025-12-23T16:41:21Z sim642 Simmo Saan

Following the invalidation procedures from past years (https://gitlab.com/sosy-lab/sv-comp/sv-comp-2024/-/issues/91#note_1681023029 and !1606), I've now done it for this year. This required some fixes to the fingerprinter: competition-scripts!213.

This is not yet final and will have to be updated after remaining SV-COMP 2026 MRs are merged, but I wanted to already try out the procedure. Currently it also includes all the fingerprint CSV files for reference. Eventually the git history should be cleaned up to merge this into some svcomp26 final tag preparation branch (only adding a single set file).

The clean version is copied to !1732 for merging. This MR remains for reference.

TODO

  • Fix path beginnings in set file.
  • Add SV-LIB: most tasks were touched after deadline, so this may have to be done more manually.
  • Document entire procedure, including how and why some invalidations were overridden.
  • Redo after remaining SV-COMP 2026 MRs.

Procedure

1. Backported changes which should not invalidate tasks

I created a new branch (svcomp26-freeze-for-invalid-taskdefs) based on the svcomp26-freeze tag. To that branch, I backported the following changes made after svcomp26-freeze:

  1. !1724. (Unnecessary in the end because all SV-LIB tasks are kept valid.) It should not invalidate tasks because it touched all SV-LIB task definitions, but not the actual programs nor the intended expected verdicts. This is also necessary for BenchExec (and the task fingerprinter based on it, competition-scripts!213) to run on the tasks at all.

    Since there are other SV-LIB task changes after freezing, cherry-picking 25c9d260 would not work cleanly. Therefore, to backport I instead replicated that fix using the shell command from that commit message:

    find ./sv-lib -type f -name '*.yml' -exec sed -i 's/expected_verdict: correct/expected_verdict: true/g; s/expected_verdict: incorrect/expected_verdict: false/g' {} +
  2. !1693. (Unnecessary in the end because all SV-LIB tasks are kept valid.) It should not invalidate tasks because it touched all SV-LIB programs, but only regarding the witness options, not the actual program contents.

    To backport, I cherry-picked 04f63794 and 77104c39.

  3. !1698. It should not invalidate tasks because it only added documentation to the Verifier class used by most (if not all) Java tasks.

    To backport, I cherry-picked ddaef1c2.

2. Generated task fingerprints

Using the fixed task fingerprinter (competition-scripts!213), I generated task fingerprint CSV files both on the backports branch (svcomp26-freeze-for-invalid-taskdefs) and this branch (svcomp26-invalid-taskdefs, based latest main). On both branches, I generated fingerprints using benchmark definitions of two tools:

  1. cpachecker, because it participates in all C and SV-LIB categories. Using this shell command in sv-benchmarks:

    PYTHONPATH=~/dev/benchexec python3 ../bench-defs/scripts/execute_runs/fingerprinter.py ../bench-defs/benchmark-defs/cpachecker.xml --no-container

    This creates a cpachecker.fingerprints.csv file.

  2. gdart, because it participates in all Java categories. Using this shell command in sv-benchmarks:

    PYTHONPATH=~/dev/benchexec python3 ../bench-defs/scripts/execute_runs/fingerprinter.py ../bench-defs/benchmark-defs/gdart.xml --no-container

    This creates a gdart.fingerprints.csv file.

The specific tools don't matter (nor are needed to generate fingerprints), just benchmark definitions covering all categories are needed. These commands are based on https://gitlab.com/sosy-lab/sv-comp/sv-comp-2024/-/issues/91#note_1681023029, adapted to the corresponding paths on my system.

3. Generated Invalid-TaskDefs.set

This consists of the following substeps:

  1. Merged svcomp26-freeze-for-invalid-taskdefs into svcomp26-invalid-taskdefs. This causes merge conflicts on the fingerprint CSV files. I resolved this by keeping the ones from svcomp26-freeze-for-invalid-taskdefs as cpachecker.freeze-fingerprints.csv and gdart.freeze-fingerprints.csv, and the ones from svcomp26-invalid-taskdefs with their original names.

  2. Generated Invalid-TaskDefs.set using the following two shell commands:

    diff cpachecker.freeze-fingerprints.csv cpachecker.fingerprints.csv | grep "^\(<\|>\)" | cut -f 1 | sed -e "s/> //" -e "s/< //" | sort -u > Invalid-TaskDefs.set
    diff gdart.freeze-fingerprints.csv gdart.fingerprints.csv | grep "^\(<\|>\)" | cut -f 1 | sed -e "s/> //" -e "s/< //" | sort -u >> Invalid-TaskDefs.set

    These commands are based on https://gitlab.com/sosy-lab/sv-comp/sv-comp-2024/-/issues/91#note_1681023029, adapted to the filenames here.

  3. Fixed relative paths (removing the prefix ../bench-defs/sv-benchmarks/) in Invalid-TaskDefs.set and sorted (again after appending cpachecker and gdart ones) using the following shell command:

    sed -e 's|^../bench-defs/sv-benchmarks/||' Invalid-TaskDefs.set | sort | sponge Invalid-TaskDefs.set

4. Manually excluded more changes which should not invalidate tasks

These exclusions could not have been done by the earlier backporting:

  1. !1681 fixed a class constructor in a file used under input_files: for all java/objects/ tasks. However, not all of them ever invoke that constructor.

    Thus, I manually removed unaffected java/objects/ tasks from Invalid-TaskDefs.set, leaving only the affected ones also listed in !1717.

  2. As discussed in !1723 (comment 2962396630), all SV-LIB tasks are kept valid.

    Thus, instead of backporting !1687, I manually removed all sv-lib/ tasks from Invalid-TaskDefs.set.

5. Cross-referenced fingerprints with git

Alternatively, I generated Invalid-TaskDefs-git.set by relying on git diff using the following shell command:

git diff svcomp26-freeze-for-invalid-taskdefs svcomp26-invalid-taskdefs --name-only --no-renames | sed -nr 's/(\/Main)?\.(yml|i|c|java|svlib)$/.yml/p' | sort -u > Invalid-TaskDefs-git.set

It uses --no-renames as per !1723 (comment 2932694891) and attributes all file changes to the corresponding .yml file.

The cross-check using

diff Invalid-TaskDefs.set Invalid-TaskDefs-git.set

outputs

211a212
> .gitlab-ci.yml
233,237c234
< java/objects/objects01.yml
< java/objects/objects02.yml
< java/objects/objects05.yml
< java/objects/objects08.yml
< java/objects/objects09.yml
---
> java/objects/classes/svcomp/objects/A.yml
238a236,304
> sv-lib/c-translated/loop-acceleration/array3.yml
> sv-lib/c-translated/loop-acceleration/const_1-2_abstracted.yml
> sv-lib/c-translated/loop-invariants/linear-inequality-inv-a.1.yml
> sv-lib/c-translated/loop-invariants/linear-inequality-inv-a.2.yml
> sv-lib/c-translated/loop-invariants/linear-inequality-inv-a.yml
> sv-lib/c-translated/loop-invgen/fragtest_simple.yml
> sv-lib/c-translated/loop-invgen/heapsort.yml
> sv-lib/c-translated/loop-invgen/large_const.yml
> sv-lib/c-translated/loop-invgen/nest-if3.yml
> sv-lib/c-translated/loop-lit/bh2017-ex1-poly.yml
> sv-lib/c-translated/loop-lit/gcnr2008.yml
> sv-lib/c-translated/loops/count_up_down-1.1.yml
> sv-lib/c-translated/loops/count_up_down-1.2.yml
> sv-lib/c-translated/loops/count_up_down-1.yml
> sv-lib/c-translated/loops-crafted-1/in-de20.yml
> sv-lib/c-translated/loops-crafted-1/in-de31.1.yml
> sv-lib/c-translated/loops-crafted-1/in-de31.2.yml
> sv-lib/c-translated/loops-crafted-1/in-de31.yml
> sv-lib/c-translated/loops-crafted-1/in-de32.1.yml
> sv-lib/c-translated/loops-crafted-1/in-de32.2.yml
> sv-lib/c-translated/loops-crafted-1/in-de32.yml
> sv-lib/c-translated/loops-crafted-1/in-de41.1.yml
> sv-lib/c-translated/loops-crafted-1/in-de41.2.yml
> sv-lib/c-translated/loops-crafted-1/in-de41.yml
> sv-lib/c-translated/loops-crafted-1/in-de42.1.yml
> sv-lib/c-translated/loops-crafted-1/in-de42.2.yml
> sv-lib/c-translated/loops-crafted-1/in-de42.yml
> sv-lib/c-translated/loops-crafted-1/in-de61.1.yml
> sv-lib/c-translated/loops-crafted-1/in-de61.2.yml
> sv-lib/c-translated/loops-crafted-1/in-de61.yml
> sv-lib/c-translated/loops-crafted-1/in-de62.1.yml
> sv-lib/c-translated/loops-crafted-1/in-de62.2.yml
> sv-lib/c-translated/loops-crafted-1/in-de62.yml
> sv-lib/c-translated/loops-crafted-1/loopv3.yml
> sv-lib/c-translated/loops/sum01-1.yml
> sv-lib/c-translated/loops/sum01_bug02_sum01_bug02_base.case.yml
> sv-lib/c-translated/loops/sum_array-1.yml
> sv-lib/c-translated/loops/sum_array-2-1.yml
> sv-lib/c-translated/loops/sum_array-2-2.yml
> sv-lib/c-translated/loops/sum_array-2.yml
> sv-lib/c-translated/loops/terminator_03-1.yml
> sv-lib/c-translated/loop-zilu/benchmark01_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark06_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark12_linear.yml
> sv-lib/c-translated/loop-zilu/benchmark13_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark15_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark16_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark21_disjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark22_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark24_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark27_linear.yml
> sv-lib/c-translated/loop-zilu/benchmark28_linear.yml
> sv-lib/c-translated/loop-zilu/benchmark29_linear.yml
> sv-lib/c-translated/loop-zilu/benchmark30_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark31_disjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark36_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark38_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark39_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark40_polynomial.yml
> sv-lib/c-translated/loop-zilu/benchmark41_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark42_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark44_disjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark45_disjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark46_disjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark47_linear.yml
> sv-lib/c-translated/loop-zilu/benchmark48_linear.yml
> sv-lib/c-translated/loop-zilu/benchmark49_linear.yml
> sv-lib/c-translated/loop-zilu/benchmark50_linear.yml
> sv-lib/c-translated/loop-zilu/benchmark53_polynomial.yml

which is in line with the changes of !1681 and !1687 mentioned above.

tag:gitlab.com,2025-12-23:4939364144 Simmo Saan commented on merge request !1723 at SoSy-Lab / Benchmarking / SV-Benchmarks 2025-12-23T15:58:48Z sim642 Simmo Saan

Since there were no strong opinions either way, but a preference towards keeping all SV-LIB tasks valid, I've updated this MR and !1732 (merged) to just remove sv-lib/ tasks from the invalidated set. None of them were touched after svcomp26-rc.1.

tag:gitlab.com,2025-12-23:4939359649 Simmo Saan pushed to project branch svcomp26-invalid-taskdefs-clean at SoSy-Lab / Benchmarking / SV-Benchmarks 2025-12-23T15:56:58Z sim642 Simmo Saan

Simmo Saan (456cd964) at 23 Dec 15:56

Exclude all SV-LIB tasks from invalidation

tag:gitlab.com,2025-12-23:4939345139 Simmo Saan pushed to project branch svcomp26-invalid-taskdefs at SoSy-Lab / Benchmarking / SV-Benchmarks 2025-12-23T15:51:23Z sim642 Simmo Saan

Simmo Saan (7bddb475) at 23 Dec 15:51

Exclude all SV-LIB tasks from invalidation

tag:gitlab.com,2025-12-23:4938239868 Simmo Saan commented on merge request !1732 at SoSy-Lab / Benchmarking / SV-Benchmarks 2025-12-23T09:39:27Z sim642 Simmo Saan

There is also the undecided matter in !1723 (comment 2962396630). Currently, this invalidates some (but not all) SV-LIB tasks but this is open to decision.

tag:gitlab.com,2025-12-23:4938223203 Simmo Saan opened merge request !1732: Invalid task definitions for SV-COMP 2026 at SoSy-Lab / Benchmarking / SV-Benchmarks 2025-12-23T09:34:50Z sim642 Simmo Saan

This simply copies the Invalid-TaskDefs.set file from !1723 to have a simple and clean git history which can easily be reverted after SV-COMP 2026. See !1723 for all the details.

tag:gitlab.com,2025-12-23:4938212254 Simmo Saan pushed new project branch svcomp26-invalid-taskdefs-clean at SoSy-Lab / Benchmarking / SV-Benchmarks 2025-12-23T09:31:40Z sim642 Simmo Saan

Simmo Saan (fb907c2d) at 23 Dec 09:31

Add Invalid-TaskDefs.set for SV-COMP 2026