Fixed by 2ed0ad70.
Simmo Saan (434a0056) at 28 Jan 10:09
Update Goblint techniques for SV-COMP 2026
Putting aside the interestingness to potential viewers, which everyone will probably never agree on, there is just the pragmatic reason that fully log-scale is less bunched up at the very low end. The height for 1s–1000s is decreased then, but not in a way that causes other lines to be more bunched up than before.
I suppose a lot of which scale shows the most just happens to depend on the practical characteristics of the tools. For example, the lines for most tools turn quite vertical and don't move much on the x-axis between 800s–900s. Whereas they happen to move a lot between 0s–1s. If there was competition in the 800s–900s range, or if there wasn't in the 0s–1s range, then things might be different.
I now wondered, whether things were different 15 years ago, as in there wasn't any competition below <1s (e.g. because none of the tools were so fast). But actually the quantile plot in the SV-COMP 2012 report is not that different in this regard:
Clicking around the results of early years, I even came across SV-COMP 2014 HeapManipulation which was particularly crowded <1s:
Anyway, don't let this delay any SV-COMP activity. I've voiced my thoughts and this could be discussed again in the future. It's not like the current plots do an injustice to Goblint.
I noticed that f0cf9d20 changes the quantile plots back to having linear scale <1s, as opposed to a fully log-scale plot. I know it was this way in the past with gnuplot, but... I have previously thought about suggesting a fully log-scale plot, so I was pleasantly surprised to see that on the results-page-to-be. With <1s squashed to a thin strip at the bottom, it's impossible to differentiate tools at the very low end.
The competition <1s is no less interesting than the competition >100s, so it's a shame that the plots squash it to be more-or-less irrelevant. Quantile plots rank the tools at all possible time limits, why is 500s more interesting than 0.5s? It entirely depends on what use case someone is considering the tools for. The squashing has a bias against tools which aim to be very fast as if nobody should be interested in that time range and no tool developer should aim to improve there.
Of course a fully log-scale plot can't go all the way to 0s, but neither do the tools we currently have (and if someone makes a tool that can solve tasks in 0.01s, we should show this!). BenchExec's quantile plots automatically use the minimum value in data for the log scale lower bound and that has been fine.
Or if a fully log-scale quantile plot really isn't reasonable, then I would at least vouch for making the height between 0s–1s equally large as the other heights (e.g. between 1s–10s).
For example, here's the symlog-style quantile plot for SV-COMP 2024 Overall:
And here's a fully log-scale version (ignore the stylistic differences):Simmo Saan (287400c9) at 29 Dec 15:38
Add REUSE for .gitignore
Mainly this sorts the "Input Languages" page which was completely unsorted before.
Additionally this sorts other pages by str.lower (like "Tools" already were) because some techniques (like k-Induction) and frameworks (like cvc5) are lowercase.
Finally this adds the generated HTML to .gitignore.
Simmo Saan (d5ffde00) at 29 Dec 15:28
Add generated website HTML to .gitignore
... and 1 more commit
Following the invalidation procedures from past years (https://gitlab.com/sosy-lab/sv-comp/sv-comp-2024/-/issues/91#note_1681023029 and !1606), I've now done it for this year. This required some fixes to the fingerprinter: competition-scripts!213.
This is not yet final and will have to be updated after remaining SV-COMP 2026 MRs are merged, but I wanted to already try out the procedure.
Currently it also includes all the fingerprint CSV files for reference. Eventually the git history should be cleaned up to merge this into some svcomp26 final tag preparation branch (only adding a single set file).
The clean version is copied to !1732 for merging. This MR remains for reference.
I created a new branch (svcomp26-freeze-for-invalid-taskdefs) based on the svcomp26-freeze tag.
To that branch, I backported the following changes made after svcomp26-freeze:
!1724. (Unnecessary in the end because all SV-LIB tasks are kept valid.) It should not invalidate tasks because it touched all SV-LIB task definitions, but not the actual programs nor the intended expected verdicts. This is also necessary for BenchExec (and the task fingerprinter based on it, competition-scripts!213) to run on the tasks at all.
Since there are other SV-LIB task changes after freezing, cherry-picking 25c9d260 would not work cleanly. Therefore, to backport I instead replicated that fix using the shell command from that commit message:
find ./sv-lib -type f -name '*.yml' -exec sed -i 's/expected_verdict: correct/expected_verdict: true/g; s/expected_verdict: incorrect/expected_verdict: false/g' {} +
!1693. (Unnecessary in the end because all SV-LIB tasks are kept valid.) It should not invalidate tasks because it touched all SV-LIB programs, but only regarding the witness options, not the actual program contents.
!1698. It should not invalidate tasks because it only added documentation to the Verifier class used by most (if not all) Java tasks.
To backport, I cherry-picked ddaef1c2.
Using the fixed task fingerprinter (competition-scripts!213), I generated task fingerprint CSV files both on the backports branch (svcomp26-freeze-for-invalid-taskdefs) and this branch (svcomp26-invalid-taskdefs, based latest main).
On both branches, I generated fingerprints using benchmark definitions of two tools:
cpachecker, because it participates in all C and SV-LIB categories. Using this shell command in sv-benchmarks:
PYTHONPATH=~/dev/benchexec python3 ../bench-defs/scripts/execute_runs/fingerprinter.py ../bench-defs/benchmark-defs/cpachecker.xml --no-container
This creates a cpachecker.fingerprints.csv file.
gdart, because it participates in all Java categories. Using this shell command in sv-benchmarks:
PYTHONPATH=~/dev/benchexec python3 ../bench-defs/scripts/execute_runs/fingerprinter.py ../bench-defs/benchmark-defs/gdart.xml --no-container
This creates a gdart.fingerprints.csv file.
The specific tools don't matter (nor are needed to generate fingerprints), just benchmark definitions covering all categories are needed. These commands are based on https://gitlab.com/sosy-lab/sv-comp/sv-comp-2024/-/issues/91#note_1681023029, adapted to the corresponding paths on my system.
Invalid-TaskDefs.set
This consists of the following substeps:
Merged svcomp26-freeze-for-invalid-taskdefs into svcomp26-invalid-taskdefs.
This causes merge conflicts on the fingerprint CSV files. I resolved this by keeping the ones from svcomp26-freeze-for-invalid-taskdefs as cpachecker.freeze-fingerprints.csv and gdart.freeze-fingerprints.csv, and the ones from svcomp26-invalid-taskdefs with their original names.
Generated Invalid-TaskDefs.set using the following two shell commands:
diff cpachecker.freeze-fingerprints.csv cpachecker.fingerprints.csv | grep "^\(<\|>\)" | cut -f 1 | sed -e "s/> //" -e "s/< //" | sort -u > Invalid-TaskDefs.set
diff gdart.freeze-fingerprints.csv gdart.fingerprints.csv | grep "^\(<\|>\)" | cut -f 1 | sed -e "s/> //" -e "s/< //" | sort -u >> Invalid-TaskDefs.set
These commands are based on https://gitlab.com/sosy-lab/sv-comp/sv-comp-2024/-/issues/91#note_1681023029, adapted to the filenames here.
Fixed relative paths (removing the prefix ../bench-defs/sv-benchmarks/) in Invalid-TaskDefs.set and sorted (again after appending cpachecker and gdart ones) using the following shell command:
sed -e 's|^../bench-defs/sv-benchmarks/||' Invalid-TaskDefs.set | sort | sponge Invalid-TaskDefs.set
These exclusions could not have been done by the earlier backporting:
!1681 fixed a class constructor in a file used under input_files: for all java/objects/ tasks. However, not all of them ever invoke that constructor.
Thus, I manually removed unaffected java/objects/ tasks from Invalid-TaskDefs.set, leaving only the affected ones also listed in !1717.
As discussed in !1723 (comment 2962396630), all SV-LIB tasks are kept valid.
Thus, instead of backporting !1687, I manually removed all sv-lib/ tasks from Invalid-TaskDefs.set.
Alternatively, I generated Invalid-TaskDefs-git.set by relying on git diff using the following shell command:
git diff svcomp26-freeze-for-invalid-taskdefs svcomp26-invalid-taskdefs --name-only --no-renames | sed -nr 's/(\/Main)?\.(yml|i|c|java|svlib)$/.yml/p' | sort -u > Invalid-TaskDefs-git.set
It uses --no-renames as per !1723 (comment 2932694891) and attributes all file changes to the corresponding .yml file.
The cross-check using
diff Invalid-TaskDefs.set Invalid-TaskDefs-git.set
outputs
211a212
> .gitlab-ci.yml
233,237c234
< java/objects/objects01.yml
< java/objects/objects02.yml
< java/objects/objects05.yml
< java/objects/objects08.yml
< java/objects/objects09.yml
---
> java/objects/classes/svcomp/objects/A.yml
238a236,304
> sv-lib/c-translated/loop-acceleration/array3.yml
> sv-lib/c-translated/loop-acceleration/const_1-2_abstracted.yml
> sv-lib/c-translated/loop-invariants/linear-inequality-inv-a.1.yml
> sv-lib/c-translated/loop-invariants/linear-inequality-inv-a.2.yml
> sv-lib/c-translated/loop-invariants/linear-inequality-inv-a.yml
> sv-lib/c-translated/loop-invgen/fragtest_simple.yml
> sv-lib/c-translated/loop-invgen/heapsort.yml
> sv-lib/c-translated/loop-invgen/large_const.yml
> sv-lib/c-translated/loop-invgen/nest-if3.yml
> sv-lib/c-translated/loop-lit/bh2017-ex1-poly.yml
> sv-lib/c-translated/loop-lit/gcnr2008.yml
> sv-lib/c-translated/loops/count_up_down-1.1.yml
> sv-lib/c-translated/loops/count_up_down-1.2.yml
> sv-lib/c-translated/loops/count_up_down-1.yml
> sv-lib/c-translated/loops-crafted-1/in-de20.yml
> sv-lib/c-translated/loops-crafted-1/in-de31.1.yml
> sv-lib/c-translated/loops-crafted-1/in-de31.2.yml
> sv-lib/c-translated/loops-crafted-1/in-de31.yml
> sv-lib/c-translated/loops-crafted-1/in-de32.1.yml
> sv-lib/c-translated/loops-crafted-1/in-de32.2.yml
> sv-lib/c-translated/loops-crafted-1/in-de32.yml
> sv-lib/c-translated/loops-crafted-1/in-de41.1.yml
> sv-lib/c-translated/loops-crafted-1/in-de41.2.yml
> sv-lib/c-translated/loops-crafted-1/in-de41.yml
> sv-lib/c-translated/loops-crafted-1/in-de42.1.yml
> sv-lib/c-translated/loops-crafted-1/in-de42.2.yml
> sv-lib/c-translated/loops-crafted-1/in-de42.yml
> sv-lib/c-translated/loops-crafted-1/in-de61.1.yml
> sv-lib/c-translated/loops-crafted-1/in-de61.2.yml
> sv-lib/c-translated/loops-crafted-1/in-de61.yml
> sv-lib/c-translated/loops-crafted-1/in-de62.1.yml
> sv-lib/c-translated/loops-crafted-1/in-de62.2.yml
> sv-lib/c-translated/loops-crafted-1/in-de62.yml
> sv-lib/c-translated/loops-crafted-1/loopv3.yml
> sv-lib/c-translated/loops/sum01-1.yml
> sv-lib/c-translated/loops/sum01_bug02_sum01_bug02_base.case.yml
> sv-lib/c-translated/loops/sum_array-1.yml
> sv-lib/c-translated/loops/sum_array-2-1.yml
> sv-lib/c-translated/loops/sum_array-2-2.yml
> sv-lib/c-translated/loops/sum_array-2.yml
> sv-lib/c-translated/loops/terminator_03-1.yml
> sv-lib/c-translated/loop-zilu/benchmark01_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark06_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark12_linear.yml
> sv-lib/c-translated/loop-zilu/benchmark13_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark15_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark16_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark21_disjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark22_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark24_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark27_linear.yml
> sv-lib/c-translated/loop-zilu/benchmark28_linear.yml
> sv-lib/c-translated/loop-zilu/benchmark29_linear.yml
> sv-lib/c-translated/loop-zilu/benchmark30_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark31_disjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark36_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark38_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark39_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark40_polynomial.yml
> sv-lib/c-translated/loop-zilu/benchmark41_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark42_conjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark44_disjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark45_disjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark46_disjunctive.yml
> sv-lib/c-translated/loop-zilu/benchmark47_linear.yml
> sv-lib/c-translated/loop-zilu/benchmark48_linear.yml
> sv-lib/c-translated/loop-zilu/benchmark49_linear.yml
> sv-lib/c-translated/loop-zilu/benchmark50_linear.yml
> sv-lib/c-translated/loop-zilu/benchmark53_polynomial.yml
which is in line with the changes of !1681 and !1687 mentioned above.
Since there were no strong opinions either way, but a preference towards keeping all SV-LIB tasks valid, I've updated this MR and !1732 (merged) to just remove sv-lib/ tasks from the invalidated set.
None of them were touched after svcomp26-rc.1.
Simmo Saan (456cd964) at 23 Dec 15:56
Exclude all SV-LIB tasks from invalidation
Simmo Saan (7bddb475) at 23 Dec 15:51
Exclude all SV-LIB tasks from invalidation
There is also the undecided matter in !1723 (comment 2962396630). Currently, this invalidates some (but not all) SV-LIB tasks but this is open to decision.
Simmo Saan (fb907c2d) at 23 Dec 09:31
Add Invalid-TaskDefs.set for SV-COMP 2026