Skip to content

Refactor LSet to use C++ std::multimap#1301

Draft
BrentBaccala wants to merge 10 commits intoSingular:spielwiesefrom
BrentBaccala:LSet
Draft

Refactor LSet to use C++ std::multimap#1301
BrentBaccala wants to merge 10 commits intoSingular:spielwiesefrom
BrentBaccala:LSet

Conversation

@BrentBaccala
Copy link
Copy Markdown
Contributor

Refactor LSet to use C++ std::multimap

This PR replaces the hand-coded array-based implementation of LSet (the lazy set data structure in Groebner basis computation) with a C++ std::multimap container wrapped in a custom writable_set class.

Key Changes

  • New data structure: Introduced writable_set<LObject, CompareLObject> in kernel/GBEngine/writable_set.h as a wrapper around std::multimap
  • Simplified API: Replaced manual array management (enterL, deleteInL, array indexing) with modern C++ container methods (push, pop, top, empty) and a few non-standard methods (would_be_top, pop_and_erase, reorder)
  • Improved comparator handling: Converted posInL* functions (position-finding functions) to compareL* functions that return -1/0/1 for use with the multimap's ordering
  • More abstract iteration: Use iterators instead of array indices to traverse LSet containers
  • Explicit initialization of strategy object: Required (instead of memset 0) since LSet can no longer be zero-initialized
  • FIFO/LIFO support: Added sequence numbering to maintain FIFO or LIFO ordering for equal elements depending on the comparator (in the older code, some comparators were FIFO and some were LIFO)

Files Modified

  • 13 files changed with 1,829 insertions and 1,335 deletions
  • Core changes in kernel/GBEngine/kutil.{cc,h}, kernel/GBEngine/kInline.h
  • Updates throughout GB engine: kstd1.cc, kstd2.cc, kstdfac.cc, khstd.cc, gr_kstd2.cc, sca.cc, kverify.cc
  • Module updates: customstd.cc, std_wrapper.cc

Benefits

  • More maintainable code using standard library containers
  • Faster insertion operation on large LSets
  • Cleaner separation between ordering logic and container management

The PR is split into two commits to make the first (and primary) commit easier to read. git doesn't format the second commit as well, but all the second commit does is remove the unused posInL* functions. The primary commit is a squashed version of a much longer commit chain from https://github.com/BrentBaccala/Singular/tree/Lqueue

All core testsuite tests (Short/ok_s.lst, Long/ok_l.lst, Old/universal.lst, Plural/short.lst, Plural/dmod.lst, Plural/long.lst, Buch/buch.lst) run and pass without modification.

Some of the debugging messages have changed, due to the difficulty in finding an element's index in the LSet now that we use a tree instead of an array.

@BrentBaccala
Copy link
Copy Markdown
Contributor Author

Hans Schönemann suggested some benchmarks, and I asked Claude to generate a script to run them. Here are some results, 5 runs each on an AMD Ryzen 5 2500U

Average Times Comparison

Test V1-spielwiese V2-pr1301 Comparison
cyclic6-ZZ-lp 0.3073s (±0.2127) 0.3379s (±0.2944) V1 1.09x faster
cyclic6-ZZ-dp 0.0638s (±0.0086) 0.2124s (±0.1696) V1 3.32x faster
cyclic6-hom-QQ-dp 0.1752s (±0.0246) 0.1485s (±0.0113) V2 1.17x faster
newellp1 2523.0554s (±40.7906) 2487.6538s (±19.9567) similar
cyclic6-QQ-lp 0.9288s (±0.4223) 0.4438s (±0.0138) V2 2.09x faster
katsura9-QQ-dp 12.1965s (±4.7289) 12.6604s (±4.4944) similar
katsura9-ZZ-dp 0.8516s (±0.1712) 0.8214s (±0.1108) similar
cyclic6-hom-ZZ-dp 0.0613s (±0.0062) 0.0673s (±0.0050) V1 1.09x faster
cyclic6-QQ-dp 0.2409s (±0.1053) 0.1844s (±0.0469) V2 1.30x faster

The only runs where this PR shows a slowdown are very fast. One possible explanation is that the older array-based code is faster for small L sets, while the tree-based code in this PR becomes more performant for larger L sets.

@BrentBaccala
Copy link
Copy Markdown
Contributor Author

The patches are much easier to read with the patience diff algorithm, like this:
git log --diff-algorithm=patience -p

@BrentBaccala
Copy link
Copy Markdown
Contributor Author

BrentBaccala commented Nov 5, 2025

Here's some timings from an improved benchmarking script that compares different algorithms:

Test Singular-spielwiese Singular-LSet
newellp1-modstd 316.6912s (±9.3126) 332.2769s (±5.5500)
newellp1-sba 69.5517s (±6.6773) 75.1145s (±6.2697)
newellp1-slimgb 39.3158s (±6.2254) 35.6399s (±4.8627)
newellp1-std 2571.6555s (±32.7144) 2577.7527s (±39.1309)

Each time is the average of 5 runs on my AMD Ryzen 5 2500U.

There's not much change, but the PR paves the way for future improvements.

@hannes14
Copy link
Copy Markdown
Member

hannes14 commented Nov 17, 2025 via email

@BrentBaccala BrentBaccala marked this pull request as draft November 18, 2025 18:01
@BrentBaccala
Copy link
Copy Markdown
Contributor Author

BrentBaccala commented Nov 18, 2025 via email

@jankoboehm
Copy link
Copy Markdown
Member

Dear Brent, as you may have heard, Hans passed away at the end of 2025.

I still think you have uncovered a relevant issue. However, this goes quite deep into the internals. So the change can only be justified if there is a clear performance benefit. Once the tests are passing, we will carry out a careful analysis of correctness and performance beyond the standard test suites.

Still, thanks a lot for your work — and we will see how it turns out!

@jankoboehm
Copy link
Copy Markdown
Member

I also think it makes sense to focus on the original issue.

@BrentBaccala
Copy link
Copy Markdown
Contributor Author

Dear Janko,

No, actually I did not know that Hans had passed away. I see the announcement now on the Singular website. I'm sorry to hear that. I don't know if there's anything I can do to assist you, please please reach out if you think that I can.

About the pull request...

There's several issues here. One is code cleanup - refactoring so that we have comparator functions that compare two LObjects without every one of them having a binary search mixed in with the function (like we do now). I've been thinking of opening a pull request just for that.

The other issue is actual performance improvement. I haven't achieved that yet. A big part of the reason I opened this PR originally was that I didn't want to wait until I've got a massive patch that touches a thousand lines of code, so I was thinking that converting to std::multimap was an incremental step. But no actual improvement, and some regression because sometimes the array operations are faster than the tree operations.

So, right now, I've got a side branch that just changes the LObject comparator functions, and I haven't looked at it for a while. Most of my focus is on trying to grapple with the performance issues. I've got a benchmark script and some tracing code that dumps all of the Gröbner basis calculations within a primdec calculation as SSI files. It's not totally disorganized, but it's not quite suitable as a PR just yet.

BrentBaccala and others added 10 commits March 23, 2026 22:06
Key Changes

- Introduced writable_set<LObject, CompareLObject> in kernel/GBEngine/writable_set.h as a wrapper around std::multimap
- Replaced manual array management (enterL, deleteInL, array indexing) with C++ container methods (push, pop, top, empty)
- Converted posInL* functions (position-finding functions) to compareL* functions that return -1/0/1 for use with the multimap's ordering
- Use iterators instead of array indices to traverse LSet containers
- Use explicit initialization of strategy object members (instead of memset 0) since LSet can no longer be zero-initialized
Two bugs fixed:

1. updateL() used std::iter_swap to move pure-power elements to the front
   of the L set. With the multimap-based LSet, iter_swap corrupts the
   sorted invariant since the multiset expects elements to stay in order.
   Fix: remove both iter_swap calls and rely on the compareL10 comparator
   (which already gives pure-power elements priority). Add missing
   strat->L.reorder() call after updateL() in enterSMora.

2. In spielwiese, posInL11Ringls normalizes polynomials by negating those
   with negative leading coefficients (as a side effect of insertion).
   The LSet branch's compareL11Ringls comparator doesn't have this side
   effect. Without normalization, the nEqual() checks in redRiloc_Z and
   redRiloc make wrong lazy-reinsertion decisions, causing stdZtests.tst
   to hang on "Yue's Bug" (integer ring with mixed ws ordering).
   Fix: normalize leading coefficients in LSet::push() for non-global
   Z-ring orderings, and before comparison points in redRiloc/redRiloc_Z.

Test results: 184/184 (ringlocal, stdZtests, Short/ok_s.lst all pass).

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@BrentBaccala
Copy link
Copy Markdown
Contributor Author

An update on this PR:

It's still not ready to merge. I've progressed quite a bit past it and have gotten good performance gains (60-70%) on the large real-world polynomial systems that I'm trying to solve, but shifting LSet to use a tree introduces performance regressions on small Gröbner basis calculations, for which the original array implementation is faster.

I'm planning to handle this by further enhancing LSet so that it can use either data structure, and switch between them dynamically, but that's further refactoring. It will come. Right now, I'm focusing on refactoring S, which is starting to look like it's as challenging to handle as L, so that the whole bba algorithm can be parallelized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants