Refactor LSet to use C++ std::multimap#1301
Refactor LSet to use C++ std::multimap#1301BrentBaccala wants to merge 10 commits intoSingular:spielwiesefrom
Conversation
|
Hans Schönemann suggested some benchmarks, and I asked Claude to generate a script to run them. Here are some results, 5 runs each on an AMD Ryzen 5 2500U Average Times Comparison
The only runs where this PR shows a slowdown are very fast. One possible explanation is that the older array-based code is faster for small L sets, while the tree-based code in this PR becomes more performant for larger L sets. |
|
The patches are much easier to read with the |
|
Here's some timings from an improved benchmarking script that compares different algorithms:
Each time is the average of 5 runs on my AMD Ryzen 5 2500U. There's not much change, but the PR paves the way for future improvements. |
|
Am Wed, Nov 05, 2025 at 08:13:45AM -0800 schrieb Brent Baccala:
BrentBaccala left a comment (Singular/Singular#1301)
Here's some timings from an improved benchmarking script that compares different algorithms:
| Test | Singular-spielwiese | Singular-LSet |
|------|----------|----------|
| newellp1-modstd | 316.6912s (±9.3126) | 332.2769s (±5.5500) |
| newellp1-sba | 69.5517s (±6.6773) | 75.1145s (±6.2697) |
| newellp1-slimgb | 39.3158s (±6.2254) | 35.6399s (±4.8627) |
| newellp1-std | 2571.6555s (±32.7144) | 2577.7527s (±39.1309) |
Each time is the average of 5 runs on my AMD Ryzen 5 2500U.
There's not much change, but the PR paves the way for future improvements.
--
Reply to this email directly or view it on GitHub:
#1301 (comment)
You are receiving this because you are subscribed to this thread.
Message ID: ***@***.***>
Dear Brent Baccala,
thanks for your work with the LSet.
You are right about the main problem: the elements in the LSet are too
large (they grew over time.. I was not aware that an entry has now
around 140 bytes) respectively it takes too much time to move them
around.
But I will not merge that PR:
- the tests Tst/New/ringlocal.tst Tst/New/stdZtests.tst fail
(they do not finish)
- no improvements in the timings (see above)
- std::multiset does not work (well) with omalloc
- it uses more memory: some (8 bytes for the pointer) can not be
avoided, but there are also vtables etc.
- it is difficult to accces more than just the top element
II wiull try to make LSet a an array of pointers.
Hans Schoenemann
|
|
Dear Hans,
Thank you for your reply and feedback.
Your decision is completely understandable, and I support it fully. Let me
elaborate on a few points:
- I didn't know some of the tests were failing. I'll look into them.
- I didn't consider how std::multiset would interact with omalloc. I'll
look into that.
- I do think we should shift (if possible) to iterators (of some kind).
Not being able to easily access more than the top element might end up
being a common situation if part of the L set is either on disk or on
another node in a cluster.
- I think shifting from "posInL" to "compareL" is a real improvement
that should be made (see below)
I'm currently exploring "mathicgb". What's your opinion of this code? It
looks like we've got an F4 implementation there, as well as a newer
implementation of standard Buchberger and F5.
I'm trying to organize the library code to be more independent of the
choice of algorithm. It's unfortunate that we use the names "std" for the
original algorithm and "groebner" for a heuristic selection of algorithm.
I wonder even about changing "std" to be a synonym for "groebner" and
adding "bba" as the new name for "std". That might break compatibility too
much; what do you think? Failing that, I'm thinking that a lot of the
places we use "std" should be changed to use "groebner". What are your
thoughts of making that change throughout the LIB directory?
I'm also working to develop a protocol-independent test suite, by making
copies of a lot of our tests, removing "option(prot)", using "groebner"
instead of "std", and just checking their return values. This will be
useful in checking the correct operation of "mathicgb".
I've got a nice benchmark script (1100 lines of bash) written by Claude.
You can select different algorithms, different characteristics, different
numbers of variables for the cyclic and katsura tests. It also does the
newellp1 test. Should I put it in as a separate PR?
As far as the L set PR is concerned, I've improved it somewhat, so I'll
push those changes and flag it "draft" (which I just did). I'll look into
the concerns you've got about it, and perhaps we can revisit it at a later
date. I would especially recommend factoring out the LObject comparison
code and keeping it separate from the binary search algorithm (i.e,
introducing "compareL" as the switchable function). I think it makes the
code a lot easier to understand and maintain. I think we should do the
same thing for the T set, as well. What about a PR to just make that
change, without shifting everything to standard containers?
Thank you.
agape
brent
…On Mon, Nov 17, 2025 at 10:37 AM Hans Schoenemann ***@***.***> wrote:
*hannes14* left a comment (Singular/Singular#1301)
<#1301 (comment)>
Am Wed, Nov 05, 2025 at 08:13:45AM -0800 schrieb Brent Baccala:
> BrentBaccala left a comment (Singular/Singular#1301)
>
> Here's some timings from an improved benchmarking script that compares
different algorithms:
>
> | Test | Singular-spielwiese | Singular-LSet |
> |------|----------|----------|
> | newellp1-modstd | 316.6912s (±9.3126) | 332.2769s (±5.5500) |
> | newellp1-sba | 69.5517s (±6.6773) | 75.1145s (±6.2697) |
> | newellp1-slimgb | 39.3158s (±6.2254) | 35.6399s (±4.8627) |
> | newellp1-std | 2571.6555s (±32.7144) | 2577.7527s (±39.1309) |
>
> Each time is the average of 5 runs on my AMD Ryzen 5 2500U.
>
> There's not much change, but the PR paves the way for future
improvements.
>
> --
> Reply to this email directly or view it on GitHub:
> #1301 (comment)
> You are receiving this because you are subscribed to this thread.
>
> Message ID: ***@***.***>
Dear Brent Baccala,
thanks for your work with the LSet.
You are right about the main problem: the elements in the LSet are too
large (they grew over time.. I was not aware that an entry has now
around 140 bytes) respectively it takes too much time to move them
around.
But I will not merge that PR:
- the tests Tst/New/ringlocal.tst Tst/New/stdZtests.tst fail
(they do not finish)
- no improvements in the timings (see above)
- std::multiset does not work (well) with omalloc
- it uses more memory: some (8 bytes for the pointer) can not be
avoided, but there are also vtables etc.
- it is difficult to accces more than just the top element
II wiull try to make LSet a an array of pointers.
Hans Schoenemann
—
Reply to this email directly, view it on GitHub
<#1301 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABLXJBZL4XVYB3ETRIW242D35HTVNAVCNFSM6AAAAACKA6BX4GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKNBSGUYDSMJZGI>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
|
Dear Brent, as you may have heard, Hans passed away at the end of 2025. I still think you have uncovered a relevant issue. However, this goes quite deep into the internals. So the change can only be justified if there is a clear performance benefit. Once the tests are passing, we will carry out a careful analysis of correctness and performance beyond the standard test suites. Still, thanks a lot for your work — and we will see how it turns out! |
|
I also think it makes sense to focus on the original issue. |
|
Dear Janko, No, actually I did not know that Hans had passed away. I see the announcement now on the Singular website. I'm sorry to hear that. I don't know if there's anything I can do to assist you, please please reach out if you think that I can. About the pull request... There's several issues here. One is code cleanup - refactoring so that we have comparator functions that compare two LObjects without every one of them having a binary search mixed in with the function (like we do now). I've been thinking of opening a pull request just for that. The other issue is actual performance improvement. I haven't achieved that yet. A big part of the reason I opened this PR originally was that I didn't want to wait until I've got a massive patch that touches a thousand lines of code, so I was thinking that converting to std::multimap was an incremental step. But no actual improvement, and some regression because sometimes the array operations are faster than the tree operations. So, right now, I've got a side branch that just changes the LObject comparator functions, and I haven't looked at it for a while. Most of my focus is on trying to grapple with the performance issues. I've got a benchmark script and some tracing code that dumps all of the Gröbner basis calculations within a primdec calculation as SSI files. It's not totally disorganized, but it's not quite suitable as a PR just yet. |
Key Changes - Introduced writable_set<LObject, CompareLObject> in kernel/GBEngine/writable_set.h as a wrapper around std::multimap - Replaced manual array management (enterL, deleteInL, array indexing) with C++ container methods (push, pop, top, empty) - Converted posInL* functions (position-finding functions) to compareL* functions that return -1/0/1 for use with the multimap's ordering - Use iterators instead of array indices to traverse LSet containers - Use explicit initialization of strategy object members (instead of memset 0) since LSet can no longer be zero-initialized
Two bugs fixed: 1. updateL() used std::iter_swap to move pure-power elements to the front of the L set. With the multimap-based LSet, iter_swap corrupts the sorted invariant since the multiset expects elements to stay in order. Fix: remove both iter_swap calls and rely on the compareL10 comparator (which already gives pure-power elements priority). Add missing strat->L.reorder() call after updateL() in enterSMora. 2. In spielwiese, posInL11Ringls normalizes polynomials by negating those with negative leading coefficients (as a side effect of insertion). The LSet branch's compareL11Ringls comparator doesn't have this side effect. Without normalization, the nEqual() checks in redRiloc_Z and redRiloc make wrong lazy-reinsertion decisions, causing stdZtests.tst to hang on "Yue's Bug" (integer ring with mixed ws ordering). Fix: normalize leading coefficients in LSet::push() for non-global Z-ring orderings, and before comparison points in redRiloc/redRiloc_Z. Test results: 184/184 (ringlocal, stdZtests, Short/ok_s.lst all pass). Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
An update on this PR: It's still not ready to merge. I've progressed quite a bit past it and have gotten good performance gains (60-70%) on the large real-world polynomial systems that I'm trying to solve, but shifting LSet to use a tree introduces performance regressions on small Gröbner basis calculations, for which the original array implementation is faster. I'm planning to handle this by further enhancing LSet so that it can use either data structure, and switch between them dynamically, but that's further refactoring. It will come. Right now, I'm focusing on refactoring S, which is starting to look like it's as challenging to handle as L, so that the whole bba algorithm can be parallelized. |
Refactor LSet to use C++ std::multimap
This PR replaces the hand-coded array-based implementation of
LSet(the lazy set data structure in Groebner basis computation) with a C++std::multimapcontainer wrapped in a customwritable_setclass.Key Changes
writable_set<LObject, CompareLObject>inkernel/GBEngine/writable_set.has a wrapper aroundstd::multimapenterL,deleteInL, array indexing) with modern C++ container methods (push,pop,top,empty) and a few non-standard methods (would_be_top,pop_and_erase,reorder)posInL*functions (position-finding functions) tocompareL*functions that return -1/0/1 for use with the multimap's orderingLSetcontainersLSetcan no longer be zero-initializedFiles Modified
kernel/GBEngine/kutil.{cc,h},kernel/GBEngine/kInline.hkstd1.cc,kstd2.cc,kstdfac.cc,khstd.cc,gr_kstd2.cc,sca.cc,kverify.cccustomstd.cc,std_wrapper.ccBenefits
The PR is split into two commits to make the first (and primary) commit easier to read.
gitdoesn't format the second commit as well, but all the second commit does is remove the unusedposInL*functions. The primary commit is a squashed version of a much longer commit chain fromhttps://github.com/BrentBaccala/Singular/tree/LqueueAll core testsuite tests (
Short/ok_s.lst,Long/ok_l.lst,Old/universal.lst,Plural/short.lst,Plural/dmod.lst,Plural/long.lst,Buch/buch.lst) run and pass without modification.Some of the debugging messages have changed, due to the difficulty in finding an element's index in the LSet now that we use a tree instead of an array.