Skip to content

test: make assert_return() critical by default#30049

Merged
yuwata merged 3 commits intosystemd:mainfrom
yuwata:assert-return-critical
Dec 23, 2023
Merged

test: make assert_return() critical by default#30049
yuwata merged 3 commits intosystemd:mainfrom
yuwata:assert-return-critical

Conversation

@yuwata
Copy link
Member

@yuwata yuwata commented Nov 16, 2023

No description provided.

@github-actions
Copy link

github-actions bot commented Nov 16, 2023

We had successfully released a new major release. We are no longer in a development freeze phase.
We will try our best to get back to your PR as soon as possible. Thank you for your patience.

@yuwata yuwata force-pushed the assert-return-critical branch 3 times, most recently from 82508a4 to 4677093 Compare November 16, 2023 03:58
src/basic/log.c Outdated
if (e && log_set_ratelimit_kmsg_from_string(e) < 0)
log_warning("Failed to parse log ratelimit kmsg boolean '%s'. Ignoring.", e);

e = getenv("SYSTEMD_LOG_ASSERT_RETURN_IS_CRITICAL");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel a bit uneasy about this switch given that it can start crashing systemd left and right :-) Another option would be a compile-time option that could be passed by the CI. (at least that's what NetworkManager did as far as I can remember)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like NM flipped it at runtime to judge from #22406 (comment). I still think it should be a compile-time switch though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was planning to introduce a build time option and explicitly enable it in the CI, since this has a very high potential of screwing up the whole system.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. We have hundreds of command-line switches which will cause the system to fail if used incorrectly. This switch here is harder to misuse and not as bad as many others. It seems reasonable to allow this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would actually be nice if it was possible to control it at runtime too because I can imagine scenarios where it would be useful to crash resolved (or any other program linked against libsystemd) without touching networkd but that runtime switch should be hidden behind a compile time switch probably :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not to use release mode build when you want to test other software, e.g. NetworkManager?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because assert_return isn't supposed to crash in release mode but I'd want it to crash on the CI. For example here's what NM did according to #22406 (comment)

NM redefines assert_return() to behave like g_return_if_fail()... which under normal conditions only prints a message.
But in NM-CI we run with G_DEBUG=fatal-warnings, which turns this into an abort().

(NM replaced those systemd parts so it's a hypothetical example)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a closer look at NM and it seems the dhcpv6 client is still there so maybe it isn't a hypothetical example after all :-) Either way systemd crashes aren't that unusual to me so I don't feel strongly about this. It's just that I'm not sure what is going to happen with this option enabled everywhere if some external code linked against libsystemd starts crashing on CentOS CI once it pulls the latest version of Arch or something like that. I'll leave it to @mrc0mmand to decide :-)

Copy link
Contributor

@evverx evverx Nov 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. We have hundreds of command-line switches which will cause the system to fail if used incorrectly

It's kind of hard to set them all to get almost all the systemd components to crash at once.

not as bad as many others

Dunno. I turned this on globally at some point and rolled it back because even in the environment prepared to deal with all sorts of systemd crashes where PID1, journald, resolved, networkd and so on go down on a regular basis it was unbearable. I guess if weird packets are never sent or other weird things like spawning a gazillion of weird jobs are never done it can be somewhat tolerated probably. Either way I think it's a debug feature and it should be available in debug builds only by analogy with the other additional assertions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That being said it it was in release mode it would make it easier to set it outside of, say, various CIs but I'm not sure who would need it because to, for example, instrument systemd it would still be necessary to compile systemd separately. All in all I'm for the runtime option available in debug builds only but I don't feel strongly about this.

@yuwata yuwata force-pushed the assert-return-critical branch 2 times, most recently from 622cd66 to d3ba7bb Compare November 20, 2023 00:32
@yuwata yuwata added this to the v256 milestone Nov 20, 2023
Copy link
Member

@keszybz keszybz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I had some pending comments from a long time ago. I'll publish them, maybe they'll still be useful.

src/basic/log.c Outdated
if (e && log_set_ratelimit_kmsg_from_string(e) < 0)
log_warning("Failed to parse log ratelimit kmsg boolean '%s'. Ignoring.", e);

e = getenv("SYSTEMD_LOG_ASSERT_RETURN_IS_CRITICAL");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. We have hundreds of command-line switches which will cause the system to fail if used incorrectly. This switch here is harder to misuse and not as bad as many others. It seems reasonable to allow this.

@yuwata yuwata force-pushed the assert-return-critical branch from d3ba7bb to 6df248e Compare December 6, 2023 05:46
yuwata added a commit to yuwata/systemd that referenced this pull request Dec 6, 2023
@yuwata yuwata marked this pull request as ready for review December 6, 2023 05:46
@github-actions github-actions bot added the please-review PR is ready for (re-)review by a maintainer label Dec 6, 2023
@yuwata
Copy link
Member Author

yuwata commented Dec 6, 2023

Let's add env variable and command line option when necessary. PTAL.

@yuwata
Copy link
Member Author

yuwata commented Dec 22, 2023

Could anyone review this, and hopefully set the green label? There may be several rooms to improve our test suites more, but at least this should be a good starting point.

cc @mrc0mmand @evverx @keszybz

@mrc0mmand
Copy link
Member

Could anyone review this, and hopefully set the green label? There may be several rooms to improve our test suites more, but at least this should be a good starting point.

cc @mrc0mmand @evverx @keszybz

I'm a bit confused about the additional macros - would you mind extending the commit message(s) with a rationale behind them and when they should be used? I think I understand what the macros do, but not completely sure why we need them (compared to just making all assert_return() calls fail when -Dmode=developer is used).

yuwata and others added 3 commits December 24, 2023 01:52
These can be used to check if we trigger assert_return()
unexpectedly.

Co-authored-by: Frantisek Sumsal <[email protected]>
Several test cases intentionally trigger assert_return(). So, to avoid
the entire test fails, this introduces several macros that tentatively
make assert_return() not critical.
Triggering assert_return() should be a bug in general, and we should
really fix that.  But, previously, it is hard to notice such bug, as
it was not critical.
This is for making CI or our testing environment fail if we unexpectedly
trigger assert_return(). So, hopefully we can easily find such bugs.
@yuwata
Copy link
Member Author

yuwata commented Dec 23, 2023

Extended commit messages, moved the macros to src/shared/tests.h, and moved them from the first commit to the second commit. Total code change is the same as the previous version.

@mrc0mmand mrc0mmand added good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed and removed please-review PR is ready for (re-)review by a maintainer labels Dec 23, 2023
@mrc0mmand
Copy link
Member

Extended commit messages, moved the macros to src/shared/tests.h, and moved them from the first commit to the second commit. Total code change is the same as the previous version.

Thank you!

@yuwata yuwata merged commit b6c424a into systemd:main Dec 23, 2023
@yuwata yuwata deleted the assert-return-critical branch December 23, 2023 18:39
@github-actions github-actions bot removed the good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed label Dec 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants