JIRA: https://issues.redhat.com/browse/RHEL-99501
Omitted-fix: 8671bad873ebe ("sched: Do not call __put_task_struct() on rt if pi_blocked_on is set")
The omitted commit will likely see a follow up fix that will reintroduce/adapt the non-RT PROVE_RAW_LOCK_NESTING-relevant parts introduced by this patch.
Tested: SCHED_T1 regression test on kernel-{,rt-}debug
commit 893cdaaa3977be6afb3a7f756fbfd7be83f68d8c
Author: Wander Lairson Costa <[email protected]>
Date: Wed Jun 14 09:23:22 2023 -0300
In put_task_struct(), a spin_lock is indirectly acquired under the kernel
stock. When running the kernel in real-time (RT) configuration, the
operation is dispatched to a preemptible context call to ensure
guaranteed preemption. However, if PROVE_RAW_LOCK_NESTING is enabled
and __put_task_struct() is called while holding a raw_spinlock, lockdep
incorrectly reports an "Invalid lock context" in the stock kernel.
This false splat occurs because lockdep is unaware of the different
route taken under RT. To address this issue, override the inner wait
type to prevent the false lockdep splat.
Suggested-by: Oleg Nesterov <[email protected]>
Suggested-by: Sebastian Andrzej Siewior <[email protected]>
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Wander Lairson Costa <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Čestmír Kalina [email protected]
Given the time constraints, I will ack this MR as it fixes the two issues and suggest the creation of a Jira Issue to investigate whether backporting commit 9c7f93a4 ("memcg: drain obj stock on cpu hotplug teardown") is necessary or not. In conversation with @ckalina it became clear that my initial concern was misplaced, coming from a context confusion.
Though this change is not wrong, I wonder why we haven't seen that problem reported in RHEL-9.8... Maybe this RHEL-9.8 commit, missing in RHEL-9.7, could be related to the problem?
9c7f93a405bf memcg: drain obj stock on cpu hotplug teardown
That adds extra locking for the RT case and is translated to a local_irq_save() in the stock kernel. I have the impression that could be enough to solve the problem observed.
printk: Avoid scheduling irq_work on suspend
JIRA: https://issues.redhat.com/browse/RHEL-141481
Upstream Status: RHEL9 only
Conflicts: Upstream suspend fixes are dependent on multiple new helpers and context-check mechanisms.
On certain machines with large CPU counts, as well as some large workstations with VROC enabled, sending said machine into S4/Hibernation sleep states and promptly waking it may cause a console hang and call trace [0].
To elaborate, in the snapshot/hibernation code, swsusp_save() --which is called during APIC shutdown-- invokes a pr_info() that results in waking a console thread via an IPI with irq_work_queue(). In this context, sending an IPI while the APIC MSR is shutdown causes an invalid/unchecked MSR write, resulting in the aforementioned trace.
With some architectures interpreting IPIs and work enqueued by irq_work as reasons not to suspend, irq_work is continually queued in the suspend procedure as NBCON consoles wake printer threads via irq_work with every printk invocation.
Upstream, the workaround has been to avoid queueing irq_work once the console has begun its suspend process, including deferred printing and klogd waiters. Additional checks during console flush were included to determine appropriate console-flushing methods for currently registered consoles and determines the appropriate flushing method for the current system state and available consoles using printk_get_console_flush_type() [1][2].
However, as it stands currently, the upstream series does not apply to CentOS Stream 9 or RHEL9, as a variety of mechanisms, helpers and guard clauses were introduced, creating considerable drift that cannot be resolved cleanly.
Therefore, perform the following in kernel/printk:
Previously, the printk series' "upstream" was synchronized with patches landing in v6.6-rt-stable of the linux-stable-rt trees, as the necessary printk patches had not made their way into the mainline kernel at the time of the tty/printk "rebase" in RHEL-9.5 (note: c9s had merged the RT patches at RHEL-9.3) [3][4].
As previously mentioned, the upstream series does not apply cleanly due to considerable drift. Therefore, a majority of the "plumbing" patches backported from upstream are considered RHEL-only for the purpose of this MR, and to comply with "complete commit history" rules, as multiple commits were folded into each other due to differences in function names, guard clauses, macros and other helpers.
[0] https://issues.redhat.com/browse/RHEL-120703
[1] https://lore.kernel.org/stable/[email protected]/
[2] https://lore.kernel.org/lkml/[email protected]/
[3] https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/?h=v6.6-rt
[4] !3895
Signed-off-by: Herton R. Krzesinski [email protected] Signed-off-by: Derek Barbosa [email protected]
JIRA: https://issues.redhat.com/browse/RHEL-144727
commit 36f8e3087562bc3f55f584fb42d105dfdf6686f4
Author: Zqiang [email protected]
Date: Wed May 7 19:26:04 2025 +0800
rcu/nocb: Add Safe checks for access offloaded rdp
For built with CONFIG_PROVE_RCU=y and CONFIG_PREEMPT_RT=y kernels,
Disable BH does not change the SOFTIRQ corresponding bits in
preempt_count(), but change current->softirq_disable_cnt, this
resulted in the following splat:
WARNING: suspicious RCU usage
kernel/rcu/tree_plugin.h:36 Unsafe read of RCU_NOCB offloaded state!
stack backtrace:
CPU: 0 UID: 0 PID: 22 Comm: rcuc/0
Call Trace:
[ 0.407907] <TASK>
[ 0.407910] dump_stack_lvl+0xbb/0xd0
[ 0.407917] dump_stack+0x14/0x20
[ 0.407920] lockdep_rcu_suspicious+0x133/0x210
[ 0.407932] rcu_rdp_is_offloaded+0x1c3/0x270
[ 0.407939] rcu_core+0x471/0x900
[ 0.407942] ? lockdep_hardirqs_on+0xd5/0x160
[ 0.407954] rcu_cpu_kthread+0x25f/0x870
[ 0.407959] ? __pfx_rcu_cpu_kthread+0x10/0x10
[ 0.407966] smpboot_thread_fn+0x34c/0xa50
[ 0.407970] ? trace_preempt_on+0x54/0x120
[ 0.407977] ? __pfx_smpboot_thread_fn+0x10/0x10
[ 0.407982] kthread+0x40e/0x840
[ 0.407990] ? __pfx_kthread+0x10/0x10
[ 0.407994] ? rt_spin_unlock+0x4e/0xb0
[ 0.407997] ? rt_spin_unlock+0x4e/0xb0
[ 0.408000] ? __pfx_kthread+0x10/0x10
[ 0.408006] ? __pfx_kthread+0x10/0x10
[ 0.408011] ret_from_fork+0x40/0x70
[ 0.408013] ? __pfx_kthread+0x10/0x10
[ 0.408018] ret_from_fork_asm+0x1a/0x30
[ 0.408042] </TASK>
Currently, triggering an rdp offloaded state change need the
corresponding rdp's CPU goes offline, and at this time the rcuc
kthreads has already in parking state. this means the corresponding
rcuc kthreads can safely read offloaded state of rdp while it's
corresponding cpu is online.
This commit therefore add softirq_count() check for
Preempt-RT kernels.
Suggested-by: Joel Fernandes <[email protected]>
Reviewed-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Zqiang <[email protected]>
Signed-off-by: Joel Fernandes <[email protected]>
Signed-off-by: Luis Claudio R. Goncalves [email protected]
Luis Claudio R. Goncalves (9088858e) at 30 Jan 21:44
rcu/nocb: Add Safe checks for access offloaded rdp
... and 151176 more commits
JIRA: https://issues.redhat.com/browse/RHEL-141497
Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
On certain machines with large CPU counts, as well as some large workstations with VROC enabled, sending said machine into S4/Hibernation sleep states and promptly waking it may cause a console hang and call trace [0].
To elaborate, in the snapshot/hibernation code, swsusp_save() --which is called during APIC shutdown-- invokes a pr_info() that results in waking a console thread via an IPI with irq_work_queue(). In this context, sending an IPI while the APIC MSR is shutdown causes an invalid/unchecked MSR write, resulting in the aforementioned trace.
With some architectures interpreting IPIs and work enqueued by irq_work as reasons not to suspend, irq_work is continually queued in the suspend procedure as NBCON consoles wake printer threads via irq_work with every printk invocation.
Upstream, the workaround has been to avoid queueing irq_work once the console has begun its suspend process, including deferred printing and klogd waiters. Additional checks during console flush were included to determine appropriate console-flushing methods for currently registered consoles and determines the appropriate flushing method for the current system state and available consoles using printk_get_console_flush_type() [1][2].
Backport the necessary fixes to resolve behavior.
Also, address drift from 6.12-longterm.
[0] https://issues.redhat.com/browse/RHEL-120703
[1] https://lore.kernel.org/stable/[email protected]/
[2] https://lore.kernel.org/lkml/[email protected]/
Signed-off-by: Derek Barbosa [email protected]
JIRA: https://issues.redhat.com/browse/RHEL-99501
Omitted-fix: 8671bad873ebe ("sched: Do not call __put_task_struct() on rt if pi_blocked_on is set")
The omitted commit will likely see a follow up fix that will reintroduce/adapt the non-RT PROVE_RAW_LOCK_NESTING-relevant parts introduced by this patch.
Tested: SCHED_T1 regression test on kernel-{,rt-}debug
commit 893cdaaa3977be6afb3a7f756fbfd7be83f68d8c
Author: Wander Lairson Costa <[email protected]>
Date: Wed Jun 14 09:23:22 2023 -0300
In put_task_struct(), a spin_lock is indirectly acquired under the kernel
stock. When running the kernel in real-time (RT) configuration, the
operation is dispatched to a preemptible context call to ensure
guaranteed preemption. However, if PROVE_RAW_LOCK_NESTING is enabled
and __put_task_struct() is called while holding a raw_spinlock, lockdep
incorrectly reports an "Invalid lock context" in the stock kernel.
This false splat occurs because lockdep is unaware of the different
route taken under RT. To address this issue, override the inner wait
type to prevent the false lockdep splat.
Suggested-by: Oleg Nesterov <[email protected]>
Suggested-by: Sebastian Andrzej Siewior <[email protected]>
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Wander Lairson Costa <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Čestmír Kalina [email protected]
@ckalina In RHEL-40885, MR!7175 (currently marked as Release Pending) I slightly modified the code in put_task_struct(), so your MR won't apply. Would you mind rebasing your change on top of that?
JIRA: https://issues.redhat.com/browse/RHEL-125445
After RTLA BPF sample collection was introduced in RHEL 9.7, a regression was introduced due to the code incorrectly interpreting -T/--thread option as threshold only for thread latency. The expected behavior by both the timerlat tracer and RTLA users it, despite the naming, to stop also at user-ret latency.
This merge request fixes that discrepancy by adding an additional stop condition to the timerlat BPF program.
All submissions to CentOS Stream must reference a ticket in Red Hat Jira.
List tickets each on their own line of this description using the format "Resolves: RHEL-76229", "Related: RHEL-76229" or "Reverts: RHEL-76229", as appropriate.
Signed-off-by: Tomas Glozar [email protected]
JIRA: https://issues.redhat.com/browse/RHEL-125441
After RTLA BPF sample collection was introduced in RHEL 9.7, a regression was introduced due to the code incorrectly interpreting -T/--thread option as threshold only for thread latency. The expected behavior by both the timerlat tracer and RTLA users it, despite the naming, to stop also at user-ret latency.
This merge request fixes that discrepancy by adding an additional stop condition to the timerlat BPF program.
All submissions to CentOS Stream must reference a ticket in Red Hat Jira.
List tickets each on their own line of this description using the format "Resolves: RHEL-76229", "Related: RHEL-76229" or "Reverts: RHEL-76229", as appropriate.
Signed-off-by: Tomas Glozar [email protected]
genirq/manage: Reduce priority of forced secondary interrupt handler
JIRA: https://issues.redhat.com/browse/RHEL-102562
Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
commit 51d0656959bcdb743232f9b530b4cca569e74e7f Author: Lukas Wunner [email protected] Date: Mon Oct 27 13:59:31 2025 +0100
genirq/manage: Reduce priority of forced secondary interrupt handler
Crystal reports that the PCIe Advanced Error Reporting driver gets stuck
in an infinite loop on PREEMPT_RT:
Both the primary interrupt handler aer_irq() as well as the secondary
handler aer_isr() are forced into threads with identical priority.
Crystal writes that on the ARM system in question, the primary handler
has to clear an error in the Root Error Status register...
"before the next error happens, or else the hardware will set the
Multiple ERR_COR Received bit. If that bit is set, then aer_isr()
can't rely on the Error Source Identification register, so it scans
through all devices looking for errors -- and for some reason, on
this system, accessing the AER registers (or any Config Space above
0x400, even though there are capabilities located there) generates
an Unsupported Request Error (but returns valid data). Since this
happens more than once, without aer_irq() preempting, it causes
another multi error and we get stuck in a loop."
The issue does not show on non-PREEMPT_RT because the primary handler
runs in hardirq context and thus can preempt the threaded secondary
handler, clear the Root Error Status register and prevent the secondary
handler from getting stuck.
Emulate the same behavior on PREEMPT_RT by assigning a lower default
priority to the secondary handler if the primary handler is forced into
a thread.
Reported-by: Crystal Wood <[email protected]>
Signed-off-by: Lukas Wunner <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Tested-by: Crystal Wood <[email protected]>
Reviewed-by: Sebastian Andrzej Siewior <[email protected]>
Link: https://patch.msgid.link/f6dcdb41be2694886b8dbf4fe7b3ab89e9d5114c.1761569303.git.lukas@wunner.de
Closes: https://lore.kernel.org/r/[email protected]/
Signed-off-by: Crystal Wood [email protected]
JIRA: https://issues.redhat.com/browse/RHEL-113085
This patch series fixes a couple of bugs in the powerpc64 out-of-line (OOL) ftrace support for modules, and follows up with a patch to simplify the module .stubs allocation code.
The first two patches fix bugs introduced by commit eec37961a56a ("powerpc64/ftrace: Move ftrace sequence out of line"). The first, suggested by Naveen, ensures that a NOP'd ftrace call site has its ftrace_ops record updated correctly. The second patch corrects a loop in setup_ftrace_ool_stubs() to ensure all required stubs are reserved, not just the first. Together, these bugs lead to potential corruption of the OOL ftrace stubs area for livepatch modules.
Signed-off-by: Joe Lawrence [email protected]
JIRA: https://issues.redhat.com/browse/RHEL-117873
CVE: CVE-2025-38493 CVE: CVE-2025-39887 CVE: CVE-2025-39974 CVE: CVE-2025-21733
Rebase RTLA in RHEL 9.8 to upstream 6.17, with one fix from 6.18.
All submissions to CentOS Stream must reference a ticket in Red Hat Jira.
List tickets each on their own line of this description using the format "Resolves: RHEL-76229", "Related: RHEL-76229" or "Reverts: RHEL-76229", as appropriate.
Signed-off-by: Tomas Glozar [email protected]
JIRA: https://issues.redhat.com/browse/RHEL-117874
CVE: CVE-2025-39887 CVE: CVE-2025-39974
Rebase RTLA in RHEL 10.2 to upstream 6.17, with one fix from 6.18.
All submissions to CentOS Stream must reference a ticket in Red Hat Jira.
List tickets each on their own line of this description using the format "Resolves: RHEL-76229", "Related: RHEL-76229" or "Reverts: RHEL-76229", as appropriate.
Signed-off-by: Tomas Glozar [email protected]