Fix Shutdown deadlock in some scenarios#10044
Fix Shutdown deadlock in some scenarios#10044Roasbeef merged 2 commits intolightningnetwork:masterfrom
Conversation
There was a problem hiding this comment.
Summary of Changes
Hello @ziggie1984, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request resolves a critical shutdown deadlock within the chanbackup component, specifically when the backup process might not have fully initialized. My changes enhance the SubSwapper's lifecycle management by introducing atomic state tracking and robust startup/shutdown logic, ensuring operations are only performed on a fully initialized component. I've also updated the release notes to reflect this important fix.
Highlights
- Deadlock Fix: I've addressed a shutdown deadlock in the
chanbackupcomponent that could occur if the backuper had not fully started. This was achieved by refining theSubSwapper's lifecycle management. - Concurrency Control Improvement: I've replaced
sync.Oncewithatomic.Boolfor managing theSubSwapper'sstartedandstoppedstates. This provides more explicit, robust, and thread-safe control over the component's initialization and termination. - State Validation: I've added checks to prevent operations like
ManualUpdateon aSubSwapperthat has not been successfully started. Additionally, thestartedstate is now reset if the initialupdateBackupFile()call fails during startup, preventing a partially initialized state. - Documentation: I've updated the release notes to include an entry for this fix, detailing the resolution of the shutdown deadlock.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Code Review
This pull request fixes a potential deadlock in the chanbackup.SubSwapper by refactoring the start/stop logic from sync.Once to atomic.Bool. This allows for more robust state management, including resetting the started state on a failed start and checking if the service is running before performing a manual update. The changes look good and address the described issue. I've found one potential issue in the new Stop() implementation that could lead to a deadlock and suggested a fix.
adc4c55 to
3c468e0
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request fixes a potential shutdown deadlock by replacing sync.Once with atomic.Bool for managing the SubSwapper's lifecycle. This change allows for more robust state management, especially in startup failure scenarios. I've suggested one improvement to prevent a stopped SubSwapper from being restarted, which will make the state transitions more predictable.
| // Start starts the chanbackup.SubSwapper. | ||
| func (s *SubSwapper) Start() error { | ||
| var startErr error | ||
| s.started.Do(func() { |
There was a problem hiding this comment.
How did the sync.Once usage mask things here? Mainly that w/ the public API you can't tell if something has actually run once or not?
There was a problem hiding this comment.
yeah we have the call here:
Line 2801 in b815109
but if we never started the swapper (because we fail before) we will deadlock there and not shutdown cleanly.
starius
left a comment
There was a problem hiding this comment.
LGTM! 💾
Great fix! I added small proposal to improve code readability.
chanbackup/pubsub.go
Outdated
|
|
||
| // stopped tracks whether the SubSwapper has been stopped and ensures | ||
| // it can only be stopped once. | ||
| stopped atomic.Bool |
There was a problem hiding this comment.
Can we remove stopped and just use quit state as an indication of closeness? These two variables seem to have the same meaning.
3c468e0 to
d9f57f5
Compare
chanbackup/pubsub.go
Outdated
| return | ||
| } | ||
| // Ensure we only start the SubSwapper once. | ||
| if !s.started.CompareAndSwap(false, true) { |
There was a problem hiding this comment.
nit: an even simpler way to do this is to just:
if s.started.Swap(true) {
// already started
}
d9f57f5 to
fa165e8
Compare
chanbackup/pubsub.go
Outdated
| } | ||
|
|
||
| s.wg.Add(1) | ||
| go s.backupUpdater() |
There was a problem hiding this comment.
ready is used only by Start and backupUpdater.
We can make it local to Start and pass to backupUpdater as an argument.
fa165e8 to
a119e0a
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request aims to fix a deadlock that can occur during shutdown if the SubSwapper is not started correctly. The changes involve using an atomic.Bool to track the started state and adding checks to prevent operations on a non-started SubSwapper.
The core logic change in ManualUpdate correctly prevents the deadlock. The use of a ready channel in Start to wait for the backupUpdater goroutine is also a good improvement for robustness.
However, I've identified a critical race condition in the Start method's implementation. The check and set of the started flag are not atomic, which could lead to issues under concurrent calls. I've provided suggestions to address this.
The release note update is clear and appropriate.
1b87eac to
be2c0a0
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request aims to fix a shutdown deadlock in SubSwapper by introducing an isActive flag to prevent updates when the swapper is not running. This is a good approach, but the current implementation has a race condition where ManualUpdate can still block if Stop is called concurrently. I've provided a suggestion to fix this race condition.
Additionally, I've pointed out a misleading comment that was updated and suggested a correction to improve code clarity.
be2c0a0 to
354585e
Compare
In case LND fails to startup correctly and fail before we would start the
SubSwapperwe would deadlock when shutting down the server here:lnd/chanbackup/pubsub.go
Lines 206 to 207 in ff32e90
Happend while fixing: #10041