feat: auto-regenerate accounts on persistent failures#1942
feat: auto-regenerate accounts on persistent failures#1942SantiagoPittella wants to merge 2 commits intonextfrom
Conversation
6824aa1 to
b6ef68b
Compare
|
In theory an alternative would be to separately monitor the genesis hash (or the latest known local chain tip commitment)? This wouldn't catch a protocol upgrade, but in that situation won't we remain broken even if we regenerate? |
The thing is that sometimes we restart the service but not the accounts after a protocol upgrade, causing the service to continue with the old accounts. |
JereSalo
left a comment
There was a problem hiding this comment.
I see that in run_counter_tracking_task we assign the counter_account variable, I wonder if after re-creating the accounts this keeps pointing to the previous account and we should update it. Perhaps there's something I'm missing though.
Not for this PR but maybe we should consider adding tests for these kinds of behavior (if you find it's worth it and not too complex).
| consecutive_failures, | ||
| "re-sync ineffective, regenerating accounts from scratch" | ||
| ); | ||
| last_regeneration = Some(Instant::now()); |
There was a problem hiding this comment.
We could move this inside the Ok path of try_regenerate_accounts, so that if, for some reason, fails with an error we can try again in a short period of time instead of having to wait for an hour.
There was a problem hiding this comment.
Perhaps it was intentional though and we still want to wait upon failure, but I'll leave this JIC
closes #1930
When the counter increment task fails repeatedly and wallet re-sync from RPC is ineffective (e.g., after a network reset or protocol upgrade), the monitor now automatically creates fresh wallet and counter accounts, deploys them, and re-initializes the increment task. This triggers after 10 consecutive failures and is rate-limited to once per hour to avoid loops when the network itself is down.