Skip to content

Monitor: Auto-regenerate accounts after persistent failures #1930

@SantiagoPittella

Description

@SantiagoPittella

Problem:

The counter increment task in the network monitor currently re-syncs the wallet account from the RPC after some consecutive failures. However, if the accounts themselves are fundamentally outdated or corrupted (e.g. after a network reset or protocol upgrade), re-syncing doesn't help and the monitor keeps failing indefinitely and reports the NTX services as unhealthy.

This is a common issue in our current environment.

Proposed Solution

After N consecutive failures where re-sync also fails, the monitor should automatically:

  1. Redeploy the counter account via deploy_counter_account()
  2. Recreate and save the wallet and counter account files
  3. Re-initialize the increment task state (setup_increment_task)
  4. Reset failure counters and resume normal operation

This should be bounded (e.g. max 1 regeneration attempt per hour) to avoid infinite loops if the network itself is down.

Metadata

Metadata

Labels

monitorRelated to the network monitor

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions