What feature would you like to see added?
For very long runs or when clusters / computers shut down, it would be great to have some model checkpointing available.
Describe how this feature would improve the code
It would make the code more resilient to spontaneous events like shutdowns but would also be convenient in the rare case where someone forgets to save their models.
Describe how this feature would work/possible implementations
We already have model saving built into the code. We can add a checkpointing argument to the Gym, which calls this to a special directory called ckpts.
What feature would you like to see added?
For very long runs or when clusters / computers shut down, it would be great to have some model checkpointing available.
Describe how this feature would improve the code
It would make the code more resilient to spontaneous events like shutdowns but would also be convenient in the rare case where someone forgets to save their models.
Describe how this feature would work/possible implementations
We already have model saving built into the code. We can add a checkpointing argument to the
Gym, which calls this to a special directory calledckpts.