Saveable Generator: initial import#2937
Conversation
pplantinga
left a comment
There was a problem hiding this comment.
I like the idea of saving the random generator state and this seems like a nice implementation to me. Am I understanding correctly that one could fix a recipe to restart the random seed correctly just by adding a SaveableGenerator object to the checkpointer without any arguments? Maybe we could consider a separate PR to convert many of our recipes.
speechbrain/utils/repro.py
Outdated
| Arguments | ||
| --------- | ||
| generators : list, optional | ||
| A list of generator objects. If not provided, |
There was a problem hiding this comment.
Sentence not complete, what happens if not provided?
There was a problem hiding this comment.
Also, can you add an example to the docstring?
There was a problem hiding this comment.
Also, I think the type should be dict or Mapping instead of list, and this should describe how the mapping should be defined, e.g. Mapping[str, Generator] or something like this. I think the torch generator is of the generic type:
https://docs.python.org/3/library/collections.abc.html#collections.abc.Generator
There was a problem hiding this comment.
@pplantinga : Addressed... However, in reality, this takes a "generator-like object", only getting/setting the state is required - no need to implement the full interface.
| import torch | ||
|
|
||
|
|
||
| def test_repro(tmpdir): |
There was a problem hiding this comment.
Some of our unit tests take a device parameter for testing on e.g. cuda. Perhaps we can do something similar here to ensure it works at least locally (I guess the CI is running on cpu).
There was a problem hiding this comment.
@pplantinga : Done in a separate test. However, device support is currently limited given that:
- Torch has inconsistencies in how generators are handled
- Even for Cuda, default generators are no longer exposed as generators
- Some devices like MPS don't even expose RNG state
So for now, this feature will be only for the most common use cases. Other devices can be added later by writing wrappers similar to the one I had for Cuda - if they support the functionality at all.
That is absolutely correct. Further customization may be needed if the recipes use custom generators, but for the base case that's all it takes. |
What does this PR do?
This PR improves reproducibility for scenarios where training or inference can be highly non-deterministic, especially in cases where the selection of data, layers to train, etc is done randomly and it is unreasonable to expect training to complete in a single run.
It provides an opt-in wrapper called
SaveableGeneratorthat will save the state of the available global generations (or custom generators if specified) together with the checkpoint.Sample usage:
Usage is entirely opt-in. There are no changes to existing recipes or libraries.
Before submitting
PR review
Reviewer checklist