Skip to content

Saveable Generator: initial import#2937

Merged
pplantinga merged 5 commits intospeechbrain:developfrom
flexthink:rand-repro
Jul 10, 2025
Merged

Saveable Generator: initial import#2937
pplantinga merged 5 commits intospeechbrain:developfrom
flexthink:rand-repro

Conversation

@flexthink
Copy link
Copy Markdown
Collaborator

What does this PR do?

This PR improves reproducibility for scenarios where training or inference can be highly non-deterministic, especially in cases where the selection of data, layers to train, etc is done randomly and it is unreasonable to expect training to complete in a single run.

It provides an opt-in wrapper called SaveableGenerator that will save the state of the available global generations (or custom generators if specified) together with the checkpoint.

Sample usage:

generator: !new:model.custom_model.SaveableGenerator # <-- Include the wrapper

checkpointer: !new:speechbrain.utils.checkpoints.Checkpointer
    checkpoints_dir: !ref <save_folder>
    recoverables:
        model: !ref <model>
        lr_scheduler: !ref <lr_annealing>
        counter: !ref <epoch_counter>
        generator: !ref <generator> # <-- Add to the checkpoint

Usage is entirely opt-in. There are no changes to existing recipes or libraries.

Before submitting
  • Did you read the contributor guideline?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Does your code adhere to project-specific code style and conventions?

PR review

Reviewer checklist
  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified
  • Confirm that the changes adhere to compatibility requirements (e.g., Python version, platform)
  • Review the self-review checklist to ensure the code is ready for review

@flexthink flexthink assigned flexthink and pplantinga and unassigned flexthink Jun 12, 2025
@pplantinga pplantinga self-requested a review June 19, 2025 18:09
@pplantinga pplantinga removed their assignment Jun 19, 2025
Copy link
Copy Markdown
Collaborator

@pplantinga pplantinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of saving the random generator state and this seems like a nice implementation to me. Am I understanding correctly that one could fix a recipe to restart the random seed correctly just by adding a SaveableGenerator object to the checkpointer without any arguments? Maybe we could consider a separate PR to convert many of our recipes.

Arguments
---------
generators : list, optional
A list of generator objects. If not provided,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sentence not complete, what happens if not provided?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can you add an example to the docstring?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think the type should be dict or Mapping instead of list, and this should describe how the mapping should be defined, e.g. Mapping[str, Generator] or something like this. I think the torch generator is of the generic type:

https://docs.python.org/3/library/collections.abc.html#collections.abc.Generator

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pplantinga : Addressed... However, in reality, this takes a "generator-like object", only getting/setting the state is required - no need to implement the full interface.

import torch


def test_repro(tmpdir):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of our unit tests take a device parameter for testing on e.g. cuda. Perhaps we can do something similar here to ensure it works at least locally (I guess the CI is running on cpu).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pplantinga : Done in a separate test. However, device support is currently limited given that:

  • Torch has inconsistencies in how generators are handled
  • Even for Cuda, default generators are no longer exposed as generators
  • Some devices like MPS don't even expose RNG state

So for now, this feature will be only for the most common use cases. Other devices can be added later by writing wrappers similar to the one I had for Cuda - if they support the functionality at all.

@flexthink flexthink self-assigned this Jun 23, 2025
@flexthink
Copy link
Copy Markdown
Collaborator Author

I like the idea of saving the random generator state and this seems like a nice implementation to me. Am I understanding correctly that one could fix a recipe to restart the random seed correctly just by adding a SaveableGenerator object to the checkpointer without any arguments? Maybe we could consider a separate PR to convert many of our recipes.

That is absolutely correct. Further customization may be needed if the recipes use custom generators, but for the base case that's all it takes.

Copy link
Copy Markdown
Collaborator

@pplantinga pplantinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM!

@pplantinga pplantinga merged commit 246a952 into speechbrain:develop Jul 10, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants