Merged
Conversation
tjruwase
requested changes
Feb 10, 2020
ac53563 to
9701f05
Compare
tjruwase
approved these changes
Feb 10, 2020
kouml
pushed a commit
to kouml/DeepSpeed
that referenced
this pull request
Apr 3, 2020
jeffra
added a commit
that referenced
this pull request
May 19, 2020
Co-authored-by: yuxionghe <[email protected]> Co-authored-by: Jeff Rasley <[email protected]>
rraminen
added a commit
to rraminen/DeepSpeed
that referenced
this pull request
Nov 18, 2021
…: 3.6.13 not in '>=3.7' during cupy build. (deepspeedai#45)
delock
referenced
this pull request
in delock/DeepSpeedSYCLSupport
Sep 21, 2022
liamcli
pushed a commit
to determined-ai/DeepSpeed
that referenced
this pull request
May 8, 2023
* Add SLURM launcher Signed-off-by: Dashiell Stander <[email protected]> * Need to import SlurmRunner Signed-off-by: Dashiell Stander <[email protected]> * Clean up the config JSON Signed-off-by: Dashiell Stander <[email protected]> * Properly clean up json configs Signed-off-by: Dashiell Stander <[email protected]> * runner Signed-off-by: Dashiell Stander <[email protected]> * Switch to using an argument Signed-off-by: Dashiell Stander <[email protected]> * Pre-commit Signed-off-by: Dashiell Stander <[email protected]> * Prevent clean-up when using slurm, add in hostfile Signed-off-by: Dashiell Stander <[email protected]> * Pass launcher in to autotuning jobs Signed-off-by: Dashiell Stander <[email protected]> * Pass slurm comment in Signed-off-by: Dashiell Stander <[email protected]> * Add a comment argument to DeepSpeed runner Signed-off-by: Dashiell Stander <[email protected]> * Switch slurm_comment to just comment Signed-off-by: Dashiell Stander <[email protected]> * Switch slurm_comment to just comment Signed-off-by: Dashiell Stander <[email protected]> * Use SLURM --nodelist instead of --include Co-authored-by: Quentin Anthony <[email protected]> Signed-off-by: Dashiell Stander <[email protected]> * Use SLURM --nodelist instead of --include > > > Co-authored-by: Quentin Anthony <[email protected]> Signed-off-by: Dashiell Stander <[email protected]> * Launcher args Signed-off-by: Dashiell Stander <[email protected]> * Debug print statement... Signed-off-by: Dashiell Stander <[email protected]> * Debug print statements... Signed-off-by: Dashiell Stander <[email protected]> * Debug print statements... Signed-off-by: Dashiell Stander <[email protected]> * Debug print statements... Signed-off-by: Dashiell Stander <[email protected]> * Debug print statements... Signed-off-by: Dashiell Stander <[email protected]> * user_config bug Signed-off-by: Dashiell Stander <[email protected]> * user_config bug Signed-off-by: Dashiell Stander <[email protected]> * Fix config dict * Pydantic to dict Signed-off-by: Dashiell Stander <[email protected]> * Pydantic to dict Signed-off-by: Dashiell Stander <[email protected]> * Will it work now? Signed-off-by: Dashiell Stander <[email protected]> * Just make it a dict immediately Signed-off-by: Dashiell Stander <[email protected]> * Exclude unset things Signed-off-by: Dashiell Stander <[email protected]> * Add dilation to pooling flops profiler Signed-off-by: Dashiell Stander <[email protected]> * Adding return_indices... Signed-off-by: Dashiell Stander <[email protected]> * Do cleanup with SLURM. Co-authored-by: Quentin Anthony <[email protected]> * Do cleanup with SLURM. Co-authored-by: Quentin Anthony <[email protected]> * Horrific hack to get metrics.json * Push pipeline grad tail fix * No longer hardcode path Signed-off-by: Dashiell Stander <[email protected]> * Also pass in no_ssh_check Signed-off-by: Dashiell Stander <[email protected]> * Also pass in no_ssh_check Signed-off-by: Dashiell Stander <[email protected]> * Also pass in master_addr Signed-off-by: Dashiell Stander <[email protected]> * Stop hardcoding number of steps.... Signed-off-by: Dashiell Stander <[email protected]> * detailed flops breakdown Signed-off-by: Dashiell Stander <[email protected]> * Fix autotuning reporting bug Signed-off-by: Dashiell Stander <[email protected]> * Fix autotuning reporting bug Signed-off-by: Dashiell Stander <[email protected]> * Actually off by a million, not a thousand Signed-off-by: Dashiell Stander <[email protected]> * Clean up debugging stuff Signed-off-by: Dashiell Stander <[email protected]> * Add JSRunner for summit launching on multiple nodes * import JSRUN_LAUNCHER from constants * Fix jsrun typo * Update multinode_runner.py (deepspeedai#45) * add CUDA_VISIBLE_DEVICES to jsrunner --------- Signed-off-by: Dashiell Stander <[email protected]> Signed-off-by: Dashiell Stander <[email protected]> Signed-off-by: Dashiell Stander <[email protected]> Co-authored-by: Dashiell Stander <[email protected]> Co-authored-by: Dashiell Stander <[email protected]> Co-authored-by: Dashiell Stander <[email protected]> Co-authored-by: Quentin TastyRice <[email protected]> Co-authored-by: Dashiell Stander <[email protected]> Co-authored-by: MLRichter <[email protected]> Co-authored-by: Stella Biderman <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.