Add flops profiler tutorial#682
Conversation
| ) | ||
| (transformer): ParallelTransformer( | ||
| 12.61 M, 32.43% Params, 103.62 GMACs, 100.00% MACs, 4.4 ms, 13.22% time, 4.7e+01 TFLOPS, | ||
| (layers): ModuleList( |
There was a problem hiding this comment.
Why is time percent and TFlops 0 for ModuleList?
| model = models.alexnet() | ||
| batch_size = 256 | ||
| macs, params, steps = get_model_profile(model, # the PyTorch model to be profiled | ||
| macs, params = get_model_profile(model=model, # model |
There was a problem hiding this comment.
It would be good to how how the model is called without the profiling so that the input_res and input_consturctors are clearer. Maybe have:
if profile:
macs, params = get_model_profile
else:
output/loss = model(....)
| macs, params, steps = get_model_profile( | ||
| batch_size = 5 | ||
| seq_len = 128 | ||
| macs, params = get_model_profile( |
There was a problem hiding this comment.
It would be good to how how the model is called without the profiling so that the input_res and input_consturctors are clearer. Maybe have:
if profile:
macs, params = get_model_profile
else:
output/loss = model(....)
There was a problem hiding this comment.
Made changes as suggested
| # Output: | ||
| # Number of multiply-adds: 21.74 GMACs | ||
| # Number of parameters: 109.48 M | ||
| Below is an example of this usage in a typical training workflow. |
There was a problem hiding this comment.
In this mode, is the profiler capturing only the forward, or forward backward and step? Can we make this more explicit?
There was a problem hiding this comment.
The profiler only captures the forward. I clarify this through the README.
* Dist testing backend fixes, etc. (deepspeedai#708) * set_batch_fn and remove old sanity check (deepspeedai#712) * properly set engine.local_rank if it's set to -1 * Add executable permission to `ds_elastic` and `ds_report` in `bin`. (deepspeedai#711) * Add executable permission to `ds_elastic` and `ds_report` in `bin`. * Automatic `ds_elastic` formatting Co-authored-by: Jeff Rasley <[email protected]> * local rank of -1 means not set (deepspeedai#720) * bump to 0.3.11 * [launcher] look ma, no more zombies (deepspeedai#714) Co-authored-by: Jeff Rasley <[email protected]> * Improve starred expressions (deepspeedai#696) * Improve starred expressions `deepspeed/profiling/flops_profiler/profiler.py` uses starred expressions that are no longer valid with [PEP 617][1]. The new Python parser is in 3.9, and this change allows DeepSpeed to run with the newest Python version. I have not checked all locations that has this issue. However, this change allows me to run simple examples. [1]: https://www.python.org/dev/peps/pep-0617/ * Match style for "Improve starred expressions", although readability suffers The style guide might need to be updated for this new use case of expressions. Python [Issue 40631][1] includes more discussion on the change. [1]: https://bugs.python.org/issue40631 Co-authored-by: Cheng Li <[email protected]> * Fixed typo in Readme. (deepspeedai#737) * 1bit_adam dependencies (deepspeedai#742) * Clickable screenshots (deepspeedai#746) * Fix docstring * Make screenshots clickable for easier viewing * Add flops profiler tutorial (deepspeedai#682) * work on flops profiler tutorial * update flops profiler tutorial * add flops profiler tutorial and fix names * work on flops profiler tutorial * update flops profiler tutorial * add flops profiler tutorial and fix names * fix tailing ws * fix names * remove multistep profiling and update docs * fix cases where functionals and submodules coexist in a parent module, update readme * fix typo * always invoke post hook function * fix module flops sum and update tests * update tutorial * Only initialize distributed if required (deepspeedai#734) Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Jeff Rasley <[email protected]> Co-authored-by: Shaden Smith <[email protected]> Co-authored-by: Jon Eyolfson <[email protected]> Co-authored-by: Stas Bekman <[email protected]> Co-authored-by: Cheng Li <[email protected]> Co-authored-by: TheDudeFromCI <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Co-authored-by: Sean Naren <[email protected]>
* work on flops profiler tutorial * update flops profiler tutorial * add flops profiler tutorial and fix names * work on flops profiler tutorial * update flops profiler tutorial * add flops profiler tutorial and fix names * fix tailing ws * fix names * remove multistep profiling and update docs * fix cases where functionals and submodules coexist in a parent module, update readme * fix typo * always invoke post hook function * fix module flops sum and update tests * update tutorial
* work on flops profiler tutorial * update flops profiler tutorial * add flops profiler tutorial and fix names * work on flops profiler tutorial * update flops profiler tutorial * add flops profiler tutorial and fix names * fix tailing ws * fix names * remove multistep profiling and update docs * fix cases where functionals and submodules coexist in a parent module, update readme * fix typo * always invoke post hook function * fix module flops sum and update tests * update tutorial
|
In the inference module, can I add the performance analysis tutorial of llama2? |
Added the flops profiler tutorial, configuration, and feature to the website. Also fixed names of some flops profiler function parameters.