Fix conflict between Tutel and top-2 gate in MoE layer#2053
Fix conflict between Tutel and top-2 gate in MoE layer#2053awan-10 merged 6 commits intodeepspeedai:masterfrom yetiansh:master
Conversation
|
Thank you for the PR @yetiansh :) It looks good to me. Alex had added tutel support so let me tag him and ask for a quick review. @alexandremuzio - can you please review this real quick? |
|
@yetiansh - can you please follow the guide here and update your PR? I see its failing in format checks. https://github.com/microsoft/DeepSpeed/blob/master/CONTRIBUTING.md |
|
Looks good to me. Thanks! |
|
Thanks @alexandremuzio @awan-10. I've run the |
|
Hi, is this PR still active? @awan-10 @alexandremuzio |
|
Sorry for the delay in getting back @yetiansh. I approved this PR so tests can run. Will merge it as soon as tests pass. Thank you! |
| logger.warning("Tutel optimization requested but not installed. " | ||
| "Proceeding without Tutel.") | ||
| elif use_tutel and TUTEL_INSTALLED and gate.k != 1: | ||
| logger.warning( |
There was a problem hiding this comment.
Can we wrap this in a if torch.distributed.get_rank() ==0:?
There was a problem hiding this comment.
Yeah it is possible. But I wonder should we also wrap other warnings and infos? For example, L480 and L482-483?
Previously, using both Tutel optimization and top-2 gating in MoE model training would fail. If we enable both Tutel and top-2,
MoELayerwould try to unpack top-2 gate's output at here, which would fail because top-2 gate does not produce these number of outputs.Fix by checking the gate's type when constructing
MoELayer.