Fix softmax dim of Residual MoE implementation in moe/layer.py by hero007feng · Pull Request #2110 · deepspeedai/DeepSpeed

hero007feng · 2022-07-19T09:48:56Z

We found that the shape of hidden_states and coef is [batch_size, seq_len, 2], and the softmax function should be applied to the last dimensions, which can be used to normalize the weights of expert and mlp.

fix softmax dim of Residual MoE in moe/layer.py

c593f97

hero007feng requested review from RezaYazdaniAminabadi, ShadenSmith, arashb, awan-10, cli99, conglongli, duli2012, eltonzheng, jeffra, minjiaz, mrwyattii, samadejacobs, samyam, tjruwase, xiaoxiawu-microsoft and yaozhewei as code owners July 19, 2022 09:48

Merge branch 'master' into master

ecc1669

yaozhewei enabled auto-merge (squash) July 20, 2022 01:12

conglongli approved these changes Jul 20, 2022

View reviewed changes

yaozhewei merged commit b4513f6 into deepspeedai:master Jul 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix softmax dim of Residual MoE implementation in moe/layer.py#2110

Fix softmax dim of Residual MoE implementation in moe/layer.py#2110
yaozhewei merged 2 commits intodeepspeedai:masterfrom
hero007feng:master

hero007feng commented Jul 19, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

hero007feng commented Jul 19, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants