The PR #459 adds initial support for the GPT-OSS-20B model, some follow-up implementations are needed for improving the performance of the model;
The MoE piece can be a new kernel and custom op as the current one, but with support to the model weights bias.
This is a follow-up for #461
The PR #459 adds initial support for the GPT-OSS-20B model, some follow-up implementations are needed for improving the performance of the model;
The MoE piece can be a new kernel and custom op as the current one, but with support to the model weights bias.
This is a follow-up for #461