Gpt-Oss-20B follow-up - Add optmizations to attention and MoE code.

The PR #459 adds initial support for the GPT-OSS-20B model, some follow-up implementations are needed for improving the performance of the model;

- [ ] Flex attention path - [reference here](https://arxiv.org/pdf/2506.07311 );
- [ ] MoE new kernel that uses bias weights. 

The MoE piece can be a new kernel and custom op as the current one, but with support to the model weights bias.

This is a follow-up for #461 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gpt-Oss-20B follow-up - Add optmizations to attention and MoE code. #477

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Gpt-Oss-20B follow-up - Add optmizations to attention and MoE code. #477

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions