This is an annotated paper of the Transformer architecture implemented in numpy to explain the main mechanisms of self-attention and multihead attention to the students of the attention seminar.
- add JAX/autograd
| Name | Name | Last commit date | ||
|---|---|---|---|---|
This is an annotated paper of the Transformer architecture implemented in numpy to explain the main mechanisms of self-attention and multihead attention to the students of the attention seminar.