Fixed the missing token normalization for cross-attention computation by goutamyg · Pull Request #82 · apple/ml-cvnets

goutamyg · 2023-07-31T16:51:04Z

For a downstream task, I see better training convergence upon normalizing both x and x_prev during the computation of cross-attention here: https://github.com/apple/ml-cvnets/blob/main/cvnets/modules/transformer.py#L258

Currently, I am conducting model training with and without the proposed normalization of x_prev and will share the results for the two cases. In the meantime, if this change makes sense, kindly include it. Let me know if you need any related info.

…tion

fixed the missing normalization of tokens for cross-attention computa…

b5831e3

…tion

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed the missing token normalization for cross-attention computation#82

Fixed the missing token normalization for cross-attention computation#82
goutamyg wants to merge 1 commit intoapple:mainfrom
goutamyg:fix_cross_attention_LinearAttnFFN

goutamyg commented Jul 31, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

goutamyg commented Jul 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

goutamyg commented Jul 31, 2023 •

edited

Loading