Skip to content

kernel: added attention bench for profiling before optimization#360

Merged
guocuimi merged 3 commits intomainfrom
attn_bench
Jan 4, 2025
Merged

kernel: added attention bench for profiling before optimization#360
guocuimi merged 3 commits intomainfrom
attn_bench

Conversation

@guocuimi
Copy link
Collaborator

@guocuimi guocuimi commented Jan 4, 2025


## attention_bench_sm80

### [0] NVIDIA GeForce RTX 4090

| batch_size | q_len | kv_len | n_heads | n_kv_heads | head_dim | HBWPeak | LoadEff | StoreEff | L1HitRate | L2HitRate | Samples | Samples |  CPU Time  |  Noise   | GPU Time  | Noise  | Samples | Batch GPU |
|------------|-------|--------|---------|------------|----------|---------|---------|----------|-----------|-----------|---------|---------|------------|----------|-----------|--------|---------|-----------|
|          1 |    64 |     64 |       2 |          2 |       64 |   7.35% |   0.00% |  100.00% |     0.00% |    76.67% |      2x |  54653x | 107.071 us | 1856.94% |  7.545 us | 10.23% |  95639x |  5.228 us |
|          1 |    64 |    128 |       2 |          2 |       64 |  11.91% |   0.00% |  100.00% |     0.00% |    68.22% |      2x |  40768x | 111.090 us | 1197.68% | 12.265 us |  7.78% |  49399x | 10.122 us |

@guocuimi guocuimi merged commit b78389c into main Jan 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant