Comparing changes

* Update benchmark data and kernel benchmarks * Update benchmark config for qwen3.5-35B: change world_size and batch sizes * fixed * feat: update sgl_chunk_gdn test config - seq_len 4096->8192, beta dtype bfloat16->float32, cu_seqlens size 2->4 * feat(kernel_benchmark): add varlen mode test for sgl_causal_conv1d - Add generate_testcase_varlen() to simulate SGLang continuous batching scenario - Add run_test_varlen() for varlen mode performance testing - Update TestParam with seq_lens and num_cache_slots fields - Support x shape [dim, cu_seqlen] with query_start_loc and cache_indices * bug fix and add benchdata * fixed * fixed * fixed * fixed * fixed * fixed * Sync kernel benchmark files from upstream: sgl_causal_conv1d.py, sgl_causal_conv1d_update.py, sgl_chunk_gdn.py * fixed * fixed * fixed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comparing changes

Open a pull request

Uh oh!

Commits on Mar 2, 2026

Commits on Mar 18, 2026

This comparison is taking too long to generate.

Uh oh!