Skip to content

Tags: embedl/flash-head

Tags

v0.1.9

Toggle v0.1.9's commit message
Fix handling of lm head for Qwen3.5

v0.1.8

Toggle v0.1.8's commit message
Add FlashHeadQwen3_5 support

v0.1.7

Toggle v0.1.7's commit message
Simplify loading, remove duplicate indices calc, fix prefill path

- Remove clustering_config.json validation from _get_centroids (rely on
  safetensors contents directly)
- Auto-detect n_clusters from centroids tensor shape instead of requiring
  it as a parameter
- Infer vocab_size/hidden_size from weight shape instead of config metadata
- Return indices from _get_cluster_logits to avoid recomputing them in
  get_next_token (removes duplicate index_select + flatten + unique)
- Fix prefill regression: only use FlashHead for single-token decode
  (shape[0] == 1); let vLLM handle prefill natively via compiled path
- Fix sampling softmax to slice [:, -1, :] before temperature scaling

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

v0.1.6

Toggle v0.1.6's commit message
Remove duplicate indices calc

v0.1.5

Toggle v0.1.5's commit message
Bump version to 0.1.5

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

v0.1.4

Toggle v0.1.4's commit message
Bump version to 0.1.4

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

v0.1.3

Toggle v0.1.3's commit message
Bump version to 0.1.3, add HF collection link, remove Homepage URL

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

v0.1.2

Toggle v0.1.2's commit message
Bump version to 0.1.2, fix README images for PyPI

Use absolute URLs for images so they render on PyPI.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

v0.1.1

Toggle v0.1.1's commit message
Bump version to 0.1.1, add PyPI metadata and landing page

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

v0.1.0

Toggle v0.1.0's commit message
Fix pypi-publish permissions: add contents read for release download

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>