Skip to content

Tags: cachevector/comprexx

Tags

v0.3.0

Toggle v0.3.0's commit message
bump version to 0.3.0

v0.2.0

Toggle v0.2.0's commit message
v0.2.0

New compression techniques:
- Unstructured pruning (magnitude/random, gradual cubic schedule)
- N:M sparsity (default 2:4, for Ampere sparse tensor cores)
- Weight-only INT4/INT8 quantization (group-wise)
- Low-rank decomposition via truncated SVD
- Operator fusion (Conv+BN) via torch.fx
- Weight clustering (k-means codebook)

Plus per-layer sensitivity analysis (cx.analyze_sensitivity) for
picking exclude_layers and per-layer compression targets.

163 tests passing.

v0.1.0

Toggle v0.1.0's commit message
v0.1.0

Initial release.

- Model analysis and profiling (cx.analyze)
- Structured pruning (L1/L2/random, global or per-layer scope)
- Post-training quantization (dynamic and static INT8)
- ONNX export with manifest and onnxruntime validation
- Recipe-driven pipelines (YAML)
- CLI: comprexx analyze, compress, export
- Accuracy guards with halt/warn actions