Tags: cachevector/comprexx
Tags
v0.2.0 New compression techniques: - Unstructured pruning (magnitude/random, gradual cubic schedule) - N:M sparsity (default 2:4, for Ampere sparse tensor cores) - Weight-only INT4/INT8 quantization (group-wise) - Low-rank decomposition via truncated SVD - Operator fusion (Conv+BN) via torch.fx - Weight clustering (k-means codebook) Plus per-layer sensitivity analysis (cx.analyze_sensitivity) for picking exclude_layers and per-layer compression targets. 163 tests passing.
v0.1.0 Initial release. - Model analysis and profiling (cx.analyze) - Structured pruning (L1/L2/random, global or per-layer scope) - Post-training quantization (dynamic and static INT8) - ONNX export with manifest and onnxruntime validation - Recipe-driven pipelines (YAML) - CLI: comprexx analyze, compress, export - Accuracy guards with halt/warn actions