crushr is a deterministic archive system that preserves and exposes data truth under failure. It is designed for workflows where verifiable payload integrity, explicit metadata handling, and bounded degraded outcomes matter more than generic convenience or maximum compression ratio.
crushr is designed to preserve and expose the truth about data, especially under partial failure or corruption.
crushr prioritizes data integrity, explicit truth, and bounded failure behavior over maximum compression ratio.
It is not a convenience-first archive format. It is a system that defines what is known, what is degraded, and what must be refused.
- Verified data is never silently corrupted or misrepresented
- Unverifiable data is never presented as valid
- Degraded or partial results are explicitly labeled and structured
- Archive processing fails closed when required truth cannot be established
- Filesystem writes are constrained and cannot escape intended boundaries
crushr enforces a strict separation between:
- Validation — structural correctness of archive components
- Verification — integrity correctness of data via cryptographic proof (BLAKE3)
No output is considered trustworthy without explicit verification.
All extraction and recovery results are classified into explicit trust classes:
canonical— payload integrity is verified and required metadata is intactmetadata_degraded— payload integrity is verified, but metadata or structure is incompleterecovered_named— payload integrity is verified and identity has been reconstructed within defined constraintsrecovered_anonymous— payload integrity is verified but no reliable identity remainsunrecoverable— payload integrity cannot be proven to required standards
These classes reflect the separation of payload integrity from metadata integrity.
-
Strict / Default
- Requires verified payload integrity and required metadata
- Refuses output when canonical guarantees cannot be established
-
Recover
- Allows extraction of data that can be cryptographically verified
- Produces classified output when metadata or structure is incomplete
- Does not reconstruct, infer, or repair missing data
Most archive tooling assumes intact metadata, trustworthy structure, and clean success/failure outcomes.
crushr is built around the opposite assumption:
- data can be partially damaged
- structure can be incomplete
- metadata can be lost
- partial truth still has value
The system makes all outcomes explicit rather than assuming correctness.
The project provides:
- deterministic archive creation with
crushr pack - integrity and structure validation with
crushr verify - strict extraction with
crushr extract - recovery-aware extraction with
crushr extract --recover - archive introspection with
crushr info(--list,--entry,--find,--propagation) - build and identity details with
crushr about - shell completion generation with
crushr completion - man page generation with
crushr man
crushr archives are identified by format markers, not by filename extension.
The canonical default extension is:
.crs
If no extension is supplied for pack -o, .crs is appended automatically.
If a path, file identity, or recovery outcome cannot be proven from surviving archive evidence, crushr does not invent certainty.
All output is classified into explicit trust classes rather than presented as uniformly valid.
Strict operations refuse when canonical guarantees cannot be met. Recovery is explicit.
crushr is designed for real Linux archival workflows. Other platforms are not allowed to redefine the core metadata model.
Archives are not opaque containers. Inspection, listing, and metadata visibility are first-class capabilities.
crushr extract is strict by default.
If strict canonical extraction cannot be completed, the command refuses and requires explicit recovery mode:
crushr extract ...→ strict canonical extraction onlycrushr extract --recover ...→ recovery-aware extraction
Recovery-aware extraction separates output by trust class:
canonical/metadata_degraded/recovered_named/_crushr_recovery/anonymous/_crushr_recovery/manifest.json
Recovery results are reported explicitly as:
canonicalmetadata_degradedrecovered_namedrecovered_anonymousunrecoverable
Anonymous recovered files follow a deterministic naming policy:
- high-confidence classification →
file_<id>.<ext> - medium-confidence classification →
file_<id>.probable-<type>.bin - low/unknown confidence →
file_<id>.bin
The recovery manifest preserves structured classification and identity metadata for all recovered outputs.
crushr's foundational model is the separation of payload integrity from metadata integrity. Tar-style preservation is layered onto that foundation.
crushr pack supports explicit archive preservation contracts:
--preservation full(default)--preservation basic--preservation payload-only
The selected preservation profile is recorded in archive metadata and shown by crushr info.
Preserves the complete Linux-first metadata and entry-kind set currently supported.
Preserves regular files, directories, empty directories, symlinks, hard links, mode, mtime, and sparse semantics.
Intentionally omits:
- xattrs
- uid/gid
- uname/gname
- ACLs
- SELinux labels
- Linux capabilities
- FIFOs
- device nodes
Preserves only regular-file payload bytes plus logical tree reconstruction directories.
Intentionally omits:
- symlink semantics
- hard link semantics
- mode
- mtime
- sparse semantics
- xattrs
- ownership
- ACLs
- SELinux labels
- Linux capabilities
- FIFOs
- device nodes
If a selected profile excludes an entry kind, crushr warns and omits it rather than fabricating an alternate representation.
With --preservation full, crushr currently preserves:
- regular files
- directories
- empty directories
- symlinks and link targets
- hard links
- sparse files
- FIFOs
- char/block device nodes
- file mode / permissions
- modification time (
mtime) - extended attributes (
xattrs) - numeric ownership (
uid/gid) - optional ownership names (
uname/gname) when available - POSIX ACL metadata (
system.posix_acl_access,system.posix_acl_default) - SELinux label metadata (
security.selinux) - Linux file capability metadata (
security.capability)
Where preservation or restoration cannot be applied due to platform or permission constraints, crushr degrades honestly and warns rather than silently pretending success.
crushr aims to support Linux-first archive fidelity suitable for serious tar-based workflows.
This is a staged goal. Full parity is not implied.
crushr archives are inspectable without extraction.
crushr infoprovides archive-level introspection: structure, preservation profile, and declared metadata scopecrushr info --listprovides entry-level introspection: listing, classification, and attributes without extractioncrushr info --entry <logical/path>provides exact-path truth for one entry without extractioncrushr info --find <query>provides deterministic substring search over stable logical identitiescrushr info --propagationprovides dependency and impact visibility for detected corruption paths
Current behavior is fail-closed:
- if structure can be proven, it is reported
- if required metadata is missing, structure is not invented
- directory views are derived from stored logical paths
Generate shell completion scripts directly from the clap command model:
crushr completion bashcrushr completion zshcrushr completion fish
The command prints the completion script to stdout and does not write files.
Examples:
- Bash:
crushr completion bash > /etc/bash_completion.d/crushr - Zsh:
crushr completion zsh > "${fpath[1]}/_crushr" - Fish:
crushr completion fish > ~/.config/fish/completions/crushr.fish
Generate man pages directly from the clap command model:
crushr mancrushr man --out-dir ./man
Example:
crushr man --out-dir ./manman ./man/crushr.1
crushr publishes a self-assessed security and assurance set covering:
- threat model
- integrity guarantees
- verification semantics
- architectural invariants
- control and audit documents
crushr is designed in alignment with ISO/IEC 27001 control principles (self-assessed) for relevant controls.
This is not a certification claim.
Public material lives under docs/.
Primary entry points:
docs/index.md— site landing pagedocs/why-crushr.md— positioningdocs/whitepaper/index.md— technical whitepaperdocs/reference/index.md— concise technical referencedocs/chronicles/index.md— historical development
Canonical behavior and guarantees are defined by this README and the security documentation.
Internal planning and control material exists under:
.ai/.ai/contracts/
These are not part of the public documentation surface.
- Stable product surface:
info,extract,verify,pack,about,completion,man - Bounded internal surface: workspace Rust crates/modules
- Experimental/lab-only surface:
crushr laband research tooling
crushr lab is an internal development harness used to evaluate potential features under controlled conditions.
It is not part of the user-facing product surface and is not covered by stability guarantees. Only behavior that demonstrates clear value and aligns with crushr’s guarantees is promoted into the canonical system.
Commands share a consistent presentation model:
- structured output
- consistent terminology
- deterministic summaries
- restrained motion for active operations
Script-oriented paths emit concise deterministic output suitable for automation.
- Data is collected
- Archives are created
verifyestablishes integrity and strict-extraction viabilityextractreturns canonical output when possibleextract --recoverreturns classified output when required- Results remain reproducible and verifiable
- complete Linux-first preservation semantics
- expand archive introspection
- deepen structural visibility
- begin compression benchmarking once semantics stabilize
- explore reproducible archive modes
VERSIONis the canonical version source- update
VERSION, then run sync scripts - validate with version checks
Code is dual-licensed under MIT OR Apache-2.0.
Documentation and diagrams are licensed under CC-BY-4.0.
The repository follows REUSE compliance with SPDX metadata.