crushr

crushr is a deterministic archive system that preserves and exposes data truth under failure. It is designed for workflows where verifiable payload integrity, explicit metadata handling, and bounded degraded outcomes matter more than generic convenience or maximum compression ratio.

Intent

crushr is designed to preserve and expose the truth about data, especially under partial failure or corruption.

crushr prioritizes data integrity, explicit truth, and bounded failure behavior over maximum compression ratio.

It is not a convenience-first archive format. It is a system that defines what is known, what is degraded, and what must be refused.

Guarantees

Verified data is never silently corrupted or misrepresented
Unverifiable data is never presented as valid
Degraded or partial results are explicitly labeled and structured
Archive processing fails closed when required truth cannot be established
Filesystem writes are constrained and cannot escape intended boundaries

Behavior

Validation vs Verification

crushr enforces a strict separation between:

Validation — structural correctness of archive components
Verification — integrity correctness of data via cryptographic proof (BLAKE3)

No output is considered trustworthy without explicit verification.

Output Classification

All extraction and recovery results are classified into explicit trust classes:

canonical — payload integrity is verified and required metadata is intact
metadata_degraded — payload integrity is verified, but metadata or structure is incomplete
recovered_named — payload integrity is verified and identity has been reconstructed within defined constraints
recovered_anonymous — payload integrity is verified but no reliable identity remains
unrecoverable — payload integrity cannot be proven to required standards

These classes reflect the separation of payload integrity from metadata integrity.

Processing Modes

Strict / Default
- Requires verified payload integrity and required metadata
- Refuses output when canonical guarantees cannot be established
Recover
- Allows extraction of data that can be cryptographically verified
- Produces classified output when metadata or structure is incomplete
- Does not reconstruct, infer, or repair missing data

Why crushr exists

Most archive tooling assumes intact metadata, trustworthy structure, and clean success/failure outcomes.

crushr is built around the opposite assumption:

data can be partially damaged
structure can be incomplete
metadata can be lost
partial truth still has value

The system makes all outcomes explicit rather than assuming correctness.

What crushr is now

The project provides:

deterministic archive creation with crushr pack
integrity and structure validation with crushr verify
strict extraction with crushr extract
recovery-aware extraction with crushr extract --recover
archive introspection with crushr info (--list, --entry, --find, --propagation)
build and identity details with crushr about
shell completion generation with crushr completion
man page generation with crushr man

crushr archives are identified by format markers, not by filename extension.

The canonical default extension is:

.crs

If no extension is supplied for pack -o, .crs is appended automatically.

Core design principles

Prove, don't guess

If a path, file identity, or recovery outcome cannot be proven from surviving archive evidence, crushr does not invent certainty.

Separate trust classes explicitly

All output is classified into explicit trust classes rather than presented as uniformly valid.

Fail closed by default

Strict operations refuse when canonical guarantees cannot be met. Recovery is explicit.

Linux-first honesty

crushr is designed for real Linux archival workflows. Other platforms are not allowed to redefine the core metadata model.

Archives should be inspectable

Archives are not opaque containers. Inspection, listing, and metadata visibility are first-class capabilities.

Recovery model

crushr extract is strict by default.

If strict canonical extraction cannot be completed, the command refuses and requires explicit recovery mode:

crushr extract ... → strict canonical extraction only
crushr extract --recover ... → recovery-aware extraction

Recovery-aware extraction separates output by trust class:

canonical/
metadata_degraded/
recovered_named/
_crushr_recovery/anonymous/
_crushr_recovery/manifest.json

Recovery results are reported explicitly as:

canonical
metadata_degraded
recovered_named
recovered_anonymous
unrecoverable

Anonymous recovered files follow a deterministic naming policy:

high-confidence classification → file_<id>.<ext>
medium-confidence classification → file_<id>.probable-<type>.bin
low/unknown confidence → file_<id>.bin

The recovery manifest preserves structured classification and identity metadata for all recovered outputs.

Linux-first preservation model

crushr's foundational model is the separation of payload integrity from metadata integrity. Tar-style preservation is layered onto that foundation.

Preservation profiles

crushr pack supports explicit archive preservation contracts:

--preservation full (default)
--preservation basic
--preservation payload-only

The selected preservation profile is recorded in archive metadata and shown by crushr info.

full

Preserves the complete Linux-first metadata and entry-kind set currently supported.

basic

Preserves regular files, directories, empty directories, symlinks, hard links, mode, mtime, and sparse semantics.

Intentionally omits:

xattrs
uid/gid
uname/gname
ACLs
SELinux labels
Linux capabilities
FIFOs
device nodes

payload-only

Preserves only regular-file payload bytes plus logical tree reconstruction directories.

Intentionally omits:

symlink semantics
hard link semantics
mode
mtime
sparse semantics
xattrs
ownership
ACLs
SELinux labels
Linux capabilities
FIFOs
device nodes

If a selected profile excludes an entry kind, crushr warns and omits it rather than fabricating an alternate representation.

Current Linux-first preservation scope

With --preservation full, crushr currently preserves:

regular files
directories
empty directories
symlinks and link targets
hard links
sparse files
FIFOs
char/block device nodes
file mode / permissions
modification time (mtime)
extended attributes (xattrs)
numeric ownership (uid / gid)
optional ownership names (uname / gname) when available
POSIX ACL metadata (system.posix_acl_access, system.posix_acl_default)
SELinux label metadata (security.selinux)
Linux file capability metadata (security.capability)

Where preservation or restoration cannot be applied due to platform or permission constraints, crushr degrades honestly and warns rather than silently pretending success.

Long-term preservation goal

crushr aims to support Linux-first archive fidelity suitable for serious tar-based workflows.

This is a staged goal. Full parity is not implied.

Archive introspection

crushr archives are inspectable without extraction.

crushr info provides archive-level introspection: structure, preservation profile, and declared metadata scope
crushr info --list provides entry-level introspection: listing, classification, and attributes without extraction
crushr info --entry <logical/path> provides exact-path truth for one entry without extraction
crushr info --find <query> provides deterministic substring search over stable logical identities
crushr info --propagation provides dependency and impact visibility for detected corruption paths

Current behavior is fail-closed:

if structure can be proven, it is reported
if required metadata is missing, structure is not invented
directory views are derived from stored logical paths

Shell completions

Generate shell completion scripts directly from the clap command model:

crushr completion bash
crushr completion zsh
crushr completion fish

The command prints the completion script to stdout and does not write files.

Examples:

Bash: crushr completion bash > /etc/bash_completion.d/crushr
Zsh: crushr completion zsh > "${fpath[1]}/_crushr"
Fish: crushr completion fish > ~/.config/fish/completions/crushr.fish

Man pages

Generate man pages directly from the clap command model:

crushr man
crushr man --out-dir ./man

Example:

crushr man --out-dir ./man
man ./man/crushr.1

Security and assurance

crushr publishes a self-assessed security and assurance set covering:

threat model
integrity guarantees
verification semantics
architectural invariants
control and audit documents

crushr is designed in alignment with ISO/IEC 27001 control principles (self-assessed) for relevant controls.

This is not a certification claim.

Documentation

Public material lives under docs/.

Primary entry points:

docs/index.md — site landing page
docs/why-crushr.md — positioning
docs/whitepaper/index.md — technical whitepaper
docs/reference/index.md — concise technical reference
docs/chronicles/index.md — historical development

Canonical behavior and guarantees are defined by this README and the security documentation.

Internal project control

Internal planning and control material exists under:

.ai/
.ai/contracts/

These are not part of the public documentation surface.

Product boundary

Stable product surface: info, extract, verify, pack, about, completion, man
Bounded internal surface: workspace Rust crates/modules
Experimental/lab-only surface: crushr lab and research tooling

crushr lab is an internal development harness used to evaluate potential features under controlled conditions.

It is not part of the user-facing product surface and is not covered by stability guarantees. Only behavior that demonstrates clear value and aligns with crushr’s guarantees is promoted into the canonical system.

CLI presentation

Commands share a consistent presentation model:

structured output
consistent terminology
deterministic summaries
restrained motion for active operations

Silent/scriptable mode

Script-oriented paths emit concise deterministic output suitable for automation.

Evidence-oriented workflow

Data is collected
Archives are created
verify establishes integrity and strict-extraction viability
extract returns canonical output when possible
extract --recover returns classified output when required
Results remain reproducible and verifiable

Roadmap direction

complete Linux-first preservation semantics
expand archive introspection
deepen structural visibility
begin compression benchmarking once semantics stabilize
explore reproducible archive modes

Product version governance

VERSION is the canonical version source
update VERSION, then run sync scripts
validate with version checks

License

Code is dual-licensed under MIT OR Apache-2.0.

Documentation and diagrams are licensed under CC-BY-4.0.

The repository follows REUSE compliance with SPDX metadata.

Name		Name	Last commit message	Last commit date
Latest commit History 484 Commits
.ai		.ai
.cargo		.cargo
.github		.github
LICENSES		LICENSES
crates		crates
demos		demos
docs		docs
schemas		schemas
scripts		scripts
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
AI_BOOTSTRAP.md		AI_BOOTSTRAP.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Containerfile.musl		Containerfile.musl
LICENSE		LICENSE
LICENSE-APACHE-2.0		LICENSE-APACHE-2.0
LICENSE-CC-BY-4.0		LICENSE-CC-BY-4.0
LICENSE-MIT		LICENSE-MIT
README.md		README.md
REPO_GUARDRAILS.md		REPO_GUARDRAILS.md
REPO_LAYOUT.md		REPO_LAYOUT.md
REUSE.toml		REUSE.toml
REVIEW_CHECKLIST.md		REVIEW_CHECKLIST.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
SPEC.md		SPEC.md
VERSION		VERSION
zensical.toml		zensical.toml

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

crushr

Intent

Guarantees

Behavior

Validation vs Verification

Output Classification

Processing Modes

Why crushr exists

What crushr is now

Core design principles

Prove, don't guess

Separate trust classes explicitly

Fail closed by default

Linux-first honesty

Archives should be inspectable

Recovery model

Linux-first preservation model

Preservation profiles

full

basic

payload-only

Current Linux-first preservation scope

Long-term preservation goal

Archive introspection

Shell completions

Man pages

Security and assurance

Documentation

Internal project control

Product boundary

CLI presentation

Silent/scriptable mode

Evidence-oriented workflow

Roadmap direction

Product version governance

License

About

Topics

Resources

License

Licenses found

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages