s3stor is a command-line tool for backing up files to S3-compatible storage (e.g., AWS S3, Wasabi) with block-based deduplication, point-in-time snapshots, and efficient file management. Designed for reliability and multi-writer safety, it supports syncing files, creating snapshots (with Volume Shadow Copy Service on Windows), listing/restoring files, and cleaning up unused data. Ideal for backup scenarios requiring data integrity and storage efficiency.
# Install Go (https://go.dev/doc/install)
git clone <your-repo-url>
cd s3stor
go build -o s3stor
# Configure Wasabi (or other S3-compatible storage)
export S3_PROVIDER=wasabi
export S3_BUCKET=your-bucket-name
export S3_REGION=us-east-1
export S3_ENDPOINT=https://s3.us-east-1.wasabisys.com
export AWS_ACCESS_KEY_ID=your-wasabi-access-key
export AWS_SECRET_ACCESS_KEY=your-wasabi-secret-key# Sync a file to S3
./s3stor sync test_out/file1.txt
# Output: Synced file1.txt (123 bytes)
# Create a snapshot
./s3stor snapshot test_out sn001 file1.txt
# Output: Snapshot sandow-sn001 created with 1 files
# List files in snapshot
./s3stor ls sandow-sn001
# Output: Files in snapshot sandow-sn001 (created 2025-07-27T22:50:00Z by sandow):
# - file1.txt (123 bytes)
# Restore a file from snapshot
./s3stor get sandow-sn001 file1.txt ./restore
# Output: File reconstructed to: ./restore/file1.txt
# Delete a file from global catalog
./s3stor delete file1.txt
# Output: Deleted file: file1.txt
# Block cleanup completed: 0 blocks deletedJump to Usage for more examples or Architecture for how it works.
- Features
- Architecture
- Installation
- Configuration
- Usage
- Examples
- S3 Bucket Structure
- Locking Mechanism
- Troubleshooting
- Contributing
- License
- Block-Based Deduplication: Splits files into blocks, stores unique blocks by SHA-256 hash, and reuses them across files and snapshots to save storage.
- Point-in-Time Snapshots: Creates consistent backups using Volume Shadow Copy Service (VSS) on Windows, with independent file maps for each snapshot.
- Multi-Writer Safety: Uses S3-based locking to prevent conflicts when multiple instances (e.g., on different machines) access the same bucket.
- File Management:
sync: Upload files to S3 with deduplication, creating global catalog if missing.ls: List files in global catalog or snapshots, creating global catalog if missing.get: Restore files from snapshots or global catalog.map: Display block mappings for a file in global catalog or a snapshot.snapshot: Create snapshots of specified files.delete-snapshot: Remove snapshots and their metadata.delete: Remove files from global catalog with safe block cleanup.cleanup-blocks: Remove unreferenced blocks to reclaim storage.
- S3 Compatibility: Works with AWS S3, Wasabi, and other S3-compatible providers.
- Efficient Cleanup: Safely deletes unreferenced blocks only after checking all file maps (global and snapshot).
s3stor organizes data in an S3 bucket using a structured layout, with separate catalogs for global files and snapshots, deduplicated block storage, and a locking mechanism for concurrency.
-
Global Catalog (
catalog.json):- Stores metadata for files synced via
sync. - Automatically created as an empty catalog (
[]) on firstsyncorlsif not found. - Format: JSON array of entries:
[ { "file_name": "file1.txt", "file_size": 123, "map_key": "maps/file1.txt.json" }, { "file_name": "d001/f005.txt", "file_size": 456, "map_key": "maps/d001/f005.txt.json" } ] map_keypoints to a file map listing block hashes.
- Stores metadata for files synced via
-
File Maps (
maps/<file_name>.json):- For each file in the global catalog, stores metadata and a list of SHA-256 block hashes:
{ "file_name": "file1.txt", "file_size": 123, "block_size": 1048576, "blocks": ["a1b2c3d4...", "e5f6g7h8..."] } - Blocks are stored in
blocks/<hash>.
- For each file in the global catalog, stores metadata and a list of SHA-256 block hashes:
-
Snapshot Catalog (
<hostname>/snapshots/<snapshot_id>/catalog.json):- Created by
snapshot, stores metadata for files in a snapshot (e.g.,sandow/snapshots/sandow-sn001). - Format: JSON object:
{ "snapshot_id": "sandow-sn001", "timestamp": "2025-07-27T22:50:00Z", "computer_id": "sandow", "files": [ { "file_name": "file1.txt", "file_size": 123, "map_key": "sandow/snapshots/sandow-sn001/maps/file1.txt.json" } ] } - Independent of global catalog, with separate file maps.
- Created by
-
Snapshot File Maps (
<hostname>/snapshots/<snapshot_id>/maps/<file_name>.json):- Similar to global file maps, lists block hashes for snapshot files.
- Ensures snapshots are self-contained, unaffected by global catalog changes.
-
Block Storage (
blocks/<hash>):- Stores unique file blocks, identified by SHA-256 hashes.
- Deduplication ensures identical blocks are stored only once, referenced by multiple file maps.
-
Locks (
locks/global/<resource>.lock,locks/<hostname>/snapshots/<snapshot_id>/<resource>.lock):- S3 objects used for concurrency control (e.g.,
locks/global/catalog.lock,locks/global/file1.txt.lock). - Prevents race conditions in multi-writer scenarios (e.g., multiple
s3storinstances). - Automatically expire via S3 lifecycle policy (1-day retention).
- S3 objects used for concurrency control (e.g.,
- Sync:
- Read local file, split into blocks, compute SHA-256 hashes.
- Upload new blocks to
blocks/<hash>if not already present. - Create file map (
maps/<file_name>.json) listing block hashes. - Create or update
catalog.jsonwith file metadata.
- Snapshot:
- Use VSS (Windows) for consistent file access.
- Create snapshot catalog (
<hostname>/snapshots/<snapshot_id>/catalog.json). - Copy or create file maps in
<hostname>/snapshots/<snapshot_id>/maps/. - Reuse existing blocks in
blocks/<hash>.
- Delete:
- Remove file from
catalog.jsonand delete its file map. - Clean up unreferenced blocks by checking all file maps (global and snapshot).
- Remove file from
- Get:
- Read file map (from global catalog or snapshot) to get block hashes.
- Download blocks from
blocks/<hash>. - Reconstruct file locally.
- Files are split into fixed-size blocks (default: 1MB).
- Each block’s SHA-256 hash is computed and stored in
blocks/<hash>. - File maps reference these blocks, enabling deduplication across files and snapshots.
- Example: If
file1.txtandfile2.txtshare a block, it’s stored once inblocks/a1b2c3d4...and referenced by both file maps.
- Install Go:
- Download and install Go (version 1.16+): https://go.dev/doc/install.
- Clone Repository:
git clone <your-repo-url> cd s3stor
- Build:
go build -o s3stor
- Verify:
./s3stor # Output: Usage: go run main.go <sync|ls|get|map|snapshot|delete-snapshot|cleanup-blocks|delete> [args...]
s3stor uses environment variables for S3 configuration. Example for Wasabi:
export S3_PROVIDER=wasabi
export S3_BUCKET=your-bucket-name
export S3_REGION=us-east-1
export S3_ENDPOINT=https://s3.us-east-1.wasabisys.com
export AWS_ACCESS_KEY_ID=your-wasabi-access-key
export AWS_SECRET_ACCESS_KEY=your-wasabi-secret-keyEnsure your S3 credentials allow:
{
"Effect": "Allow",
"Action": ["s3:PutObject", "s3:GetObject", "s3:DeleteObject", "s3:ListBucket"],
"Resource": ["arn:aws:s3:::your-bucket-name/*", "arn:aws:s3:::your-bucket-name"]
}Set an S3 lifecycle policy to expire locks after 1 day:
aws s3api put-bucket-lifecycle-configuration --bucket your-bucket-name --lifecycle-configuration '{
"Rules": [{
"ID": "CleanLocks",
"Status": "Enabled",
"Filter": {"Prefix": "locks/"},
"Expiration": {"Days": 1}
}]
}'s3stor <command> [args...]- sync <file_or_dir>:
- Uploads files to S3 with deduplication, creating global catalog if missing.
- Example:
./s3stor sync test_out/file1.txt
- ls [snapshot_id]:
- Lists files in global catalog (creates empty catalog if missing) or a specific snapshot.
- Example:
./s3stor lsor./s3stor ls sandow-sn001
- get [<snapshot_id>] <file_name> <output_dir>:
- Restores a file from a snapshot (if
snapshot_idprovided) or global catalog. - Example:
./s3stor get sandow-sn001 file1.txt ./restoreor./s3stor get file1.txt ./restore
- Restores a file from a snapshot (if
- map [<snapshot_id>] <file_name>:
- Displays block mappings for a file in the global catalog or a snapshot (if
snapshot_idprovided). - Example:
./s3stor map file1.txtor./s3stor map sandow-sn001 file1.txt
- Displays block mappings for a file in the global catalog or a snapshot (if
- snapshot <source_dir> <snapshot_id> [file_names...]:
- Creates a snapshot of specified files using VSS (Windows).
- Example:
./s3stor snapshot test_out sn001 file1.txt
- delete-snapshot <snapshot_id>:
- Deletes a snapshot and its metadata, with block cleanup.
- Example:
./s3stor delete-snapshot sandow-sn001
- cleanup-blocks:
- Removes unreferenced blocks after checking all file maps.
- Example:
./s3stor cleanup-blocks
- delete <file_name>:
- Removes a file from the global catalog, with block cleanup.
- Example:
./s3stor delete file1.txt
Upload a file and a directory to S3:
./s3stor sync test_out/file1.txt
# Output: Synced file1.txt (123 bytes)
./s3stor sync test_out/d001
# Output: Synced d001/f005.txt (456 bytes)Create a snapshot of specific files:
./s3stor snapshot test_out sn001 file1.txt d001/f005.txt
# Output: Snapshot sandow-sn001 created with 2 filesList files in the global catalog (creates empty catalog if none exists):
./s3stor ls
# Output: Files in global catalog:
# - file1.txt (123 bytes)
# - d001/f005.txt (456 bytes)
# If no catalog exists:
# Output: Files in global catalog:
# (none)List files in a snapshot:
./s3stor ls sandow-sn001
# Output: Files in snapshot sandow-sn001 (created 2025-07-27T22:50:00Z by sandow):
# - file1.txt (123 bytes)
# - d001/f005.txt (456 bytes)View block mappings for a file in the global catalog:
./s3stor map file1.txt
# Output: File Map for file1.txt:
# File Name: file1.txt
# File Size: 123 bytes
# Block Size: 1048576 bytes
# Blocks:
# 1: a1b2c3d4...
# 2: e5f6g7h8...View block mappings for a file in a snapshot:
./s3stor map sandow-sn001 file1.txt
# Output: File Map for file1.txt:
# File Name: file1.txt
# File Size: 123 bytes
# Block Size: 1048576 bytes
# Blocks:
# 1: a1b2c3d4...
# 2: e5f6g7h8...Restore a file from a snapshot:
./s3stor get sandow-sn001 file1.txt ./restore
# Output: File reconstructed to: ./restore/file1.txtRestore a file from the global catalog:
./s3stor get file1.txt ./restore
# Output: File reconstructed to: ./restore/file1.txtRemove a file from the global catalog:
./s3stor delete file1.txt
# Output: Deleted file: file1.txt
# Block cleanup completed: 0 blocks deletedRemove a snapshot:
./s3stor delete-snapshot sandow-sn001
# Output: Snapshot sandow-sn001 deletedManually clean unreferenced blocks:
./s3stor cleanup-blocks
# Output: Block cleanup completed: 2 blocks deletedAfter running commands, your bucket (your-bucket-name) will have:
your-bucket-name/
├── catalog.json
├── maps/
│ ├── file1.txt.json
│ ├── d001/f005.txt.json
├── blocks/
│ ├── a1b2c3d4...
│ ├── e5f6g7h8...
├── <hostname>/
│ ├── snapshots/
│ │ ├── sandow-sn001/
│ │ │ ├── catalog.json
│ │ │ ├── maps/
│ │ │ │ ├── file1.txt.json
│ │ │ │ ├── d001/f005.txt.json
├── locks/
│ ├── global/
│ │ ├── catalog.lock
│ │ ├── file1.txt.lock
│ │ ├── cleanup.lock
│ ├── <hostname>/
│ │ ├── snapshots/
│ │ │ ├── sandow-sn001/
│ │ │ │ ├── file1.txt.lock
- Purpose: Ensures thread-safety in multi-writer scenarios (e.g., multiple
s3storinstances onsandowor other machines). - Implementation: S3 objects (
locks/global/<resource>.lock,locks/<hostname>/snapshots/<snapshot_id>/<resource>.lock) act as mutexes.- Example:
locks/global/catalog.lockfor global catalog updates. - Example:
locks/sandow/snapshots/sandow-sn001/file1.txt.lockfor snapshot file operations.
- Example:
- Acquisition:
- Attempts to write lock object with a unique owner (e.g., hostname
sandow). - Retries (default: 3 attempts) if locked by another instance.
- Attempts to write lock object with a unique owner (e.g., hostname
- Expiration: Locks expire after 1 day via S3 lifecycle policy, preventing deadlocks.
- Commands Using Locks:
sync,delete,snapshot,delete-snapshot,cleanup-blocks.
- Snapshot Creates 0 Files:
- Cause: Files not found in
source_dir, VSS access denied, or lock conflicts. - Fix:
- Verify files:
ls test_out/file1.txt. - Check VSS permissions (Windows): Run as administrator.
- List locks:
aws s3 ls s3://your-bucket-name/locks/. - Remove stuck locks:
aws s3 rm s3://your-bucket-name/locks/global/file1.txt.lock.
- Verify files:
- Cause: Files not found in
- File Not Found in Catalog:
- Cause: File not synced or deleted.
- Fix: Run
./s3stor lsto check catalog, thensyncthe file.
- Global Catalog Not Found:
- Cause: No prior
syncorlscommands executed. - Fix: Run
./s3stor lsor./s3stor sync <file>to create an empty catalog.
- Cause: No prior
- Lock Acquisition Fails:
- Cause: Another instance holds the lock.
- Fix: Wait and retry, or increase
maxLockRetriesin code (default: 3).
- S3 Permission Errors:
- Cause: Insufficient IAM permissions.
- Fix: Update policy with required actions (
PutObject,GetObject,DeleteObject,ListBucket).
- Blocks Not Cleaned Up:
- Cause: Eventual consistency in S3 or recent snapshot creation.
- Fix: Retry
cleanup-blocksor add delay (e.g.,time.Sleep(1 * time.Second)indeleteFile).
- Fork the repository and submit pull requests.
- Report issues or suggest features via GitHub Issues.
- Enhance features:
- Add
--dry-runfordeleteandcleanup-blocks. - Support multiple file deletions:
./s3stor delete file1.txt file2.txt. - Parallelize block cleanup for large buckets.
- Add man page:
man s3stor.
- Add
MIT License. See LICENSE for details.