cat on steroids β a drop-in
catreplacement that understands Parquet, Avro, CSV, JSONL, and remote sources.
cat is everywhere, but it can't read Parquet or Avro. The existing tools (parquet-cli, avro-tools) are heavy Java dependencies that take ages to install. mcat is a single pip install (or uv tool install) that just works -- all GNU cat flags, plus structured format support and remote sources out of the box.
# With uv (recommended)
uv tool install mcat
# Or with pip
pip install mcat
# With Homebrew
brew tap christyjacob4/tap
brew install mcatmcat works exactly like cat β all the same flags work:
mcat file.txt # Same as cat
mcat -n file.txt # Number lines
mcat -b -s file.txt # Number non-blank, squeeze blanks
mcat -A file.txt # Show all (tabs, ends, non-printing)
echo "hello" | mcat # Stdin passthroughBut it also understands structured data:
mcat data.parquet # Pretty table output
mcat data.parquet --format jsonl # As JSON Lines
mcat data.csv # CSV as table
mcat data.jsonl --head 10 # First 10 records
mcat data.parquet --schema # Print schema only
mcat data.parquet --columns name,age # Select columns
mcat data.parquet --grep "Smith" # Rows where any column matches "Smith"
mcat data.csv --grep "^A" --columns name # Names starting with A
mcat data.parquet --grep "2024" --format jsonl # Rows mentioning 2024
mcat data.parquet --count # Row count (instant for Parquet)
mcat data.parquet --sample 10 # Random 10 rows
mcat data.csv --sample 5 --format jsonl # 5 random rows as JSONL
mcat data.parquet --detect # Print detected format
mcat data.parquet --sort age # Sort ascending by age
mcat data.parquet --sort -age # Sort descending by age
mcat data.csv --sort "region,-sales" # Multi-column sort
mcat data.csv --sort name --head 10 # Sort + headComparing two structured files:
mcat --diff old.csv new.csv
mcat --diff prod.parquet staging.parquet --columns name,ageColumn statistics (instant for Parquet β reads metadata only):
mcat --stats data.parquet
mcat --stats --columns age,salary data.parquet # specific columns onlyTransparent compression (gzip, zstd, bz2, lz4, xz β all work):
mcat data.parquet.gz
mcat s3://bucket/logs.jsonl.zst --head 100
mcat data.csv.bz2 --statsAnd remote sources (streaming, no full download):
mcat s3://bucket/data.parquet
mcat gs://bucket/data.parquet
mcat https://example.com/data.csv
# S3-compatible storage (MinIO, Cloudflare R2, Backblaze B2, DigitalOcean Spaces)
mcat --s3-endpoint https://play.min.io s3://mybucket/data.parquetFormat conversion with --output:
mcat data.parquet --format jsonl --output data.jsonl
mcat data.csv --format jsonl --output data.jsonlPager support for large output (respects $PAGER, defaults to less -R):
mcat large_data.parquet --pager # view in pager
mcat data.csv --pager # page through CSV table
PAGER="more" mcat data.parquet --pager # use 'more' instead of 'less'| Flag | Short | Description |
|---|---|---|
--number |
-n |
Number all output lines |
--number-nonblank |
-b |
Number non-blank lines only |
--squeeze-blank |
-s |
Squeeze multiple blank lines |
--show-all |
-A |
Equivalent to -vET |
--show-ends |
-E |
Display $ at end of each line |
--show-tabs |
-T |
Display TAB as ^I |
--show-nonprinting |
-v |
Use ^ and M- notation |
-e |
Equivalent to -vE |
|
-t |
Equivalent to -vT |
|
--format |
Output format: table | jsonl | csv | raw |
|
--head |
Show first N rows | |
--tail |
Show last N rows | |
--schema |
Print schema only | |
--columns |
Comma-separated column names | |
--grep |
Filter rows where any column matches pattern (regex) | |
--sample |
Random sample of N rows | |
--count |
-c |
Print row count only |
--sort |
Sort by column(s), prefix with - for descending |
|
--query |
Filter with SQL WHERE clause (powered by DuckDB) | |
--stats |
Print column statistics summary | |
--diff |
Compare two structured files side by side | |
--detect |
Print detected format and exit | |
--output |
-o |
Write output to file instead of stdout |
--pager |
Pipe output through pager (less/more) |
|
--s3-endpoint |
Custom S3 endpoint URL (MinIO, R2, B2, Spaces) | |
--version |
-V |
Show version |
| Format | Extensions | Features |
|---|---|---|
| Parquet | .parquet, .pq |
Stream row groups, schema inspect |
| Avro | .avro |
Stream blocks |
| JSONL | .jsonl, .ndjson |
Pretty-print each record |
| CSV | .csv |
Table with headers |
| TSV | .tsv |
Table with headers |
| Excel | .xlsx, .xls |
First sheet |
| JSON | .json |
Array of objects or single object |
Formats are detected by extension first, then by magic bytes (PAR1, Obj\x01) as a fallback.
Use --format to control output:
table(default) β Rich formatted tablejsonlβ One JSON object per linecsvβ CSV with headersrawβ Python repr
mcat uses zero-config auth β it piggybacks on credentials you've already set up for your cloud provider. No mcat-specific credential flags needed.
aws configure # one-time setup β works everywhere
mcat s3://my-bucket/data.parquetAll standard AWS auth methods work automatically: ~/.aws/credentials, env vars (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY), named profiles (AWS_PROFILE), IAM roles, SSO, etc.
gcloud auth application-default login # one-time setup
mcat gs://my-bucket/data.parquetAlso supports GOOGLE_APPLICATION_CREDENTIALS for service account keys.
# Set env vars once
export AZURE_STORAGE_ACCOUNT_NAME=myaccount
export AZURE_STORAGE_ACCOUNT_KEY=...
mcat az://mycontainer/data.parquetAlso works with az login and DefaultAzureCredential.
# Option 1: AWS_ENDPOINT_URL env var (recommended β boto3/botocore 1.29+ official)
export AWS_ENDPOINT_URL=https://play.min.io
export AWS_ACCESS_KEY_ID=minioadmin
export AWS_SECRET_ACCESS_KEY=minioadmin
mcat s3://mybucket/data.parquet
# Option 2: Named profile in ~/.aws/config
# [profile minio]
# endpoint_url = https://play.min.io
# aws_access_key_id = minioadmin
# aws_secret_access_key = minioadmin
AWS_PROFILE=minio mcat s3://mybucket/data.parquet
# Option 3: Per-command --s3-endpoint override
mcat --s3-endpoint https://play.min.io s3://mybucket/data.parquetMIT