A command-line tool that filters JSON, CSV, and log files. One tool, one syntax, any format.
Instead of juggling grep, awk, and jq with different syntax for each, flowfilter lets you write a single query that works across all three formats. The query language borrows from SQL — not because it talks to a database, but because most developers already know how WHERE and SELECT work.
You feed it a file (or pipe data via stdin), and write a query that references the field names in your data. That's the key idea — the field names in the query come directly from the fields in your JSON objects, CSV column headers, or parsed log fields.
Here's a JSON file with user records. Each record has fields called name, age, role, and active:
{"name": "Alice", "age": 32, "role": "admin", "active": true}
{"name": "Bob", "age": 24, "role": "user", "active": true}
{"name": "Charlie", "age": 45, "role": "admin", "active": false}
{"name": "Diana", "age": 28, "role": "user", "active": true}
{"name": "Eve", "age": 19, "role": "user", "active": false}To filter this, you reference those field names in a WHERE clause:
$ flowfilter 'WHERE age > 25 AND active = true' users.json
{"active":true,"age":32,"name":"Alice","role":"admin"}
{"active":true,"age":28,"name":"Diana","role":"user"}age, active, name, role — those aren't keywords. They're the field names from the JSON above. If your data had a field called price, you'd write WHERE price > 100.
You can also pick which fields to include in the output with SELECT:
$ flowfilter 'WHERE age > 25 AND active = true SELECT name, age' users.json
{"age":32,"name":"Alice"}
{"age":28,"name":"Diana"}Or just count the matches:
$ flowfilter 'WHERE role = "admin"' --count users.json
2The exact same query style works on CSV files. Here, the field names come from the CSV column headers (product, price, quantity, category):
product,price,quantity,category
Widget A,29.99,100,electronics
Widget B,149.50,25,electronics
Gadget D,75.00,50,accessories
Service E,199.99,10,services$ flowfilter 'WHERE price > 50' sales.csv
product,price,quantity,category
Widget B,149.5,25,electronics
Gadget D,75,50,accessories
Service E,199.99,10,servicesAnd log files work too. flowfilter parses Apache/Nginx logs into fields like method, path, status, remote_host, etc., so you can query them by name:
192.168.1.1 - - [10/Oct/2024:13:55:36 +0000] "GET /index.html HTTP/1.1" 200 2326
10.0.0.1 - - [10/Oct/2024:13:57:01 +0000] "GET /favicon.ico HTTP/1.1" 404 0
192.168.1.1 - - [10/Oct/2024:13:58:22 +0000] "DELETE /api/users/5 HTTP/1.1" 500 128
$ flowfilter 'WHERE status >= 400 SELECT method, path, status' access.log
{"method":"GET","path":"/favicon.ico","status":404}
{"method":"DELETE","path":"/api/users/5","status":500}You don't need to tell flowfilter what format your file is — it auto-detects from the content and file extension.
You'll need Rust 1.94 or later.
git clone https://github.com/dragonGR/flowfilter.git
cd flowfilter
cargo build --release
cp target/release/flowfilter ~/.local/bin/ # or wherever you keep binariesThe query language is intentionally simple. If you've written a SQL WHERE clause, you already know how to use it.
flowfilter 'WHERE field = value'
flowfilter 'WHERE age > 25'
flowfilter 'WHERE name = "Alice"'
flowfilter 'WHERE active = true'| Operator | Example | Description |
|---|---|---|
= |
WHERE status = "ok" |
Equal |
!= |
WHERE status != "error" |
Not equal |
> < >= <= |
WHERE age >= 18 |
Numeric/string comparison |
LIKE |
WHERE name LIKE "A%" |
Pattern match (% = any, _ = single char) |
IN |
WHERE status IN ("ok", "pending") |
Match any value in list |
IS NULL |
WHERE email IS NULL |
Field is null or missing |
IS NOT NULL |
WHERE email IS NOT NULL |
Field exists and isn't null |
# AND - both must be true
flowfilter 'WHERE age > 25 AND active = true'
# OR - either can be true
flowfilter 'WHERE role = "admin" OR role = "superuser"'
# NOT - negate a condition
flowfilter 'WHERE NOT status = "deleted"'
# Parentheses for grouping
flowfilter 'WHERE (age > 25 OR vip = true) AND active = true'Use SELECT to pick specific fields from the output instead of getting the whole record back:
flowfilter 'WHERE age > 25 SELECT name, email'
flowfilter 'SELECT name, age' # no filter, just project fieldsDot notation works for nested objects:
flowfilter 'WHERE user.address.city = "New York"'
flowfilter 'WHERE user.address.city = "NYC" SELECT user.name, user.address.zip'Handles both newline-delimited JSON (one object per line) and JSON arrays. Auto-detected.
# NDJSON
cat records.jsonl | flowfilter 'WHERE status = "active"'
# JSON array
flowfilter 'WHERE id > 100' data.jsonAuto-detects headers from the first row. Supports custom delimiters and headerless files.
# Standard CSV
flowfilter 'WHERE price > 50' products.csv
# Tab-separated
flowfilter 'WHERE score >= 90' --delimiter $'\t' results.tsv
# No header row (fields become col0, col1, col2...)
flowfilter 'WHERE col1 > 25' --no-header data.csvNote: CSV fields are strings internally. When you compare a field to a number (WHERE price > 50), flowfilter automatically tries to parse the string as a number. This just works in practice.
Built-in patterns for Apache Combined and syslog formats. You can also supply your own regex.
# Apache/Nginx access logs (auto-detected)
flowfilter 'WHERE status >= 400' access.log
# Syslog
flowfilter 'WHERE program = "sshd"' /var/log/syslog
# Custom pattern with named capture groups
flowfilter 'WHERE level = "ERROR"' \
--log-pattern '(?P<timestamp>\S+) (?P<level>\S+) (?P<message>.*)' \
app.logLog fields depend on the pattern. Apache logs give you remote_host, user, timestamp, method, path, status, size. Syslog gives you timestamp, hostname, program, pid, message.
# JSON output (default for JSON/CSV input)
flowfilter 'WHERE age > 25' data.json
# Pretty table
flowfilter 'WHERE age > 25' -o table data.json
# CSV output
flowfilter 'WHERE age > 25' -o csv data.json
# Raw (one value per line)
flowfilter 'WHERE age > 25' -o raw data.json-f, --format <auto|json|csv|log> Force input format (otherwise auto-detected)
-o, --output <auto|json|csv|table|raw> Choose output format
--delimiter <CHAR> CSV delimiter (default: comma)
--no-header CSV has no header row
--log-pattern <REGEX> Custom log regex with named groups
-c, --count Just print how many records matched
--first <N> Stop after N matches
--last <N> Show only the last N matches
--stats Print field statistics instead of records
--no-color Disable colored output
The --stats flag gives you a quick overview of the matching data instead of dumping every record:
$ flowfilter 'WHERE active = true' --stats users.json
Field Count Nulls Unique Min Max Mean
name 3 0 3 - - -
age 3 0 3 28.0 32.0 30.0
email 3 0 3 - - -
active 3 0 1 - - -Internally, flowfilter processes data as a streaming pipeline:
- Parse the query expression into an AST
- Read input one record at a time (constant memory usage)
- Evaluate the filter against each record
- Project selected fields (if using SELECT)
- Write matching records to stdout
This means you can pipe arbitrarily large files through it without worrying about memory. A 10GB log file uses the same amount of RAM as a 10KB one.
The input format is auto-detected from file extensions and content sniffing, but you can always override it with --format.
flowfilter is written in Rust with zero-copy parsing where possible. It processes data in a single pass with no buffering (except for --last which uses a ring buffer, and table output which needs to compute column widths).
On a typical machine, expect throughput in the hundreds of MB/s range for simple filters. The hand-written recursive descent parser adds negligible overhead compared to the I/O.
git clone https://github.com/dragonGR/flowfilter.git
cd flowfilter
# Run tests
cargo test
# Run clippy
cargo clippy
# Build release binary
cargo build --release
# Run benchmarks
cargo benchMIT