mlr-binder

Turn Miller into a fluent Java API—shape CSV, JSON, TSV, and DKVP in batch jobs, services, and tests without hand-built shell strings or copy-pasted argv lists. Miller Java Binder wraps the real mlr process so you keep upstream Miller semantics while your JVM code stays readable and refactor-friendly.

Why?

Less string soup: Build invocations with Mlr chains, Flag, Verb, and Option instead of concatenating commands you cannot safely rename or review in the IDE.
Catch mistakes earlier: Method names and types encode Miller’s structure; only Miller DSL snippets (for example put / filter expressions) are still validated when mlr runs.
Fast integration: One Maven or Gradle dependency, mlr on PATH (or a path you configure), then Mlr.inDir(...).csv().sort(...).file(...).run()—no native JNI layer or extra services.
Maven Central: Published as io.github.bluedskim:mlr-binder (coordinates match https://repo1.maven.org/maven2/io/github/bluedskim/mlr-binder/0.2/).

Install

Requires Java 11+. Package: net.shed.mlrbinder.

Maven (pom.xml):

<dependency>
  <groupId>io.github.bluedskim</groupId>
  <artifactId>mlr-binder</artifactId>
  <version>0.2</version>
</dependency>

Gradle (Kotlin DSL):

dependencies {
    implementation("io.github.bluedskim:mlr-binder:0.2")
}

Gradle (Groovy):

dependencies {
    implementation 'io.github.bluedskim:mlr-binder:0.2'
}

Use cases

Data prep and ETL: Sort, cut, join, reshape, and aggregate large delimited or JSON files inside JVM pipelines while Miller stays the engine.
Backend and batch jobs: Run repeatable mlr workflows from schedulers or workers with working-directory and file helpers instead of brittle Runtime.exec strings.
Quality gates and fixtures: Mirror Miller in 10 minutes-style flows in tests (see the tutorial mapping section below) to lock behavior to known Miller releases.
Polyglot teams: Let analysts keep Miller expertise; let application teams call the same verbs from Java with a single shared dependency.

Supported Miller (`mlr`) version

This library is developed and tested against Miller mlr 6.17.0. Install that version on your PATH (or confirm behavior yourself if you use a different release).

Binder-only “sugar” on Mlr: Some Mlr methods bundle common Miller verb options into one call (for example uniqCountBy → uniq -c -g …). These Java names are not Miller CLI features; the child process is always standard mlr. Each sugar method documents the Miller CLI equivalent in its Javadoc, and the net.shed.mlrbinder package summary lists them in one table (see generated Javadoc or your IDE’s quick documentation for Mlr / the package).

Recommended style: Prefer a single Mlr chain: use Mlr’s global-flag chain methods (.icsv(), .from("…"), and so on) for global flags, and instance methods named like Miller verbs (.sort(…), .cat(), and so on; only filter / split use .filterVerb() / .splitVerb()). Use the flag(Flags…) + verb(Mlr.Verbs…) combination only when you need it.

Goals

How you invoke Miller: Express Miller behavior as much as possible with Java methods and objects instead of concatenating long command strings; assemble with Flag, Verb, Option, and related types.
Compile-time checks: Parts you build through the API (flag names, verb names, argument order, and so on) can be caught at the type and method-signature level. Miller DSL (for example put expressions) and field-name validity are still checked at runtime by mlr, so the compiler cannot catch everything.
Usable without deep Miller syntax knowledge: Following the recommended style above, global flags go on the Mlr chain, verbs use Mlr methods with the same names as verbs (exceptions: filter → .filterVerb(), split → .splitVerb()). There is also a Mlr.Verbs static factory for internals and advanced assembly. Pass the same arguments as Miller for fine-grained options via Flag, Objective, and Option. (Verb is just an argv fragment; avoid new Verb(...) in application code when you can.)
Execution: Internally, ProcessBuilder starts an mlr process with the assembled argument list. mlr must be installed and executable on PATH (or the path you configure).

Scope and limitations

Nearly all verbs: Factories per Miller verb live in net.shed.mlrbinder.verb.Verbs, and both Mlr verb-named instance methods and Mlr.Verbs delegate there. Global flags follow upstream reference-main-flag-list as static factories on Flags, and Mlr exposes matching chain methods (regenerate with python3 utils/gen_flags.py). Flags missing from the docs can be passed with Flags.raw("--name") / Flags.raw("--name", "value") or verb options via Flag.flag("..."). For forms like --mfrom / --mload where -- follows variadic arguments, use Mlr#mfrom / #mload.
Not a native binding: This runs external mlr, not Miller linked inside the JVM.
When errors surface: Some invalid combinations appear at run() / run(InputStreamReader) via exit code and stderr.

Testing

./gradlew :lib:test

Reports: lib/build/reports/tests/test/index.html, coverage: lib/build/jacocoHtml/index.html (Jacoco)

Miller in 10 minutes → Java

How mlr invocations from Miller in 10 minutes map to this library. All Java samples below use the recommended style: start with Mlr.inDir(…) (and then .csv() when you need --csv), or start from Mlr.withCsvPreset() (static: mlr + --csv before you set workDir / files). Chain global flags on Mlr, and use instance methods named like verbs (filter / split → .filterVerb / .splitVerb). To append another --csv on the same chain, call .csv() again. Verb options use SortFlags helpers f / n / nr, import static …Flag.flag with flag("-f").objective("…"), option / objective, and so on. Chaining several verbs automatically inserts then into the run() argv.

The Java snippets below assume the following imports in common (pick only what you need in real code):

import static net.shed.mlrbinder.Flag.flag;
import static net.shed.mlrbinder.Objective.objective;
import static net.shed.mlrbinder.SortFlags.f;
import static net.shed.mlrbinder.SortFlags.n;
import static net.shed.mlrbinder.SortFlags.nr;
import static net.shed.mlrbinder.verb.Option.option;

import net.shed.mlrbinder.Mlr;

For more patterns, see TenMinTutorialE2eTest, TenMinTutorialFormatsE2eTest (the tutorial’s CSV/JSON/DKVP/TSV file-format sections), and lib/src/test/resources/10min/.

I/O flags and `cat`

mlr --csv cat example.csv

Mlr.inDir(workingPath)
	.csv()
	.cat()
	.file("example.csv")
	.run();

mlr --icsv --opprint cat example.csv

Mlr.inDir(workingPath)
	.icsv()
	.opprint()
	.cat()
	.file("example.csv")
	.run();

File formats (CSV, JSON, DKVP, TSV)

As in the tutorial’s “File formats” section, set input format with --csv / --json / --idkvp / --tsv, and so on.

mlr --csv cat shape.csv
mlr --json cat shape.json
mlr --idkvp --ocsv cat shape.dkvp
mlr --tsv cat shape.tsv

Mlr.inDir(workingPath).csv().cat().file("shape.csv").run();
Mlr.inDir(workingPath).jsonFlag().cat().file("shape.json").run();
Mlr.inDir(workingPath).idkvp().ocsv().cat().file("shape.dkvp").run();
Mlr.inDir(workingPath).tsv().cat().file("shape.tsv").run();

`head` / `tail` options

mlr --csv head -n 4 example.csv

Mlr.inDir(workingPath)
	.csv()
	.head(4)
	.file("example.csv")
	.run();

mlr --csv tail -n 4 example.csv

Mlr.inDir(workingPath)
	.csv()
	.tail(4)
	.file("example.csv")
	.run();

`sort`, `cut`

mlr --icsv --opprint sort -f shape -nr index example.csv

Mlr.inDir(workingPath)
	.icsv()
	.opprint()
	.sort(f("shape"), nr("index"))
	.file("example.csv")
	.run();

mlr --icsv --opprint cut -o -f flag,shape example.csv

Mlr.inDir(workingPath)
	.icsv()
	.opprint()
	.cut(
		option(flag("-o")),
		option(flag("-f").objective("flag,shape")))
	.file("example.csv")
	.run();

`filter` / `put` (DSL stays a plain string)

mlr --icsv --opprint filter '$color == "red"' example.csv

Mlr.inDir(workingPath)
	.icsv()
	.opprint()
	.filterVerb(objective("$color == \"red\""))
	.file("example.csv")
	.run();

mlr --icsv --opprint put '$[[3]] = "NEW"' example.csv

Mlr.inDir(workingPath)
	.icsv()
	.opprint()
	.put(objective("$[[3]] = \"NEW\""))
	.file("example.csv")
	.run();

Field names with spaces (`-nr` value is one token without shell quotes)

mlr --csv cat spaces.csv

Mlr.withCsvPreset()
	.workDir(workingPath)
	.cat()
	.file("spaces.csv")
	.run();

mlr --c2p sort -nr 'Total MWh' spaces.csv

Mlr.inDir(workingPath)
	.c2p()
	.sort(nr("Total MWh"))
	.file("spaces.csv")
	.run();

mlr --c2p put '${Total KWh} = ${Total MWh} * 1000' spaces.csv

Mlr.inDir(workingPath)
	.c2p()
	.put(objective("${Total KWh} = ${Total MWh} * 1000"))
	.file("spaces.csv")
	.run();

Multiple input files

mlr --csv cat data/a.csv data/b.csv

Mlr.inDir(workingPath)
	.csv()
	.cat()
	.file("data/a.csv")
	.file("data/b.csv")
	.run();

Chaining verbs with `then`

mlr --icsv --opprint sort -nr index then head -n 3 example.csv

Mlr.inDir(workingPath)
	.icsv()
	.opprint()
	.sort(nr("index"))
	.head(3)
	.file("example.csv")
	.run();

The tutorial’s shell pipe (mlr … | mlr …) maps to two Mlr calls in Java. To capture the first stage’s stdout to a temp file, use redirectOutputFile; the second stage runs in that directory with files only (no stdin).

mlr --csv sort -nr index example.csv | mlr --icsv --opprint head -n 3

import java.nio.file.Files;
import java.nio.file.Path;

Path tmp = Files.createTempDirectory("mlr-pipe");
Path sorted = tmp.resolve("sorted.csv");
Mlr.inDir(workingPath)
	.csv()
	.sort(nr("index"))
	.file("example.csv")
	.redirectOutputFile(sorted.toFile())
	.run();

String pprintTop = Mlr.inDir(tmp.toString())
	.icsv()
	.opprint()
	.head(3)
	.file(sorted.getFileName().toString())
	.run();

`--from`

mlr --icsv --opprint --from example.csv sort -nr index then head -n 3

Mlr.inDir(workingPath)
	.icsv()
	.opprint()
	.from("example.csv")
	.sort(nr("index"))
	.head(3)
	.run();

`--mfrom` / `--mload` (`--` after variadic arguments)

mlr --csv --mfrom a.csv b.csv -- cat

Mlr.inDir(workingPath)
	.csv()
	.mfrom("a.csv", "b.csv")
	.cat()
	.run();

Verbs with many options such as `stats1`

mlr --icsv --opprint --from example.csv stats1 -a count,min,mean,max -f quantity -g shape

Mlr.inDir(workingPath)
	.icsv()
	.opprint()
	.from("example.csv")
	.stats1(
		flag("-a").objective("count,min,mean,max"),
		flag("-f").objective("quantity"),
		flag("-g").objective("shape"))
	.run();

JSON input and output

mlr --ijson --ocsv cat example.json

Mlr.inDir(workingPath)
	.ijson()
	.ocsv()
	.cat()
	.file("example.json")
	.run();

In-place update `-I`

mlr -I --csv sort -f shape newfile.txt

Mlr.inDir(tmpDir)
	.inPlace()
	.csv()
	.sort(f("shape"))
	.file("newfile.txt")
	.run();

The tutorial’s put example mentioning $y2 is likely a documentation typo; in Miller you write squaring as $y**2.

Examples

import static net.shed.mlrbinder.SortFlags.n;
import static net.shed.mlrbinder.SortFlags.nr;

import net.shed.mlrbinder.Mlr;

// Recommended: global-flag chain + verb-named methods
String runResult = Mlr.inDir(workingPath)
	.csv()
	.sort(n("a"), nr("b"))
	.file("example.csv")
	.run();

// When starting from the CSV preset, use static Mlr.withCsvPreset()
String runResult2 = Mlr.withCsvPreset()
	.workDir(workingPath)
	.sort(n("a"), nr("b"))
	.file("example.csv")
	.run();

`Mlr`: global-flag chain + verb-named instance methods (recommended)

Mlr.withCsvPreset() — static entry: mlr + --csv (no working directory yet; then .workDir(…) / .file(…)). For a second --csv on the same chain, call .csv() again.
Global-flag chain (recommended): .icsv(), .opprint(), .csv() (appends --csv), .ocsv(), .ijson(), .jsonFlag() (--json), .idkvp(), .tsv(), .oxtab(), .ixtab(), .c2p(), .from("path"), .inPlace() (-I), and so on mirror flag(Flags…) but prefer the chain form.
Verbs (recommended): For each verb in Verbs, Mlr exposes a same-named instance method (for example .uniq(), .histogram(), .join(…)). Exceptions: filter → .filterVerb(…), split → .splitVerb(…). Convenience overloads: .head(n) / .tail(n), .cutFields / .cutOrdered / .cutExcept, .stats1("count", "qty"), .splitBy("shape"), .putQuiet(…). Use .verb(Mlr.Verbs.foo(…)) only when the patterns above are awkward.

For cut’s -o / -x / -f, use CutFlags and .cutOrdered / .cutFields, or .cut(option(…), option(…)). head/tail counts: .head(4) / .tail(4) or HeadTail.n(4). stats1’s -a/-f/-g: StatsFlags. put -q: .putQuiet(…). split -g: .splitBy("shape"). Shared grouping: MillerVerbOpts.groupBy("field") (for example head -g).

For SortFlags, keep using n("field") / nr("field") / f("field") and n() (bare -n, for head).

import static net.shed.mlrbinder.SortFlags.n;
import static net.shed.mlrbinder.SortFlags.nr;

String runResult = Mlr.withCsvPreset()
	.workDir(workingPath)
	.sort(n("a"), nr("b"))
	.file(new File("example.csv"))
	.run();

// Same tutorial flow
String top3 = Mlr.inDir(workingPath)
	.icsv()
	.opprint()
	.sort(nr("index"))
	.head(3)
	.file("example.csv")
	.run();

Miller filter / split verbs are invoked on the chain as .filterVerb(…) and .splitVerb(…).

import static net.shed.mlrbinder.Objective.objective;

Mlr.inDir(workingPath)
	.csv()
	.filterVerb(objective("$index > 1"))
	.file("example.csv")
	.run();

file(File) sets workingPath automatically when it is still unset: absolute file → parent directory; relative file → user.dir. Override anytime with workDir(String) or workingPath(String).

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
.github/workflows		.github/workflows
gradle/wrapper		gradle/wrapper
lib		lib
scripts		scripts
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
Doxyfile		Doxyfile
LICENSE		LICENSE
README.md		README.md
RELEASING.md		RELEASING.md
build.gradle		build.gradle
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
runDoxygen.sh		runDoxygen.sh
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mlr-binder

Why?

Install

Use cases

Supported Miller (`mlr`) version

Goals

Scope and limitations

Testing

Miller in 10 minutes → Java

I/O flags and `cat`

File formats (CSV, JSON, DKVP, TSV)

`head` / `tail` options

`sort`, `cut`

`filter` / `put` (DSL stays a plain string)

Field names with spaces (`-nr` value is one token without shell quotes)

Multiple input files

Chaining verbs with `then`

`--from`

`--mfrom` / `--mload` (`--` after variadic arguments)

Verbs with many options such as `stats1`

JSON input and output

In-place update `-I`

Examples

`Mlr`: global-flag chain + verb-named instance methods (recommended)

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mlr-binder

Why?

Install

Use cases

Supported Miller (mlr) version

Goals

Scope and limitations

Testing

Miller in 10 minutes → Java

I/O flags and cat

File formats (CSV, JSON, DKVP, TSV)

head / tail options

sort, cut

filter / put (DSL stays a plain string)

Field names with spaces (-nr value is one token without shell quotes)

Multiple input files

Chaining verbs with then

--from

--mfrom / --mload (-- after variadic arguments)

Verbs with many options such as stats1

JSON input and output

In-place update -I

Examples

Mlr: global-flag chain + verb-named instance methods (recommended)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Supported Miller (`mlr`) version

I/O flags and `cat`

`head` / `tail` options

`sort`, `cut`

`filter` / `put` (DSL stays a plain string)

Field names with spaces (`-nr` value is one token without shell quotes)

Chaining verbs with `then`

`--from`

`--mfrom` / `--mload` (`--` after variadic arguments)

Verbs with many options such as `stats1`

In-place update `-I`

`Mlr`: global-flag chain + verb-named instance methods (recommended)

Packages