Skip to content

alecthomas/t

Repository files navigation

T (T-for-text) - a text processing language and utility

t is a concise language for manipulating text, replacing common usage patterns of Unix utilities like grep, sed, cut, awk, sort, and uniq.

Histogram

Usage

t [<flags>] <programme> [<file> ...]

Example - top 20 most frequent words, lowercased

Using traditional Unix utilities:

tr -s '[:space:]' '\n' < file | tr A-Z a-z | sort | uniq -c | sort -rn | head -20

The equivalent in t would be:

t 'sfld:20' file

Going through the programme step by step gives us:

Op State Description
[line, line, ...] lines of input
s [[word, word], [word], ...] split each line into words
f [word, word, word, ...] flatten into single list
l [word, word, word, ...] lowercase each word
d [[5, "the"], [3, "cat"], ...] dedupe with counts
:20 [[5, "the"], [3, "cat"], ...] take first 20

Installation

curl -fsSL https://raw.githubusercontent.com/alecthomas/t/master/install.sh | sh

To install a specific version or to a custom directory:

curl -fsSL https://raw.githubusercontent.com/alecthomas/t/master/install.sh | sh -s v0.0.1
curl -fsSL https://raw.githubusercontent.com/alecthomas/t/master/install.sh | INSTALL_DIR=~/.local/bin sh

Data Model

By default, input is a flat stream of lines, with each input file's lines concatenated together: [line, line, ...].

Most operators apply to each element of the current array individually. For example, l (lowercase) on ["Hello", "World"] produces ["hello", "world"]—each element is lowercased independently.

Operators come in three kinds:

  • Transform (map): apply to each element. l on ["Hello", "World"]["hello", "world"]
  • Filter: keep or remove elements. /x/ on ["ax", "", "cx"]["ax", "cx"]
  • Reduce: collapse array to a value. # on ["a", "b", "c"]3

Element-wise transforms (u, l, t, n, r, +) recurse through nested arrays automatically. Structural operators (d, #, o, O, selection) operate on the top-level array only—use @ to apply them at deeper levels.

Selection (0, :3, 0,2:5,8) is a reduce operator—it collapses the array to a subset. To apply selection within each element of a nested structure, use @ to descend first.

Use @ to descend into nested structures, ^ to ascend back up.

Type System

There are three types:

Type Description
array ordered collection of values
string text
number numeric value (converted from string via n)

Input is always an array of strings (lines). Operators like s create nested arrays, j joins them back. Numbers only exist after explicit conversion with n, and are used by numeric operators like +.

Split/Join Semantics

s and j are inverse operations—sj always returns the original value.

Arrays have a semantic "level" that determines how s splits and j joins:

Array Level s splits text into j joins with
file lines newline
line words space
word chars nothing

s operates only on the direct text elements of an array—it does not recurse into nested arrays. To split at deeper levels, use @ to descend first.

j joins the elements of each nested array back into text, reversing the effect of s. It does not flatten—use @ to join at deeper levels.

Operators

Quick Reference

Structural

Operator Meaning
s split natural
S<char> or S"<delim>" split on delimiter
j join natural (inverse of s)
J<char> or J"<delim>" join with delimiter
f flatten one level

Transform

Operator Meaning
l lowercase
L<selection> lowercase selected
u uppercase
U<selection> uppercase selected
r[<selection>]/<old>/<new>/ replace (regex), optionally in selected
n to number
N<selection> to number selected
t trim whitespace
T<selection> trim selected

Filter

Operator Meaning
/<regex>/ keep matching
!/<regex>/ keep non-matching
m/<regex>/ extract all matches
x delete empty

Reduce

Operator Meaning
<selection> select elements (index, slice, or multi)
o sort descending
O sort ascending
g<selection> group by
d dedupe with counts
D<selection> dedupe by selected field
# count
+ sum
c columnate
p<selection> partition at indices

Navigation

Operator Meaning
@ descend
^ ascend

Misc

Operator Meaning
; separator (no-op)

Operator Details

s - Split

Splits text elements of the current array according to the array's semantic level:

  • file array → splits text into lines (on newlines)
  • line array → splits text into words (on whitespace)
  • word array → splits text into characters

Array elements are left unchanged—s does not recurse. Use @ to descend and split at deeper levels.

# Split lines into words (line array)
["hello world", "foo bar"]  →  [["hello", "world"], ["foo", "bar"]]

# Split words into chars (word array, after sj)
["hello", "world"]  →  [["h","e","l","l","o"], ["w","o","r","l","d"]]

S<delim> - Split on Delimiter

Splits on a custom delimiter. Use a single character directly, or quotes for multi-character delimiters:

  • S, splits on comma
  • S: splits on colon
  • S"::" splits on ::
# Split CSV
"a,b,c"  →  ["a", "b", "c"]   (with S,)

# Split on ::
"a::b::c"  →  ["a", "b", "c"]   (with S"::")

j - Join

The inverse of s—joins nested arrays back into text using the appropriate delimiter for the array level. sj always returns the original value.

# Join words back into lines (after s)
[["hello", "world"], ["foo", "bar"]]  →  ["hello world", "foo bar"]

# Join chars back into words (after s@s)
[["h","e","l","l","o"], ["w","o","r","l","d"]]  →  ["hello", "world"]

J<delim> - Join with Delimiter

Joins array elements with a custom delimiter:

  • J, joins with comma
  • J"\n" joins with newline
["a", "b", "c"]  →  "a,b,c"   (with J,)

f - Flatten

Flattens nested arrays by one level. Non-array elements are kept as-is.

[["a", "b"], ["c"]]  →  ["a", "b", "c"]
[["a", ["b", "c"]], ["d"]]  →  ["a", ["b", "c"], "d"]   (only one level)

l - Lowercase

Converts all text to lowercase. Works recursively on arrays.

["Hello", "WORLD"]  →  ["hello", "world"]

L<selection> - Lowercase Selected

Lowercases only the elements at the specified indices:

["HELLO", "WORLD", "FOO"]  →  ["hello", "WORLD", "FOO"]   (with L0)
["HELLO", "WORLD", "FOO"]  →  ["hello", "world", "FOO"]   (with L:2)

u - Uppercase

Converts all text to uppercase. Works recursively on arrays.

["Hello", "world"]  →  ["HELLO", "WORLD"]

U<selection> - Uppercase Selected

Uppercases only the elements at the specified indices.

r[<selection>]/<old>/<new>/ - Replace (Regex)

Replaces matches of regex <old> with <new>. Recurses through nested arrays.

With an optional selection, applies replacement only to elements at the specified indices.

# Remove prefix
["ERROR: fail", "ERROR: crash"]  →  ["fail", "crash"]   (with r/ERROR: //)

# Replace pattern
["cat", "hat"]  →  ["dog", "hat"]   (with r/cat/dog/)

# Replace only in first element
["cat", "cat"]  →  ["dog", "cat"]   (with r0/cat/dog/)

n - To Number

Converts strings to numbers. Recurses through nested arrays. Non-numeric strings error.

["42", "3.14", "100"]  →  [42, 3.14, 100]

N<selection> - To Number Selected

Converts only the elements at the specified indices to numbers.

t - Trim

Removes leading and trailing whitespace from each string. Recurses through nested arrays.

["  hello  ", "\tworld\n"]  →  ["hello", "world"]

T<selection> - Trim Selected

Trims only the elements at the specified indices.

/<regex>/ - Filter Keep

Keeps only elements matching the regex.

["apple", "banana", "apricot"]  →  ["apple", "apricot"]   (with /^a/)

!/<regex>/ - Filter Remove

Removes elements matching the regex (keeps non-matching).

["apple", "banana", "apricot"]  →  ["banana"]   (with !/^a/)

m/<regex>/ - Match All

Extracts all regex matches from each element, returning an array of matches per element. This is the equivalent of grep -o.

# Extract all IP addresses from each line
["192.168.1.1 to 10.0.0.1", "from 172.16.0.1"]  →  [["192.168.1.1", "10.0.0.1"], ["172.16.0.1"]]
   (with m/\d+\.\d+\.\d+\.\d+/)

# Extract all numbers
"price: $42, qty: 7"  →  ["42", "7"]   (with m/\d+/)

# Get first match only
m/pattern/@0

# Flatten all matches into single list
m/pattern/f

x - Delete Empty

Removes empty strings and empty arrays from the current array.

["hello", "", "world", ""]  →  ["hello", "world"]

<selection> - Select

Selects elements by index, slice, or combination. See Selection for full syntax.

  • Single index returns the element itself
  • Multiple indices or slices return an array

Also works on strings, treating them as character arrays:

"hello"  →  "h"       (with 0)
"hello"  →  "olleh"   (with ::-1)

o - Sort Descending

Sorts the array in descending order. For arrays of arrays, sorts lexicographically (first element, then second, etc.).

[3, 1, 4, 1, 5]  →  [5, 4, 3, 1, 1]
[[2, "b"], [1, "a"], [2, "a"]]  →  [[2, "b"], [2, "a"], [1, "a"]]

O - Sort Ascending

Sorts the array in ascending order.

[3, 1, 4, 1, 5]  →  [1, 1, 3, 4, 5]

g<selection> - Group By

Groups elements by the value(s) at the specified selection. Produces [[key, [elements...]], ...].

# Group by first element
[["a", 1], ["b", 2], ["a", 3]]  →  [["a", [["a", 1], ["a", 3]]], ["b", [["b", 2]]]]   (with g0)

# Group by slice (composite key)
g0:2  →  key is [first, second] elements

d - Dedupe with Counts

Removes duplicates and counts occurrences. Returns [[count, value], ...] sorted by count descending.

["a", "b", "a", "a", "b"]  →  [[3, "a"], [2, "b"]]

D<selection> - Dedupe by Field

Removes duplicates based on the value at the specified selection, counting occurrences. Returns [[count, element], ...] sorted by count descending.

# Dedupe by first element
[["a", 1], ["b", 2], ["a", 3]]  →  [[2, ["a", 1]], [1, ["b", 2]]]   (with D0)

# - Count

Returns the number of elements in the array.

["a", "b", "c"]  →  3

+ - Sum

Sums all numeric values. Recurses through nested arrays. Strings are coerced to numbers (non-numeric strings contribute 0).

[1, 2, 3, 4]  →  10
[["1", "2"], ["3", "4"]]  →  10

c - Columnate

Formats array of arrays as aligned columns (like column -t). Each column width is automatically determined by the widest element in that column.

[["name", "age"], ["alice", "30"], ["bob", "25"]]
→
name   age
alice  30
bob    25

p<selection> - Partition

Splits an array or string at the specified indices. Each index becomes a split point.

# Split at index 2
["a", "b", "c", "d", "e"]  →  [["a", "b"], ["c", "d", "e"]]   (with p2)

# Split at multiple indices
["a", "b", "c", "d", "e"]  →  [["a"], ["b", "c"], ["d", "e"]]   (with p1,3)

# Chunk into groups of 2 (split at every 2nd index)
["a", "b", "c", "d", "e", "f"]  →  [["a", "b"], ["c", "d"], ["e", "f"]]   (with p::2)

Also works on strings:

"hello"  →  ["he", "llo"]   (with p2)
"abcdef"  →  ["ab", "cd", "ef"]   (with p::2)

@ - Descend

Descends one level into the data structure. Subsequent operations apply to each element of the current array, rather than the array itself.

# Without @: select first element of outer array
[["a", "b"], ["c", "d"]]  →  ["a", "b"]   (with 0)

# With @: select first element of EACH inner array
[["a", "b"], ["c", "d"]]  →  ["a", "c"]   (with @0)

Multiple @ descends multiple levels:

# @@0 operates on elements of elements of elements

^ - Ascend

Ascends one level, undoing a previous @. Returns focus to the parent array.

# Split, descend, select first word, ascend
"hello world\nfoo bar"  →  ["hello", "foo"]   (with s@0)

; - Separator

A no-op operator that does nothing. Useful for visually separating groups of operators in complex programmes.

# Without separator
s@0^do:10

# With separator for readability
s@0^;d;o;:10

Selection

Selection is a reduce operator—it collapses the array to a subset. Selecting a single element returns that element; selecting multiple returns an array:

Syntax Meaning Result
<n> single index (0-based) element
-<n> negative index (from end) element
<n>:<m> slice (exclusive end) array
<n>: slice to end array
:<m> slice from start array
<n>:<m>:<s> slice with stride array
<n>,<m>,<p> select multiple array
<n>,<m>:<p> mixed index + slice array

To apply selection within each element of a nested structure, use @ to descend first:

# Select first 3 lines
t ':3' file

# Split lines into words, then select first word of each line
t 's@0' file

# Split on colon, select first and last fields of each line
t 'S:@0,-1' /etc/passwd

# Split into words, select 1st, 3rd, 4th of each line
t 's@0,2,3' file

# Reorder columns: last column first, then rest
t 's@-1,0:-1' file

Grouping

g<selection> groups elements by the value(s) at the specified selection, producing [[key, [element, ...]], ...].

Syntax Meaning
g0 group by first element
g-1 group by last element
g1,2 group by composite key (elements 1 and 2)
g0:3 group by first three elements as key

Examples:

# Group log lines by IP (first field)
t 'sg0' access.log
# → [["192.168.1.1", [[192.168.1.1, -, -, ...], ...]], ["10.0.0.5", [...]], ...]

# Group CSV rows by region (field 2)
t 'S,g2' sales.csv

# Group by composite key: method + status code
t 'sg0,8' access.log

# Group by IP (first field), showing all requests per IP
t 'sg0' access.log
# → [["192.168.1.1", [[...], [...]]], ["10.0.0.5", [[...]]]]

# Group by IP, show top 10 offenders with their actual requests
t 'sg0o:10' access.log

Aggregation & Cleaning

Operator Behavior Example
# count: [a, b, c]3 t '#' file (line count)
+ sum: [1, 2, 3]6 t 'S,@1n+' data.csv (sum column 2)
t trim whitespace (per element) t 't' file (trim each line)
x delete empty elements t 'x' file (remove blank lines)

Interactive Mode

Interactive mode allows a user to live preview programmes as they're typed. Pressing ^J will toggle between text and JSON modes.

$ t -i access.log
Loaded 124847 lines
t> s                     # live preview as you type
[[192.168.1.1, -, -, ...], [10.0.0.5, -, -, ...], ...]
t> s@8
["200", "404", "200", "500", ...]
t> s@8^d
[[98423, "200"], [1042, "404"], [89, "500"], ...]
t> s@8^do
[[98423, "200"], [1042, "404"], [89, "500"], ...]
t> s@8^do:10<Enter>      # enter commits

CLI Flags

Flag Meaning
-d <delim> input delimiter (what s splits on)
-D <delim> output delimiter (what j joins with)
-c CSV mode (split/join handle quoted fields)
-e <prog> explain
-p <prog> parse tree
-i interactive
-j json output

Rosetta Stone

Filtering

Lines with "fail" but not "expected":

grep fail file | grep -v expected
t '/fail/!/expected/' file

Error messages, deduped and sorted by frequency:

grep ERROR app.log | sed 's/.*ERROR: //' | sort | uniq -c | sort -rn
t '/ERROR/r/.*ERROR: //do' app.log

Field Selection

Select specific columns (1st, 3rd, 4th) from whitespace-delimited file:

awk '{print $1, $3, $4}' file
t 's@0,2,3' file

Extract username and shell from /etc/passwd:

awk -F: '{print $1, $7}' /etc/passwd
t 'S:@0,-1' /etc/passwd

Reorder CSV columns (swap first two, keep rest):

awk -F, -v OFS=, '{print $2, $1, $3}' file
t 'S,@1,0,2:J,' file

Colon-delimited: 5th field, lowercased, reversed:

cut -d: -f5 /etc/passwd | tr A-Z a-z | rev
t 'S:@4ls::-1j' /etc/passwd

Grouping

Group log lines by IP, see all requests from each:

# No simple Unix equivalent - requires awk with arrays
awk '{a[$1] = a[$1] ? a[$1] "\n" $0 : $0} END {for (k in a) print "==" k "==\n" a[k]}' access.log
t 'sg0' access.log

Group errors by error type, show all occurrences:

# Complex in traditional tools
t '/ERROR/r/.*ERROR: //sg0' app.log

Group by field:

# No simple Unix equivalent - requires awk with arrays
t 'sg0' access.log

Group CSV by category (column 3), extract values (column 2):

awk -F, '{a[$3] = a[$3] " " $2} END {for (k in a) print k, a[k]}' data.csv
t 'S,g2@1@1' data.csv

Frequency & Deduplication

Request counts by IP (first field of log):

awk '{print $1}' access.log | sort | uniq -c | sort -rn
t 's@0^do' access.log

HTTP status code distribution (9th field):

awk '{print $9}' access.log | sort | uniq -c | sort -rn
t 's@8^do' access.log

Most requested URLs (7th field), top 20:

awk '{print $7}' access.log | sort | uniq -c | sort -rn | head -20
t 's@6^do:20' access.log

Top 10 file extensions:

ls -1 | grep '\.' | rev | cut -d. -f1 | rev | sort | uniq -c | sort -rn | head -10
t '/\./S.@-1^do:10' filelist

CSV: value frequency in column 1:

cut -d, -f1 data.csv | sort | uniq -c | sort -rn
t 'S,@0^do' data.csv

CSV: unique values in column 3, sorted:

cut -d, -f3 data.csv | sort -u
t --csv 's@2^d@1^O' data.csv

Extract and count email domains:

grep -E '@' file | sed 's/.*@//' | sed 's/[^a-zA-Z0-9.-].*//' | sort | uniq -c | sort -rn
t '/@/S@@-1^do' file

Remove duplicate words within each line:

awk '{delete a; for(i=1;i<=NF;i++) if(!a[$i]++) printf "%s ", $i; print ""}' file
t 's@D0@1^J" "' file

Counting & Aggregation

Count lines (like wc -l):

wc -l < file
t '#' file

Count words (like wc -w):

wc -w < file
t 'sf#' file

Sum a column of numbers:

awk '{sum+=$1} END{print sum}' file
t 'n+' file

Sum column 2 of a CSV:

awk -F, '{sum+=$2} END{print sum}' data.csv
t 'S,@1n+' data.csv

Cleaning & Transformation

Remove blank lines:

sed '/^$/d' file
t 'x' file

Trim whitespace from each line:

sed 's/^[ \t]*//;s/[ \t]*$//' file
t 't' file

Reverse words within each line:

awk '{for(i=NF;i>=1;i--) printf "%s ", $i; print ""}' file
t 's@::-1j' file

Reverse each word's characters (hello world → olleh dlrow):

# Bash equivalent is ugly:
while IFS= read -r line; do echo "$line" | xargs -n1 | rev | xargs; done < file
t 's@s@::-1^j^j' file

Extraction

Extract all IP addresses from log file (like grep -o):

grep -oE '[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+' access.log
t 'm/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/f' access.log

Extract all numbers from text:

grep -oE '[0-9]+' file
t 'm/[0-9]+/f' file

Extract email addresses:

grep -oE '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' file
t 'm/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/f' file

Slicing

Every 3rd line, starting from line 2:

awk 'NR%3==2' file
t '1::3' file

About

`t` (T-for-text) is a concise language for manipulating text, replacing common usage patterns of Unix utilities like grep, sed, cut, awk, sort, and uniq.

Resources

License

Stars

Watchers

Forks

Contributors

Languages