Liger is a fast, composable concatenation tool that creates supermatrices from multiple FASTA alignments. It runs in two modes:
- Exact match (default): headers must match exactly across files, like FASconCAT and AMAS.
- Smart match (
-a alias.txt): pass an alias list of clean output names that get matched to messy input headers via case-insensitive substring search. Underscores in aliases match spaces in headers, soMus_musculusfindsAB123.1 Mus musculus COX1 gene, partial cds. Longer aliases match first to prevent partial collisions. The alias list doubles as a rename map — input headers contain metadata, output gets clean names. Requires-lfor a provenance TSV that records exactly which original header matched each alias.
Liger auto-detects DNA vs amino acid data per gene and adjusts missing characters and partition labels accordingly. FASTA output goes to stdout, partition boundaries to stderr in RAxML/IQ-TREE format by default. NEXUS bundles everything into one file.
Liger is also available as the concat subcommand in phylo.
Liger was benchmarked against AMAS, FASconCAT-G, PhyKit, pxcat, and SeqKit across combinations of 10–1,000 taxa and 10–1,000 loci. Liger is the fastest tool at every scale, while maintaining moderate memory usage.
Requires Rust.
cargo install --git https://github.com/andrewbudge/LigerTo update to the latest version:
cargo install --force --git https://github.com/andrewbudge/LigerThe previous Go version (v1) is preserved in the v1-go/ directory.
liger [FLAGS] [INPUT FASTA FILES]-a, --alias— alias list for smart matching (clean output names that map to messy input headers)-l, --log— provenance TSV output file (required with-a)-f, --format— output format: fasta (default), nexus (also acceptsnornex)-m, --missing— override missing data character (default: N for DNA, X for amino acid, ? for mixed)-p, --partitions— partition format: raxml (default, also used by IQ-TREE) or nexus
$ liger gene1.fasta gene2.fasta > supermatrix.fasta
DNA, gene1.fasta = 1-500
DNA, gene2.fasta = 501-1000$ cat alias.txt
Mus_musculus
Rattus_rattus
Xenopus_laevis
$ liger -a alias.txt -l prov.tsv gene1.fasta gene2.fasta > supermatrix.fasta
DNA, gene1.fasta = 1-4
DNA, gene2.fasta = 5-8
$ cat supermatrix.fasta
>Mus_musculus
ATCGATCG
>Rattus_rattus
ATCGNNNN
>Xenopus_laevis
NNNNATCG
$ cat prov.tsv
alias.txt gene1.fasta gene2.fasta
Mus_musculus AB123.1 Mus musculus gene1 cds XM456.1 Mus musculus gene2 cds
Rattus_rattus AB124.1 Rattus rattus gene1 cds MISSING
Xenopus_laevis MISSING XM789.1 Xenopus laevis gene2 cds$ liger -a alias.txt -l prov.tsv -f nexus gene1.fasta gene2.fasta
#NEXUS
BEGIN DATA;
DIMENSIONS NTAX=3 NCHAR=8;
FORMAT DATATYPE=DNA MISSING=N GAP=-;
MATRIX
Mus_musculus ATCGATCG
Rattus_rattus ATCGNNNN
Xenopus_laevis NNNNATCG
;
END;
BEGIN SETS;
CHARSET gene1.fasta = 1-4;
CHARSET gene2.fasta = 5-8;
END;Partitions always go to stderr so they can be piped or redirected.
# Default: RAxML/IQ-TREE format
$ liger dna_gene.fasta protein_gene.fasta > matrix.fasta
DNA, dna_gene.fasta = 1-500
WAG, protein_gene.fasta = 501-700
# NEXUS charset format
$ liger -p nexus dna_gene.fasta protein_gene.fasta > matrix.fasta
CHARSET dna_gene.fasta = 1-500;
CHARSET protein_gene.fasta = 501-700;Liger v2 is a rewrite in Rust of the original Go tool (v1). Development is assisted by Claude (Anthropic), which serves as a teaching aid and coding partner. The design, domain knowledge, and direction are the author's own.
Andrew Budge