Minipoa: A minimizer-based method for fast and memory-efficient partial order alignment.
Install via conda
conda install -c malab minipoaOr, make from source
# Git clone this repository
git clone https://github.com/NCl3-lhd/minipoa.git
cd minipoa && mkdir build && cd build
# AVX2(default)
cmake -DENABLE_AVX2=ON -DENABLE_AVX512=OFF -DENABLE_SSE2=OFF .. && make
# AVX512
cmake -DENABLE_AVX512=ON -DENABLE_AVX2=OFF -DENABLE_SSE2=OFF .. && make
# SSE2
cmake -DENABLE_SSE2=ON -DENABLE_AVX2=OFF -DENABLE_AVX512=OFF .. && makeGenerate a consensus sequence (Sequencing mode):
minipoa data/mtDNA.fasta > cons.fastaPerform multiple sequence alignment (MSA mode):
minipoa data/mtDNA.fasta -S -r1 -t thread > mtDNA.fasta Output the sequence graph in GFA format:
minipoa data/mtDNA.fasta -S -r2 -t thread > mtDNA.gfaView the full list of parameters:
minipoa -hMinipoa supports input in FASTA, FASTQ, gzipped FASTA(.fa.gz), and gzipped FASTQ(.fq.gz) formats. It incrementally construct a alignment graph by input sequences. Optionally, an existing GFA file can be provided via -i to initialize the alignment graph.
minipoa input.fasta -i input.gfa -S -t thread -r2 > output.gfaMinipoa provides three output modes, which can be selected using the -r parameter. Please note that the -r option is independent and is not affected by any other parameters.
-r0: Output the consensus sequence in FASTA format (default)-r1: Output the multiple sequence alignment:-r2: Output the sequence graph in GFA format
To accommodate different datasets and balance speed with accuracy, minipoa provides the following advanced command-line parameters for fine-tuning:
You can customize the dynamic programming scoring scheme using the following parameters:
-M: Match score-X: Mismatch penalty-O: Gap open penalty-E: Gap extension penalty
minipoa input.fasta -M 2 -X -4 -O -4 -E -2 > output.fasta-B : Enable the adaptive band strategy (static banding is enabled by default).
# Enable adaptive band strategy in Sequencing mode
minipoa input.fasta -B > output.fastaThe band length is automatically calculated based on the specified parameters and the query sequence length using the formula: len = b + 1 / f * query_length.
# Narrow the bandwidth under adaptive band strategy
minipoa input.fasta -B -b 10 -f 100 > output.fastaf = 0 : If you explicitly set f = 0, minipoa will not apply any banding strategy and will perform full dynamic programming across the entire matrix.
minipoa input.fasta -f 0 > output.fasta-S : Enable anchor chain optimization. Please note that enabling this option will force-enable the adaptive band strategy to maintain alignment robustness.
-W : Adjust the window distance parameter for anchors.
- Recommended Range: Values between
500andquery_length / 24are generally appropriate. - Default Behavior: To prevent incorrect alignments caused by false-positive anchors, the software conservatively defaults to the maximum value in this range.
- Performance Tuning: If you want to further accelerate the alignment process, you can manually specify a smaller
-Wvalue. This will increase speed, though it may result in a very slight loss of accuracy.
minipoa input.fasta -S -W 500 -t thread > output.fastaFor any questions or issues, please contact me at [email protected].