Skip to content

leotaku/aligntools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

aligntools

Tools for optimal alignment of arbitrary sequences and related tasks.

Software

This project provides the align tool, a CLI application that can align the contents of two files on a byte level or based on arbitrary delimiters. The (currently not configurable) scoring function is biased to prefer large gaps, which is generally what you want for simple text or binary analysis tasks. The author has been using the tool for reverse-engineering badly documented binary formats.

Zig source files containing the implementations of the required algorithms hirschberg_gotoh.zig and perfect_hash.zig might also be of interest.

Motivation

While there is a breadth of quality software packages for sequence alignment, these are generally focused solely on specific bioinformatics workloads.1 I was also unable to find a good implementation of the linear-space Hirschberg algorithm.2 Thirdly, this was an excuse for me to learn the Zig programming language.

Algorithms

  • Myers, Eugene W., and Webb Miller. ‘Optimal Alignments in Linear Space’. Bioinformatics 4, no. 1 (1 March 1988): 11–17. https://doi.org/10.1093/bioinformatics/4.1.11.
  • Belazzougui, Djamal, Fabiano C. Botelho, and Martin Dietzfelbinger. ‘Hash, Displace, and Compress’. In Algorithms - ESA 2009, edited by Amos Fiat and Peter Sanders, 5757:682–93. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009. https://doi.org/10.1007/978-3-642-04128-0_61.

Using SIMD or an external library for accelerating the alignment scoring function is planned, but not implemented yet.

  • Farrar, Michael. ‘Striped Smith–Waterman Speeds Database Searches Six Times over Other SIMD Implementations’. Bioinformatics 23, no. 2 (15 January 2007): 156–61. https://doi.org/10.1093/bioinformatics/btl582.
  • Rognes, Torbjørn, and Erling Seeberg. ‘Six-Fold Speed-up of Smith–Waterman Sequence Database Searches Using Parallel Processing on Common Microprocessors’. Bioinformatics 16, no. 8 (1 August 2000): 699–706. https://doi.org/10.1093/bioinformatics/16.8.699.
  • Rahn, René, Stefan Budach, Pascal Costanza, Marcel Ehrhardt, Jonny Hancox, and Knut Reinert. ‘Generic Accelerated Sequence Alignment in SeqAn Using Vectorization and Multi-Threading’. Bioinformatics 34, no. 20 (15 October 2018): 3437–45. https://doi.org/10.1093/bioinformatics/bty380.

License

MIT © 2024-2025 Leo Gaskin

Footnotes

  1. SeqAn3 with its Alphabets feature is an exception to this, and will most likely be a better choice for most applications. Future versions of this package might use SeqAn3 routines for their scoring function.

  2. SeqAn3's align_pairwise algorithm advertises O(N²) space complexity when computing alignments, though I find it hard to believe that this is the best the library can offer. There might be an O(N) space aligorithm somewhere in there, making this project largely obsolete.

About

Tools for optimal alignment of arbitrary sequences and related tasks.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors