MUSIAL-v2.4

Latest

Latest

s-t-h released this 27 Sep 13:51

· 14 commits to main since this release

927e60c

v2.4.2 (Minor Update, 13.10.2025)

Update Gradle to version 9.1.0.
Improvements to the sequence task: Sequences can now be stored per sample, per feature, or for both. The performance of sequence export has been improved when gaps should not be added (no alignment case).
Multiple minor bug fixes:
- Added check for missing alternative in SnpEff annotation.
- Handling of DP4 coverage annotations is now fixed.
- The temporary directory is now correctly deleted if no task is executed.

v2.4.2 (Minor Update, 27.09.2025)

Performance and Memory Optimization: String operations were streamlined and a faster navigable map implementation (btreemap) introduced, leading to reduced runtime and memory usage. For large-scale datasets, MUSIAL now integrates a local caching system (Ehcache) to back memory-heavy variant call processing.
Support for HDBSCAN* clustering of alleles and proteoforms (introduced in v2.4.1 via Tribuo) has been rolled back due to performance and usability constraints. This functionality is planned for a dedicated future workflow.
The handling of complex variant calls was refined for improved analysis accuracy. In addition, handling of additional file formats was implemented.
Data Export Enhancements: Storage-to-table exports were re-implemented using Tablesaw, providing a more flexible and performant tabular output. Additionally, sequence and variant profile exports per sample have been separated into a distinct task profile for improved usability and workflow clarity.
Codebase Restructuring: The software was reorganized into distinct packages to improve maintainability and logical separation of responsibilities. Key components now cover: CLI parsing and validation, genomics data model, task execution, operations on the model independent of individual tasks, utility methods.

v2.4.1 (Minor Update, 23.05.2025)

Implementation of HDBSCAN* clustering of alleles and proteoforms of features with Tribuo: After inference of the allele and proteoform sequences per sample, these are now clustered using the Tribuo library's HDBSCAN* algorithm to increase the interpretability of the data, i.e. samples that fall into the same clusters in terms of features can be considered similar even if they do not have the exact same set of variants in terms of features. Clustering is done using L1 distance based on binary features represented by all available variants (position & alternative content) of the feature - this means in particular that clustering is not stable across different sets of variants.
The clustering results are used to generate informative names for alleles and proteoforms: these names have been adapted to be used in the different output formats.
Improved naming convention for output files.

Assets 3