Skip to content

Releases: derijkp/scywalker

0.112.0

04 Jun 13:01

Choose a tag to compare

Major changes are:

  • Fixed important bug in merging known and novel isoquant transcripts that could sometimes cause novel transcript counts to be overinflated.
  • Validation fix: Allow empty value for whitelist (for running without whitelist)
  • Added functionality
    • sw select - added function exonlist

Full Changelog: 0.111.0...0.112.0
Full Changelog genomecomb: derijkp/genomecomb@0.111.0...0.112.0

0.111.0

18 Mar 12:10

Choose a tag to compare

Major changes are:

Support for joint analysis has been added using the -iso_joint option.
Isoquant has stricter requirements for the detection of novel isoforms than for known ones. This can cause novel isoforms to be missed in some samples. Joint analysis works by first analysing all samples separately, and than using the novel isoforms found in any of the samples as "known/reference" ones in a reanalysis (still per sample), causing them to be more likely found (and counted) even in these lower evidence samples. If you have many samples, the number of artefactual "novel" transcripts can become too large to use as a reference (performance wise). With the option -iso_joint_min only novel transcripts found in at least samples are used as a reference in stage2, reducing the number of artefacts used drastically.

The command sc_demultiplex was added for demultiplexing singlecell results into different samples, based on a demultiplexing file assigning each cell to sample. (Such a file can be generated based on genomic variants or feature barcode sequencing)

Some improvements in sc barcode detection were made:

  • find_barcodes handles the presence of multiple adapter matches in a read better (mapquality filter, location)
  • Support for v4 and 5 prime 10x barcodes has been added

An alternative cluster distribution method that limits the number of concurrently submitted jobs was added: When using -d slurm, all jobs will be submitted immediately (in a manner jobs will only be run when jobs they are dependent on are finished). This can (for large analyses) submit thousands of jobs, causing problemson clusters that limit the number of submitted jobs. You can now use the options -d -dsubmit slurm to run (distributed) on a slurm cluster, while limiting the number of concurrently submitted jobs to . The disadvantage of this method is that the submitting command has to keep running until the entire command/pipeline is finished (as it is managing the jobs partly by itself).

A custom set of regions for distribution can be added to a genome reference, and used by the option -distrreg g. This allows for optimization of parallel analysis by splitting up regions that take a longer time to process (due to e.g. size of gene richmess).

Isoquant has been updated to version 3.6.3 (issue #20)

ubams are now supported as source data.

A basic singlecell report is generated even if some data is missing

The following issues are also solved in this release:

Full Changelog: 0.110.0...0.111.0

0.110.0

11 Jul 02:09

Choose a tag to compare

Major changes are:

Added support for analysis of PacBio data (-preset pacbio)

More input parameters are checked and give an error if wrong before actually starting the run (issue #12)

The option -sc_whitelist now allows shortcuts v3 and v2 for the typical 10x whitelists

The option -dmaxmem was added to limit the total memory used (requested)
when running local distribution (-d number).

scywalker_makerefdir supports -groupchromosomes option for alternative grouping of chromosomes

sw tsv210x options added: -round (to create integer matrix), -remdups (to
remove duplicate lines)

Various optimizations

  • sorting (using gnusort8) compresses temporary files using zstd (to avoid filling up /tmp when sorting huge files).
    It also uses a larger block size for speed
  • The -scratchdir option was added for when /tmp is (still) too small on your system
  • memory reservations were adjusted, e.g. minimap2 request memory based on the index size
  • gzfiles and jobgzfile optimizations: use less globs/filesystem access, return values bsorted (per pattern)

Full Changelog: 0.109.0...0.110.0

0.109.0

07 May 15:37

Choose a tag to compare

Major changes are:

Qol improvements

  • added cg project_make for creating a basic experiment/project directory based on a samplesheet
  • Added option -samplesheet for creating a basic experiment/project directory based on a samplesheet (issue #3)
  • Improvement help
  • Added cg error_report for easier checking of errors after a distributed run (issue #5 and #4)

Distribution and installation

  • include more dependencies in distro (from testing in minimal docker distro)
    • less, which, bzcat, bzip2, dot
    • extra libs for dirR
    • semi-static compiles of samtools, bgzip, tabix

cram support:

  • cram can be used using the option -aliformat cram

Optimization cg select summaries

  • added option -optim memory to minimize memory use when making summaries using cg select with the -g option
    (default memory is proportional to size of summary output)
  • used in scywalker for getting aggregate data from sc_gene and sc_isoforms

Optimizations (esp. for starting big runs)

  • file access optimizations (check for compressed only where specified, direct deps check, use cached data)
  • storage optimizations (shadowdirs)
  • jobs optimizations (integrate cleanup, merge jobs, hold_jid in runfile)

Ohter changes

  • Added extra cluster support (options -submitoptions, -dqueue, help, ...)
  • process_project option -jobsample now allows specification of number of cores to be used by one-job sample analysis

Full Changelog: v0.108.0...0.109.0

v0.108.0

21 Feb 23:39

Choose a tag to compare

adapt README.md: moved example/test run up, added cg tools