Ryan Wick’s bioinformatics blog

ONT read QC strategies for assembly

2026-02-05T00:00:00+00:00

My colleague Hugh Cottingham recently came to me with a question about pre-assembly ONT read QC. He asked how using Filtlong with --min_length 1000 --keep_percent 95 would compare to a simpler approach like just tossing out the reads with a qscore <20. My first thought was, it probably doesn’t matter much, but I haven’t tested it. But then I thought, why not test it?

ONT read QC is a big domain with lots to potentially investigate. To keep the scope suitably narrow for this post, I tested just a few simple read subsampling/QC methods for single-assembler whole-genome assembly.¹ This doesn’t precisely answer Hugh’s original question, but it does shed light on the topic, and it gives me a chance to share some miscellaneous thoughts about pre-assembly ONT read QC.

Methods

I reused the read sets from my Autocycler paper. These are five different genomes, all sequenced with ONT and basecalled with the [email protected] model. The pre-QC read sets were very deep, ranging from 364× to 1280×. Based on this old blog post, I decided to aim for a post-QC depth of 100×.

Here are the QC methods I tried:

none: just assembling the full read set. Not a QC method but a negative control.²
Rasusa: random sampling to 100× depth with Rasusa. This only reduces the depth, and all other read stats (e.g. length, quality) should remain similar. So also not a QC method, more of a depth-matched negative control.
Chopper+Rasusa: first Chopper (-q 20 -l 1000) to discard reads with a mean qscore <20 or a length <1 kbp, then Rasusa sampling to randomly sample to 100× depth.
Filtlong-defaults: Filtlong with --target_bases set to 100 times the genome size. This keeps the ‘best’ reads using Filtlong’s default logic which prefers longer and higher-qscore reads, so it will increase both the read length and read qscore distribution.
Filtlong-mean-qual: Filtlong with --target_bases set to 100 times the genome size and --length_weight 0 --window_q_weight 0. This makes Filtlong only care about mean qscore, i.e. it will keep the reads with the highest quality scores. This should greatly increase the read qscore distribution but shouldn’t have much effect on the read length distribution (only to the degree that length and mean qscore are correlated).

For each of the 25 read sets (5 genomes × 5 QC methods), I assembled the reads using 11 different long-read assemblers (Canu, Flye, hifiasm, LJA, metaMDBG, miniasm, Myloasm, NECAT, NextDenovo, Raven and wtdbg2) for a total of 275 attempted assemblies.³ I then quantified their accuracy using the script from the Autocycler paper, which counts sequence-level errors and structural errors. Sequence-level errors are substitutions and small indels. Structural errors include missing bases (e.g. a missing plasmid) and extra bases (e.g. duplicated sequence at the ends of circular contigs).

Results

If you’re interested in the full results table, here’s an Excel file with everything: read_qc_results.xlsx

Read stats

Read QC strategy	Read count	Total bases	Min read length	N50 read length	Max read length	Median read qscore
none	2,956,843	16,884,738,340	5	9,514	946,320	21.97
Rasusa	415,227	2,364,203,186	5	9,603	544,348	21.90
Chopper+Rasusa	368,668	2,364,188,652	1000	9,583	98,126	24.14
Filtlong-defaults	90,593	2,364,261,194	10,515	27,455	128,070	24.05
Filtlong-mean-qual	431,507	2,364,193,509	5	8,808	104,213	27.69

The values in the above table are totals across all five genomes. As intended, all read sets except for none have about the same number of bases – the five genomes total to 23.64 Mbp, so 100× read depth is 2.364 Gbp. Since Rasusa is just random subsampling, it doesn’t change much about the read set, other than reducing its depth.⁴ Chopper+Rasusa increases the min read length and median qscore. Filtlong-defaults increased the median qscore and greatly increased the min and N50 read lengths. And Filtlong-mean-qual greatly increased the median qscore.

Assembly resources

As expected, the none read sets (which were very deep) took far more time and memory to assemble than the other read sets (all 100×). That’s one practical reason to do read QC.

Six of the 275 assemblies failed/crashed and did not produce a FASTA file: one Flye assembly (none) quit with a ‘No disjointigs were assembled’ error (sometimes happens when Flye is given excessively deep reads), and five LJA assemblies (four none and one Rasusa) crashed (LJA seems to struggle with very low-quality reads).

Assembly accuracy

The none assemblies were least accurate, despite the assemblers having the most information to work with. This is consistent with this post which showed that too much depth can make assemblies worse.

For the other assemblies (all 100×), the number of sequence errors (left plot) correlates with the median read qscore. The worst sequence accuracy was from Rasusa (median read qscore = 21.90, median errors per assembly = 143.5) and the best sequence accuracy was from Filtlong-mean-qual (median read qscore = 27.69, median errors per assembly = 19).

The number of total errors (right plot) didn’t show as clear of a pattern – structural errors were common with all QC methods.⁵ (If structural accuracy is important, you should be using Autocycler!) But Filtlong-defaults did have more structural errors because this QC approach removes small plasmids.

Discussion and conclusions

ONT read QC is a potentially huge topic, and this blog post has a narrow focus. So I’d like to explicitly state what I did not address here:

This only tested read QC in the domain of bacterial whole-genome assembly. There are obviously a lot of other things you can do with ONT reads, and they may need different QC strategies.
This tested the results from single assemblers (e.g. Flye). It did not look at consensus assembly, e.g. with Autocycler.
My read sets were all very deep, so read QC was used to bring them down to a more usable depth. I did not test cases where the full read set was already a good depth (e.g. 50–100×) or a less-than-ideal depth (e.g. 20–50×).
I only used a few tools/approaches that pass/fail on a whole-read basis. I did not look at any QC strategies that alter reads, e.g. trimming low quality bases, splitting chimeras, etc.

Given those limitations, here are the main conclusions I can draw from this mini-study:

If you have very deep reads, definitely do some sort of QC/subsampling to bring them down to a more reasonable depth (~50-100×) before assembly. This will not just save assembly time but probably also give you better assemblies.
Filtering by qscore definitely seems to help with sequence accuracy. This makes sense – fewer read errors mean fewer opportunities for errors to creep into the assembly. So, if you have very deep reads, I recommend including some sort of qscore-based filter.
Filtering by length can eliminate small plasmids. Three of the five genomes I used in this study had plasmids smaller than 10 kbp, but the Filtlong-defaults QC removed all reads below 10 kbp, essentially erasing these plasmids! Be very careful using Filtlong (or any QC tool that selects for longer reads) if small plasmids are important to you.

Stepping beyond this mini-study, here are some general thoughts on the topic based on my experience:

If your pre-QC reads are not very deep, you’ll need to be more conservative with your QC, so as to not reduce depth further. Higher pre-QC depth → more stringent QC. Lower pre-QC depth → more lenient QC. Filtlong’s --target_bases option can help with this.
Tossing out very short reads is probably a good idea. It can make assembly go faster by reducing the read count, especially important if the assembler contains any O(n²) algorithms. I usually use a 1 kbp threshold, since most plasmids are larger than this. If your read depth is low (so you want to preserve more depth) or you want to be extra-cautious around small plasmids (which are occasionally <1 kbp), then a 500 bp threshold might be better.
If you are working with a tough-to-assemble genome, then a QC strategy which prefers longer reads (like Filtlong-defaults used in this post) could be beneficial. Longer reads can span longer repeats, and this can help assemblers to produce a complete assembly. Just be aware that small plasmids may be lost!
Even when small plasmids are well represented in post-QC reads, long-read assemblers sometimes assemble them poorly or omit them entirely. So if small plasmids matter to you, I’d recommend checking out Plassembler.⁶
If you have samples from a multiplexed ONT run, it’s likely that they will vary in depth.⁷ If you then do qscore-based filtering to a target depth (as I recommend below), you’ll create a bias in your data: samples with higher pre-QC depth will have fewer post-QC errors. If this will matter for your analysis, then random subsampling (e.g. Rasusa) could be better.
Other QC tools do more than pass/fail whole reads. E.g. fastplong trims reads based on quality, removes adapters, splits chimeras and more. But assuming you basecalled/demultiplexed with Dorado, then most adapters and chimeras may already be taken care of. And based on this old study of mine, assemblers are pretty tolerant of untrimmed adapters and chimeras. So I suspect the extra QC steps done by fastplong probably don’t matter much for assembly, but that remains untested.

Considering all of the above, some loose recommendations for pre-assembly ONT read QC that I think would work well in most cases:

Do some light QC to toss out very short and very low-quality reads: chopper -q 10 -l 1000. This will clean up the worst reads and probably not reduce the overall depth by much.
If you’re going to assemble with Autocycler, then stop now! Leaving lots of depth is a good thing for Autocycler, which includes a random subsampling step in its pipeline. Read more here.
If you’re going to assemble with a single assembler (e.g. Flye), then run Filtlong with --target_bases "$genome_size"00 --length_weight 0 --window_q_weight 0.⁸ This will keep the best 100× reads as judged only by mean quality (not length).
If your assembly didn’t reach completion (e.g. the chromosome was fragmented), then you could try Filtlong with default weights: --target_bases "$genome_size"00. This will increase the length of the post-QC reads and may help get a complete assembly. But again, be aware that you’re at risk of losing small plasmids.

Footnotes

By ‘single-assembler’ I mean assembling with just one tool, like Flye or Canu. This is in contrast to consensus assembly with Autocycler which uses multiple assemblers. Read QC for Autocycler would have different priorities, which I briefly address at the end of this post. ↩
Since the read sets used in this post came from a multiplexed ONT run, even the none reads effectively had a little bit of QC from the demultiplexing process. This is because very low-quality reads are more likely to end up in the ‘unclassified’ bin. ↩
I used the same approach as I did in the Autocycler paper and my last post: running the assemblers via Autocycler helper using --min_depth_rel 0.1 to clean up any low-depth contigs. ↩
The very long max-length reads in the none and Rasusa read sets are just junk, e.g. the GT dinucleotide repeated over and over. This is often the case for the longest reads in a set without any QC. ↩
You might have noticed that ‘Total errors’ sometimes gets very high, larger than the genome size. This is because extra bases count towards total errors, and some assemblies contained a lot of extra sequence. The worst was the Klebsiella pneumoniae LJA none assembly, which was 44 Mbp in size (the genome is only 6 Mbp). ↩
I use Plassembler in my Autocycler pipeline specifically to help recover small plasmids. ↩
This seems inevitable, despite attempts at balancing things like input DNA concentration. Some barcodes just mysteriously give more yield than others. ↩
This assumes you know the approximate genome size, but it doesn’t need to be exact. For example, a 10% error in genome size will lead to a 10% error in target depth, so if you were aiming for 100×, you might get 90–110× (no big deal). ↩

P2 Solo announcement and the trade-offs of a more stable ONT

2026-01-21T00:00:00+00:00

(This is an opinion piece, a bit different from what I normally post here. My next post will return to my normal mini-study fare.)

I’ve been using Oxford Nanopore sequencing since 2016, and it’s been a ride! At the start, the data was rough – sequencing yields were low and read error rates were >10%. An ONT-only bacterial genome assembly from 2016 had tens of thousands of errors. But each year, it got better. By 2024, ONT-only bacterial genome assemblies had reached near perfection (and sometimes actual perfection). The read error rate in 2024 was lower than the assembly error rate in 2016. Riding that steady wave of improvement was fun.

But I can now say that at least for my domain (bacterial whole-genome sequencing), ONT today feels mostly the same as ONT from two years ago, and that’s not something I’ve ever felt before. Their last basecalling model update was 8 months ago, and it wasn’t dramatic. That ‘fun’ feeling is diminishing.

This is both good and bad. I work in academic research, but I also have many colleagues in public health, and I think we represent two ends of a spectrum. For me, ONT’s rapid development and improvement was great! It created new opportunities, new problems to solve, and let me work on the cutting edge.¹ But on the public health side, ONT’s constantly-changing landscape was frustrating. Nobody wanted to spend time and effort developing and validating a pipeline that would be out-of-date in a year’s time. Now with the increased stability, some of my public health colleagues are comfortable replacing old Illumina-based workflows with ONT-based alternatives.² While I do see the advantages to a slower pace of change, I sometimes pine for the ‘good old days’.³

While I’ve been feeling this sentiment for a while, this post was motivated by ONT’s recent announcement that the P2 Solo will be discontinued.⁴ The P2 Solo allowed PromethION flowcells to be run on an affordable computer, and its discontinuation could put high-throughput ONT sequencing out of financial reach for many academic labs.⁵ It’s bad news for researchers that use ONT. I’ve seen a lot of surprise and frustration in the community around this decision, and I’ll add my voice to the others requesting that ONT please reconsider.

As many have already pointed out, the P2 Solo’s discontinuation feels counter to the company’s motto: ‘To enable the analysis of anything, by anyone, anywhere’. By raising the cost of in-house sequencing, this could push more users to third-party sequencing providers which will erode the DIY appeal that drew many labs to ONT in the first place. I can think of many other tech companies who were the cool new underdogs until they became just another big corporation driven by shareholder value, so I hope ONT doesn’t lose what made it special.

A disclaimer: everything I’ve written is from the perspective of bacterial isolate WGS, and users in other fields may have a different experience. I do really like ONT sequencing – it’s a great technology that I would like to keep using. Going forward, I just hope ONT can walk the tightrope: producing a stable product that can be relied upon, while also continuing to push the envelope and support researchers like me.

Footnotes

ONT has benefitted from a strong feedback loop: they shared their new developments early, and researchers like me developed protocols, software tools and validation. In effect, the academic community often acted as an informal extension of their R&D. ↩
ONT has many benefits in bacterial genomics, such as being able to clearly distinguish plasmids (which often carry antimicrobial resistance) that Illumina sequencing cannot. ↩
I nostalgically remember Clive Brown (former ONT CTO) giving enthusiastic talks full of ambitious roadmaps and big ideas. Many of these never materialised (whatever happened to outie sequencing?), but it nevertheless added to the excitement around the technology. ↩
Announcement is here (requires an ONT community login). ↩
The P2 Solo doesn’t include compute for basecalling, which kept its cost down. We do our basecalling on a ‘gaming’ computer named OnION. The alternative to the P2 Solo (which ONT is keeping) is the P2 Integrated, which includes a big NVIDIA GPU and is much more expensive. ↩

Benchmark update: metaMDBG and Myloasm

2025-09-23T00:00:00+00:00

New assembler releases

Our manuscript describing Autocycler was recently published:¹
Wick RR, Howden BP, Stinear TP (2025). Autocycler: long-read consensus assembly for bacterial genomes. Bioinformatics. doi:10.1093/bioinformatics/btaf474.

But its benchmarking is already out of date! Since I ran the analyses for the paper, two long-read assemblers have had new releases: metaMDBG v1.2 and Myloasm v0.2.0. Both came with claims that caught my eye (‘improved assembly quality’ for metaMDBG and ‘cleaner contig outputs with better polishing’ for Myloasm), and both tools are still young (especially Myloasm). I therefore decided to rerun these new versions through the same benchmarking pipeline I used in the Autocycler paper.²

Updated results

Below is an updated version of Figure 2 from the Autocycler paper. Error counts are shown on the y-axes (pseudo-log transformed, lower is better). The original metaMDBG and Myloasm versions (from the paper) are orange, the new versions are green and everything else (less relevant here) is grey.

I also updated the relevant supplementary figures using the same old-orange new-green colour scheme:

Discussion

Both metaMDBG and Myloasm showed clear improvements in accuracy with their latest releases: fewer sequence errors (substitutions and indels) and fewer total structural errors.³ I was particularly impressed by the best cases for Myloasm v0.2.0 – a couple of the Listeria innocua assemblies had only one single-bp error, better than any other single-tool assembler.

When I run Autocycler, I usually use this Bash script to automate the process. Autocycler benefits from a diverse set of input assemblers, but I had previously left out Myloasm because v0.1.0 had relatively high error rates. These new results, along with positive reports from a colleague⁴, convinced me to add Myloasm to the pipeline.

It’s worth noting that both metaMDBG and Myloasm were developed as metagenome assemblers, but I’m using them here to assemble isolate genomes. As my results show, metagenome assemblers can work quite well on isolates! However, they can be more likely to leave low-depth contigs in the assembly. In metagenomes this is desirable, since there are often many low-abundance organisms. But for isolates, low-depth contigs usually indicate contamination.⁵ For these tests, I ran the assemblies via Autocycler helper using --min_depth_rel 0.1 to remove contigs below 10% chromosomal depth, and I recommend others do the same when applying these assemblers to isolates.

Footnotes

At the time of writing, the paper is reviewed and accepted but still an unproofed advance article. ↩
For the full methods and results, see the Autocycler paper GitHub repo. ↩
The only metric that got worse with the new versions is ‘missing bases’, but this was balanced by improvements in the ‘extra bases’ metric (see Figure S1). ↩
Michael Hall had one tricky genome where metaMDBG and Myloasm were the only two assemblers which could successfully assemble the chromosome. ↩
A common cause would be cross-barcode contamination. In multiplexed ONT runs, some reads can ‘leak’ into other barcodes, and if the source is sufficiently high depth (e.g. a high-copy-number plasmid), the contamination can sometimes reach assemblable levels in wrong barcodes. ↩

Cross-sample homopolymer polishing with Pypolca

2025-09-04T00:00:00+00:00

Cryptosporidium assembly

I was recently working with Torsten Seemann and our colleagues at the Centre for Pathogen Genomics to assemble a Cryptosporidium hominis genome from ONT reads. It’s small for a eukaryote at only 9.2 Mbp, has low GC content (~30%) and, importantly for this post, has lots of long A/T homopolymers (~100 of them ≥20 bp). The ONT sequencing was deep, and the assembly went smoothly: T2T for all eight chromosomes.¹ However, I expect it contains homopolymer-length errors – while R10.4.1 is better than R9.4.1, I’m still suspicious of homopolymers >10 bp.

To illustrate the problem, I took a long homopolymer from the Crypto genome and used squigulator to simulate the raw ONT signal. For long stretches of a single base, the signal flatlines, making it difficult for the basecaller to determine the precise homopolymer length:

Normally, I’d fix homopolymer lengths (and any other lingering errors) with short reads from the same isolate using Polypolish and Pypolca. But we didn’t have short reads for this genome, so I wondered: could I correct homopolymers using short reads from a closely related isolate instead?² I’ll call this cross-sample homopolymer polishing.

Homopolymer-only polishing

So I decided to add a homopolymer-only polishing feature to Pypolca. When used, Pypolca will ignore all changes except for length adjustments in homopolymers above a threshold. George Bouras kindly merged my pull request and released it in v0.4.0, so this feature is now available. Documentation is here.

To try it, I downloaded all NCBI assemblies of Cryptosporidium and identified the closest relative to my genome using Mash. I then downloaded Illumina reads for that sample³ and polished with:

pypolca run -a draft.fasta -1 SRR1557959_1.fastq.gz -2 SRR1557959_2.fastq.gz -t 16 --homopolymers 6

This changed the length of 343 homopolymers:⁴

The top plot shows the distribution of all homopolymers in the genome.⁵ The bottom plot shows homopolymers whose length changed. Both plots use a pseudo-log y axis. The dashed line marks the --homopolymers 6 threshold, i.e. shorter homopolymers were not allowed to change.

The results match my expectations: very few shorter homopolymers were altered (e.g. 2/17297 6-mers and 3/7070 7-mers) where ONT is reliable, but about one-quarter of longer homopolymers changed (e.g. 6/26 19-mers and 9/30 20-mers) where ONT struggles.

I also annotated the genome before and after homopolymer-only polishing with Companion, and it seemed to improve the annotation: gene count increased (4069 → 4076), pseudogene count decreased (41 → 36) and the fraction of bases annotated increased (76.968% → 76.979%).

Discussion

Cross-sample homopolymer polishing looks promising, but it depends on several assumptions:

ONT-only assemblies have length errors in long homopolymers.
Illumina reads handle long homopolymers better than ONT reads.
Closely related genomes usually share the same homopolymer lengths.
Homopolymer lengths are biologically consistent (i.e. little true variation).

I ordered these by my confidence. Assumption 1 seems safe. I think assumption 2 is usually true, though short reads can struggle in extreme GC contexts.⁶ I’m less sure about assumptions 3 and 4, which probably depend on the organism – what holds in Crypto may not in other taxa.

So while Pypolca now supports cross-sample homopolymer-only polishing, treat the biological validity as experimental. If the assumptions hold, it may work very well. For our Crypto genome… I’m not yet sure. We’ll be sequencing our isolate with Illumina to get a firmer answer.

Footnotes

This was my first time trying Autocycler on a eukaryote genome. It worked well, except for the telomeres at contig ends, which I had to manually repair. See the Autocycler graph here. ↩
I couldn’t find an existing tool that does exactly this. This anvi’o script is similar, but it uses a reference genome (not reads) to set homopolymer lengths. Homopolish is also similar, but it can make non-homopolymer changes (see this issue). ↩
The source of these reads: Comparative genomic analysis reveals occurrence of genetic recombination in virulent Cryptosporidium hominis subtypes and telomeric gene duplications in Cryptosporidium parvum. ↩
For comparison, Pypolca with default settings (not homopolymer-only) made >1600 changes, most of which were not in homopolymers. Some may have been genuine fixes, but I suspect most are biological differences between our genome and the downloaded reads. ↩
These homopolymers are much longer than I’m used to seeing in bacteria! As a comparison, I generated random sequences with the same length and GC content, and their longest homopolymer was typically around 15-16 bp. ↩
Because homopolymers are pure A/T or G/C, they skew local GC. In Crypto, all long homopolymers are A/T, which combined with an already low average GC (~30%), means they often fall in very low-GC regions (<20%). Some Illumina preps (e.g. Nextera XT) do not do well with this. ↩

Dorado v1.0.0 and the v5.2.0 basecalling models

2025-05-27T00:00:00+00:00

Last week, ONT’s London Calling conference brought the release of Dorado v1.0.0. While I think of Dorado as a basecaller, ONT keeps adding other features: it aligns, trims, corrects, demultiplexes, polishes and (as of this release) calls variants. But for this post, I’ll focus only on basecalling.

A new version of Dorado is nice, but what really caught my attention was that it came with new DNA basecalling models: version 5.2.0. It’s been a full year since the previous models (v5.0.0), so I’m hoping for a boost in read and assembly accuracy. That’s what I’ll test in this blog post!

Model architectures

For both v5.0.0 and v5.2.0, the models follow the same basic architecture: hac uses LSTMs, and sup uses transformers.

The hac model has grown from five alternating-direction LSTM layers in v5.0.0 to seven in v5.2.0, making it 37% larger by parameter count. Dorado has seen many performance improvements in recent versions, and I suspect those optimisations gave ONT the headroom to increase model size while keeping hac basecalling quick enough on existing hardware.

The sup model architecture appears unchanged from v5.0.0 to v5.2.0, so differences there will come from training data and weights.

I didn’t test the fast models here, but their architecture also appears unchanged from v5.0.0 to v5.2.0.

Methods

I used the same dataset as in the Autocycler preprint. Briefly: five bacterial isolates with high-quality reference genomes (Illumina-polished), sequenced (along with other isolates) on a PromethION flowcell (~132 Gbp total run yield).

I basecalled the full run four times:¹

[email protected]: Dorado v0.9.5 with [email protected]
[email protected]: Dorado v1.0.0 with [email protected]
[email protected]: Dorado v0.9.5 with [email protected]
[email protected]: Dorado v1.0.0 with [email protected]

To assess read accuracy, I simply aligned reads to their reference genome and calculated the identity of the alignments.²

To assess assembly accuracy, I made six non-overlapping 50× read subsets per isolate (30 total). Each was assembled with Autocycler (with some manual curation to ensure that all small plasmids were included), and errors were counted using the assessment script from the Autocycler preprint.

Speed performance

Basecalling was done on an H100 GPU on the University of Melbourne’s Spartan cluster.

Dorado version	Model	Model parameters	Time (h:m)	Speed (samples/sec)
0.9.5	`[email protected]`	6,425,328	2:39	1.91×10⁸
1.0.0	`[email protected]`	8,790,768	3:18	1.53×10⁸
0.9.5	`[email protected]`	78,718,162	25:06	2.01×10⁷
1.0.0	`[email protected]`	78,718,162	18:45	2.69×10⁷

Hac basecalling got ~20% slower with the new model, likely due to its increased size.

Sup basecalling got ~33% faster, which surprised me since the model size is unchanged. This may be due to Dorado optimisations, though I didn’t spot anything obvious in the release notes between v0.9.5 and v1.0.0. Or it could be due to cluster factors like shared node usage. These benchmarks weren’t tightly controlled, so take the timing results with a grain of salt.

Read accuracy

The violin plots below show the read identity distributions (higher is better).³ The line inside each violin indicates the median.

For hac, median identity increased from Q16.3 (97.65%) with [email protected] to Q17.0 (97.99%) with [email protected], which corresponds to ~15% fewer read errors.

For sup, there was little change. The newer [email protected] reads actually had a slightly lower median than the older [email protected] reads, but the difference was small.

Assembly accuracy

The boxplots below show the number of assembly errors (lower is better). The line in each box shows the median, and whiskers span the full range.

For hac, median errors per assembly dropped from 37.5 with [email protected] to 13 with [email protected] – a 65% reduction, despite only a 15% drop in read errors. Very nice! The mean assembly accuracy was Q50.6 for [email protected] and Q54.5 for [email protected].

For sup, both [email protected] and [email protected] had a median of 3 errors per assembly.⁴ Still, the newer model improved the error rate for some assemblies, reducing the mean errors per assembly by ~1. The mean assembly accuracy was Q60.4 for [email protected] and Q61.6 for [email protected].

Across all 30 [email protected] assemblies, I counted 99 bp of errors in total. These occurred at 79 loci: 44 were homopolymer-length errors (indels up to 4 bp), 22 were 1-bp substitutions, 12 were 1-bp indels and one was a 2-bp substitution.

Discussion and conclusions

This new release is a big upgrade for hac users: modest read accuracy gains and large improvements in assembly accuracy. The downside is that hac basecalling is now a little bit slower, but I think that trade-off is worth it.

However, I almost exclusively use sup basecalling, so the hac improvements aren’t very important to me. The new sup model did improve assembly accuracy, but not by much. Given that a full year passed since the last model release, I was hoping for a bigger improvement in sup assemblies.

ONT’s accuracy has improved rapidly over the past decade, which has been exciting but also frustrating for users who want stability. In my domain (bacterial whole genome sequencing), this past year has been unusually stable. I appreciate not feeling the need to rebasecall my data every few months, but I also miss the thrill of shrinking error rates. My guess is that ONT has already picked the low-hanging fruit, and the assembly errors that remain are genuinely hard to avoid. I’m still hoping for a day when ONT-only assemblies can be reliably perfect, but for now, short-read polishing is still often needed to clean up the last few errors.

This post looked at Autocycler assembly accuracy without any post-assembly polishing. I noticed that new Dorado polish models were released alongside the new basecalling models, but there’s no updated bacterial model. In a previous post, I found the bacterial model outperformed the move-table-aware sup model for bacterial isolates. So here’s a request for ONT: please train a [email protected] move-table-aware bacterial model for Dorado polish! I’m optimistic it would be the best option for polishing ONT-only bacterial genome assemblies.

Footnotes

I think the basecalling model version (e.g. [email protected] vs [email protected]) is more relevant than the Dorado version (0.9.5 vs 1.0.0). I used Dorado v0.9.5 (released a couple months ago) with the 5.0.0 models because I had already done the [email protected] basecalling for the Autocycler preprint. ↩
I used what Heng Li calls BLAST identity: matching bases divided by alignment length. That’s column 10 / column 11 from a minimap2 -c PAF file. ↩
These plots show how ONT reads have a bimodal identity distribution: some reads cluster around Q5–Q10, while most peak above Q15. I’ve seen this pattern before (not unique to this run), but I don’t know the cause. If you do, let me know! ↩
In case anyone is closely checking my work: the curated assemblies in the Autocycler preprint (which used [email protected]) had a median of 4 errors per assembly, but I report a median of 3 in this post. That’s because I re-ran the assemblies for this analysis, and full Autocycler runs aren’t completely deterministic. Autocycler itself is deterministic, but some of the input assemblies it relies on are not. ↩

FASTQ assemblies with Dorado polish

2025-02-19T00:00:00+00:00

This post builds on my previous one about using Dorado for genome polishing. Its key takeaway: Dorado polish and Medaka use the same bacterial polishing model, making them effectively interchangeable for bacterial genomes (at least at the time of writing).

While writing that post, I noticed something interesting in the Dorado docs: using the --qualities option outputs the polished assembly in FASTQ format. I initially thought this was unique to Dorado, but I checked Medaka, and it actually has a -q option that does the same thing, so this feature has been hiding under my nose for a while now.¹

The key difference between FASTA and FASTQ is that FASTQ includes per-base quality scores, encoded as ASCII characters.² In a well-calibrated FASTQ, these scores indicate the absolute probability of an error, but if not well-calibrated they can still indicate relative error likelihood.

FASTQ is commonly used for reads, while assemblies typically use FASTA. But assemblies can use FASTQ format, and it could be useful in downstream analyses. For example, when calling variants from an assembly, errors can create false positives.³ FASTQ quality scores could help by allowing one to mask low-confidence bases in the assembly.

So what do Dorado polish FASTQs look like? How well do low qscores correlate with assembly errors? Are the scores well calibrated? In this post, I take an initial look using the same genomes from my previous post, focusing on the five that still had errors after Dorado-bac polishing.

Calibration

To assess how well the FASTQ quality scores are calibrated, I calculated the expected number of errors per genome using $\sum 10^{-q/10}$ summing over all genomic positions.

Genome	Actual errors	Expected errors
Shigella flexneri	5	58.4
Klebsiella pneumoniae	3	19.4
Providencia rettgeri	2	16.9
Enterobacter kobei	11	31.7
Escherichia coli	4	23.5

While not perfectly calibrated, the predictions are within an order of magnitude or so – better than I expected!

Qscore plots at error locations

Across the five genomes, there were 25 bp of errors at 17 loci. Each locus is plotted below, with qscores on the y axes. The x axes display the assembled sequence (top) and ground truth (bottom, sometimes with extra bases squeezed in when the assembly had a deletion). Red bars mark error positions, and for deletion errors I used red on both sides of the deletion.

As you can see, positions with or near errors often have a much lower qscore than their neighbours. This is excellent, as it means the FASTQ scores indeed provide a useful indicator of base reliability.

However, there are exceptions. In particular, E and F lacked qscore drops. On closer inspection, the ONT reads at these sites were very clear (no heterogeneity) but differed from the Illumina reads, suggesting these might not be errors at all.⁴ Since my ONT and Illumina reads came from separate DNA extractions, biological differences between the read sets are possible. Even discounting E and F, some errors showed only subtle qscore drops. For example, C and G appear to be real errors, yet their qscores remain above 40.

Discussion

These results just scratch the surface, but they show that Dorado-polish qscores, while imperfect, are potentially quite useful. Using my previous example of calling variants from an ONT assembly, one could reduce false positives by masking low-quality bases (perhaps with a local threshold, e.g. masking bases with a qscore significantly lower than their neighbours). The ideal strategy would balance sensitivity and specificity based on the application.

FASTQ assemblies could also improve genome polishing. I’ve spent a lot of time trying to make short-read polishing more reliable,⁵ but false-positive corrections can still occur.⁶ Dorado-polish qscores could help decide which changes to accept. For example, E and F were short-read polishing corrections in the Providencia genome, but their high Dorado-polish qscores could have convinced me to reject those changes.

Even if FASTQ output is a niche feature, I’m glad Dorado includes it. FASTQ assemblies provide a useful way to assess base-level accuracy, even when qscore calibration isn’t perfect. I think we should embrace FASTQ as an assembly format, especially for workflows that could benefit from identifying unreliable bases.

Footnotes

The -q option was added to medaka_consensus in v1.8.1 (June 2023), but it’s undocumented in the Medaka README, which is probably why I missed it. ↩
This is usually done with this formula: ascii - 33 = qscore. So the scale starts with the ! character (ascii = 33, qscore = 0) and could potentially go up to the ~ character (ascii = 126, qscore = 93), but the highest value I saw in a Dorado polish FASTQ was g (ascii = 103, qscore = 70). ↩
I’m about to release a preprint on this very topic – stay tuned! ↩
This means the error totals in my previous post are slightly off. I reported the Dorado-bac assemblies as having 25 total errors, but I now think 23 total errors is more accurate. ↩
George Bouras and I published this paper last year with a lot of relevant info: How low can you go? Short-read polishing of Oxford Nanopore bacterial genome assemblies. ↩
This is especially true when there are genuine biological differences between the short-read and long-read sets, as is likely the case for the Providencia genome in this post. I hate hybrid read sets made from separate DNA extractions! ↩

Medaka vs Dorado polish

2025-02-07T00:00:00+00:00

For years, Medaka has been the standard tool for polishing genome assemblies with ONT reads. However, with the release of Dorado v0.9.0 in December 2024, ONT introduced a new polishing option: the dorado polish command.

Shifting polishing to Dorado makes sense from a computational perspective. Dorado is written in C++ with libtorch and optimised for both NVIDIA and Apple GPUs, making it fast and efficient. But what really caught my attention was this note in the Dorado README:¹

When auto or syntax is used and the input is a v5.0.0 dataset, the data will be queried for the presence of move tables and the best polishing model will be selected for the data. Move tables need to be exported during basecalling. If available, this allows for higher polishing accuracy.

Move tables store information about dwell times – how long each base lingers in the pore – which provides additional context for the neural network during polishing.² Since Dorado polish can use this move-table data, it has the potential to outperform Medaka, which does not use it.

Initially, Dorado polish was not recommended for bacterial genomes,³ but the recent release of Dorado v0.9.1 introduced a bacterial polishing model. In this post, I test Dorado polish on a set of bacterial genomes and compare it to Medaka.

Methods

I used 10 bacterial genomes from a P2 Solo run, each from a different species.⁴ These genomes had deep ONT sequencing coverage and complementary Illumina reads.

To take advantage of Dorado’s move-table functionality, I first re-basecalled the ONT reads with the --emit-moves option, using the [email protected] basecalling model. An exciting side note: Dorado v0.9.1 includes performance optimisations that made basecalling about 50% faster than before!⁵

Next, I created high-quality reference assemblies using Autocycler, Polypolish and Pypolca. Everything went well, so I assume my reference genomes are error-free (or close to it).

To generate draft assemblies for polishing, I wanted ONT-only genome assemblies with a meaningful number of errors. My existing Autocycler ONT-only assemblies (used for generating the reference assemblies) were too accurate, containing just 21 total errors across all 10 genomes, with some genomes being completely error-free. To better assess polishing performance, I intentionally created lower-quality assemblies by:

Randomly subsampling the ONT reads to 50× coverage.
Assembling with Raven.
Manually correcting any large indel errors.⁶

This produced 10 draft assemblies containing a total of 270 errors.⁷

I then polished each genome using five different methods:

Medaka-bac: Medaka with the r1041_e82_400bps_bacterial_methylation model. This is the model Medaka auto-selects when using the --bacteria option and is the recommended Medaka model for native bacterial DNA.
Medaka-sup: Medaka with r1041_e82_400bps_sup_v5.0.0 model. This is the model Medaka auto-selects when not using --bacteria.
Dorado-bac: Dorado with the dna_r10.4.1_e8.2_400bps_polish_bacterial_methylation_v5.0.0 model. This is the model Dorado auto-selects when using the --bacteria option and is the recommended Dorado model for native bacterial DNA. This model does not make use of move-table data.⁸
Dorado-sup: Dorado with the [email protected]_polish_rl model. This is the model Dorado auto-selects when not using --bacteria and when move-table data is not present.
Dorado-sup-mv: Dorado with the [email protected]_polish_rl_mv model. This is the model Dorado auto-selects when not using --bacteria and when move-table data is present.

Here are some details of each model’s neural network:

Polishing model	Architecture	Weights
Medaka-bac	2-layer bidirectional GRU	405253
Medaka-sup	2-layer bidirectional GRU	405253
Dorado-bac	2-layer bidirectional GRU	405253
Dorado-sup	CNN + 4-layer LSTM	4853501
Dorado-sup-mv	CNN + 4-layer LSTM	4853565

Other notes:

I used the 50× subsampled ONT reads as input for polishing, not the full ONT read set.
I gave each tool 32 threads to use, but they actually used far fewer. Dorado in particular spent most of its runtime on a single thread.
While both Medaka and Dorado support GPU acceleration, I ran all tests on CPU (an AMD EPYC 7742). The smaller models (Medaka-bac, Medaka-sup, Dorado-bac) ran efficiently on CPU, but the larger models (Dorado-sup, Dorado-sup-mv) were slower and would likely benefit from GPU acceleration.

Results

Polishing method	Total errors	Overall qscore	Time (h:m:s)	RAM (GB)
draft assemblies (no polishing)	270	Q52.2	n/a	n/a
Medaka-bac	26	Q62.4	0:02:09	7.2
Medaka-sup	100	Q56.5	0:01:44	7.1
Dorado-bac	25	Q62.5	0:03:30	10.3
Dorado-sup	203	Q53.4	1:00:35	46.5
Dorado-sup-mv	101	Q56.5	1:02:23	46.7

Total errors: Sum of all errors across the 10 genomes. Per-genome results are in the figure below.
Overall qscore: Calculated from total errors and total genome size (44.9 Mbp).
Time and RAM: Median values across the 10 genomes.

Discussion

Medaka-bac and Dorado-bac were the best-performing polishers, each reducing errors by about 10-fold (Q52 → Q62). Given that these are ONT’s recommended models for bacterial genome polishing, this result was expected. However, what stood out was that Medaka-bac and Dorado-bac produced nearly identical results – only a 1 bp difference across all 10 genomes. This led me to suspect that these models were not just similar in architecture but also identical in their trained weights. I confirmed this by checking with PyTorch, and yes, Medaka-bac and Dorado-bac share the exact same weights. At the time of writing, this means Medaka and Dorado are interchangeable for bacterial genome polishing. However, it seems likely that ONT will transition to Dorado as the recommended tool, which may lead to Medaka becoming deprecated in the future.⁹

The Medaka-sup model performed worse than Medaka-bac, which isn’t surprising – it was presumably trained on data with less bacterial diversity and more non-bacterial reads. The Dorado-sup and Dorado-sup-mv models also did poorly. This aligns with ONT’s notes stating that these models are optimised for human genomes.¹⁰

Even though Dorado-sup and Dorado-sup-mv performed poorly on bacterial genomes, the move-table-aware model (Dorado-sup-mv) outperformed its non-move-table counterpart (Dorado-sup). This suggests that move-table data is indeed beneficial for polishing. Also, these models have a larger and more sophisticated neural network architecture than Dorado-bac. This raises an interesting question: What if ONT trained a bacterial-specific move-table-aware model with this bigger architecture? This hypothetical Dorado-bac-mv model could potentially outperform Dorado-bac. If and when such a model is released, I’ll test it out in a follow-up post.

One more interesting feature of Dorado polish: it has a --qualities option which makes Dorado output the polished genome in FASTQ format. This has interesting implications, but after drafting a section on it, I realised the topic is big enough for its own blog post, so stay tuned for that!

Footnotes

Minor typos were corrected for clarity. ↩
For more on move tables, see this explanation in Dorado’s README and this technical document from squigualiser. ↩
See the Dorado v0.9.0 release notes and Mike Vella’s Bluesky post. ↩
The genomes are Enterobacter hormaechei, Enterobacter kobei, Escherichia coli, Klebsiella planticola, Klebsiella pneumoniae, Listeria innocua, Listeria monocytogenes, Listeria seeligeri, Providencia rettgeri and Shigella flexneri. And yes, I know that Shigella is technically E. coli, so that’s really only nine unique species. ↩
When I last tested Dorado’s speed on an NVIDIA A100 GPU, it basecalled at 6.76e+06 samples/sec. This time, it basecalled at 1.04e+07 samples/sec. Also see the Feb 2025 update in my Spring OnION post. ↩
I manually corrected large indels because polishing tools are expected to fix small errors but not necessarily large structural errors. After corrections, all remaining errors in my draft assemblies were either substitutions or indels of ≤10 bp. ↩
Errors were counted per base. For example, a 5-bp deletion counts as 5 errors. If instead each indel were counted as a single error regardless of size, the total count across all 10 draft assemblies would be 174 errors. ↩
I initially tested Dorado-bac both with and without move-table data, but the results were identical. Later, I confirmed that the Dorado-bac model is identical to the Medaka-bac model, meaning it does not use move-table data. ↩
See Chris Wright’s comment here: github.com/nanoporetech/medaka/issues/547 ↩
The Dorado v0.9.0 release notes (before the bacterial model was added) state that Dorado polish ‘is optimised for refining draft assemblies of human genomes.’ ↩

A first look at CycloneSEQ data

2024-12-17T00:00:00+00:00

MGI released their CycloneSEQ-WT02 nanopore sequencer this year, which bears a notable resemblance to ONT’s GridION.¹ See this preprint for details on CycloneSEQ.

As a long-time user of ONT sequencing, I was curious how CycloneSEQ data compares. In this post, I use publicly available bacterial data to quantify CycloneSEQ’s read-level and consensus-level accuracy and compare it to ONT data.

Data

This preprint used both CycloneSEQ long reads and DNBSEQ short reads to assemble an ATCC type strain (Akkermansia) and 10 other bacterial genomes. It mostly focused on the accuracy of Unicycler hybrid assemblies, where accuracy is primarily determined by the short reads. CycloneSEQ-only assembly accuracy was only mentioned briefly, e.g. for their Akkermansia type strain assembly, where they reported 141 errors per 100 kbp, equivalent to 99.86% accuracy (Q28.5). The preprint also doesn’t focus on read-level accuracy – they show qscores in Figure 1 (average of Q14.4), but these seem to be from the FASTQ files, not the actual read accuracy as measured against a ground-truth reference.

The reads are available on the China National GeneBank DataBase, but downloading them was challenging², so I only tested one of their genomes: E. coli AM114-O-1. The sequencing was very deep: over 500× short-read depth and 1200× long-read depth.

For an ONT comparison, I used an E. coli genome we recently sequenced with both ONT and Illumina. This wasn’t as deep as the CycloneSEQ data, but still more than enough: over 200× short-read depth and 200× long-read depth. I basecalled the ONT reads with Dorado v0.8.3 using the v5.0.0 models at each accuracy level: fast, hac and sup. Our E. coli is a different strain from the CycloneSEQ E. coli³, so the comparison isn’t perfect but good enough for a first impression.

Methods

I had four datasets to assemble: CycloneSEQ, ONT-fast, ONT-hac and ONT-sup, each with complementary short reads. For short-read QC, I ran fastp with default settings. For long-read QC, I only excluded reads <1 kbp in length.⁴ The ONT data was part of a barcoded run, so the demultiplexing process also served as a form of QC.⁵

I assembled each long-read set with Autocycler v0.1.1⁶ to make a long-read-only assembly. I then polished with Polypolish v0.6.0 and Pypolca v0.3.1⁷ to make a ground-truth assembly.

To quantify consensus-level accuracy, I aligned the long-read-only assembly to the ground-truth assembly, counting base differences.⁸ For read-level accuracy, I aligned the long reads to the ground-truth assembly and calculated accuracy for alignments ≥10 kbp.

Results

The accuracy metrics are summarised below, with read-level results shown as averages⁹:

Genome	Consensus	Read – mean	Read – median	Read – mode
CycloneSEQ	1562 errors (Q35.1)	90.9% (Q10.7)	91.9% (Q10.9)	93.2% (Q11.1)
ONT-fast	2317 errors (Q33.6)	91.5% (Q11.0)	92.6% (Q11.3)	93.8% (Q12.1)
ONT-hac	13 errors (Q56.1)	96.7% (Q15.9)	97.8% (Q16.5)	98.5% (Q17.7)
ONT-sup	2 errors (Q64.3)	98.4% (Q20.5)	99.3% (Q21.3)	99.6% (Q23.1)

Below are histograms showing read-level accuracy distributions:

Discussion

The preprint that provided the CycloneSEQ data reported read-level accuracy of Q14.4, but my analysis found a much lower value of Q11.1 (modal), probably because I used the ground-truth assembly (not FASTQ scores) to quantify accuracy. For consensus-level accuracy, the preprint reported Q28.5 for their Akkermansia genome, while my analysis of their E. coli genome did better at Q35.1. This could be because the E. coli was an ‘easier’ genome than the Akkermansia or because my assembly method (Autocycler) was more robust. Both the preprint’s results and my analysis found accuracy levels lower than CycloneSEQ’s advertised accuracy of 97% (Q15.2) for reads and 99.99% (Q40) for consensus.

Overall, CycloneSEQ data seems roughly comparable to ONT-fast data – CycloneSEQ was slightly better at consensus accuracy while ONT-fast was slightly better at read accuracy. The majority of CycloneSEQ’s consensus errors were homopolymer-length errors, often occurring in relatively short homopolymers (e.g. the ground-truth was G×5 but the assembly had G×4). This reminds me of ONT data from their previous pore (R9.4.1).

Basecalling greatly influences long-read accuracy, but CycloneSEQ’s basecalling process is unclear to me. Does CycloneSEQ offer different-sized basecalling models, similar to ONT’s fast/hac/sup? Are new models regularly released to allow for re-basecalling of existing data? Can users perform basecalling on a separate computer via the command line, or is it restricted to the workstation connected to the CycloneSEQ? I searched online for answers to these questions but found none. I only found mentions of CycloneMaster, a software tool that doesn’t appear to be freely available.

In conclusion, CycloneSEQ’s accuracy is not yet competitive with ONT. However, ONT’s accuracy improved dramatically over the course of its history, so I anticipate that CycloneSEQ may follow a similar trajectory. I will continue to watch this space!

Footnotes

Perhaps too much resemblance – Oxford Nanopore has filed a lawsuit. ↩
Transfer rates were extremely slow, and downloads frequently crashed. In order to get the three read files (two for short reads and one for long reads), I used curl to download the files in 100 kB pieces (small enough to have a reasonable chance of being successful) and then stitched them together afterward. It took a couple of days and was not fun! Maybe CNGBdb were having technical problems that week? Or perhaps the performance is better from within China? ↩
Using the Achtman MLST scheme, our E. coli is ST1193 and the CycloneSeq E. coli is ST117. ↩
The CycloneSEQ data didn’t have any <1 kbp reads (presumably some length-based QC had already been applied) so my filter did nothing for that read set. ↩
Very low-quality reads are more likely to fail demultiplexing (i.e. go into the unclassified bin), so barcode-demultiplexed data tends to be a bit better on average than whole-flowcell data. ↩
Autocycler is the successor to Trycycler. At the time of writing, it’s not yet publicly released but will be very soon! ↩
I used --careful with Pypolca. See our paper for lots of details on this polishing method: How low can you go? Short-read polishing of Oxford Nanopore bacterial genome assemblies. ↩
I used this script for comparing two alternative versions of an assembly. It gives the total difference count and shows the regions of difference in a human-readable manner. ↩
To get the mode, I rounded values to three significant figures and took the most common value. A subtle point: since qscore is a non-linear transformation of accuracy (-10 × log₁₀(error rate)), the mean/mode of qscore-based accuracies is not equal to the mean/mode of percentage-based accuracies. So for this table, I calculated mean and mode separately for percentage-based and qscore-based accuracies. For anyone checking my maths, this is why mean/mode percentages and qscores seem to not match up – for example 90.9% equals Q10.4 not Q10.7. ↩

Medaka v2: progress and potential pitfalls

2024-10-17T00:00:00+00:00

A new version of Medaka was recently released, featuring a model designed specifically for bacterial genomes. See this video and this video from ONT for an overview of the challenges posed by modified bases and some information on the new Medaka release.

As someone focused on bacterial genome assembly, this piqued my interest! Over the past few years, I’ve mostly moved away from using Medaka, as it didn’t usually improve assemblies from sup-basecalled reads.¹ Could this new model change things? In this post, I’ll share my initial thoughts on the new Medaka model and highlight a key pitfall in genome polishing.

New Medaka model

In Medaka v2.0.0, you can specify the r1041_e82_400bps_bacterial_methylation model or simply use the --bacteria flag. This model is flexible with basecalling, supporting v4.2, v4.3 and v5.0 basecalls at both hac and sup speeds.²

To quickly test it, I used two genomes with higher-than-normal error rates in my ONT-only assembly: a Campylobacter lari genome with 18 errors and an Enterobacter cloacae genome with 11 errors. While these numbers may seem low, most of my ONT-only assemblies in 2024 have fewer than five errors. I suspect methylation motifs are behind these higher error rates.

Genome	Pre-Medaka errors	Medaka v1.12.1 errors	Medaka v2.0.0 errors
C. lari	18	28	2
E. cloacae	11	30	10

In both cases, the previous release of Medaka made the accuracy worse and the new release made it better! While this is based on just two genomes, it’s enough to reignite my interest in Medaka.

Missing plasmid pitfall

Encouraged by these results, I ran the new Medaka model on some Staphylococcus aureus assemblies from a current project. But to my surprise, Medaka often made the assemblies worse.

This genome has a 2.9 Mbp chromosome and two small plasmids: 4.4 kbp and 3.1 kbp. As I dug into the strange Medaka results, I found that the chromosome and 4.4 kbp plasmid share ~850 bp of sequence with ~80% identity. This normally wouldn’t be a problem, but some assemblies were missing that plasmid.³

Here’s an IGV screenshot of the problematic region in Medaka’s calls_to_draft.bam file from one such assembly:

In the screenshot, the reads that align cleanly are from the chromosome, but others (at the bottom) are plasmid reads that erroneously aligned to the chromosome since the plasmid was missing. The plasmid reads outnumbered the correct chromosome reads, leading Medaka to introduce over 100 changes – all of them errors.

Lesson learned: Make sure your assembly is structurally sound before running Medaka. Missing plasmids can cause havoc during polishing.⁴ I also noticed that circularisation overlap (duplicated sequence at contig ends) sometimes acquired errors during Medaka polishing.⁵

To avoid this problem, I recommend trying Hybracter from George Bouras. It includes special logic to recover small plasmids which may be missing from the long-read assembly.⁶ Alternatively, you could filter out any reads that don’t have a high-identity full-length alignment before running Medaka.

Final thoughts

While I am impressed by the new version of Medaka, if you have short reads available, you can probably skip it and go straight to polishing with Polypolish and/or Pypolca. But if you don’t have short reads, the new Medaka model is worth a try.

After running Medaka, how can you assess whether the polished assembly is better or worse? There’s no perfect solution, but I find the mean length of predicted proteins to be a decent metric.⁷ It’s also good practice to manually inspect Medaka’s changes.⁸ Ideally, Medaka should make scattered, minimal changes. If you notice clusters of changes in a single region, that’s a red flag – not just for Medaka, but for any polisher.

Footnotes

See this post and this post for examples. ↩
Other Medaka models are specific to a basecalling version and speed, e.g. r1041_e82_400bps_sup_v5.0.0. Presumably this reflects the data that model was trained on, so the new bacterial Medaka model was probably trained on a mix of different basecalling models. ↩
A missing small plasmid is a common problem with long-read assemblies. Sometimes it’s due to library-prep bias (see this paper), but it can also be a fault of the assembler. ↩
This missing plasmid issue isn’t specific to the new Medaka version – earlier versions would have encountered the same problem. ↩
Canu contigs usually have circularisation overlap, but the contig headers specify how much should be trimmed, so I wrote this script to automate the trimming. I suspect Canu assemblies cleaned by this script will perform better in Medaka than unprocessed Canu assemblies. ↩
Read more about Hybracter in this paper and this blog post. ↩
Assembly errors can truncate coding sequences, so assemblies with more errors tend to have shorter predicted proteins. So when comparing alternative assemblies, the one with a larger mean protein length is likely better. This metric is more sensitive to indel errors than substitutions, because indels create frameshifts which often lead to premature stop codons. A helper script to measure this is available here. ↩
I wrote this script to display pre- and post-polishing differences in a human-readable format. ↩

Spring OnION: a high-spec laptop for ONT sequencing

2024-08-16T00:00:00+00:00

Last year, I shared this post about our ONT-sequencing desktop, OnION. It’s been working well, but Louise and the DAMG team often need to travel, so we’ve recently set up an ONT-sequencing laptop named Spring OnION. This post shares its details and performance to help others who might be considering a similar setup.

Computer details

Laptops don’t offer as much opportunity for custom builds as desktops, so instead of buying Spring OnION from a boutique PC shop, we opted for a major manufacturer. We chose the Lenovo Legion Pro 7i 16”, purchased for 5070 AUD (on sale at the time).

Here are Spring OnION’s key specs:

Intel i9-14900HX CPU (32 threads)
32 GB RAM
NVIDIA RTX 4090 Laptop GPU
2TB SSD

While we often do bioinformatics on OnION, Spring OnION is primarily for sequencing, so we decided to leave Windows 11 as the OS rather than installing Linux. With WSL, command-line Linux is easy to use on Windows, so it still has plenty of bioinformatics capability. It’s been a while since I regularly used a Windows machine, and I was pleasantly surprised to find that the native Windows command-line (via the Windows Terminal) feels more Linux-like than it used to.¹

Basecalling performance

To test Spring OnION’s basecalling performance, I grabbed 10 pod5 files from a recent run. These contained 40k reads, ~206 Mbp of sequence and had an N50 length of ~10.5 kbp. I basecalled them with Dorado v0.7.2 using the current hac and sup models on Spring OnION, OnION and the university’s HPC. Here are the times (min:sec) for basecalling to complete:

	Spring OnION (RTX 4090 Laptop)	OnION (RTX 4090)	HPC (A100)
[email protected]	2:20	1:09	0:57
[email protected]	28:39	11:18	6:24

As you can see, Spring OnION has about 40–50% of OnION’s basecalling performance. We’ve successfully run four MinIONs simultaneously on OnION with live sup basecalling, so I am confident that Spring OnION can handle two MinIONs at once.

That being said, at the time of writing, MinKNOW is still using the [email protected] basecalling model, not the current [email protected] model. The latter has shifted to using transformers, which has increased accuracy but runs considerably slower.² So if and when MinKNOW starts using v5.0.0 models, this may reduce our number of simultaneous real-time sup-basecalling MinIONs.

Finally, despite having similar names, the RTX 4090 and the RTX 4090 Laptop are emphatically not the same GPU. The desktop version is a larger chip and clocked faster, giving it more than twice the performance of the laptop version. This isn’t surprising, as desktops have the room for more power and cooling, but the naming scheme is misleading. Don’t be fooled!

Basecalling performance – Feb 2025 update

Dorado v0.9.1 was released in Jan 2025, and its release notes include this comment:

This release of Dorado brings significant basecalling speed improvements for Nvidia GPUs with compute capabilities 8.6 (Ampere – e.g., RTX A6000), 8.7 (Ampere – e.g., Orin family), and 8.9 (Ada Lovelace).

At the suggestion of ONT’s Mike Vella, I re-ran the above benchmarks using Dorado v0.9.2. Since the same models were used, the basecalled reads are mostly unchanged.

	Spring OnION (RTX 4090 Laptop)	OnION (RTX 4090)	HPC (A100)
[email protected]	1:11	0:32	0:31
[email protected]	11:50	4:11	4:13

Compared to Dorado v0.7.2, the HPC (A100) was 50% faster for the sup model – a welcome improvement.³ More impressively, both OnION and Spring OnION (which use NVIDIA compute capability 8.9 GPUs) now deliver more than twice their previous sup-basecalling speed!

When first introduced in Dorado v0.7.0, the [email protected] model was noticeably slower than its [email protected] predecessor, but Dorado v0.9.1 compensates for that. These improvements are currently limited to command-line basecalling, since the update hasn’t yet been integrated into the Dorado basecall server used by MinKNOW. But this new enhanced speed will eventually make its way to MinKNOW, which will help keep real-time sup-basecalling for simultaneous MinION runs feasible on both OnION and Spring OnION.

Footnotes

The commands I used for Dorado were the same for Linux and Windows. I also found that basic command-line navigation in Windows is more Linux-like than it used to be. For example, I can use ls instead of dir to view the contents of the current directory. It’s not quite the same as the Linux ls command, but close enough, and it’s nice to be able to use my muscle memory. ↩
On the release notes page (login required), ONT said, ‘In this initial release, the v5 SUP models are expected to run more slowly than v4.3 models. We will add speed enhancements over the coming months.’ ↩
This boost may have arrived with Dorado v0.8.0 (Sep 2024), as this version’s release notes say ‘improves the speed of v5 SUP basecalling models on A100/H100 GPUs’. Also, the A100 has compute capability 8.0, so it may not be affected by the Dorado v0.9.1 changes, but I didn’t test this. ↩