Bacon of Hope - a technical write-up

“I think this points at the long-standing truth about demos that when you watch a demo you are not seeing the code at all. Experienced coders can guess how an effect works and they can tell that something is faked, but the code itself is not what is being demonstrated. This is why the code is less important than other things when it comes to demo quality.” (fizzer)

Is it that time of the year again? Episode III of the HAM trilogy has been released at GERP 2026, and it was about time. And here we will reveal some, but not all of its secrets. I will mostly whine about the struggles, ask for sympathy along the long and winding road.

But first a short PSA to Noby: Q.e.d, puppet.

Before we go into what’s new with this trackmo, I suggest you read the write-up for HAM Eager and the one for HAMazing if you haven’t already, this allows me to keep it brief (haha) and focus on the interesting fried bacon bits.

Three weeks later after having written above lines, maybe you shouldn’t actually spend too much time on the older write-ups. With more than 120 KB text, I think this one is enough to digest until you want to come back for more.

Download Bacon of Hope or watch it here (how often do you get a URL that starts with “hamy” for a HAM demo?), or the 60 Hz version.

Thank you all for the positive comments in person, via messages and email and even on the dysfunctional shit website.

This time, we will also have a guest appearance of Gigabates, who contributed two major effects codewise to the demo, among other valuable contributions.

Please forgive me if I, after all that time, forgot some of the details. I’ve seen and heard the demo so many times, I kind of got used to it and maybe can’t appreciate it myself completely anymore :D

Preamble

Preparation of the demo started right after the release of Is Real at GERP 2024 in January 2024, but was a bit delayed by other intros such as Undated and Framtidstro.

Again, as with the other demos, I wanted to do new effects, explore new things, and thus it meant to again start almost from scratch. New HAM effects for the world! As with the other demos, it was supposed to be a tech demo, showing advanced techniques, but not so much as having a (coherent) story.

It turned out a bit different, thanks to the music and graphics from mA2E, Optic and Steffest; All members of the team, and that also includes Gigabates, too, contributed so much to this production: Without their help this might have been a very boring experience.

Special shout-out to Virgill for helping out on the AKlang samples that brought the UNZ tracks to life.

As hinted in the HAMazing write-up, I was a bit frustrated that we released a half-finished demo that could have been much better.

So we took our time finishing it, moving deadlines slightly (from GERP 2025, to 68k Inside 2025 to Christmas 2025 (and release at GERP 2026)) until it really was a polished product.

It was totally worth it.

So… what’s that thing about the bacon?

The first mentioning of the name within the team was already in July 2023. “Bacon of Hope” is (among other purposeful uses) – if you haven’t found out already – a quote by Annalena Baerbock, former foreign minister of Germany, who on 27.06.2023 unintentionally again showed her lack of English language skills as she actually wanted to express beacon instead of bacon. Aren’t you glad to have this warmongering anti-diplomat as president of the UN General Assembly right now?

But bacon makes everything better, and as this was supposed to be another tech demo about “The HAM Technology” (psenough), you have to agree it is a brilliant title for the demo.

Let’s reframe the PlatOS Framework

PlatOS, albeit with some annoying bugs, did a good job already for HAMazing, so there was not much change here, at least regarding the basics. I’ve added some support for Shrinkler very late on (we were very low on disk space), but that actually didn’t make it into the demo, especially because it is so slow to decrunch (I have some ideas I wanted to pursue, but it seems like zajc (hi!) has already been working on that in secret – you should be watching this guy more closely, he’s a genius!).

The trackmo startup now allocates memory in an OS semi-friendly way (compared to being too OS-friendly before, not quite using all available memory). It will scan the memory lists and pick the first (highest priority) memory header for fast mem (unless it is marked as chip mem) and then scan for the chip memory one. It will copy a tiny relocation routine in lower chipmem and then execute it from there to relocate the framework into its final position in memory. It will also allocate the whole memory presented in the memory header, which, unfortunately can vary between kickstart versions (e.g. the supervisor stack is sometimes included, sometimes it isn’t). But this way it should be using the best available fast memory and all the chip memory there is (so it also works on 1 MB chip machines).

New features are some run-time generated cubic easing and Bernstein polynomials tables (without using multiplications) in the framework and some new ways to script things, inspired by Gigabates’ scripting macros. Some call it protothreads, some call it coroutines. It’s some basic jump-to-a-subroutine-via-pointer and (maybe) update the pointer thing.

Nice & shiny bootblock and modern layout

The disk layout changed a bit. I mean, why not spend endless hours for optimizations that nobody will ever notice? For HAMazing, the bootblock was reserving 512 bytes (with 206 bytes of code) and the uncompressed directory table started at block 1, so the bootblock knew where the framework (and script) was and loaded and decrunched it. Then the framework would reread the uncompressed directory table and load the first part.

This was a bit wasteful both on time and disk space. How can one get a super-fast boot time AND best compression possible?

I modified the bootloader, threw everything “unnecessary” out and with some size-optimizations on the ZX0 decruncher (partially with hints by Losso) got it down to 128 bytes (including the 12 bytes header). This is the only uncompressed part of the disk, every other stuff is going to be compressed.

It’s the bare minimum and the dynamic sizes are stored in the longword that normally contains the (useless) rootblock information:

_start:
        dc.b    'D','O','S',0 ; disk type
        dc.l    0           ; checksum
_sizes:
        dc.w    0,0

_entrypoint:
        move.l  a1,a3
        ; unpacked size.w, packed-load-size.w in _sizes
        movem.w _sizes(pc),a4/a5
        move.l  a5,d0
        add.l   a4,d0
        moveq.l #MEMF_CHIP,d1
        CALL    AllocMem
        move.l  d0,-(sp)        ; unpacked start, push to stack to use rts later
        adda.l  d0,a5           ; start of packed buffer
        movem.l a4/a5,IO_LENGTH(a3) ; and IO_DATA

        move.l  a3,a1
        CALL    DoIO

        move.l  (sp),a1
        lea     FW_PLATOS_OFFSET(a5),a0    ; start of compressed stream

        [...]   ; ZX0 decompressor follows

Looking at track 0, this leaves 5504 bytes of compressed space for the framework and the demo script. Why is this important? Well, AmigaOS had already loaded the complete track 0 into memory to be able to read the bootblock and taking data only from track 0 means that no additional disk access is required. If we want extremely fast booting, this is the key (or, well, one of the keys).

It also made my life very miserable – I quite often had to size-optimize many parts of the framework, throw out unused code, shift around things. It got even more nasty because the framework now also included the directory table appended to it and any change in disk offsets and sizes for parts could and would make a difference in compression. You will be happy to learn that in the end the trackmo used every single byte of that space.

Just pour on some water: Instant booting

Like with State of the Art, where within a few seconds after putting in the disk, there is something happening on screen, I wanted to achieve a similar result.

So I created the “Part I” zoomer, which only took about 3 KB uncompressed and 1.2 KB on disk.

Timeline:

In total, that’s 580ms. Pretty nice.

You may wonder why it starts loading from track 3 and not 1. That’s okay.

Of course, some code will continue to read the disk in the background, load and decrunch the ship horn sample and play it once it’s ready.

It doesn’t stop there of course. The LSP music data is picked up next, together with the sample data for the first two audible samples (19 KB). It loads the squid part code with the minimal assets needed, and decompresses it.

At this point, the 3.5 seconds of the Part I zoomer are over and the Squid part can start, with music and all.

In the background, more assets are loaded. E.g. the background graphics – right now the scroller is still running across garbage memory, but you can’t see that because it’s still black (except for, maybe, on AGA). By the time it needs to fade it in, everything’s in its right place.

Now the rest of the samples (131 KB) are loaded, just before they become audible in the music. More graphic assets and the next part make their way from the disk into memory. After having read 222 KB of compressed data (25% of the disk), we can finally relax and turn off the drive – until we need more squiddly things.

Shape your disk image

The tool creating the disk image has been extended to add caching, which decreased the turn-around times a lot. The framework also introduced a mode where you only specify a two letter file name, instead of the up to 16 character string. Yes, this was only done to reduce the size of the data needed.

So here’s the directory table:

Slot Disk Offset Name Disk size Hunk# Mem size ChipMem FastMem Attributes
0 128 OS 5504 0/0 9322 0 KB CHIP 9 KB FAST FAST DATA *DIRT ZX0 IN-PLACE
1 5632 CD 10991 0/0 10991 0 KB CHIP 10 KB FAST FAST DATA
2 16896 P1 1225 0/2 3204 0 KB CHIP 3 KB FAST FAST HUNK CODE ZX0 IN-PLACE
3 18122 P1 10 0/2 0 0 KB CHIP 3 KB FAST FAST HUNK RELOC
4 18132 P1 56 1/2 56 0 KB CHIP 3 KB FAST CHIP HUNK DATA
5 18188 HS 10406 0/0 14722 14 KB CHIP 0 KB FAST CHIP DATA ZX0 IN-PLACE DELTA8
6 28594 L1 7757 0/0 29071 0 KB CHIP 28 KB FAST FAST DATA ZX0 IN-PLACE
7 36352 S1 8301 0/0 19352 18 KB CHIP 0 KB FAST CHIP DATA ZX0 IN-PLACE DELTA8
8 44654 SQ 10943 0/2 50032 0 KB CHIP 48 KB FAST FAST HUNK CODE ZX0 IN-PLACE
9 55598 SQ 54 0/2 0 0 KB CHIP 48 KB FAST FAST HUNK RELOC
10 55652 SQ 2944 1/2 99072 96 KB CHIP 48 KB FAST CHIP HUNK DATA ZX0 IN-PLACE
11 58596 SM 32964 0/0 91496 89 KB CHIP 0 KB FAST CHIP DATA ZX0 IN-PLACE
12 91560 S2 89124 0/0 134492 131 KB CHIP 0 KB FAST CHIP DATA ZX0 IN-PLACE DELTA8
13 180684 SO 20415 0/0 27616 26 KB CHIP 0 KB FAST CHIP DATA ZX0 IN-PLACE
14 201100 EY 221 0/2 280 0 KB CHIP 0 KB FAST FAST HUNK CODE ZX0 IN-PLACE
15 201322 EY 8 0/2 0 0 KB CHIP 0 KB FAST FAST HUNK RELOC
16 201330 EY 323 1/2 348 0 KB CHIP 0 KB FAST CHIP HUNK DATA ZX0 IN-PLACE
17 201654 BN 4096 0/0 4096 0 KB CHIP 4 KB FAST FAST DATA
18 205750 MP 9595 0/2 15428 0 KB CHIP 15 KB FAST FAST HUNK CODE ZX0 IN-PLACE
19 215346 MP 68 0/2 0 0 KB CHIP 15 KB FAST FAST HUNK RELOC
20 215414 MP 11975 1/2 25532 24 KB CHIP 15 KB FAST CHIP HUNK DATA ZX0 IN-PLACE
21 227390 DB 12336 0/0 26400 25 KB CHIP 0 KB FAST CHIP DATA ZX0 IN-PLACE
22 239726 SA 11174 0/0 36000 35 KB CHIP 0 KB FAST CHIP DATA ZX0 IN-PLACE
23 250900 TT 15037 0/2 34728 0 KB CHIP 33 KB FAST FAST HUNK CODE ZX0 IN-PLACE
24 265938 TT 62 0/2 0 0 KB CHIP 33 KB FAST FAST HUNK RELOC
25 266000 TT 72385 1/2 121764 118 KB CHIP 33 KB FAST CHIP HUNK DATA ZX0 IN-PLACE
26 338386 SS 10540 0/2 54120 0 KB CHIP 52 KB FAST FAST HUNK CODE ZX0 IN-PLACE
27 348926 SS 10 0/2 0 0 KB CHIP 52 KB FAST FAST HUNK RELOC
28 348936 SS 56 1/2 56 0 KB CHIP 52 KB FAST CHIP HUNK DATA
29 348992 L3 6067 0/0 28766 0 KB CHIP 28 KB FAST FAST DATA ZX0 IN-PLACE
30 355060 S3 92830 0/0 136494 133 KB CHIP 0 KB FAST CHIP DATA ZX0 IN-PLACE DELTA8
31 447890 P2 1215 0/2 3660 0 KB CHIP 3 KB FAST FAST HUNK CODE ZX0 IN-PLACE
32 449106 P2 18 0/2 0 0 KB CHIP 3 KB FAST FAST HUNK RELOC
33 449124 P2 29339 1/2 36468 35 KB CHIP 3 KB FAST CHIP HUNK DATA ZX0 IN-PLACE DELTA8
34 478464 BC 4674 0/3 8628 0 KB CHIP 8 KB FAST FAST HUNK CODE ZX0 IN-PLACE
35 483138 BC 36 0/3 0 0 KB CHIP 8 KB FAST FAST HUNK RELOC
36 483174 BC 11577 1/3 25196 0 KB CHIP 33 KB FAST FAST HUNK DATA ZX0 IN-PLACE
37 494752 BC 93509 2/3 144196 140 KB CHIP 33 KB FAST CHIP HUNK DATA ZX0 IN-PLACE
38 588262 BB 10112 0/3 22396 0 KB CHIP 21 KB FAST FAST HUNK CODE ZX0 IN-PLACE
39 598374 BB 24 0/3 0 0 KB CHIP 21 KB FAST FAST HUNK RELOC
40 598398 BB 64 1/3 64 0 KB CHIP 21 KB FAST CHIP HUNK DATA
41 598462 BB 59446 2/3 68560 0 KB CHIP 88 KB FAST FAST HUNK DATA ZX0 IN-PLACE
42 657908 ZF 21074 0/2 35212 0 KB CHIP 34 KB FAST FAST HUNK CODE ZX0 IN-PLACE
43 678982 ZF 24 0/2 0 0 KB CHIP 34 KB FAST FAST HUNK RELOC
44 679006 ZF 48 1/2 48 0 KB CHIP 34 KB FAST CHIP HUNK DATA
45 679054 TW 12666 0/3 19092 0 KB CHIP 18 KB FAST FAST HUNK CODE ZX0 IN-PLACE
46 691720 TW 80 0/3 0 0 KB CHIP 18 KB FAST FAST HUNK RELOC
47 691800 TW 12507 1/3 65536 0 KB CHIP 82 KB FAST FAST HUNK DATA ZX0 IN-PLACE DELTA8
48 704308 TW 3035 2/3 6592 6 KB CHIP 82 KB FAST CHIP HUNK DATA ZX0 IN-PLACE
49 707344 TM 14476 0/2 52056 0 KB CHIP 50 KB FAST FAST HUNK CODE ZX0 IN-PLACE
50 721820 TM 182 0/2 0 0 KB CHIP 50 KB FAST FAST HUNK RELOC
51 722002 TM 4375 1/2 12220 11 KB CHIP 50 KB FAST CHIP HUNK DATA ZX0 IN-PLACE
52 726378 HB 3266 0/2 6908 0 KB CHIP 6 KB FAST FAST HUNK CODE ZX0 IN-PLACE
53 729644 HB 44 0/2 0 0 KB CHIP 6 KB FAST FAST HUNK RELOC
54 729688 HB 71138 1/2 87736 85 KB CHIP 6 KB FAST CHIP HUNK DATA ZX0 IN-PLACE
55 800826 BZ 55135 0/2 138240 0 KB CHIP 135 KB FAST FAST HUNK CODE ZX0 IN-PLACE
56 855962 BZ 40 0/2 0 0 KB CHIP 135 KB FAST FAST HUNK RELOC
57 856002 BZ 1392 1/2 2188 2 KB CHIP 135 KB FAST CHIP HUNK DATA ZX0 IN-PLACE
58 857394 EP 12872 0/2 38784 0 KB CHIP 37 KB FAST FAST HUNK CODE ZX0 IN-PLACE
59 870266 EP 40 0/2 0 0 KB CHIP 37 KB FAST FAST HUNK RELOC
60 870306 EP 8272 1/2 32836 32 KB CHIP 37 KB FAST CHIP HUNK DATA ZX0 IN-PLACE
61 878578 EU 2265 0/0 10652 0 KB CHIP 10 KB FAST FAST DATA ZX0 IN-PLACE
62 880844 ES 16232 0/0 20398 0 KB CHIP 19 KB FAST FAST DATA ZX0 IN-PLACE
63 897076 WH 4043 0/0 4884 4 KB CHIP 0 KB FAST CHIP DATA ZX0 IN-PLACE DELTA8

Total size uncompressed: 1815988 (1773 KB). 31 files in image 901120 of 901120 used (0 (0 KB) free).

Albeit the disk containing many already hardware-compressed HAM graphics, we still reach a compression ratio of about 50%, which is much better than in the previous trackmos. Note that much of the data needed for the effects is run-time generated where possible, instead of using large pre-calculated tables and data blobs.

The harddisk version contains some higher quality samples or graphics with better dithering and less quantization that didn’t fit the floppy disk and is therefore about 46 KB bigger.

How the “Hope” part theme was developed

So the story about the whole Hope part goes as follows… This is going to be a bit chaotic, please stick with me.

On November 10th, 2024, I came up with this idea of using the HAM5 mode to create something that resembles horizontal Alcatraz/Kefrens bars (after having done the other “normal” bars a couple of days before). On the same day, I created a proof of concept with a random squid image from the internet.

And then in the evening I showed it to the team.

Squid proof of concept

It only had one moving tentacle arm, but they liked it anyway.

How does it work? This will be discussed further down.

Summer of Squid Atari version

The next day, Optic showed us his “Summer of Squid” image he had drawn for an abandoned Atari demo in his “tentacle phase”. Little did I know that he had more unused graphics.

A day later I presented a new prototype with five moving arms.

Squid with five arms

Gigabates then told me he had done this tentacle effect for day 8 of the Tiny Code Christmas on the TIC-80 (and later for D-Funk). And pondering whether a special HAM version of that would be feasible.

TIC-80 tentacles by Gigabates

This then turned into the Tentacle part later (see below).

Soon after Optic again came up with this 16 colour multi-screen image (320x600) from an abandoned project after he had seen the tentacle twirl:

Multiscreen Summer of Squid

This looked really wonderful and not something that could be wasted.

However, the squid in that image would not work with the effect. I tried to flip it for the right orientation, but its arms were going not straight and were too thick.

So Optic did what he does best, he just drew another squid:

New squid, but facing the wrong direction

A week later I had added the water effect for the top screen.

Then mA2E chipped in (heh) and said he could donate the fantastic tune that I already had heard a lot of times in prior years as Pretracker version, before I left that project and created Frustro with the left-over code. But this was now years later, and mA2E feared that the project was going nowhere and the tune would never see the light of day.

I wasn’t too happy about that at first because that project didn’t deserve another hit. And taking away the music from a demo project is like ripping out its spine in Predator style. But then again, it’s his music and he can do what he wants to do with it. And I did understand that he wanted to get it out (but he probably didn’t anticipate that it would take one more year!).

The tune that mA2E renamed to “Bacon of Hope” (like the demo) set the pace to Part I.

We felt that the squid, the tentacles and the dodecahedron could fit to the smooth and soothing track.

Part I Zoomer

Part 1 Hope

These are simple bitmap zoomers, zooming a 48x30 pixels high bitmap by only setting the zoomed start and end points of each line, and then filling the one-dot image with the blitter.

It might not be the fastest method, but this time I am computing the scaled offsets and then generate some run-time speedcode for one line that turns these offsets into a series of testing and bchg instructions which are in turn just executed for each of the 30 lines.

So the generated code looks something like that with the dx register and offsets being updated for every zoom factor:

        ; a0 chunky texture, a1 line buffer
.code
        lea     42(a0),a0   ; d2 high / low
        move.l  a0,a2
        tst.b   (a0)+

        REPT    47
        cmp.b   (a0)+,(a2)+ ; d3 high (8)
        beq.s   .cskip      ; d3 low (10)
        bchg    dx,$offset(a1) ; d4 high/low (16)
.cskip
        ENDR

        tst.b   (a2)        ; d5 high
        beq.s   .cskip2     ; d5 low
        bchg    dx,$offset(a1)
.cskip2

The vertical zoom is done with the copper of course.

Squid

Okay, now that a bit of the backstory has been told, let’s get back to the effects, one by one.

The screen is split into three sections (beta gfx):

The top part used twelve colours, which left some opportunities to change the gradients in the sky:

Top part

The middle section only used nine colours, but not at the same time, so one colour could be remapped with the copper further down. Eight colours was ideal for dual playfield use (see later):

Middle section

We would need to have some colour gradient that would suck the other red and green components out of the colours like it actually also happens in water. In the HAM section at the bottom of the sea, it could only use (almost) pure blue tones from $100 to $10f.

Bottom part

I manually separated background and squid and the resulting 5-bitplane (only 132 pixels high though).

Then, finally there is the second squid to the right (early gfx).

New squid, but facing the wrong direction

As HAM only works from left to right, the squid had to be mirrored and the effect needed to start at a 16 pixels boundary.

Top section: Neruda poem

012_neruda_dawn.png

I put the poem “Ode to Hope” by Pablo Neruda into the beginning of the demo to set the stage. My connection to Neruda is that I had the chance to visit his home in Valparaiso about 15 years ago. Neruda was poisoned and killed by the Pinochet regime.

It was a lucky coincidence that his Ode to Hope was a perfect match to the graphics, the feeling and everything. There is a hint of despair with the empty boat, but we still have hope that whoever was inside the boat is going to be found alive – or maybe there was no-one in the boat to begin with.

The colours of the sunrise, the well animated silhouette of the sole bird, the calm and solemn water. Hope and fear, the gifts and the danger, life and death, it’s this opposition you can feel when you read his poem. But hope will prevail!

To honour Neruda, his poem is also available in its original Spanish language by holding down the right mouse button just before the part starts.

The poem is printed (with little random vertical offsets and proportional font) into a tall bitmap to avoid spending any time on rendering, and because we have the memory to spare (we will recycle the memory for different purposes later).

As said before, the top portion of the screen used to have 16 colours which allows us to use an extra bitplane for the text. We can even waste the sixth bitplane for a nice shadow using extra half-bright mode (which also needs a copy of the poem slightly offset to the right by two pixels because we cannot use the hardware shifting with bplcon1 here).

We need to update the bitplane pointers for bitplane 5 and 6 every line for the wobble effect to work (we can’t use even/odd modulos here).

The tricky part is how not having to update 16 colours for the gradient of the text, because we’re just moving a bitplane across four others and normally that would just push up the colours from the 0 to 15 range to the 16 to 31 range. Not only would this take up more copper time that we have in the horizontal blank period (so the colour changes would “bleed” into the left part of the image), it would also make it impossible to use the right colours for the boat sprite.

Fortunately, some golden coders have found an undocumented feature called SWIV mode after the game SWIV where this trick was used in the top part of the score panel or so. If you set the display priority control register (bplcon2) to the illegal value $003f, then every pixel where bitplane 5 is set, will end up at index 16 on OCS, regardless what’s in bitplanes 1 to 4 (and 6, see below). I had used this feature in prior intros before (Waffles…), but here it came in extra handy.

The SWIV mode meant that only one colour needed to be updated per line and colours 17-31 were available for the boat. This unfortunately only works on OCS, so on AGA there is a different code path where you won’t get the gradient, but rather inverted colours and the sprite colours are set via the sprite palette banking feature.

But what about bitplane 6 with the shadow? I already had prepared to cut out that bitplane so that it does not overlap with bitplane 5 and the shadow would turn into a half-bright version of the scroller gradient. However, it seems the Amiga takes it very seriously with that bitplane 5 priority, so with both bitplane 5 and 6 set, it still just maps to index 16 without half-brightness! So I could remove that code again and learned something more.

The boat with and without shadow

The boat is a bit wiggly to give it a whiff of rotation rather than keeping it completely rigid. The bird is just blitted and restored at the right off-screen time, because we can’t use double buffering here.

The water effect is just line selection via modulo with some sine functions. But it does look excellent on the graphics that Optic drew. The data for the waves are precalculated into a table to avoid too much CPU time being used, because it still needs to be applied as we scroll down later and want to have some more effects.

There was a blåhaj planned to swim across the bottom part of the screen, but in the end, it didn’t make it in. No Blåhaj, no Stingray

The bottom part scrolling in then goes from 16 colours to eight colours for the middle part and then later to the HAM5 part, so the memory layout for the non-interleaved bitmaps needs to be in slightly non-standard order to be able to simply adjust the number of bitplanes for the sections.

Everything will be fulfilled!

There is temporal dithering on the underwater colours towards the sea ground to compensate for the limited number of gradients possible with only 4 bit depth per gun. However, this will flicker badly on anything not running at the right frame rate, like a 60 Hz TFT. It might be a lesson learned to not use such dithering the next time.

Mid-section: Jellyfish

Middle section

Once we start scrolling down, we will switch to a double-buffered version of the middle section. This allows us to paint lots of three coloured shadebob’ed transparent (with saturation) jellyfish that Optic drew. The jellyfish come in two sizes and as a second, runtime generated version for the use as far distance silhouette that darkens the water instead of lighting it up.

So this is no normal bob rendering! Notice how overlapping jellyfish will have true transparency and also against the background.

Jellyfish frames

Once the final vertical position has been reached, the screen mode switches to dual playfield to allow the logo to be painted.

What was the biggest obstacle with all of this? Keeping the temporal dither flickering at 50 Hz while the new copperlists were prepared, so that nothing would be jerky and no frame is being dropped at any point. If you’re looking at the WinUAE capture with DMA view enabled, take a note on how often the copperlists change significantly during that time.

Mid-section: Logo painter

The painting of the eight colours logo by Steffest is a bit more complicated than it might look on first sight. It’s being faded in using ordered dither on a ball shape. So it is a shade-bob routine similar to what I used to paint in the Sexy Woman in Ham Eager, but with added dithering.

To be fair, Steffest’s logo looked a lot better when it wasn’t downcoloured to just seven colours plus transparency. But it is really hard to get a good-looking logo with so many little details – it is still awesome!

Steffest's logo

With all the jellyfish being rendered in the background and the six bitplanes of dual playfield being enabled there is not much raster time left. There is a gradient in the logo, but it only changes one colour per rasterline – it still makes it a lot more colourful!

I invested some time to manually tweak the logo gradients to look as nice as possible as you can see from this debug version (that’s about 100 different colours in this eight colour logo):

Debugging image to make the gradients more visible

The water wiggle (bplcon1) needs to be updated, too. Note that for the later appearing credits text, I just left the last bplcon1 value hanging on. That movement of the credits looked good enough, so I kept it.

Logo almost ready with gradient

Mid-section: Seaweed

Many years ago, I was thinking of a sprite effect, where horizontal moving bands would overlap and the overlap would create some fancy transparency effect. I never got the chance to put it to use until now.

That's not how it looks in the demo

Eight sprites are moving in front and behind the second playfield. But how is the transparent glitter effect achieved? Simple! Just set the “attached” bit, so the sprites are actually 15 colour sprites, but unlike normally being placed directly over each other, they have different positions. That makes them three colour sprites with two sets of palettes (17, 18, 19 and 20, 24, 28), except for when they “accidentally” overlap, where the other colours come into play. The glitter effect only occurs between pairs of sprites though, so you won’t see it happening between e.g. sprite 0 and 2-7.

Observing the above screenshot closely, you might notice that the seaweed in the demo looks differently. That’s because it was taken before Steffest replaced my coder’s placeholder graphics last minute with some nice varied graphics (of course this had exceeded the free disk space, but we somehow fixed that in time):

Last minute new seaweed

It might not be totally obvious that the seaweed is made of sprites. In most demos, sprites are just used as 16 pixel wide stripes going downwards. You can, however, change the horizontal position every line. It’s easy to update the sprxpos control word, because it will not disarm the sprite, but this gives you only a control over half the horizontal resolution (like with the laser in Ham Eager in the desert scene). It’s a bit more tricky to write the X position at a full one pixel resolution by also writing the sprxctl word. Timing is everything, because writing sprxctl also disables the horizontal comparator for that line (so the sprite would not show for the rest of the rasterline).

Also, updating such a nice dual sine wobble for eight sprites with a height of 224 pixels is not something that you would manage to do with the CPU alone. This would require some heavy shifting of values until they end up in the right words. So of course, we will do it with the blitter, which is perfect for this kind of bit fiddling.

Even with these optimizations, there wasn’t much DMA time left: A huge copperlist, eight sprites, some bobs still moving around, the blitter updating the copperlist to get the sprites to their right position, dual playfield with five or six bitplanes enabled and the CPU running hot with movem/add.l sequences to calculate the seaweed movements – that really almost saturates the frame.

Mid-section: Credits

Steffest drew these fine four colour rope graphics.

Steffest drew something nice!

Please don’t ask about “Graphx” ;) You can also easily see, what’s most important, just by the size of it :D

Look at this version with all the temporary graphics still in place to grasp how much it changed over time:

Early version

The fading in or cross-fading of the credits is not a normal “replace pixels through a changing mask” function. Instead, the pixels are faded in with that noise mask until they reach the brightness of the target pixel.

Similarly, when cross-fading to the next text, I had planned for each old pixel to fade out and the new one to fade in. But instead, it keeps fading brighter until it wraps around from 3 to 0, which actually looked quite cool (like glowing hot and disappearing).

Old demoscener rule: Don’t change it if it looks good! Neither was it as computational heavy as the initial plan.

Loading the graphics into the right spots had to be done in the background while the effect is running, because the frame time was so tight. It’s really nice to have multitasking.

Notice that there are over 250 different colours in this scene, thanks to all the gradients.

During the credits the last few jellyfish exit the screen on the top.

Bottom section: First squid

With the seaweed and the heavy copper lifting gone, we can draw a couple transparent shade bobs for some bubbles (coder gfx), while shrinking the view down to a 16:9 letterbox display and scrolling down (and loading the final squid graphics from disk).

The 320x180 format was chosen early on for many of the HAM effects in all three HAM demos because of the highly demanding DMA bandwidth otherwise being needed for just displaying a full 256 pixel high HAM screen. In hindsight, the last effect in this part could have worked also for the full height, but who wants to go back and draw some more background graphics?

The final graphics for the bottom part

The first squid is then scrolled into view. Remember that it is using HAM5 mode (at this point in the demo, it could also have just used a normal 32 colour mode with the upper 16 colours set to a blue gradient), so there is a bit of eight colour graphics still present.

While we are scrolling, a background task calculates the movement tables for all frames of each squid tentacle.

Once the final parking position has been reached, we can on-the-fly remap the normal eight colour graphics to match the HAM5 colours to create a full height HAM5 image.

We will also need to reorganise the memory massively now. The next effect uses a 640x180 screen with double buffering and interleaved bitplane layout, that’s two times 70 KB of continuous chip mem. Nothing of that is visible to the viewer except for a short blip in the DMA view (or Coppenheimer) where the blitter pushes around all the memory and goes from one copperlist to the next.

Only the left squid part

The flashing of the squid colours should symbolise some sort of communication. Squids use these colour bursts to talk to each other, which is pretty cool! The colours are generated in realtime using a 2D wave propagation / smoothing effect that interpolates between three different palettes. It also has the effect of adding some wilder colours in the bars on the right-hand screen.

Squidcatraz

Now we finally reach the actual HAM effect, after so much ado beforehand. Remember that it took less than a day to get this effect working?

While we scroll to the right, we will store some of original graphics on the right quarter of the image into one of the buffers on the left quarter of the no longer visible part of the image to be able to restore the background behind the moving tentacles. This gives some extra virtual memory recycling points.

So let’s finally reveal how the effect works!

Only five bitplanes are active in HAM mode (so let’s call it HAM5), which means that you have 32 choices for each pixel instead of 64 for normal HAM6. The first 16 colours are index colours and the last 16 can only modify the blue component of the previous pixels (because the ones for red and green are in the next 32 colours that we can’t select with only five bitplanes).

But if the whole screen has a blue shaded background (16-31) and only modifies blue, this means that prior values of the red and green components will continue to bleed to the right. This is the same side effect that you use for HAM filling: Just bleed on purpose!

HAM5 Kefraz squid

So if we draw our arms with index colours, the edges where the bobs have a vertical offset, the prior pixels will continue until the right edge, which is basically what Alcatraz/Kefrens-bars are all about. The squid image itself needs one bluish index colour on the right edge to stop it from bleeding into the blue by default.

We’re using copper driven blits here. There are 352 blits for the arms, each of them blitting a two pixel wide part of it into five bitplanes. The height and the lengths of each arm are slightly variable.

To make the blit as optimal as it could be and minimizing the register updates required, the texture is written into memory in the order so that the source pointer only needs to be loaded once and the address is increasing over all the blits without being touched again. I even wrote a program to convert the arms painted by Optic to be rearranged in the right order (the stripe of data you see at the left-hand side).

Source image for the arms

Because Optic somehow lost interest to finalize these tentacle graphics, I (had to) finished them (but it does look better with more gradient!).

The update of the movement is done via generated speedcode, that just bulk loads the data from the big table in fast ram and writes the new values into the given offsets into the copperlist.

There was not much raster time left, so only the fifth bitplane is cleared every frame and the other four bitplanes of the background are restored using gradual refresh, going from right to left in quick succession. This leaves some blue ghost-arms for a couple of frames, which is not so bad.

I hope you noticed the sunk DsR ship in the background. Amazing work by Optic again!

HAM5 is not available on AGA. here, trying to enable HAM with any other combination of six or eight bitplanes active will just do nothing. Hence, there is an AGA code path that makes it HAM6 with an empty sixth bitplane. That needs some more DMA time though, but hardware-scrolling makes it tricky to use higher fetch modes… It’s a mess, but I got it working on AGA after all.

Wave wipe

At the final horizontal position, the hardware scrolling is disabled so we can display eight sprites instead of only seven (ECS) or only six (OCS).

We will use this to create the (surreal) wave splashing over the effect. These sprites only use three colours, where one of the colours is black, so we can completely fill the area covered by the sprites.

As none of the graphic artists wanted to replace my coder sprites with something more beautiful, you have to live with that.

Wave wipe

As we only have eight sprites with 16 pixel width each, the maximum area we can cover is 128 pixels. This means we need to cut off the left view area once the area is fully black.

There is a slight colour gradient in the wave. Why? Because we can.

Lessons learned

Don’t be overly conservative regarding memory consumption and try to save and recycle memory unless you absolutely must. Don’t be wasteful either. There must be something in between where you don’t go nuts either way.

Turning a technical effect into a fully fledged story-like part with all the polish takes a lot of time and effort. This part with over 7500 lines of code is by far the largest (and longest with over two minutes) in the whole demo.

Meister Polyeder

Alright. This part has got nothing to do with this fella called Meister Eder:

Meister Eder

Except for parts of the name. In German, a polyhedron is a Polyeder, which is funny because both words are of greek origin, the former meaning “many bases” while the latter is about “many vertices/corners”.

In Frustro I had experimented the first time with anti-aliased lines drawing using the blitter. The first scene in that intro was not using subpixel accuracy there, though.

Frustro

It was also time to try some new geometric shapes and the most logical one would be to go from the typical demoscene cube to a dodecahedron, because it shares eight of its corners with a cube and is one of the five Platonic solids.

The math

So a dodecahedron consists of 12 faces, 20 vertices and 30 edges. If we want to achieve super smooth movement and high accuracy for sub-pixel accuracy, it’s important not to cut corners (heh), but keeping the rotation and projection math at elite™ quality.

But normal matrix multiplication will take twelve multiplications to generate the matrix, nine multiplications per vertex for a rotation around three axes and three multiplications to obtain the normal vector z coordinate for each face, totalling in 228 multiplications.

That’s not gonna cut it. Let’s look at it a bit more in detail:

So many numbers

We got the orange vertices (the cube ones), which can be represented by a linear combination of three normal vectors in x, y and z direction ([1, 0, 0], [0, 1, 0], [0, 0, 1]). Because the normal vectors contain zeros in two of their 3D coordinates, we will only need 14 multiplications to rotate all three normal vectors around three axes.

phi (φ) is the golden ratio and is ~1.618034. This golden ratio number has many nice properties, but one of them is that 1 / φ = φ - 1 (another one is φ^2 = φ + 1). This means the green, blue and pink coordinates can be calculated by scaling the rotated normals (nine multiplications in total) to the golden ratio (or 1 / φ) and then either add or subtract the normal.

The twelve face normal z coordinates turn out to be at permutations of [φ, 1, 0], so we get these almost for free, too.

Using all the symmetry features of the dodecahedron, we are down to 23 multiplications for the whole object. And lots of negs and adds.

Unlike other parts, where a standard sin/cos table with 1024 entries is used, we are using one with twice as many entries instead, and we will interleave sin and cos so we can grab both values with one longword read. There is no scaling involved for rotating the normals, we will take the -0x4000 to +0x4000 values verbatim.

For the perspective projection, we will still use standard division to get the best accuracy (instead of a 1 / z multiplication). By selecting the right zoom factor we get results that fill the numeric range of a 16 bit word nicely.

The aftermath

With four decimal bits of our 12.4 fixed point coordinates, we will get really silky smooth subpixel movement. Now we only need to be drawing anti-aliased lines. For that purpose, I no longer relied on the implementation by Kalms, but wrote everything from scratch.

I prototyped my line drawing code in Processing, trying a long time to get all the different cases right. Or right enough. It’s not good enough for filling, but line drawing is fine. The code is a lot more compact than the one by Kalms. However, there are still two multiplications required for every line (but only two in all cases, unlike in Kalms’ code) and this adds up pretty quickly.

For getting a good anti-aliased effect, we need to draw each line of the polyhedron four times. Unfortunately, as far as I know you cannot draw blitter lines in a way that it is additive to a background (like shade bobs) – simply because the drawing operation happens always in-place.

First section

In Frustro, the lines with slight subpixel offset are therefore drawn into two separate bitplanes. With Meister Polyeder, four lines are drawn into four different bitplanes for the front facing edges and two lines are erased from the background for the backfacing ones.

Wait, background? So we have four bitplanes, which means we get 16 colours. However, unfortunately, we can’t use an arbitrary palette for it. If we want to get lines that look anti-aliased, the palette needs to be additive corresponding to the number of bits set – this way, overlapping lines will be brighter. We are allowed to alter the hue for each bitplane, but the brightness should be roughly the same.

If all four subpixel lines overlap, then colour 15 (white) will be visible. Here’s an example of the palette used for the effect:

Palette for Meister Polyeder

What I tried to show here is the original palette, and below how an overlap between the four separate lines will turn out, and some boxes to represent the expected brightness of the final line.

I gave Optic the specification regarding the palette, and he drew something really nice, but completely against the limitations (you might… uhm… be able to find it somewhere in the demo):

Winner, winner, chicken dinner!

So in the end the demo got some coder graphics, which was created with my limited abilities and basically only 7 colours. This is also important because the lines are or'ed onto the background and if there’s too much background graphics, the aliasing will be lost.

Coder graphics

Much (1/3rd) of the frame time is spent on restoring the background (four bitplanes), but that’s where the math (3D and line drawing) can run in parallel, so at least that’s effectively “for free”.

The rest of the time the blitter draws all those lines, while the CPU calculates updates the next copperlist for line drawing.

Raster time

There is a minor, easy to find “happy accident” mode where it shows how it once looked when I got something wrong.

Dodecahedron fo(u)rce

To get a bit of variation to the effect, it is going into split screen mode (without the background) and adds some fancy interpolated lines in the inside of the object.

Quaddddro

Of course, we’re not drawing the 128x128 pixel object four times. Instead, we use a corkscrew buffer, where we keep clearing and painting in the top right and the left object and bottom objects are just from prior frames (the bottom row is duplicated with the copper, showing the bitmap memory shifted by one position).

The tricky part is timing the coppersplit so that it doesn’t interfere with the copper driving the blitter. It’s a bit easier here than in the next section, because there’s only one split.

The effect is actually CPU-bound because much time is spent on calculating the data for the line drawing.

033_quadraster.png

17, my lucky number

For the final section of this effect, we reduce the size of each dodecahedron to 64x64 but show 16 of them first, again using the corkscrew buffer.

Then a 17th one appears that moves across the coppersplits. But how is this possible? If we’re recycling the same bitmap memory for the repeated rows, then anything drawn there would surely also repeat across the lower parts of the screen, right? (And don’t even think about the wrap around handling!)

Seventeen dodecahedra

So if it can’t be on the normal bitplane… it must be… a sprite?

Indeed, that’s exactly what it is. Just like in Magnum A.I.. However, simply just generating a sprite version of one of the buffers and then putting it on the screen would not look very nice – the sprite can be only in front or in back of the bitmap graphics. And if it’s in front, it might draw darker pixels where there were brighter ones before, destroying the sweet antialiasing (and vice versa if it was beneath).

The solution is straight-forward (just like in Magnum A.I.): We need to merge background and foreground bitplane by bitplane. The hardest part is getting the horizontal offset, the shifting and the vertical wrap right, but otherwise, easy-peasy!

Tentacles

I will hand over to Gigabates to explain what’s going on this part.

Tentacles part screen grab

The idea for this part was to make a HAM version of an effect in D-Funk, which was originally inspired by a TIC-80 effect by Aldroid that was included in Tiny Code Christmas. The basic method is a feedback effect where we draw moving ellipses to a scrolling buffer without clearing, and the resulting trail gives the appearance of a group of intertwining bars / tentacles. Of course I ended up making it way more complicated than this!

The scrolling buffer uses the ‘corkscrew’ method, meaning it’s one continuous block of memory where we just increment the start pointer, and only need to restore the background image to the rightmost word to overwrite wrapped data. The background image itself is a horizontally tiling, 384x256, true colour image by Steffest:

Original background artwork

We blit eight separate ellipses over the background, one for each tentacle. On D-Funk I was able to race the beam to do this in a single buffer, but this was not feasible this time round. With a double buffered setup we need to draw two updates on each frame for the feedback effect to work i.e. each ellipse is drawn twice: in the current and previous position.

The HAM screen setup

But before we go any further, how are we drawing bobs in a HAM screen in the first place? Platon has already demonstrated the ‘correct’ way to do this by fixing up the fringes on the start/end columns of the bob using indexed colours. This is by far the superior method, but too expensive to do for the number of blits required for this effect.

Instead I used a HAM7 based method. Platon talks about his version of this in relation to the lighthouse part, but here goes my attempt. This uses fixed data for the HAM control words, where each pixel updates a given RGB component in a repeating 8 pixel sequence RGBRGRBG. In other words, for each 8 pixel block, the first pixel always updates the red component, the second pixel updates the green, and so on. Unlike a true HAM6 image, indexed colours are not used at all, and we don’t get to choose the best component to update for each pixel. You’ll notice that in each block the red and green update three times, and the blue only twice. This is deliberately chosen to give the best perceptual result. Blue carries the least luma information and is less noticeable.

Overall this still results in a less than ideal result with visible artifacts, but I try to work around this in the image data. Any hard edges are particularly noticeable, and a small amount of blur can improve the appearance a lot.

The result of all this is that if our BOB images follow the same control sequence, we can just do a normal masked blit into our background. Well almost - we just lost the ability to use the blitter’s horizontal shift, as this will move the BOB data out of alignment with the control words. To get around this we need multiple pre-shifted versions of the source graphic like we would on lesser platforms :-)

As if this wasn’t enough, I add another level of complexity! In order to avoid the visible vertical banding resulting from every 8th pixel setting the same component, I stagger the control word sequence per line. The copper has to set the bpldat5/6 registers per line in order to achieve this.

RGBRGRBG…
GBRGRBGR…
BRGRBGRG…
RGRBGRGB…
GRBGRGBR…
RBGRGBRG…
BGRGBRGR…
GRGBRGRB…
…

But why stop there? We also introduce an interlace style effect to further hide blockiness, by offsetting the pattern by one on alternate frames. To be honest this is really the thing that makes it tolerable, especially for the text rendering, but it comes at the cost of needing double the data for the background image.

Taking all of this into account, the index of the preshifted BOB variant to use is (x+y+(frame%2))%8.

AGA support

As AGA doesn’t support HAM7, we need to use real bitplane data, filled with our pattern for bitplanes 5/6. The copper needs to set the pointers per line instead of data registers. For many effects this is enough, and the faster CPU speed of AGA systems (we can expect a minimum of 68020) makes up for the lost performance from the free bitplane DMA in HAM7 mode. Not so here! With all the blitting going on, DMA is still a bottleneck, and we can’t achieve 50Hz on base A1200. To get around this we need to set FMODE=1, to enable 32bit bitplane fetching. This requires that all bitplane data is 32 bit aligned, and hardware scrolling is now 0-31 pixels, using the additional bits in bplcon1. This results in some fiddly alternate code paths for AGA to compensate for this.

Tentacle texture

Ok, so now we have the basic method to blit the moving BOBs over the scrolling background, but there are a few more things going on. Rather than just overlaying the same graphic each time, it appears to ‘paint’ a texture along the resulting tentacle. This is done by positioning a repeating texture within an ellipse mask, and scrolling this texture in sync with the movement. What ends up happening is that the leftmost pixels of the ellipse remain visible, while the rest are overwritten, effectively drawing the texture along a path.

The tentacle texture scrolling across the ellipse mask

If we just scrolled the texture graphic at a constant X speed, the resulting graphic would appear stretched as the ellipse moves on the Y axis. Instead I base the scroll increment on the total Euclidean distance traveled since the previous frame. I used the following approximation which h0ffman shared with me a while back (thanks!):

dx+dy - min(dx,dy)/2 - min(dx,dy)/4 + min(dx,dy)/8

The radius of the ellipse also varies by just picking from a range of pre-generated masks.

Text Writer

The addition of the greetings text works in much the same way as the texture, and just introduces another round of blitting. Using a variable width font, it overlays the current character onto the texture at the correct offset in a temporary buffer. For each tentacle we have a list of strings with a pixel offset. Getting these in the right place where they’re actually readable, and not obscured by other tentacles was very much a matter of trial and error.

The biggest challenge was probably making a font that minimised the artifacts caused by sharp edges in the HAM7 mode and remained readable. I based it on VAG Rounded, and added some Gaussian blur, with an alpha matte of the average colour in the texture graphic. I also added some emboss and distortion to give the impression that the text is wrapped around a cylinder.

Fish BOBs

The fish BOB graphic

To add to the underwater theme I wanted to include some fish swimming in front of the tentacles. This should have been pretty straightforward, using the pre-shifted method I described earlier, but the pain point was in restoring the background given the underlying feedback effect. As we have no ‘clean’ background image to restore, we first have to copy the current state of the bounding box to a temporary buffer, to be restored on the next frame. This all adds to the blit heaviness of this part, and limited the number of bobs we could achieve.

One unexpected benefit of the control word pattern is that we can deliberately misalign the data to swap the red and green channels. We take advantage of this to get a free colour variant from the same bitmap data. This is lucky, because the amount of data required for this effect is starting to get a bit out of hand!

Moai Sprite

As part of the transition into this part I used sprites to cover the left edge of the buffer as it scrolls into view. The use of Moai statue was a reference to Ham Eager.

Similar to the colour cycled sprites in Inside the Machine, the idea was to simulate a moving light source. This was achieved by rendering the statue in Blender, with the greyscale value mapped to the surface normal. Then when the colour palette is cycled it gives the appearance of changing the light angle.

Different palette states showing simulated lighting direction

One challenge in this was that we lose one of the sprites when setting up the screen for hardware scrolling with additional data fetching, and we really wanted all available sprites for a 64px wide attached image. The solution was to initially use a standard screen setup and only switch on the extra fetching once one of the sprite pairs was out of view. This was complicated by the fact that we need to adjust the scroll offset to compensate for the switch, and on AGA we’re actually fetching an extra two words due to the fmode setting.

Summer of Squid

Original Atari image

The girl with the squid was originally supposed to be used for an Atari production, but Optic then released it at GERP 2022 in a graphics compo, where it got second place.

I took the image and extended it to 256 pixel height and even added a fluffy cloud :D

Slightly higher

It stayed this way for almost a year, but in the end Optic finally gave into my nagging, reworked the background and some bits to make it fit better to Part II with the lighthouse:

Final image

The image at the end of Part I has been also placed there to allow disk loading of the next music (97 KB compressed), the track for Part II.

To save a couple of bytes the graphics is stored in chunky format.

Eyes

There is an alternative way to directly get to Part II and that’s by holding down the left mouse button while booting. To avoid long loading times with nothing, I’ve spent 0.5 KB of disk space on six sprites as an homage to all the mindblowing Fairlight C64 productions that we got the last couple of years.

It somehow also became a running gag for Fairlight to have Soya’s three guys from the Eyes demo reappear in almost every production since.

Soya's eyes

Have some free eyes peeking at you while loading. I would have liked to have it somewhere more prominent, but that’s what it is.

Part II

This intro works the same way that Part I did, just with a different direction of zooming and the long speech sample is already in memory while running.

Bacon for all

Magic would just skip this part with the famous quote by Mrs. Baerbock, but it does in fact have a purpose: It bridges a gap of about five seconds of otherwise black screen. While running, it decompresses the previously loaded music and loads the beacon part (107 KB) in parallel. The disk gauge is then at about 60%.

The decrunching of the beacon part won’t happen in parallel, so there are a few frames of delay before the beacon part kicks off with the new music. A short moment to breathe.

The top-bottom memory allocation scheme sometimes gets complicated; it helps to write down what’s happening. Here are some comments from the source code:

; this is brain-twisting, so let's write down the memory allocations:
; summer is located in low memory
; new music is loaded ALSO to low memory (fast), pushed
; part2 is preloaded to high memory
; summer ends, frees low memory
; part2 is decrunched into low memory
; part2 prehook also pushes low memory
; part2 prehook pops high memory, decrunches music to new high memory allocations
; part2 executes (low memory)
; beacon is preloaded to low memory and decrunched to high memory
; before beacon executes in high memory, low memory is pop'ed (three times)
; bars is fully preloaded to low memory

Memory allocation is still something that makes linking the parts with PlatOS a bit more of a hassle than it could be.

The Battle for Bacon

mA2E delivered an amazing soundtrack that sets the pace for Part II, putting a strong contrast to the slow, melodic and soothing Part I. He composed the track without actually knowing how the demo would look like and the first version of the track was rather short (2:18).

The name is probably a reference to Optic trying to eat up all the bacon at the breakfast buffet at the Majoren hotel in Skövde. This achievement has not yet been unlocked, they always seem to have more bacon in the kitchen.

Around April 2025 I started drafting the “storyboard” for both parts and tried figuring out the right order of the effects and how long they should stay on the screen, carefully listening to the cues inside the tunes. It wasn’t that easy because it wasn’t clear if some effects would make it into the demo or not, and how much time needed to be taken into account for transitions (which ones?) and for precalculations, loading and decrunching.

That included Gigabates’ texture mapper and the bilinear zoomer (a kill candidate) plus two more effects that ended up not being used. But even without these, 2:18 was not much to work with.

mA2E added some ending patterns for the bizoomer and 22 seconds for the texture mapper very late in the process and there was a bit of confusion during this process, so the last changes to the music were actually dropped.

In the end, it worked out okay, but it’s really better to have the storyboard set up before composing the music. That’s no guarantee for anything, of course, and I’ve no idea how this could work for tech demos (next time?).

Beacon of Hope

If you remember the first scene in HAMazing with the swinging lamp and how difficult I said it was to pull it off, then let’s hear some good news first: Once I gave up on making this lighthouse scene in the same 1x1 pixel way as in that demo, a 10-ton paperweight was lifted from my shoulders (and also code that just kept crashing and was untouched for months). Best decision in April 2025.

In the HAMazing tech-tech babble, I wrote in the “This sounds awfully complicated” section that I wanted to try out the HAM7 mode next time. So here we are, making ugly, blurry graphics with only four bitplanes. Except that I really tried to get the quality up as much as possible.

Similar to the Tentacles part, the screen is using a fixed pattern for modifying red, green and blue components, and not using any index colour, removing the need for any fringe fixing.

But how to choose the right pattern for R, G, B that fits nicely into a repeating 16 bit word? There is something called Superpermutation, that contains all the possible permutations of n symbols in it as a substring.

There are six permutations of three symbols and a superpermutation of R, G, and B would be RBGRBRGBR.

Superpermutation by IntegralPython (CC-BY-SA)

In this example, there are only two greens, but three reds (not counting the last one) and blues. This means that there is a distance of max. 4 pixels between each green component and between 2 to 3 pixels between blue and red.

You will immediately say: “Well, that’s not a good choice.”

And you’re right – we won’t be using this pattern. Instead, we will swap the blue and green components from the example, which gives less blue but more green changes, being better suited for the visual response of the human retina. (However, in the end, it probably depends on the image itself what looks best!)

The pattern repeats after eight symbols, which is perfect for a 16 bit word. We can also change the starting position a bit in every line (0, +2, +3, +1). I think Gigabates used eight different positions (a diagonal shift) before the pattern repeats, but I chose only four, but have it more “random” (or bit-reversed?).

Combining different patterns

By introducing a second rendering of the image (with the leftover dithering), flickering quickly between both of them, and using a different superpermutation, we get more pixels to change their RGB values and a seemingly higher resolution.

When creating the figure I noticed that not using a pattern shift of +7, +5, +4, +6 for the second pattern but keeping the 0, +2, +3, +1 from above (see “alternative?”) I might have gotten a better colour representation. But now it’s too late :P

Optic painted this original 24 bit true-colour graphics before I ruined it by downconverting it to 12 bit with the HAM7 patterns above. Really beautiful!

Beautiful original graphics

If I am ever going down that rabbit hole of doing some Arcade-style game conversion with huge colourful bobs running on an A500, I might be trying this mode. Only having to blit four bitplanes as in any 16 colour seems ideal (you still can use 15 colour sprites for the main character, though?).

Keeping several different copies of bobs (see below) in memory doesn’t sound so nice though. Baconfighter II. You heard it here first.

TV teaser

Black and white TV noise

The part starts with a slightly tinted black & white version of the graphics. It works by just switching to regular 16 colour mode. Look at these nice crosshatch patterns created by the superpermutations!

The palette is modified in a similar way as with the flashing squids, smoothing out a 2D longitudinal wave that defines whether to use a colour from a brighter or darker palette.

There is a running bplcon1 distortion that is actually a reference to the first effect in Ham Eager.

Glitching is applied by the CIA music timer modifying the bitplane modulo registers.

All that glitching gets less before we switch to HAM7 mode.

The lighthouse

Shine all your love on me

By drawing a polygon and filling it, we get a mask that we can use to cookie-cut graphics from a second bitmap that is brighter and has the demo title baked inside it.

Originally, the title should have appeared word by word with every rotation, but we didn’t have the disk (and memory?) space left for that.

There is a manual, non-DMA sprite on the left extending over the whole height to mask out a couple of pixels that would look odd because they would start from black and need some pixels to reach the final colour.

In HAMazing, the different shades were generated at run-time by halving and quartering the brightness. Here, we use a complete second image (that’s why it takes so much space on disk!), which allows us to have both the hidden logo, a ghost boat, shiny reflective water, and a nice gradient in brightness. The picture below shows how it looks in 24 bit true colour:

The bright version of the image

Some information on the memory use here: The screen is moving a bit up and down so the actual image resolution is 320x200 (31 KB). And because we’re using temporal dithering, we actually are using two versions of each as source (125 KB). Then we have two buffers to render to (62 KB) and two more bitplanes the drawing and filling (16 KB), making the total chip memory consumption for these bitmaps around 203 KB.

Unlike the HAMazing lamp scene, we don’t have to worry about any HAM fringing here. The polygon data and the areas that need updating have been calculated offline via a prototype written for Processing in Kotlin. This means that the rotating beam doesn’t take much raster time at all, except for when it shines directly into the camera.

Lighthouse mask

A cookie blit is applied to the top at light housing to make sure the beam goes behind the lighthouse and is not drawn in front of it.

What to do with all the free raster time?

After Steffest had come up with the pig hero for the Ham Bobs screen, I somehow had that silly idea of having an invasion of pigs.

I painted some tiny pigs and tried to draw as many of them onto the screen as possible, cued to the increasing rate of snare hits.

Steffest thankfully replaced my placeholder with something nicer, which wasn’t as easy because of the low horizontal resolution with the HAM7 mode:

Coder graphics was turned into Steffest graphics

To be able to draw a bob at any position on the screen, it becomes necessary to have the right version of it that corresponds the permutation pattern at that certain x and y location.

Thus, we need to create a lot of different version of the pigs at runtime from the original graphics in 12 bit: 8 horizontal offsets * 4 vertical offsets * 2 different superpermutations plus one mask makes 65 versions of the four animation cells.

Enter superpigs

As it’s dawn, the pigs are usually only visible as silhouettes, except when the beam hits them directly. As they are supposed to be at different positions in the air, the superpigs are shortly lit up when the beam passes them. Drawing only the silhouette is also cheaper in terms of blitting.

It's getting crowded

In the end I was able to draw and restore 40 superpigs, at least while the beam is not really updating much of the image.

The final idea was to have a transition wipe of a really fast and big pig sweeping from left to right directly in front of the camera (yeah, that image did get reused quite a lot!). The first versions had this wipe twice as fast, but that was more like a 10-frames “what the hell was that?” experience.

Pig wipe!

This big blit is split into three parts where the major section can be blit without cookie cutting as a straight A->D blit and some top and right edge where a mask is needed (ABC->D). Because the bob is so huge, there is not enough raster time left to blit all the 40 small pigs and restore the background each frame, so the latter is just dropped, making the pigs slightly larger splotches. I don’t think you noticed.

And thus the beacon part ends, but we’re ramping up the pace.

Between the Bars

HAM Eager had some tinting bars over a lemur picture. Didn’t look that great, but was a proof of concept to get five more hue permutations for free on HAM images (see that section for details).

How far do we get if we try to combine a common line selection effect with the hue shift using the HAM Technology™?

Here’s an early prototype of the idea written in Processing:

Processing prototype

To select a random line, we can either use the bitplane modulos or write each bitplane address pointer separately:

As a compromise, we will set fixed high word addresses of the bitplane pointers and only update the six lower halves of the pointers (leaving some capacity for other copper updates).

Insane with the mem-pain…

This means that each line selectable bitplane needs to stay within a 64 KB “page”.

So let’s calculate. Given a 320 pixel wide screen, we need 40 bytes per line, which leaves us at most 1638 lines in 64 KB. Assuming we are using a 16:9 layout as for many of the HAM effects, we need 180 lines for a background image. The remaining 1458 lines can be used for our bar effect.

Now let’s say we want to have a texture of about 96 pixels height, then we can store 15 different versions of this texture in memory, taking up 1440 of the 1458 available lines.

Sounds like a plan. But wait, how much chip mem would this need? Six bitplanes with 64 KB each is 384 KB of the total 500 KB we can obtain max, and we need 140 KB for the music already. Moreover, we need perfect 64 KB page alignment, but the lowest chip mem area also contains such unnecessary things as the vector interrupt table.

This is not going to fit, is it?

Luckily, it turns out we will not need six times 64 KB, but are able to cram the bitplane 5 and 6 into the same page, which will not even occupy all the page.

The 15 different versions we talked about earlier are just darkened or brightened versions of the same HAM texture. Since the HAM Eager write up, you should know that this kind of manipulation will only modify the lower four bitplanes while the top two are staying intact because they determine whether index colour, red, green or blue modification are selected – and that doesn’t change.

So we only need to store 180 + 96 = 276 lines each for plane 5 and 6, right?

Not quite. Turns out we need one extra (black) line for the in-transition. Okay. Anything else? Yeah. Remember the hue trick that I mentioned in the beginning? We wanted to make use of that!

How does it work again? We need to permutate the R, G, B HAM modification colours, so that e.g. a modification of red turns into green, and green turns into blue, and blue into red. Luckily, all six permutations can be achieved by switching around the contents of bitplanes 5 and 6 and sometimes using the xor'ed version of plane 5 and 6. As a result, we will need to store 2 * 180 + 3 * 96 + 1 = 649 lines in the last 64 KB page – only 26 KB used there.

In the end, we need a total of 282 KB of chipmem for our line buffers. Of page aligned chipmem. Plus about 22 KB for the copperlists, plus some more for sprites and other stuff (also we might want to preload in the next effect, remember?).

If we are not able to use the lowest page of memory, we will still run out of chipmem. But wait! How many bytes do we actually use of our page of lines? (180 + 15 * 96 + 1) * 40 = 64840 – which leaves us with 696 unused bytes!

Luckily, the new framework allows us to use memory from $200 (512) on, which means we can make use of the lowest page without conflict with the interrupt table.

(That’s one of the reasons why the HD version needs 1 MB of chip mem – we cannot guarantee this kind of alignment there.)

Made in the Shading

In the HAM Eager lemur scene, the picture is using sliced HAM, for the best HAM image quality and hence updates 15 index colours per line. If we’re going to use the hue permutation trick for the bars, the index colour palette also needs to be permutated (swapping the R, G, B components respectively).

We cannot update 15 colours and 6 bitplane pointers in the horizontal blank. But not using index colours at all (for the bars) would degrade image quality with bad looking HAM fringes quite a bit (compare this to the kaleidoscope effect in HAMazing, where the texture does not use index colours at all!).

Again we need to compromise: We will allow the background to use 8 fixed colours and 8 dynamic (SHAM) ones per line and the bars can only use these 8 dynamic ones.

The 15 different shades are defined as -9, -7, -5, -4, -3, -2, -1, 0, +1, +2, +3, +5, +7, +9, +11 (from dark to bright), meaning that each RGB colour e.g. gets $999 subtracted (clipped to black).

Creating these (180 + 96 * 15 * 6 + 1) = 8821 lines for the background image, and differently shaded bars with six hues together with their dynamic palettes not only takes time, it consumes a huge amount of memory: 14 copper instructions per line need to be stored if we want a quick update of the copperlist. 242 KB of fast ram are used up for that. (An alternative you might have been thinking of would have been to use copper jumps, then we would only need to update one pointer – but that would require all copper instructions to reside in chip memory and we’re already out of it!)

Updating the screen from a table that holds the address of each line turns into a movem.l feast (for updating three lines per iteration):

btb_update_line_selection_to_out_buffer:
        move.l  pd_CurrLineSelectionPtr(a6),a0
        move.l  pd_CurrImmLinesOutPtr(a6),a1
        addq.l  #2,a1
        moveq.l #(ELLIOTT_HEIGHT/3)-1,d7
.loop
        movem.l (a0)+,a2-a4
        movem.l (a2),d0-d6
        movem.l d0-d6,(a1)
        movem.l (a3),d0-d6
        movem.l d0-d6,COP_INST_PER_LINE*2(a1)
        movem.l (a4),d0-d6
        movem.l d0-d6,2*COP_INST_PER_LINE*2(a1)

        lea     3*COP_INST_PER_LINE*2(a1),a1
        dbra    d7,.loop
        rts

This is the heaviest operation of the effect, taking about 2/3rd of the time during display DMA. But to be fair, display DMA with 6 bitplanes active and 15 copper instructions per line (and the pig sprite DMA) already eats most of the free cycles in that space, so that’s maybe not a good estimate on how much time is spent on it in total.

The potential you’ll be, that you’ll never see

I named the effect after the song “Between the Bars” by the late Elliott Smith, whose (abbreviated) lyrics you can read (or not) for a glimpse. Whether this song is about alcoholism, someone being jailed, or something completely different, it strikes more than just a nerve.

My visit to the Figure 8 (defaced) wall in L.A.

Elliott was one of the greatest singer-songwriters and I owe him a lot (above photo is from my visit to L.A. in front the sadly many times defaced Figure 8 wall). I would have loved to have a cover version of one of his songs somewhere, but it wouldn’t do his music any justice. But check out this fantastic rendition by Brad Mehldau. Goosebumps.

Elliott's lyrics

This water effect section of the part solely exists because the creation of the line and copperlist buffers takes a couple of seconds to finish, and we can’t have that delay.

Because the blitter is used to render the shaded bar graphics in the background, the water line selection effect is completely CPU-based.

Seven pigs walk into a bar…

Seven pigs walk into a bar...

For a long time we didn’t know what kind of texture and background we wanted to have for this effect. It was only with the pig theme manifesting that Steffest drew some pigs being jailed into (or stuck in) a capsule while Optic’s pig sprite looked in despise from the outside. The background is not very colourful for a sliced HAM image, but maybe it doesn’t need to be. Less Dutch coder colours, more style!

Pigs in action

The phong based lighting is done in realtime for the first bar, but once more bars start appearing, pre-calced versions for each of the 1024 positions are used instead (about 60 KB of data calculated during the in-transition).

It therefore shrinks the code to a very short inner loop for drawing a bar:

        moveq.l #0,d1
.copyloop
        move.b  (a1)+,d1        ; lighting
        move.l  (a2,d1.w),d4    ; map via brightness select table
        move.b  (a1)+,d1        ; line select
        add.w   d1,d2           ; increment texture pos
        add.w   (a3,d2.w),d4    ; look up increment in texture
        add.l   d3,d4           ; hue select
        move.l  d4,(a0)+
        dbra    d6,.copyloop

In the end, there was so much free frame time that I could easily add the wavy water effect that occurs when the bars “dip” into the background. It turned out to be more of a chewing gum or goo effect because I’m apparently too stupid to get the math right for 1D expanding waves :P

There were plans to actually have the lighting changed for when the bar was fully or partially “underwater”… but alas, this didn’t happen. Good enough is the enemy of slightly better.

HAMcatraz bars (aka Zeta Flanks Racer)

This is surely not the first version of HAM Alcatraz/Kefrens bars, probably not even the first one using HAM7, but I’m not aware of one. However, it is my first attempt at doing these and also the first one for doing diagonal ones (also, all HAM6 ones I saw had to use two rasterlines to update the bar).

And finally: I’m pretty sure, I never have seen a scroller running on Alcatraz bars before (on any platform?).

“Zeta Flanks Racer”? Well, that’s just a silly anagram to “Alcatraz Kefrens”. Who doesn’t love the Riemann Zeta function and some orbs racing down the flanks of some bacon (yes, we tried a bacon texture, but it didn’t look as good as the flares texture in the demo)?

I will not describe in detail how Kefrens bars work (standard knowledge?), but basically it just updates the very same line over and over with a new bob line blit in sync with the raster beam that is repeatedly shown again and again for the whole screen (negative bitplane modulo). Yes, it’s only a single line in memory. Only Amiga…

Zeta Flanks Racer

HAM7 again to the rescue

Using HAM7 has the advantage of only needing four bitplanes for blitting and display DMA. The disadvantage is that the horizontal resolution with a fixed RGBG pattern is a lot lower (four pixels for red and blue, two for green). We will compensate for that by using temporal dithering by offsetting the pattern by two pixels every other frame.

Using page-aligned texture data, we can store the required four copies (with the right RGBG offset) of the texture of 64x256 pixels in 32 KB of ram. It needs to be the lower 32 KB of a 64 KB page. Otherwise, the blitting address pointers could sometimes overflow and increment the high word address and the rest of the texture would turn into garbage. I learnt that the hard way.

Our bars are 64 pixels wide, so we need to have a blitting width of 80 pixels, which is much wider than what is usually the case. To start the interleaved blit, we use only six copper instructions (we only update the lower parts of the address registers), plus a wait. Each blit is using BC->D DMA channels with A source using a static $ffff value that is masked correctly via bltafwm and bltalwm – this doesn’t make the blit faster, but it adds an idle cycle that is free for other bus participants.

Each line blit doesn’t quite finish in time before the display starts (80 cycles best case for 60 cycles on the bus), but the bitplanes have been arranged in reverse order so that the least significant bits are written last and thus have the least visual impact. When the strip is in the far left part of the image, the distortion becomes more visible. The sprites also sometimes delay the blitting a bit, so it’s good when the pig has left (with) the boing ball.

Zeta DMA view

I think it would be possible for the copper to start filling the registers a bit earlier, so that the first write happens just after the last bitplane data has been fetched. But because of the damn line 256 copper wait problem, you would need to either split the copper list filling blits into two sections or add some nop instruction to keep the number of instructions per line the same. Hm. I should try that, next time?

Actually, I tried it on the spot and it would have been better indeed (sigh!):

This would have been better

You might also notice some copper-driven blits happening in the top part of the frame, and may wonder what’s happening there. And rightful so. You see, the blitter is used to actually fill in the bltcon0 (x-shift), bltcon1 (x-shift), bltbpt (texture with the right RGB offset), bltcpt (x-position) and bltdpt (x-position) of its own copperlist that is driving the drawing of the bar, so the CPU basically only needs to write in the x-positions and texture offsets for each row into a linear array and be done with – the blitter does all the address calculation.

You can also see from this screenshot that the effect only occupies about half a frame. This comes in handy because the following part with the texture mapper needs quite some texture preparation that all happens already here in the background and only finishes a few moments before the part ends.

Minor remark: During the Zeta part, the next two parts (texture mapper and twister) are actually loaded and decompressed into memory.

Have you lost your marbles?

I bet you did not notice that the marbles are actually animated (courtesy of Gigabates). Originally, I had ripped the ball animation from Slamtilt, but Gigabates just pulled out blender and gave it a shot.

Piggy marbles Marble madness

I also bet it did not occur to you that the colour of the marbles change during the part. This is due to the pig sprite using the very same 15 colours. Luckily, I could remap the marbles to the piggy palette in a way where it still looked okay.

All the cute little pigs were “after-thoughts”. Optic just drew a couple of them on his own, and I put them where I found to be suitable. For this screen I asked for a floating one, but that one looked too fearful, so Optic did the boing ball one and the floating pig ended up in the bilinear zoomer.

The pigsheet

The pig needs to move out of the screen just before the first marble reaches its top edge due to sprite multiplexing. It’s a bit relaxed regarding the overlap, though, because the first 32 pixels of the pig in its first sprite area are empty.

I actually missed the opportunity to make a secret mode where the pig would take the original ball colours. Damn. Pigs of colour – are you allowed to even say that?

Remapped pigs

Anyway, it took a lot of fiddling to find the sweet spots where the marbles would still be able to cross the gap, and the same with the last two being somewhat in sync with the music when bouncing off the belt.

Funny anecdote: About a week before the party, I was modifying the trackdisk loading and almost lost my shit when suddenly the marbles, that were expected to cross the gap also fell into the abyss. I never found out why this happened (for a while), because everything should be independent of the disk loading, right, right?!?

The belt after the gap uses the same texture, just with the RGBG pattern selected shifted a bit.

The scroller obviously just writes the letters onto a copy of the texture like a ring buffer.

Diagonal version

Diagonal Alcatraz bars

Just as a proof-of-concept I added a second version of the same effect, but this time made the bars go diagonally. It’s a simple modification that only requires an extra wide display line, the modification of the hardware scroll register bplcon1 and, at the right time, some bitplane modulo updates to get to the next display word.

There is a manual mode black sprite on the left-hand side border to cover a few pixels that otherwise would not look so nice.

Texture Mapper (Extra Bacon)

Texture mapper screen grab

Gigabates again…

The starting point for this part was the texture mapper effect in Inside The Machine. Really it was intended as a way to show off my sort-of-2x2 HAM7 chunky mode, but it took on a life of its own. I chose to use the dodecahedron object as a callback to Meister Polyeder.

The chunky mode

The display mode uses HAM7 which limits us to updating one RGB component per screen pixel, giving a theoretical maximum horizontal resolution of one virtual RGB pixel per 3 screen pixels. Given the awkwardness of a C2P in 3x resolution, the common approach used by myself and others has been to repeat one component and get a much more convenient 4x resolution. Having said that, it was very cool to see that Tarnow was able to make the 3x C2P work in GameOn, their contribution to the recent rotozoomer craze.

I took a different approach entirely though. It finally dawned on me that we don’t have to limit ourselves to ‘full’ RGB pixels at all. We can have a mode where each chunky pixel only sets two of three RGB components. With a HAM7 control word pattern of RGBG, every chunky pixel sets the green component (which is perceptually most important) but alternate between red and blue for the remaining component. Looking at this one way, only the green channel is really 2x2, but the important part is that we’re still drawing double the number of distinct chunky pixels. Doubling the vertical resolution too is of course trivial, other than the obvious performance hit of drawing more pixels.

The chunky format and C2P operation is not too different to my previous implementation for 4x4 HAM7. We still have one word per chunky pixel, albeit with a slightly different bit order, and it just adds one more blitter pass to merge two words (odd and even chunky pixels) into one word keeping the required bits for each component.

The format for our RGB pixel values in our chunky buffer is as follows. The bits are grouped by position, and we include two copies of the green component. [r3 g3 b3 g3 r2 g2 b2 g2 r1 g1 b1 g1 r0 g0 b0 g0]

We could simplify the C2P further by pre-scrambling the pixel order in our effect code, but this really isn’t practical for a texture mapper. We really need a simple format of just sequential words per pixel.

C2P

The C2P is entirely blitter driven, and chains together the operations using interrupts. This is essential, because the blits far exceed the maximum height of 1024 on OCS and have to be split into multiple operations.

The first pass to combine odd and even pixels looks like this.

src:     [Ar3 Ag3 ___ ___ Ar2 Ag2 ___ ___ Ar1 Ag1 ___ ___ Ar0 Ag0 ___ ___]
src+2:   [___ ___ Bb3 Bg3 ___ ___ Bb2 Bg2 ___ ___ Bb1 Bg1 ___ ___ Bb0 Bg0]
src:     [Ar3 Ag3 Bb3 Bg3 Ar2 Ag2 Bb2 Bg2 Ar1 Ag1 Bb1 Bg1 Ar0 Ag0 Bb0 Bg0]

And then we continue with our C2P process, first swapping groups of 4 bits into a temporary buffer. Let’s forget about RGB components now and just refer to output pixel destination A-P. Shifting to the right is done by blitting in descending mode.

Swap 4 left:
src        [A3 B3 C3 D3 -- -- -- -- A1 B1 C1 D1 -- -- -- --] [...]
src+2 >> 4 [>> >> >> >> E3 F3 G3 H3 -- -- -- -- E1 F1 G1 H1] [...]
tmp        [A3 B3 C3 D3 E3 F3 G3 H3 A1 B1 C1 D1 E1 F1 G1 H1] [...]

Swap 4 right:
src << 4 [A2 B2 C2 D2 -- -- -- -- A0 B0 C0 D0 << << << <<] [...]
src+2    [-- -- -- -- E2 F2 G2 H2 -- -- -- -- E0 F0 G0 H0] [...]
tmp+2    [A2 B2 C2 D2 E2 F2 G2 H2 A0 B0 C0 D0 E0 F0 G0 H0] [...]

And then combining swapping groups of 8 bits with copying to the destination planar buffer for each bitplane:

Copy Bpl 3:
tmp        [A3 B3 C3 D3 E3 F3 G3 H3 -- -- -- -- -- -- -- --] [...] [...] [...]
tmp+4 >> 8 [>> >> >> >> >> >> >> >> I3 J3 K3 L3 M3 N3 O3 P3] [...] [...] [...]
bpl3       [A3 B3 C3 D3 E3 F3 G3 H3 I3 J3 K3 L3 M3 N3 O3 P3]

Copy Bpl 2:
tmp+2      [A2 B2 C2 D2 E2 F2 G2 H2 -- -- -- -- -- -- -- --] [...] [...] [...]
tmp+6 >> 8 [>> >> >> >> >> >> >> >> I2 J2 K2 L2 M2 N2 O2 P2] [...] [...] [...]
bpl2       [A2 B2 C2 D2 E2 F2 G2 H2 I2 J2 K2 L2 M2 N2 O2 P2]

Copy Bpl 1:
tmp << 8   [A1 B1 C1 D1 E1 F1 G1 H1 << << << << << << << <<] [...] [...] [...]
tmp+4      [-- -- -- -- -- -- -- -- I1 J1 K1 L1 M1 N1 O1 P1] [...] [...] [...]
bpl1       [A1 B1 C1 D1 E1 F1 G1 H1 I1 J1 K1 L1 M1 N1 O1 P1]

Copy Bpl 0:
tmp+2 << 8 [A0 B0 C0 D0 E0 F0 G0 H0 << << << << << << << <<] [...] [...] [...]
tmp+6      [-- -- -- -- -- -- -- -- I0 J0 K0 L0 M0 N0 O0 P0] [...] [...] [...]
bpl0       [A0 B0 C0 D0 E0 F0 G0 H0 I0 J0 K0 L0 M0 N0 O0 P0]

3D Maths

Nothing too out of the ordinary here. For the rotation matrix I used the classic '6 muls’ optimisation with pre-multiplied pairs. For backface culling, I decided to try out the normals-based method, rather than the usual cross-product z sign. It involves transforming the camera vector into object space, calculating the dot product with the untransformed normal for each face, and comparing to a constant. In hindsight this was unnecessary. The advantage with this method is that you can flag the vertices for visible faces and only transform those, but I didn’t actually end up doing this!

Texture shading

The flat shading effect is achieved by picking from a range of pre-faded texture variants. These are generated in precalc from a single original texture, stored in 7 bit RGB, and we apply some simple error diffusion dithering. Where this relies on lookup tables to get the faded RGB values for each level, I was able to take advantage of this to bake in a gamma curve and an ambient level. By altering the ambient level between channels, I was able to introduce a hue shift, so colours get cooler as they get darker. This process also takes care of rearranging the bits into our custom chunky format.

Lighting

The moving light source is stored as a normalised vector, and we calculate the light intensity using the dot product of the normal of each face. Like the backface culling, rather than transforming the face normals, we translate the light source vector into object space.

I also reduce the light intensity as the object gets further from the camera, as a kind of fog approximation: intensity / (z + CAMERA_Z). The final intensity value is shifted and clamped to give us a texture index for the desired brightness.

The light source itself (the firefly pig!) is rendered as a sprite, picking a scaled version corresponding with its Z distance. A few caveats with the sprite were:

Drawing the triangles

The triangle draw routine i.e. the actual texture mapper adds some optimisations and accuracy improvements to my previous code but the basic approach is still the same. It’s an affine texture mapper which uses pre-stepping and self-modifying code to generate a single urolled loop capable of drawing the widest span of the triangle. It then jumps into this code for each line, with the jump offset determining the width of the span, meanwhile interpolating the initial x position and texture offset. Vertical clipping of the triangle is applied in this routine by just skipping out-of-bounds rows.

Buffering

Once again, the big trick to make the object appear as large as possible is the use of cyclical buffers, and the zooming in/out motion. The object you see at its most zoomed in state is way bigger than we could actually render is continuously, but we take advantage of the ‘easy’ frames where the object is smaller to fill up the buffer in advance. I also adjust the screen dimensions to the minimum required for the object’s bounding box, reducing the DMA usage, as well as the overhead for C2P and clearing.

HAM / HiRes Twister

I have a confession to make: This is my first twister. It was on my bucket list, and finally I can tick it off! Even twice, now that there is both a HAM twister and another one at high resolution – not vertically separated, but just next to it.

Get two for the price of one

The hires twister was a bit of an afterthought. For a long time, I only had a static hires single bitplane image (of a pan with bacon) to the right. But once everything was working, I wasn’t quite satisfied with it and… uhm… I needed something to compensate for the precalc time of the big HAM texture.

Twister and bacon!

Twist the memory into shape

There’s a bit more to this part than on first sight. Twisters usually are line select effects, where you use the copper to pick different display line with some precalculated graphics that usually have a rotating bar in it. We have discussed the Between the Bars line selection effect before, so many of the things apply here, too.

128 rotation steps are usually enough for a smooth rotation. When I calculated the memory requirements for the twister, I tried to stay on the safe side and only allocated 128 KB of page-aligned chipmem for it – instead of 256 KB. This left subsections of 16 KB for each bitplane to arrive at 1024 lines total.

1024 / 128 = 8 meant that I could get eight lines of textures or hues. That would have not been much of a variation, would it? So we again do the hue trick. This means we need to store six bitplanes plus the xored version of plane 5 and 6. This leaves one 16 KB area for other stuff, which we will use for both the background image and the hires twister.

The memory for the bitplanes is organized in this order (the first number is the relative offset to start of our 128 KB allocation):

The reason for this odd layout is not obvious at first. We only want to update the lower word addresses of the bitplane pointers to keep the down to as few commands as possible, we need to make sure that our bitplane pointers don’t increment past the last word of the page – otherwise we will be left with the wrong high word address from time to time. I tried using negative modulo values, but that didn’t do the trick because the modulo is only applied to activated bitplanes at the time of the last fetch and at the end of the line, only plane 1 is still active.

So this works for all planes except for bitplane 4 pointer which unfortunately can overflow beyond $fffe and thus needs an extra write to its high word pointer every line (no, we can’t swap bitplane 1 and 4 between the pages).

The 8 KB for the hires bitplane data at a width of 336 pixels is enough for 195 lines of data (actually only 194 because I need one empty line to be able to slide in the twister).

This was enough when the effect was designed to be 320x180 in 16:9 format. But after almost everything was finished, including the pig placement and the timing, I thought: “Hey, let’s try to make it full-screen, there’s plenty of raster time left”.

I didn’t think much about the mere 194 lines of buffer when going from a height of 180 to 256. In the end, I had to mirror some of the top and bottom lines, but I’m not sure if you even noticed.

Mirror, mirror...

The split

So this is clearly not the first time someone has done a horizontal screenmode split on the Amiga – there were early intros doing that kind of trickery and experiments surely more than 35 years ago. I had used this first in G. Rowdy, where I needed to toggle between 32 colours and EHB screenmode to save some memory (and also trigger some emulation bugs ;) ).

So the HAM to Hires split is all about timing and experimenting with the right order of register writes. We don’t want to get artifacts while we are updating bplcon0 and bpl1pt for the background and also bpl2pt later for the hires twister.

Let’s look at some of the copperlist. Here’s some initial setup which sets the high pointers and some colours:

000382b0: 0108 ffd6  ;  BPL1MOD := 0xffd6
000382b4: 010a ffd6  ;  BPL2MOD := 0xffd6
000382b8: 00e0 0004  ;  BPL1PTH := 0x0004
000382bc: 00e4 0004  ;  BPL2PTH := 0x0004
000382c0: 00e8 0004  ;  BPL3PTH := 0x0004
000382c4: 00f0 0005  ;  BPL5PTH := 0x0005
000382c8: 00f4 0005  ;  BPL6PTH := 0x0005
000382cc: 01a2 0234  ;  COLOR17 := 0x0234
000382d0: 01a4 0dde  ;  COLOR18 := 0x0dde
000382d4: 01a6 089a  ;  COLOR19 := 0x089a
000382d8: 01aa 0345  ;  COLOR21 := 0x0345
000382dc: 01ac 0eef  ;  COLOR22 := 0x0eef
000382e0: 01ae 09ab  ;  COLOR23 := 0x09ab

The set of copper instructions for every line looks like this:

000382e4: 2bdf fffe  ;  Wait for vpos >= 0x2b and hpos >= 0xde
                     ;  VP 2b, VE 7f; HP de, HE fe; BFD 1
000382e8: 00ec 0005  ;  BPL4PTH := 0x0005
000382ec: 01fe 0000  ;  NULL := 0x0000
000382f0: 00e2 0c00  ;  BPL1PTL := 0x0c00
000382f4: 00e6 4c00  ;  BPL2PTL := 0x4c00
000382f8: 00ea 8c00  ;  BPL3PTL := 0x8c00
000382fc: 00ee cc00  ;  BPL4PTL := 0xcc00
00038300: 00f2 0c00  ;  BPL5PTL := 0x0c00
00038304: 00f6 4c00  ;  BPL6PTL := 0x4c00
00038308: 0100 6a00  ;  BPLCON0 := 0x6a00
0003830c: 2c75 fffe  ;  Wait for vpos >= 0x2c and hpos >= 0x74
                     ;  VP 2c, VE 7f; HP 74, HE fe; BFD 1
00038310: 0100 0200  ;  BPLCON0 := 0x0200
00038314: 00e2 e540  ;  BPL1PTL := 0xe540
00038318: 0100 9200  ;  BPLCON0 := 0x9200
0003831c: 0112 0000  ;  BPL2DAT := 0x0000
00038320: 0100 a200  ;  BPLCON0 := 0xa200
00038324: 00e6 dfe0  ;  BPL2PTL := 0xdfe0
00038328: 2cd1 fffe  ;  Wait for vpos >= 0x2c and hpos >= 0xd0
                     ;  VP 2c, VE 7f; HP d0, HE fe; BFD 1
0003832c: 0100 9200  ;  BPLCON0 := 0x9200

So the line for the HAM twister is selected by writing all the low words for the bitplane pointers and then enabling HAM mode in bplcon0.

Then it waits until the right spot and turns the display off, sets the new bitplane 1 pointer, turns to hires mode, but one plane only. Then it clears bpl2dat to avoid garbage from the previous content of the shift register, turns to two hires bitplanes first, and selects the line for the hires twister part on the right side. It waits for the right hand side, then turns off the second bitplane again, so we won’t get garbage from extra data, since the twister ends before the right border and end of bitplane DMA.

Because the copper is delayed by bitplane DMA, it’s not easy to get it right. You cannot write bitplane pointers that are being updated (incremented) by the bitplane DMA fetch logic at the same time slot either (the write would just be ignored), so not every order works as expected.

Believe me, I tried a lot of different combinations until I ended up with this. And that’s okay, because it’s all about exploring things. I’m glad that WinUAE is nowadays pretty accurate and the end result is the same as on real hardware.

Please mind the gap!

Changing bplcon0 has immediate effect (I think), even if there is still data to be pushed out from the previously loaded bplxdat shift registers – therefore the HAM twister is actually only about 120 pixels wide regarding the visual content because for the last seven pixels, HAM mode is already disabled and there would be garbage on the screen.

Here’s an example not from the actual demo, but slightly modified so the texture is offset by eight pixels to the right, so it shows where the garbage (the red arrow) would be visible. To me, it looks like it displays some pixels of the upper 16 colour indexes (sprite colours).

Some things explained

This screen also shows something less obvious: Some of the dithered background is drawn with sprites, so we can bridge the 32 pixels gap of blackness that we would get otherwise. Actually, we will use 48 pixels of sprites to make it overlap with the HAM twister.

The ordering of the sprites is a bit weird: Three sprites are used, the left-most is using number 6, the middle one number 7 and the right-most one is number 5.

Any idea why this was chosen this way? Unfortunately, the Amiga hardware allows only to specify up to which sprite pair the sprites go in front of the bitplanes. The rest of them go behind the graphics. And if you look closely, we want the HAM twister to go in front of the background sprite pixels, but the pigs and the panels should be displayed on top of the HAM and hires twisters.

Thus, we define that sprites 0-5 are on top of the bitmap graphics and sprites 6/7 (lulz!) are behind. That sprite 5 is drawn behind the gfx doesn’t matter much here as the gap does not have any graphics that could go in front.

Luckily, the sprite priorities between sprites is that lower sprite numbers are drawn on top of higher sprite numbers, so the panel sprites (0-5) appear nicely upon the sprite background.

There’s a minor problem with the sprite content: The background graphics are dithered in hires and sprites are always lowres in OCS/ECS. This means the dither pattern gets coarser in the sprites.

The background

This required to generate two versions of the monochrome dithered image: one as hires (third region) and one as lowres (left-most). To make it less obvious, that there is a resolution change, I manually tried to double up a few pixels even in the left-most part of the hires image so that there is no clear vertical cutting point of resolution change.

More piggy fun

I haven’t yet explained who our guest pigs are in this part: Piggeldy and Frederick are childhood figures by Elke and Dietrich Loewe, shown in the public TV in Germany in the 70ties and 80ties, and sometimes still broadcasted today (see here for the YouTube channel). Piggeldy would ask his older brother a question and Frederick would try hard to explain these things to him.

Although this is a complete design break from Optic’s pig artwork, these characters seemed appropriate for a funny reference: A tribute to the legendary Deep - Psilocybin Mix demo from 1995 (with a soundtrack that blew me away then and still does!), where some women “discuss” an environment-mapped spike ball effect.

Piggeldy and Frederik

The two characters seem ideal for some low-colour sprite action. Piggeldy is 80 pixels wide and requires five sprites, but Frederick is a bit larger with 96 pixels width, requiring six sprites. However, if you look at above image, you will surely notice that the three colour sprites do not look like that in the demo.

Instead, they look like this:

Six-coloured sprites

Did we invent six-coloured sprites on the Amiga? Of course not. It all boils down to modifying the colours for the paired sprites (you can compare this to a very tame version of the recolouring done in Magnum A.I.)

Here, only one colour is changed (the brown, pink or white part). This image shows the stripes of different colour allocations a bit better:

Pre-bacon stripes

Notice how Frederick’s sprites are paired differently to have the eyes in a different pair to allow them being white, because the middle body section does not yet use this third colour at this height.

Frederick will be using sprite number 6, too, which means we need to be switching sprite priorities in the lower part of the image so his ham part (hehe!) won’t be hidden by the hires background.

There is one more little thing: the dither sprites in the background will need to have their own grey colour. That’s no problem for the pair 6 & 7, because they end before Frederick’s ass begins, so we can swap colours in time. However, sprite 5 covers the whole height, and sprite 4 (pig) and 5 (dither) usually share the same palette – except when we use attached sprite mode (15 colour mode), but make sure that sprites don’t actually overlap. This moves the sprite 5 dither colour to index 20, which is not used by 3-colour sprites and sprite 4 will use the colours of sprite pair 0 & 1.

I apparently forgot about the “do not overlap” part so there are actually sometimes dots shining through the panel, something I only just noticed.

I could have painted Piggeldy’s eyes with a different colour by superimposing the 6th sprite – but it looks more psilocybinic with the HAM twister shining through.

The textures

We have two textures, one high resolution (256 pixels wide) monochrome one and one true colour one (120 pixels wide). What you can see here below is the 24 bit output of a Kotlin prototype program:

Twister texture

The monochrome texture is stored as chunky luminance values with reduced bit depth so it packs well. Then, blue noise is applied to the texture stretched to double width before it’s cut off via threshold to generate two different copies of one bitplane graphics per line (256 lines in total). These copies are alternated every second line to reduce the weird effect of vertical lines when the rotation value stays the same.

Monochrome texture

The HAM version is created run-time from the 128 lines of texture data stored in HSV format. Actually, it’s using not HSV but a derivative that allows faster calculation.

Why in HSV and not in RGB? Remember that we need to generate eight different hue shifted copies of the same texture and calculating a hue rotation is obviously easier to do in HSV representation than in RGB.

The third generation of my realtime RGB to HAM conversion routine is now amazingly fast. It also includes index colour picking and (approximate) best HAM pen selection. The latter part is table based, but the 32 KB table compresses down to a mere 677 bytes.

The 1024 lines are HSV->RGB->HAM converted while you are watching the monochrome twister spinning around. That’s the actual reason for the build up with the thin line spinning twister and then the monochrome one: Having enough time to calculate the main effect.

Here’s a memory rip of the HAM texture (via HRTmon) with modulos set so it appears vertically squashed by 1/4th, to allow you to see all eight hue shifts.

Squashed HAM texture

If you looked closely you should have noticed that the 128 lines rotation is of course not enough to make a twister seamlessly rotate around its sides, because the colours rotated in from one side do not match the ones rotated out. This means that if there should not be a visual break of the pattern, the hue needs to be transposed on every 90° rotation. That’s where the six RGB permutations come in again.

Both monochrome and HSV textures take only about 12 KB disk space total, so while it would have been possible to do the all the lighting math in code, I refrained from doing so.

The build-up

The twister starts out with some sort of rough stepping to have a bit of a different effect than a normal twister. It looked a bit cooler on a less “crowded” texture.

Stepped

Then it smoothes itself out to one pixel rotation accuracy. Each side still has one hue tone.

Smooth bakorator

And finally, it adds some hue shifting within the displayed sides for full colour blasting.

Going full ham

In total, we get a dual line selector effect with 6 * 1024 + 256 = 6400 lines as promised. Unlike to Dope!

That’s sort of all the secrets behind the Twister. I hope you could appreciate all the tiny details.

Ham Bobs

'tis is how it looks like

Remz, author of the great Hamulet game, had created this demonstration of large animated HAM bobs where all the bob graphics, while animating, had pre-baked HAM defringing into the animation frames. Visually super impressive, but the bobs cannot not move and only update one bob per frame sequentially.

So I was thinking of taking it a step further and moving some big HAM bobs over a HAM screen in 50 FPS real-time. How hard can it be?

Early developments

The Ham Bobs were actually the first effect written down into code. Here’s a screenshot of some early version (June 2024) with test graphics:

Early prototype

This one was not scrolling the main screen around and yes, the smaller bob is showing off a basket of TRSAC ducks ;)

To make things more complicated (and dynamic) I enlarged the background and made it move (uses the colour 0 trick in the left border): Here’s a prototype with different graphics.

Prototype graphics

I’m glad Steffest came up with and painted this “Piglet of Hope” image a year (!) later…

Piglet of Hope

…that I grabbed from his warm hands and downsized it to a resolution of 384x216:

Cropped pig

It’s not as colourful as the old test graphics, but it is clearly visible that it’s more than just 32 colours.

Intro transition effect

To introduce the image, I created a transition effect that is like the cheap copper repeating lines effect, but instead of doing it only in vertical direction, I did it in both directions, which is of course, not possible with the copper alone.

Intro transition

Optic drew this extra large spiderpig after begging him nicely :) Because the screen is using hardware scrolling at full 320 pixel width, only 6 DMA driven sprites on OCS (and 7 on ECS) are available, but the spiderpig has a width of 64 pixels, so we would need all 8 sprites.

This was solved by manually loading the sprite data for the two last sprites, which was not so straight-forward with the dynamic copperlist that was necessary for the flood effect and the vertical movement of the sprite at the same time (fun to watch in the DMA view).

The left and right borders are (incrementally) painted as a plain block of some index colour. The index colours are then changed with the copper per line according to the true colour information of the HAM graphics. This 12 bit true colour information is (as you maybe know from a lot of prior HAM effects) calculated from the HAM6 image and stored in fast ram (162 KB). As this calculation takes some seconds, it’s done during the Twister part.

This part is fairly low on complexity and only takes 1/3 to 1/2 of the frame time.

Main Ham Bobs effect

Once we go to the main effect screen, an extra pixel column appears to restore the first pixel of the moving image (321 pixels wide screen).

There is actually not so much magic going on. If you closely observe the shape of the bobs in the different prototype stages, it turns out they are all completely convex:

Bob masks

Steffest’s first attempt to draw a new bob was not quite correct (still very cute!) as Gigabates nicely pointed out to him with this diagram:

Inconvex Duckling curtesy of Gigabates

The next version was a BIG improvement (posting the full size glory here) and fully convex:

Convex hero duck

We use a slightly downsized version of 143x98 pixels for the big bob and the small bob (a carrot rocket rabbit) is 80x59 pixels. Together with the piglet they are the “Team Hope” in the Bacon Part. I know, this is getting a bit weird now…

Convex Hero Duck 143x98 Carrot Rocket Rabbit 80x59

As always, that convexity restriction is part of the trickery. To draw a bob, we just draw the left-most pixel with an unused index colour that will be used to fix the left-hand edge of the bob and blit a right-hand side index colour ridge to fix the right border of the graphics after the blit. Note that these borders are fully baked into the graphics while blitting, so this is identical to a standard cookie-cutting bob blitting operation, but using six bitplanes.

Without fixing the colours every line but assigning static colours of magenta, blue, red and green, this would look like this:

Without fixup

That’s a standard technique that I already used in HAM Eager and HAMazing and many effects were based on this.

The left-hand edge colour information for every line comes from the 12 bit true colour information stored for the left edge of the bob. That’s easy because it’s always the same for each bob, just the y-offset inside the copperlist changes. Stored in chipmem, this means we can blit it directly into our copperlist at the right vertical position.

For the right-hand edge, the colours are supposed to be restored from the background. This is a bit more tricky.

Let’s start with the big bob: The right edge is a bit wonky and follows the shape of the bob. So it needs to pick the true colour information from the background with the right y-offset and x-offset.

This is solved by some speedcode that has been pre-generated by the HAM converter program, that calculates the correct combined offsets inside the true colour buffer:

hbb_big_bob_right_row_0_code:
        move.w  pd_BigBobYPos(a6),d0
        mulu    #HAMBOBS_WIDTH,d0
        add.w   pd_BigBobXPos(a6),d0
        add.l   d0,d0
        move.l  pd_TCBuffer(a6),a0
        adda.l  d0,a0
        move.l  pd_RightRow0Buffer(a6),a1

; Offset last row 0
        move.w  188(a0),(a1)+
        move.w  958(a0),(a1)+
        move.w  1730(a0),(a1)+
        move.w  2506(a0),(a1)+
        move.w  3282(a0),(a1)+
        move.w  4056(a0),(a1)+
        [...]
        move.w  32476(a0),(a1)+
        lea     $7ffe(a0),a0
        move.w  484(a0),(a1)+

As the true colour buffer is 162 KB, and each line is 768 bytes wide, it will need to increment the base pointer by about 32 KB from time to time to retain relative addressing possibility (yeah, that could have optimized further slightly now that I look at it).

pd_RightRow0Buffer is a linear buffer in chipmem that will be blitted with the right modulo for the target copperlist later.

But there is one more case we have to tackle: What should happen when the small bob (rabbit) is overlapping with the big bob (duck)?

Overlapping

If you look closely at the green line of the rabbit’s right border, it will sometimes touch the duck and sometimes not. If it’s not touching, everything is fine, the colour will be picked from the background as usually.

In the former case, however, the pixel needs to obtain the colour from the duck to restore the correct HAM colour without fringing – this means we need to store a true colour representation of our big duck bob, too!

Finding the overlap region requires a bit of not-so-hard clipping math (but it needs to be pixel perfect, no sloppy off-by-one situations allowed!), and comparing the pixel position of the right ridge of the small bob with the left and right borders of the big bob can be a bit finicky to get right. But it worked in the end and that’s what counts.

This overlap calculation adds some variable CPU time to the frame – which is the reason for having a bit more of a relaxed frame time here: For a short time span, the rendering a frame may take a bit more than 100%, but not for too long in succession.

To not make things even more complicated, I left out the case where the bobs leave the screen on the left-hand side of the screen. It’s not impossible to fix this situation, but the calculation is complicated and there’s no CPU time left. So I rather made sure that the movement of the screen works nicely with the bobs never leaving on the left side – the right side does not pose such a problem.

Restoring the background of the bobs is also done in an optimized way so that there is no overdraw happening (that’s why only the squid is jumping to the musical cues at the end, because it’s a 48 pixels wide sprite – I had forgotten that I could not freely move the HAM bobs from one frame to the next).

TLDR; Can we get some HAM games now please?

No, unfortunately this is not stuff that can be applied for games.

First, for every HAM bob drawn you will need to reserve two index colours for fringeless restoration of the graphics (simplified). As there are only 16 index colours max, and you probably don’t want to use the background colour for this, only up to seven bobs could be drawn this way.

Also, the convexity property of the outline is something that drastically limits what you can do with the bobs.

Finally, you already saw that two overlapping bobs already poses quite some computational overhead, getting arbitrary overlaps working is much more complicated.

And finally-finally, storing true-colour representations of all your graphics may pose some practical limitations on your available RAM.

Bilinear Zoom

We are falling!

With all that zoomer-rotator craze, why not ignore the trends and do something different? Especially because the first implementation was already done during the summer of 2024 (but yes, it could be turned into a bilinear rotozoomer, too!).

When zooming in on a texture, there are multiple ways to do it. What we usually get is some nearest neighbour algorithm, where the colour information will be just repeated and get more “chunky” the closer we get:

Nearest neighbour zooming

Linearly interpolating the colour via fractions from neighbouring pixels is of course a very straight-forward algorithm. You need to take four pixels into account and the fractional coordinate between them to determine the final colour. Because the interpolation needs to be applied in two dimensions, it’s called bilinear interpolation.

Bilinear zooming

We have seen zoomers on high-end AGA demos (but right now I’m having trouble finding examples of those that really do interpolate). However, it’s an entirely different beast trying this on a 68000 with 7 MHz with a damn slow multiplication instruction with worst case execution time of 70 cycles and trying to do this on 12 bit true colour information.

The naive algorithm would do eight multiplications per pixel and colour component. For an 80x45 target resolution that’s 80 * 45 * 8 * 3 = 86400 multiplications per frame (where we can do about 1000 multiplications per frame). And that’s actually a reaaallly small on-screen resolution (the first implementation actually used a copper-chunky display with only 40x45 pixels before it was replaced with Gigabates’ HAM 4x4 C2P very late on).

Interpolation on 68000

So that huge number of multiplications is quite far out of reach for 68000. Therefore, we will need to use some table lookups to speed it up.

The normal formula v = a * x + b * (1 - x) can also be changed to v = a + (b - a) * x to halve the amount of multiplications. However, this introduces an extra bit because the term (b - a) is now a signed value that can go in both directions. When applied to a 12 bit RGB value, this would expand the range to 15 bit (and the difference between two packed RGB values is not so easily calculated). Instead, we will stick to the first formula.

We can build a table with the 12 bit RGB value and use the upper bits for the fraction x that we want to multiply the value with. 1 - x is simply calculated by inverting the bits of x.

Our result is again a 12 bit value, so we will need two bytes to store it. Because we can’t use the fancy 68020+ addressing modes and want to avoid the more complicated shifting around for table access, we will limit ourselves to even addresses for accessing our table. That limits us to only 3 bits for the fraction in the lower bits of our 16 bit word and 64 KB table. Thus, the index to the table looks like this in binary notation: %rrrrggggbbbbxxx0

3 bits (8 steps) of fraction seems okayish given that our RGB values are only 4 bit per component ([0 - 15]), too. Thus, only with differences of more than eight steps between two pixels, we might get some harder gradients than the Amiga would allow.

(The first versions of the zoomer used 4 bits for the fraction, but only 3 bits for the blue component (%xxxxrrrrggggbbb0). That was a bit more noticeable than when using full 12 bits RGB and only 3 bits for the fraction.)

The usual approach is to first linearly interpolate everything in one direction (either horizontal or vertical), store the temporary values and then linear interpolate these temporary values in the other direction (there is a minor accuracy loss involved).

Depending on the scaling factor, this will change the number of input texels that need to be interpolated in one direction. Unfortunately, it doesn’t change the number of texels to get to the final target resolution, because the immediate data generated from the first interpolation pass will blow up the lines to full dimension and the final result will have to be interpolated from the full ranges in both directions.

To verify that this approach works, I programmed a prototype in Kotlin first, before I would spend weeks of assembly coding for nothing:

Prototype in Kotlin

The top left shows the two interleaved texture a and b values stretched horizontally, but you can see that fewer rows are needed. Below is the horizontal interpolation result. The top right part displays how the normal nearest neighbor zoom would look like, and bottom right shows the final bilinear interpolation result.

We haven’t yet talked about the output format for the interpolation tables: To be able to use the output as input for the vertical interpolation step, the table needs to contain the values as upshifted %rrrrggggbbbb0000 RGB values, just like our input texture. However, we will feed the result of the final stage into Gigabates’ HAM C2P routine, which needs the chunky data in a scrambled %rgbbrgbbrgbbrgbb format. This means we will actually need two different tables of 64 KB each.

Speaking of textures, it is a 256x256 24 bit true colour image, absolutely gorgeously drawn by Steffest, and downconverted to 12 bit for the demo:

Team Hope to the rescue!

The image above is the 24 bit original version. Also check out this pencil sketch he drew with such great detail!

Sketchy sketch

The texture uses 128 KB of fast memory, and because the non-interpolated zoomer also needs to output its data for the chunky format, it needs to be stored as another scrambled copy in memory, too.

So with these tables and the textures, 384 KB of fast memory are already gone.

The seven steps to bliss

I chose to interpolate in horizontal direction first. This has the advantage that we have fewer rows to interpolate (depending on the scale ratio) and the texture can be read linearly from memory.

Step 1: CPU – Calculate horizontal scaling and generate code

The first step prepares the horizontal scaling and does two things at the same time, using the CPU:

This is a fairly quick process with 30 - 44 cycles per horizontal pixel, so it takes less than 1800 cycles worst case.

The generated speedcode looks like this:

.loop   move.l  (a0)+,d2
        move.l  d2,(a1)+
        swap    d2
        move.w  (a0)+,d2
        move.l  d2,(a1)+
        move.l  d2,(a1)+
        swap    d2
        move.w  (a0)+,d2
        move.l  d2,(a1)+
        move.l  d2,(a1)+
        [...]
        move.l  d2,(a1)+
        adda.w  d6,a0
        dbra    d7,.loop
        rts

The sequence of whether there is a swap/move.w or a move.l of course depends on the zooming level and is implemented by accumulating the error until it overflows…

Step 2: Blitter – Prepare fraction data

The second step is done via the blitter (three blits, worst case 7520 cycles):

The last part is unfortunately necessary because we will have some blitting steps later that will operate with a bltsize width of 1 and cannot use a negative modulo to repeat the fraction buffer.

Step 3: CPU – Extract the texture window

Finally execute the generated speedcode to extract all required rows from the texture.

This can be done in parallel with the last blitter pass from above and – depending on the number of rows – can be rather slow (89148 cycles absolute worst case for a window of 80x46).

The result is the same as in the top left area of the prototype screenshot above.

Step 4: Blitter – Modify interpolation code

Now that we got the RGB values from the texture in the linear buffer, we need to put in the fractions for table lookup for the interpolation.

The interpolation speedcode looks like this (v = a * x + b * (1 - x)) and is generated once at start and only updated later:

        move.w  index1(a0),d0
        add.w   index2(a0),d0
        move.w  d0,(a2)+

In two blitter passes, the first and the second index are replaced by merging the texel colour information with the respective fraction from FracExtBuffer.

This takes up to 22080 blitter cycles (AC->D blit).

Step 5: CPU – Perform the horizontal interpolation

Execute the updated speedcode from above, filling the buffer with horizontally interpolated data. This is again done in parallel with the blitter.

This is the slowest part and takes up to 117760 cycles (it’s much faster when the zoom factor is higher).

Now our buffer contains the data from the bottom left part of the screenshot – interpolated, but only in horizontal direction.

Step 6: Blitter – Prepare vertical interpolation code

Now every row of the horizontal interpolated output is copied into another speedcode buffer for interpolation (same code as above), updating the index1 and index2 values and at the same time, adding the right y fraction value, which is a constant for each row (so these are just A->D blits).

Which ones of the horizontal rows are picked depends on the vertical scaling factor – every time the fraction overflows, we advance down a row.

This step takes a constant 21600 blitter cycles (A->D blits).

Step 7: CPU – Perform the final vertical interpolation

The final step is executing the above interpolation code and is taking a constant 116010 CPU cycles. Because this takes such a long time, we will run the blitter C2P for the previous frame in parallel to make use of any free DMA cycle.

How fast can we go?

Initially, the effect was running a quarter resolution of 40x22 and an 8x4 copperchunky display, updating every second row interleaved. This achieved an update rate of around 25 FPS.

Old copperchunky

However, the resolution was too low to make it look any good. The effect was always on the brink of being dropped, especially because it didn’t quite fit to the rest of the demo.

This only changed when I threw out the chunkycopper and replaced it with Gigabates’ 4x4 HAM chunky-to-planar routine and cranked the resolution to 80x45, which meant four times the computation needed. But now the interpolation looked much better, you could actually see it!

Luckily, removing the copper chunky also meant that a lot of DMA bandwidth that would be wasted every frame updating the colour registers for the whole display area would be saved – it’s using the HAM7 mode with only four bitplanes active, the copper is only used to vertically scale the image.

Still, at that resolution the interpolation can take many frames to compute. For zooming factors smaller than one, a normal shrinking routine is used which easily runs in one frame including the C2P.

The worst case happens just above a zooming factor of one, when it changes from shrinking to zooming. Then, the number of rows that need to be calculated during the horizontal interpolation becomes a maximum (46 rows). At 2x zoom, that’s only 23 rows, at 4x only 12 rows. Most of the above steps (2 to 5) get faster the more you zoom in. Only steps 6-7 (lulz), have a constant execution time.

Including the C2P, we cannot get faster than 25 FPS while the bilinear zooming is in progress, but it could go down from 50 FPS to an unbearable 8 FPS temporarily, when the scale factor drops below 1.

We solve this by buffering frames ahead of time. Gladly, each the C2P converted frame only takes 7 KB of memory (320x45 with 4 bitplanes), so we donate 176 KB of chipmem for 25 screen buffers. This allows us to smooth out the FPS needed and “rate control” the frame generation. It is a bit more complicated, though, because you need to estimate the picture-display-time for the future frames without underrunning the buffer and thus needing to wait for the certain frame, that was expected to be ready right now.

To generate the smooth path around the texture, I extended my “spline” tool (it’s actually cubic Bézier curves) to make it easier to see where about the camera pans:

Path tool

I think the final result of attempting to do a bilinearly interpolating zoomer on plain 68000 (and blitter) is not so bad after all. It runs a bit smoother on faster machines (but appears slightly broken on too fast machines).

Real-time HAM6 conversion

Because the memory requirements for this part are so massive, it meant that there was no other reasonable placement in the demo other than having it at the end of Part II. mA2E did a great job extending the music so it could be shown more than just five seconds or so.

As the bizoomer only has a 4x4 resolution, and the zoomer will not move across the whole image, many details of the amazing picture would have gone unnoticed. And that would have been a shame, right?

So I took the new real-time HAM6 conversion routine that I already used for the twister part and added it to the bizoomer. Once the music ends, a 256x256 HAM6 conversion is shown on the screen:

HAM6 version of the texture

It takes about 40 frames to convert the 256x256 image from the original 12 bit true colour data (that’s about 1600 pixels per frame, not bad!). This is all done while the screen is rolling in, so the conversion is slightly ahead of the graphics becoming visible.

You might notice that the conversion is not 100% perfect. That’s because the table based conversion takes a shortcut that might lead to not updating the right pen if it’s only one step away.

In the harddisk version, the texture is using dithering, so the end result there is much better:

HD version end picture

Unfortunately, there was not enough disk space for this version on the floppy disk.

And finally, Part II of Bacon of Hope ends.

Endpart

UNZ

I had envisioned to have some multichannel stuff for the trackmo early on.

Don’t forget that I still have plans to have a 7 channels part for you at some point – can’t promise we’ll manage during the ham effects, because they usually take much ram and cpu, but for an end scroller, this might be a good option! (06.04.2024)

Still planning to have some advanced music playback for the end scroller (best place, because this is usually a low CPU / bandwidth situation where we can waste memory and CPU on multichannel mixing) (11.08.2024)

I’d rather go with an Aklang tune at the end, because I would like to have 16 bit samples and 16 bit mixing and 14 bit output… (19.11.2024)

If I knew that this would be so much painful work, I probably wouldn’t have gone down this path.

In April 2022 I had written a basic audio mixer that was using the blitter to clip the samples when overflowing and also for splitting the 16 bit output to two 8 bit Paula channels for 14 bit playback. It was planned to be used for an 4K intro that never really went anywhere.

But if you don’t need to care about the overflow anymore (within limits), mixing channels at the same sample rate becomes just a series of add.ls (with a minor LSB error), and for different mixing rates, it’s add.ws with different fetching rates.

The aim with this (new?) sample mixer technology: How many channels can we mix on a bog-standard 68000 at 7 MHz at the highest mixing frequency in realtime, at the best possible quality?

I started working on in May 2025 and spent most of my summer vacation on it. The secret sauce is in generating speedcode for every mixing frequency that can be reused regardless of the current accumulated mixing fraction error, but not ignoring it (something you can sometimes see in some rotozoomers that look a bit jiggly – we don’t want our audio to jiggle!).

But let’s not focus on the UNZ music engine here, let’s just say it combines some smart techniques and algorithms with the blitter hardware, especially using the BLTNZERO bit and for the DSP.

The endpart music plays AmigaKlang generated samples (remember – no disk space left!) at 16 bit instead of the normal 8 bit.

There is a converter that modifies the output of Dan’s aklang2exe exemusic.asm to get 16 bit output instead of 8 bit. Another converter takes the FastTracker2 12 channels track and runs it through an XMPlayer that I wrote in Kotlin (bloody hell, XM is an unnecessarily complicated format!), tracking all the channels and notes and converts it into some custom format (a bit similar to what LSP does).

The module “Piglet of Hope” needs a lot of memory, over 522 KB for the 16 bit samples, and 128 KB for the mixing speedcode. In the end, only a single digit KB number of free memory was left.

The biggest problem, however: It takes about 35 seconds to render all the samples with AKlang. We cannot have half a minute of a black screen before the endpart starts. We need to buy some time here!

Knowledge is power, France is bacon!

Putting up a quote by Francis Bacon starting with “Hope…” is like the Alpha and Omega of the demo:

Combining Hope and Bacon

After that, we can only troll people and fake a reset (also with the right LED and colour timing). Apparently it was so convincing that the compo crew at GERP cut off the video projector after it and slightly spoiled the last fun effect, that was over by the time they switched it back on.

Spot the differences

This is a bit surprising once you notice the “Wörkbench” text (the numbers “3” and “7” are there in the image because they will be used as sprites).

It was Optic who came up with the idea of a “bacon hand” instead of the normal Amiga hand, even before we had thought the fake reset screen. But it worked so well. Then he even drew some glitch animation:

112_optic_kickstart.gif

This looked fantastic, and even added more to pulling the finger, but of course there was no space for having it as an animation. So I needed to reproduce it as best as possible in code.

I think it worked out nicely by creating some dynamic copperlist that picks one image or the other on a span by span basis, with more or less horizontal glitch offset.

I would have loved to add some electric buzzing sound to it, but again: disk space (and I was already tired).

I mean, how bad would you feel if you spent a lot of time on making this effect, and then it wouldn’t even be shown on the big screen, huh?

At least, after all that crispy bacon, I thought it would be funny to have the “Who’s hungry?” sample from the ending of Chemical Brother’s Salmon Dance.

The diagonal scroller

So about 25 seconds after we started the endpart, we have enough samples calculated that we can start the 12 channels music and the scroller. We will continue to render more samples in the background to have them ready when they come into play.

Diagonal scroller

This part is running in hires-interlace dual playfield mode. The bitplane, with the scrolltext is 1024 pixels wide and only 256 pixels high due to memory constraints. It’s not double-buffered, but there is a second copy that is displayed every other frame that has the letters shifted by one horizontal hires pixel.

Note that the font is not sheared in horizontal direction – instead the copper will modify the hardware shift register bplcon1 on every line by one lowres pixel (and modify the bpl1mod registers to scroll the bitplane pointer further than 8 lowres pixels). Together with the one hires pixel shift every other frame, this makes the font appear slanted while it is rendered straight.

Here’s how it looks in memory (only one copy):

Memory view

You can see that the font is a bit higher and skewed upwards. Together with the bplcon1 shift, this will make the font appear rotated.

With this trick, the scroller is actually only moving upwards, not diagonally! We don’t have to do wide character renderings across more horizontal words (the font data fits nicely in a longword per letter).

Because the text moves up 0.5 pixels every frame, it will appear super-smooth on a real CRT monitor and should not be flickering at all. I’ve seen this in the endpart of Andromeda’s Sequential for the first time and had used this technique in a couple of AMOS productions in the 90ties before.

However, this means we cannot pause our scroller, it really needs to keep moving, otherwise it will flicker badly.

The background layer is only 512 pixels wide (so it will partly repeat!) and also only 256 pixels high, single buffered, all due to running out of memory. It moves 0.5 pixels downwards to also become super smooth.

The most painful thing about this scroller is the fact that it’s not using the “twice as high and off-screen rendering area” technique. Instead, the copper just wraps around the bitplane address while moving upwards.

As a result, we cannot draw a whole letter or even a (diagonal) full line of text across the screen while scrolling (as the text is written right and upwards, this offscreen ahead-area would become rather large). We instead need to update one horizontal line one after the other, writing each line of each letter into the buffer.

This is realised by having a y-position sorted array of all the font letters, packed in a structure, that needs updating for every line coming in and letters being finished. There is no time for y-sorting the letters, so the algorithm for fast and consistent (sort-stable) inserting becomes a bit tricky. The array keeps track of up to 256 letters at the same time. That seems a lot, but is required for the long walls of texts.

Okay, now this looks like shit

It took a long time to get it working. I will never do another one of that type of scrollers ever again. Promise!

Both the (coder gfx) ornaments and the scroller calculation/drawing is done in the background with a low-prio task (and CPU only!), which means that if the music playback would take up too much raster time, the scroller and ornament update could be delayed a couple of frames. Luckily, due to some last-minute optimizations in the UNZ engine, this now rarely happens and is completely unnoticeable.

Scroller wraps

And that’s it! I don’t think I want to write much more about anything.

Congratulations for getting this far, sifting through more than 128 KB of lengthy descriptions. I hope you liked some of it, got some interesting insights and got inspired trying out some similar or not so similar stuff.

Drop a note to one of us (see below), or just say hi at the next demoparty.

Tooling and development

KingCon was still used for converting the graphics, although for many of the HAM images and formats I now have three different HAM converters. I sometimes used my juggler tool (already known from HAMazing) to postprocess binary data.

For ZX0 compression I used Salvador rather than the much slower optimal ZX0 compressor.

For development, I used CLion with my MC68000 assembly language plugin.

Like before, I only ran the WinUAE emulator within in a virtual machine on my Mac, but also vAmiga and Denise for testing (thanks to the latter authors for already having prepared after the release of this demo). Still no real Amiga involved, so big thanks need to go out to 4play, Gigabates and Virgill (and others) for testing on real hardware.

Best regards,

Chris ‘platon42’ Hodges, chrisly(at)platon42.de

Addendum: Storyboard

Information may not be completely accurate.

PlatOS base memory requirements: 12 KB Chip of 511 KB, 19 KB Fast of 504 KB

Partname Runtime Pt Chip Hunk Dynamic = Total Fast Hunk Dynamic = Total Disk Space LOC
Part I 0:3.5 0 KB 15 KB 16 KB 3 KB 0 KB 3 KB 1 KB 700
- Horn 14 KB 0 KB 10 KB
Music 1 3:35 28 150 KB 28 KB 103 KB
Squid 2:18 0 90 KB 147 KB 257 KB 38 KB 173 KB 221 KB 13+61 KB 7500
Mr. Polyeder 0:30 18 25 KB 61 KB 86 KB 15 KB 10 KB 25 KB 21 KB 4800
Tentacles 0:44 22 112 KB 208 KB 320 KB 32 KB 2 KB 34 KB 92 KB 2000
Summer of Squid 0:10 27.75 0 KB 34 KB 34 KB 53 KB 3 KB 56 KB 10 KB 500
Part II 0:4.5 37 KB 15 KB 52 KB 3 KB 0 KB 3 KB 30 KB 700
Music 2 2:52 64 140 KB 24 KB 97 KB
Beacon 0:22 0 140 KB 135 KB 275 KB 33 KB 4 KB 37 KB 107 KB 2200
Between the Bars 0:22 8 0 KB 332 KB 332 KB 11 KB 274 KB 362 KB 68 KB 3000
Zeta Racer 0:33 16 0 KB 106 KB 106 KB 34 KB 9 KB 43 KB 21 KB 1900
Texture Mapper 0:22 28 0 KB 304 KB 304 KB 45 KB 175 KB 230 KB 19 KB 3000
Twister 0:29 36 6 KB 232 KB 236 KB 82 KB 128 KB 210 KB 27 KB 2100
Ham Bobs 0:22 47 85 KB 192 KB 277 KB 6 KB 165 KB 171 KB 72 KB 2800
Bizoom 0:25 55 2 KB 311 KB 313 KB 135 KB 315 KB 450 KB 55 KB 3000
UNZ Music 3 2:13 26 35 KB 35 KB 30 KB 651 KB 681 KB 18 KB
Endpart 32 KB 153 KB 185 KB 37 KB 74 KB 111 KB 25 KB 5400

Yes, the endpart with the UNZ music needs almost all memory.