djh

Claude Code Questionnaires

Sat, 10 Jan 2026 20:55:34 +0000

One thing I’ve noticed about Claude Code is it will sometimes ask you questions, especially at the start of new projects, some clarifying things it wants to know so it can feed into the plan input, for example if you was creating a frontend project it might ask you what framework(s) you want to use.

I thought that was a neat feature but didn’t realise you can get the assistant to do this for your own stuff just by specifying it in prompts.

This came apparent to me when I was looking at automating adding new self-hosted services to my server. I have a bunch of configuration files for my homelab (nixos) and adding a new service requires a modifying a few things like reverse proxy configuration, docker-compose and some other minor bits. I wanted to automate these steps with Claude Code but there’s obviously a bunch of context it needs, but just by adding this to my CLAUDE.md…

When deploying a new self-hosted service, ask the user:

1. What domain the service should be accessible at (e.g., `foo.bar.com`)
2. Which server it should be hosted on (typically `myserver`)
3. What port the service listens on inside its container (the conta
iner port)
4. What dashboard section to add it to (Apps, Home, Finance, Media,
 or Infrastructure)
5. Does the container need any volumes for persistent data (e.g., `
/opt/data/servicename:/data`)
6. The docker image name (e.g. `foo:latest`)

and saying in my prompt that I wanted to deploy my service to the server - it popped up the same lil’ survey interface with the questions I want the assistant to ask.

That’s pretty cool, I can see myself using this pattern quite a lot when automating some workflows.

Using LLMs to turn scripts into applications

Tue, 19 Aug 2025 21:00:00 +0000

I’ve built a bunch of scripts and tools over the years that are largely hodge-podge, “good enough” monstrosities that should never see the light of day, but mostly useful, albeit awkward to use.

One of those tools was something I built to help me maintain my beancount file.

The tooling downloads transaction data from various financial institutions and then converts the contents into beancount format so that I can copy/paste the contents into my file.

As part of the processing pipeline it also included a rule-engine of sorts that formatted and categorised transactions depending on various conditions defined in the rules and this was all configured through one big, cumbersome JSON file.

It’s worked fine for a long time but frustratingly the “rule engine” part was becoming a pain to maintain, or even remember how to use in the first place. A small bit of JSON is cute, but JSON left unattended becomes something unspeakable, especially if you feed it after midnight.

I’ve never really taken the time to improve on it though, as ever time seems to evaporate as you grow older.

So I thought I’d look into LLMs to help with this and move the rules engine into an application to help me manage the rules in a saner way.

Hello BeanEngine

This is what Claude (and I?) built, we called it BeanEngine.

It’s a web application that implements my rules engine and allows you to configure it through a nice UI.

The application has two functions

An API where I can pass transactions, and then it returns everything I need to convert them into a beancount entries.
A UI to configure rules on how to process the transactions and classify what they are for

Over the course of a sessions on a few days (maybe 2-3 hours) Claude completed this for me.

Anyone who tells you these AI tools are one-shot miracle machines are lying to you, from my experience building this tool, there were a lot of iterations, but it felt more akin to supervising someone sometimes telling them where they were wrong.

The UI

I made claude build me a number of pieces of functionality

A rule engine to process transactions, complete with search functionality and CRUD actions for rules.
An account mapping engine to map accoutns to beancount accounts
A transfer rules engine to identify transactions that are transfers between personal accounts

Here are some screenshots of the screens Claude built for me, notice how the UI is fairly consistent between screens. All of this was designed by Claude, I did tell it to implement dark mode.

Rules engine

Claude even added some import/export functionality which I didn't even ask for but kept anyway

Account mappings engine

Transfer rules engine

All of these systems are not that complicated, just CRUD screens backed by a SQLite database. But these would have taken me days to build and lets be honest, building CRUD apps is tedious. With LLMs they’re fun to build because you get the LLM to do all the work.

What’s all this in service of?

Well, the API uses all these rules engines to process transactions

For example I can send this to the API

[
  {
    "date": "2025-08-01",
    "from_account": "creditcardprovider:-x12434",
    "recipient": "PATREON",
    "amount": 4.8,
    "reference": "PAYPAL *PATREON MEMBERS 1234",
    "currency": "GBP"
  },
  {
    "date": "2025-08-01",
    "from_account": "creditcardprovider:-x12434",
    "recipient": "GOOGLE SERVICES",
    "amount": 12.99,
    "reference": "GOOGLE*YOUTUBE 1231231",
    "currency": "GBP"
  },
  {
    "date": "2025-08-09",
    "from_account": "creditcardprovider:-x12434",
    "recipient": "WAITROSE",
    "amount": 75,
    "reference": "WAITROSE 123423     ONLINE",
    "currency": "GBP"
  },
  {
    "date": "2025-08-08",
    "from_account": "foo-bank:12345",
    "recipient": "Daniel Harper",
    "amount": 200,
    "reference": "bar-bank savings account",
    "currency": "GBP"
  }
]

and it will respond with

[
    {
        "date": "2025-08-01",
        "recipient": "Patreon",
        "reference": "Podcast sub",
        "from_account": "Liabilities:CreditCard:CreditCardProvider",
        "account": "Expenses:Fun:Subscriptions",
        "amount": 4.8,
        "currency": "GBP",
        "classification_type": "ml_prediction",
        "confidence": 0.9993922710418701,
        "original_from_account": "creditcardprovider:-x12434",
        "original_recipient": "PATREON",
        "original_reference": "PAYPAL *PATREON MEMBERS 1234"
    },
    {
        "date": "2025-08-01",
        "recipient": "YouTube",
        "reference": "Youtube Premium",
        "from_account": "Liabilities:CreditCard:CreditCardProvider",
        "account": "Expenses:Fun:Subscriptions",
        "amount": 12.99,
        "currency": "GBP",
        "classification_type": "ml_prediction",
        "confidence": 0.9989938139915466,
        "original_from_account": "creditcardprovider:-x12434",
        "original_recipient": "GOOGLE SERVICES",
        "original_reference": "GOOGLE*YOUTUBE 1231231"
    },
    {
        "date": "2025-08-09",
        "recipient": "Waitrose",
        "reference": "",
        "from_account": "Liabilities:CreditCard:CreditCardProvider",
        "account": "Expenses:Groceries",
        "amount": 75.0,
        "currency": "GBP",
        "classification_type": "rule_override",
        "confidence": 1.0,
        "original_from_account": "creditcardprovider:-x12434",
        "original_recipient": "WAITROSE",
        "original_reference": "WAITROSE 123423     ONLINE"
    },
    {
        "date": "2025-08-08",
        "recipient": "Daniel Harper",
        "reference": "bar-bank savings account",
        "from_account": "Assets:Bank:Current:FooBank:Current",
        "account": "Assets:Bank:Savings:BarBank:Savings",
        "amount": 200.0,
        "currency": "GBP",
        "classification_type": "transfer",
        "confidence": 1.0,
        "original_from_account": "foo-bank:12345",
        "original_recipient": "Daniel Harper",
        "original_reference": "bar-bank savings account"
    }
]

You may notice that the response contains a few things

The recipient is sometimes rewritten to something cleaner (e.g. “Google Cloud”)
The reference is sometimes rewritten to something nicer (e.g. “Podcast Sub”)
The beancount Expense account the transaction should be for is returned
There’s some indication of what type of transaction it is (transfer between personal accounts or “ML classification”)

My script can then convert these into nice, clean beancount transactions

2025-08-01 * "Patreon" "Podcast sub" ; confidence: 1.00
  Liabilities:CreditCard:CreditCardProvider -4.80 GBP
  Expenses:Fun:Subscriptions

2025-08-01 * "YouTube" "Youtube Premium" ; confidence: 1.00
  Liabilities:CreditCard:CreditCardProvider -12.99 GBP
  Expenses:Fun:Subscriptions

2025-08-09 * "Waitrose" "" ; confidence: 1.00
  Liabilities:CreditCard:CreditCardProvider -75.00 GBP
  Expenses:Groceries

2025-08-08 * "Daniel Harper" "bar-bank savings account" ; transfer
  Assets:Bank:Current:FooBank:Current      -200.00 GBP
  Assets:Bank:Savings:BarBank:Savings

Wait what, ML model?

You may also notice the response contains some mention of ML prediciton.

I’m not an ML engineer and do not pretend to be, but I was having so much fun with this I decided to let Claude loose on building a really simple classification model for me. Overkill? Yes. Probably a terrible model? Almost certainly. But hey, while we are here….

The model accepts

a recipient (e.g. GOOGLE SERVICES)
an amount (e.g. 12.99)

and attempts to predict the target account e.g. Expenses:Fun:Subscriptions

The model is trained on all my past beancount transactions. I got Claude to write the model trainer to accept a beancount file, extract the data, clean it up and then train a really really simple neural network, which is probably overkill - it did say better classifiers exist.

This model is then stored in a PKL file, and I got claude to write a simple system so I can upload new versions of the model via the UI and switch models without having to restart the application.

Did I need the ML model? No. You can acheive the same result just defining the rules properly in the rules engine. But I figured it might be handy for the transactions that are less frequent.

Anyway

Just a blog post about something I found pretty neat.

I think what’s interesting to me is these sorts of tools and things that I have in my toolbox are mostly personal to me. I think with the era of these LLM things, it gives the opportunity to unlock more more from them. Maybe at the expense of the ball of spaghetti Claude has come up with, but I’m willing to make that trade off for the time being, I’m the only user that has to live with it.

Cheers xxx

Side note: I’m aware that beancount has importing functionality, I don’t really use these though I prefer to maintain my beancount file myself, I’ve honed a bunch of techniques to speed this up over the years and I’m too stubborn to change. 🙂

Syncing a file to a remote server when it changes on OSX

Sat, 22 Feb 2025 20:00:00 +0000

I’m a big fan of plain text accounting and have been for years.

My tooling of choice has always been beancount with fava as the web application to render my beancount file.

I’ve always had one problem though, I edit my beancount file on my Mac using emacs¹, but my fava instance runs on a server in my homelab. So if I want to see the latest changes, I’d do stuff like manually copy the file (annoying) or implement stupidly overkill solutions like syncthing - which was a pain to maintain for just one file.

So I was looking for something simpler, my requirements were

Copy the file to my server when it changes
Do this automatically and be enabled all the time, even after reboots

After some very basic research and testing, I settled on a neat solution, using fswatch and launchd to basically monitor my beancount file and sync to my server on change events.

The solution is just a few lines of XML that runs a bash one-liner



 version="1.0">

    Label
    beancount.sync
    ProgramArguments
    
        /bin/bash
        -c
        /opt/homebrew/bin/fswatch -o beancount/finances.beancount  | xargs -n1 -I{} /usr/bin/rsync -avz beancount/finances.beancount server:/opt/data/beancount/
    
    RunAtLoad
    
    KeepAlive
    
    StandardOutPath
    /tmp/beancount.log
    StandardErrorPath
    /tmp/beancount.err
    UserName
    daniel

It works by fswatch firing some output every time the file changes and rsync does the sync up to the server.

Enabling the service was easy, I just ran this command about 6 months ago

launchctl load ~/Library/LaunchAgents/beancount.sync.plist

…and that’s it!

A remarkably simple solution that’s been working solidly, and given Fava monitors changes to the files it’s rendering - you see your updates reflected in the UI almost instantly or a few seconds at most.

Why fswatch

I’m aware launchd has a directive called WatchPaths, I tried using this initially but it never seemed to work properly and debugging why was depressing. I’ve been a Mac user for nearly 20 years and barely understand most of its internals.

The fswatch solution worked first time and has worked every time, so it gets a 👍 from me.

This is the only thing I use emacs for. I run it with beancount-mode and evil for vim keybindings. It’s the best editor for beancount data but I use Vim for everything else! ^[return]

My home network

Sun, 24 Nov 2024 12:00:00 +0000

I’ve become a rack guy.

This has been my obsession over the past few years. Whilst I’ve had various bits of network equipment, SoHo cabinets and a whole lot of chaos going on in the house over the past few years, this is the first time I’ve actually gotten myself a proper rack and I’m really pleased with it.

But this post I want to talk about my home network and all the bits I’ve put together to make it work, it’s not very interesting or novel but this sort of setup is something I’ve wanted for a long time and I just wanted to write about it and how I got here.

If you want a more informative post about home network racks, I highly recommend this post from Michael Lynch, he goes into way more detail and I found it useful.

The network

I’m lucky enough to have ethernet in some of the rooms of my house. This is not common in the UK, but newer build properties like mine are being built with CAT6 in the walls. Obviously it’s possible to retrofit cabling in UK houses but it requires some chopping and chasing to get everything in place, so it’s nice to have had that already done when I moved in. In hindsight I wish I’d requested more drops throughout the house but oh well, it’s good enough.

The drops all end at a point in a small cupboard on the ground floor, 4 ports in total one for each area, this is where the rack lives. Here’s a crude drawing of what the network looks like right now:

It took a while to get to this point though, so I’ll document the journey…

A bit of history…

When I moved in the ISP just provided a shitty router, which suits most people, but I wanted take advantage of the full setup with PoE access points etc and use all the ports in the rooms. This led me to the “pro-sumer” market which is a minefield of expense, and made a bunch of mistakes a long the way.

My initial foray was to not bother with a full size rack and purchase something smaller, the 10” “SoHo” cabinets seemed much cheaper, along with the gear that fits in it. The first thing I bought was a 6U cabinet, fitted with some shelves. For networking I settled on some TP-Link Omada gear, mainly the ER605 router, OC200 controller, 10 port PoE switch along with a few WiFi access points and mini-switches to put around the house, powered by PoE.

…and honestly? This served me well for a few years, it was great having everything nice and neatly tucked way in a little box in the cupboard, and outside of the Omada controller software/web interface being dogshit slow to navigate it worked well enough, and the experience of being able manage the router, switch and APs from one control surface was good.

But then the first problem arrived, soon after I purchased a 1L mini PC for self hosting/homelab purposes, to replace a set of Raspberry Pi’s that were becoming cumbersome to manage and constrained by RAM.

The PC did not fit into the little SoHo cabinet, I’d run out of room already and the bottom layer was consumed by surplus cabling. Maybe I could have rejigged some things around to make more room but it was becoming a hassle. So the PC ended up being put in the lounge upstairs and the fan in it was annoyingly loud.

So mistake no.1: don’t under-estimate or cheap out on your U’s, in hindsight I should have gone for a 9U or 12U cabinet

This, compounded with the fact that I had plans to get a NAS. I needed something bigger to accomodate all my things. I didn’t want all these devices dotted around my house because I’d run out of room downstairs. So the next expense was to upgrade to a 19” 12U open frame “network rack”^* for about £60 with the power distribution strip. The bits for it arrived in flatpack format and just required some assembly.

^{* note there’s a key difference between “network rack” and “server rack”. Server racks tend to be a lot deeper, whereas my network rack is only 482mm deep (from what I understand)}

Afterwards I bought some more shelves to put my things on, and because I’m a cheap git I decided to re-use the 8 port patch panel from my old cabinet and designed/3D printed some “extenders” for the left/right parts so it could fit into the wider rack.

With that out of the way the Mini PC could be moved back downstairs and into the rack, along with the Omada gear and recently purchased NAS. Unfortunately the NAS takes up about 5U of space, but there’s still plenty of room on the shelves.

I even designed and 3D printed a custom mount for the Omada controller and router so I could mount it in 1U of space which took hours and had to be glued together in sections because the printer bed on my printer was too small. Turns out 19” is quite a large amount of space to fill. In hindight this was a massive waste of time because I got the sizing slightly wrong so the franken-rackmount did not fit into the rack properly, I should have just used a shelf and not tried to be clever.

So mistake no.2: don’t try to be clever

Making it look nice

So, with all the stuff in place along with the janky 3D printed parts, there was one remaining niggle that bugged me to no end. The open frame leaves a big open gap at the top with all the wiring and power supplies and shame exposed to the open for you to look down on when you open the cupboard. I wanted to put something on top to make it look at least presentable.

This led me to an annoying quest of trying to get some nice wood cut to size, the options were surprisingly expensive. In the end, I just happened to be in IKEA one day looking in their recycled/returns/re-use section and spotted a cupboard door with a slight mark on the bottom selling for as-is for just £5. It was 23” square and I got my Dad to help me cut two sections off to get it down to size.

I quickly designed some 3D printed “pegs” to stick on the corners so they would fit into some bolt holes on the top of the rack (so the top can easily be lifted it off you need to access anything underneath)

…and the result was very satisfying! Yes it’s not super high quality and 2 of the 4 edges are rough from the cutting, but the good sides are the ones and show and it looks miles better than just a mess of cables.

At this point I thought I was done, but the homelab quest never seems to end, and quite naturally, my next side quest led me to investigate IPv6, which led me down the completely irrational path of replacing all my Omada gear entirely.

Brief interlude: IPv6 woes

You see, it all started when looking into renting a cheap VPS from Hetzner. One of their options is you can cheap out and get an IPv6 address for free or pay extra to rent an IPv4. As you may have gathered with the running theme of this post, I’m not one to open my wallet more than what I have to…

So after setting up IPv6 on my network and everything was running in the future, alarm bells set in when I could connect to one of my services running on my homelab from the VPS by just hitting an IP address. Uh-oh, all my devices and services were exposed to the internet 😱!

Isn’t my router a firewall though? was my first immediate question, and yes it is, but that only works for IPv4. The version of my router was too old to have the firmware upgrade to support IPv6 firewalling. So every device on the network is exposed all the time when you enable IPv6 on this device. While scanning the IPv6 space is probably unfeasible anyway, I didn’t like the idea of all my stuff being out there so I came crawling back to the sweet, comforting embrace of IPv4.

This experience left a niggle in my brain that was difficult to shake, if the router doesn’t have the right firmware to support some features what else is missing? The rational and cheaper move would have just been to upgrade the router to a more recent model and be done with it - the modular nature of the Omada setup makes this relatively simple.

However…

Irrational changes

It was time to see what else was out there. I remember back in the day looking at Ubiquiti gear but was put off by the expense along with stock shortages that were going on at the time (2022-ish) it was terribly off putting. The cost could have been stomached somewhat but what really put me of was their flagship router - the Dream Machine Pro did not have PoE ports, so you had to purchase a separate switch. This easily pushed the cost to over £800 once you factored in access points in and other accessories.

But while researching I noticed Ubiquiti had released a new version of their Dream Machine Pro, the “SE” special edition version, which upgrades all the ports to PoE. This was quite a compelling case to me, my thinking was, this would allow me to replace the 3 TP-Link devices in my rack plus janky 3D printed parts, with just 1 device actually designed to be rack mounted and possibly - better hardware?

So I pulled the trigger and purchased

1x Dream Machine Pro
2x U6+ WiFi 6 access points

Wise? No. Expensive? Definitely. Makes me happy? Yes.

Outside of the IPv6 issue, which I think could just be resolved by upgrading the router, there really isn’t much wrong with the Omada setup so the takeaway shouldn’t be “Omada bad, Ubiquiti good” - it’s really just a few things fell into place for me to make the switch more compelling to me.

The switchover

Switching between ecosystems seemed daunting but actually it ended up being surprisingly smooth.

Initially I just setup the dream machine on a desk, connecting just one of the APs, and configured the WiFi network to have the same SSID+password as my existing one. When my iPhone connected to it straight away it gave me confidence that everything else should be able to connect just fine.

As for the wired stuff, most things had static IPs configured on the devices themselves or via DHCP reservations, so after whipping up a quick spreadsheet to map out all the important stuff, I got on with the task of bascially tearing out the Omada gear and getting the dream machine racked up, plugged in and all patch cables transferred over.

…and everything just worked? It was honestly surprising how seamless it was.

Well, OK, there were a few niggles. One thing that broke was my EV charger on the wall outside, while the device was present in the Unifi web interface it just wasn’t speaking to the internet for some unknown reason. A cold trip outside to flip a breaker sorted that out.

Another, more annoying issue was Airplay just refused to work. I listen to podcasts through an IKEA Sonos speaker in the kitchen when making dinner and whilst my iPhone could see the speaker, trying to play any stream through it just failed. It turns out Unifi turns multicast DNS off by default and I had to enable this AND restart my iPhone to make Airplay work - this took a while to debug.

You have to 3D print something though

With the rack sorted and UDM in place my attention turned to the access points. I didn’t mount these to the walls/ceilings and didn’t with the TP-link ones either, instead I just designed and 3D printed these ‘legs’/stands for them to sit on and they rest on a shelf, one in my office and one on the other side of the house.

For the Unifi access points I just had to adjust the design for the stands as the hole placement is slightly shorter to fit the mounting bracket, but this did not take long and soon the APs were legless no more (after 8 hours of printing…)

You can find the design for these here on Printables

Wrapping up

Anyway, that’s everything up to this point I think. It’s been a journey. An expensive journey, but I’m really pleased with the final result^**.

One pleasant surprise about the Unifi ecosystem has just been how snappy the web interface is. This is probably a product of the hardware being much better than the TP Link OC200, but it’s really good! The iOS apps are excellent as well and they have a really good WiFi debugging app called WifiMan that’s been invaluable.

As for IPv6? I might experiment once again soon, but I’m gonna do some reading on how to configure the firewall on the Unifi system properly first.

Thanks for taking the time to read xxx

^{** this is what I tell myself, but nothing is ever final}

Bonus content: homelab overview

If you’re interested in what I run on my homelab here’s a quick overview. I might write a different blog post about my setup as I do some funky stuff with Tailscale and DNS and other things, but generally this is what I run:

Terramaster NAS

Hardware

2x 4TB Seagate Ironwolf HDDs

Software

Unraid - I don’t do anything other than run Unraid on the NAS, I don’t run docker containers or VMs.

Server

Hardware

Intel i5-6500
24GiB RAM
500GiB NVMe SSD
240GB SSD

Software - VM 1

This is the main VM that offers a bunch of network level things like DNS and HTTP proxy.

Traefik - reverse proxy.
Adguard Home - DNS level adblocker
Tailscaled - Acts as a subnet router so I can access everything on my network via tailscale

Software - VM 2

Librechat - for accessing all of the commercial LLMs via APIs wrapped in a nice interface
Grafana - displaying graphs for everything
Prometheus - metrics from everything inc. Home Assistant
Fava - finance UI for beancount
Paperless-ngx - document store for scanned documents/PDFs
Gitea - self hosted git
changedetection.io - tells me when stuff changes on websites
Manyfold - 3D print model store

Software - VM 3

Home assistant basically. One day I might move this onto a separate machine.

Home Assistant - controls everything in my house…
DeCONZ - Zigbee gateway controller/interface for ConBee II USB stick

Software - VM 4

Media stuff

Plex - watching movies/TV shows (streams off NAS)
Navidrome - for my music collection (streams off NAS)
Audiobookshelf - for my audiobook collection (streams of NAS)
Software for finding and archiving linux ISOs

^{As a side note I wrote all these bullet points without links in them and asked an LLM to link to the relavent projects/pages and it did it!}

No News is Good News: Using AI to auto skip the news on catch-up radio

Sun, 10 Nov 2024 21:00:00 +0000

Recently, I’ve been expanding my music horizons by listening to radio show recordings rather than relying on algorithm-driven recommendations. While this has been great for discovering new music, there’s one persistent annoyance: outdated news bulletins from the time of broadcast are still present in the recordings. Hearing old news reports can be jarring, and manually skipping them is often imprecise.

What’s really needed is a podcast-style chapter system that would let listeners jump past these sections. However, the current player from the broadcaster is just one continuous stream with no built-in navigation markers, making it impossible to efficiently bypass unwanted content.

So, with a weekend ahead of me free I went away and built something to auto-skip the news segments when listening. It’s a browser extension that runs on the same page as the player, gets a handle on the element and skips the news segments when the time is reached, and also includes a bonus feature of scrolling through a transcript of the show.

Your browser does not support the video tag.

Skipping over the news

Now I can listen to the programmes without the scourge of old news ruining it.

How it works

You’ll be disappointed to hear the solution isn’t magic or doing anything fancy. The news segment detection happens in an out of band batch process and the browser extension just takes the output of that to control the player.

player.addEventListener('timeupdate', () => {
    this.checkAndSkip(player);
});

seek(player, seconds) {
    console.log("Moving to =", seconds);
    player.currentTime = seconds;
    console.log("New currentTime =", player.currentTime);
}

checkAndSkip(player) {
    if (this.skipPoints == null) {
        return;
    }


    // checks if we need to skip the player forward
    // if the current time is within a skip point
    let currentTime = player.currentTime;
    for (let startTime in this.skipPoints["skips"]) {
        startTime = parseFloat(startTime);
        let endTime = this.skipPoints["skips"][startTime];

        // If we're at a skip point (with small buffer for precision)
        if (currentTime >= startTime && currentTime < endTime) {
            console.log(`Skip point detected, seeking to ${endTime}`);
           this.seek(player, endTime);
        }
    }
}

News Segment Detection

Identifying where news segments begin and end presents a challenge. While others have explored similar problems, like automatically removing radio advertisements through sophisticated digital signal processing (DSP) techniques, these solutions typically rely on audio cues such as jingles or “bookends” to mark segment boundaries. Unfortunately, the shows I listen to don’t use such clear audio markers to indicate news sections.

However, the news does tend to start/end with some words/phrases that could be considered bookends, although it’s unclear how consistent these are across shows and newsreaders.

BBC News at 7.30. This is Anthony Burchly.

…That’s 6 Music News, your next update is at 8.30

So my initial idea was to find some way of getting a transcription of the show to look for these phrases and identify them as segments to skip, e.g. chances are if something opens with “BBC News at…” then within 3-4 minutes there should be a corresponding closing remark to indicate the end of the news segment.

Where does one turn when you have no idea what you are doing and will blindly trust anything? AI of course!

Getting the transcription seemed like a good fit for multi-modal LLMs, but I didn’t feel comfortable uploading shows to the major players because getting the audio requires downloading them through means where it’s probably legally questionable¹. I didn’t want to get my AI accounts banned due to uploading copyrighted content².

I decided instead to use local models, specifically OpenAI’s whisper for speech recognition. After initial testing, I found that switching to whisper.cpp, which offers GPU support, significantly improved performance. This setup allowed me to process audio files at reasonable speeds on my modest M1 Macbook Air.

Whisper.cpp supports outputting the transcript as JSON and they come out looking something like this.

...
{
 "timestamps": {
   "from": "00:01:52,960",
   "to": "00:01:57,040"
  },
  "offsets": {
    "from": 112960,
    "to": 117040
  },
  "text": " Yee-haw, hey music fans, welcome to the 
new Music Fix Daily"
},
{
  "timestamps": {
    "from": "00:01:57,040",
    "to": "00:02:02,500"
  },
  "offsets": {
    "from": 117040,
    "to": 122500
  },
  "text": "on BBC six music with Tom and Deb starting the show"
}
...

With this in place my first stab was to try and identify the bookends by looking for the phrases I described earlier. This kind of worked ok but wasn’t consistent, e.g. you’d get situations like this

...
{
  "text": "BBC"
},
{
  "text": "news at"
},
{
  "text": "6:30, Sir Kier Starmer has"
}
...

Not optimal, and requires much more fooling around looking/forward backwards in the transcript to try to infer where the bookends sit. I was getting fed up around trying to come up with the solution that catches all things.

So where does one turn where they’re fed up of their stupid python script and wants someone else to do something about it? AI of course!

Rather than continue struggling with manual pattern matching, I turned to AI for help. Following some prompt refinements³ and an upgrade from Gemini Flash to Pro, the model proved remarkably effective at identifying news segments.

Prompt:

► This is a transcript from a radio show. Could you please identify the segments in the transcript that are BBC news reports so that I can trim them out 


I'd like the output to be in JSON format with no surrounding markdown just pure JSON, here's an example


  [
    {
        "segment":{
            "start": "00:33:00",
            "end": "00:35:58"
        },
        "duration_seconds": 178
    },
    {
        "segment":{
            "start": "01:13:22",
            "end": "01:16:33"
        },
        "duration_secs": 123
    }
  ]
  

The model produced something like this.

[
  {
    "segment": {
      "start": "00:29:36",
      "end": "00:32:41"
    },
    "duration_seconds": 185
  },
  {
    "segment": {
      "start": "01:29:37",
      "end": "01:32:41"
    },
    "duration_seconds": 184
  }
]

Whether the model is consistent enough across lots of shows to produce decent results remains unclear, but from the handful I’ve processed so far the results have looked pretty decent.

Putting all the bits together

Having solved the news detection challenge by trusting the output of whatever the AI gives, I created an automated pipeline that handles the entire process: downloading media files for the show by scraping the recordings page, processing them through whisper.cpp, and using Gemini Pro to identify the skip points.

The pipeline then saves both transcriptions and skip timestamps to disk, which are served via a simple server that resolves the programme ID to transcript.

When on the playback page for a show, the browser extension requests the corresponding transcript from the server. It then configures automatic news segment skipping and renders the transcript on the page. The transcript display isn’t essential to the functionality, I just thought it was neat.

I’m not gonna open source this just yet, I wrote the code in a weekend, it’s all a bit thrown together, untested garbage which I don’t apologise for, I built it for a stupid personal grievance.

On reflection

This was a fun project but probably way over engineered for a small use case. Additionally during development it was funny, and in hindsight, unsurprising to find that the news segments are remarkably punctual (around 30 minutes and 90 minutes into the show, each running around 180-190 seconds) so just looking around timestamps with a simpler approach could achieve the same result - but where’s the fun in that?

I also enjoyed writing the browser extension to augment some functionality on an existing website to suit my own needs. I even used LLMs to help me start writing it because I’d never written one before and it always seemed like a faff, but these models got me started so I could have something up and running pretty quickly.

Future ideas if I can be bothered

Try and find more ‘segments’ in the show e.g. interviews
Try and identify song boundaries - this is probably harder with transcriptions but I wonder if you just threw the audio file into one of the larger multi-modal LLMs it might be able to do something

Cheers xxx

I’m not a lawyer ^[return]
I’m well aware that scrapers for training these models don’t have the best reputation in regards to copyright… ^[return]
I’m not a prompt engineer I’m sure there are better ways of prompting these models ^[return]

Installing Unraid on a Terramaster F4-424

Thu, 25 Jul 2024 19:00:00 +0000

I recently bought a Terramaster F4-424 NAS for my home network. It’s quite a neat box and reasonably priced.

The reason why I chose this device over say, a Synology, is the fact that you can easily replace the operating system. I wasn’t keen on Terramaster’s provided software and wanted to try out Unraid instead.

Replacing the operating system is really just a matter of swapping a tiny bootable USB drive with another.

There are guides on this on YouTube that go pretty in-depth into what is a fairly simple process, but they’re mostly for one of the previous models, not the F4-424.

I just want to document some of the differences I encountered in the F4-424 compared to the above video and other guides you might find online.

Replacing the USB boot drive

The USB drive to replace is at the bottom, mounted at the edge of the motherboard.

There is not enough room between the top of the drive and the metal casing to be able to pull the drive out of the socket (contrary to the previous model)

So that means we need to lift the motherboard to get it out. Luckily it’s just a simple case of removing 4 screws

Then you just need to gently lift the motherboard up, it might be a bit tight because the board is connected to a daughter board via a PCI-E slot (I presume) but some light wiggling should get it out.

With the motherboard lifted, you can swap the USB drives and then it’s just a matter of reassembling everything back together.

Keyboard and mouse?

Some of the guides talk about plugging in HDMI, keyboard and mouse etc to configure the BIOS to boot from the right drive. In my experience I plugged a HDMI cable in and turned the device on and it automatically booted into Unraid anyway, so no further configuration required.

Hope this guide helps!

Prototyping Websites Using LLMs

Fri, 17 May 2024 19:00:00 +0000

I recently had an idea for a website I wanted to build.

The website would help Electric Vehicle owners work out the best time to charge their car if they’re on a particular time-of-use tariff here in the UK.

However, I’m my own worst enemy when getting started on projects like this. I feel I spent a good chunk of time just working out how to center a

on the damn page which usually saps all the joy out of everything. Even with just some hand rolled HTML it’s easily a 15 minute job of bootstrapping.

So this time round I decided to skip the frustration and use Large Language Models to help me get started on my project! Heck, if it’s still frustrating, at least you can tell the model off for getting it wrong!

It started with a sketch

So I whipped up a quick sketch of what I wanted my website to look like.

On the left would be the inputs e.g. how much battery is in your car, how powerful your charger is and how much you want to charge to.

On the right would be a summary of the cheapest time to charge, with some stats about prices in a table.

Below is a summary of how I got on. I might be misremembering some bits as it was a while ago.

Asking my mate Claude

I asked Claude 3 Opus to write me the HTML for this structure.

Not the most detailed prompt, but lets see what we got.

► Expand

Intercepting t.co links using DNS rewrites

Sun, 29 Jan 2023 13:00:00 +0000

When someone links to something on twitter, either by embedding something or just pasting a URL, twitter will front it with its own t.co link. This means that you cannot verify what the URL is until you click it and your browser goes to the end result via t.co. I only really noticed this properly when my DNS sinkholing server (Adguard home) started blocking t.co links and I was getting an error when say, clicking a linked news article.

The obvious fix for this would be to just add t.co to my DNS allow list so these requests can go through. However, the fact that you cannot see the URL until you’ve already navigated through to it irks me a lot, I’d rather verify the link I’m navigating to is something I want to visit.

There are browser extensions that solve this problem by modifying the DOM to uncloak the links (e.g. twitter-link-deobfuscator) which works pretty well, but this solution is limited to the browser and does not work on the Twitter app on Android. Other options are to copy the t.co link into a link uncloaking website, but this is fiddly and annoying, or install an app on your phone

So I was looking for a more general solution that works across devices, what if there was a way of “intercepting” when you click on t.co links, unwrapping where it eventually leads to and presenting the user an interstitial page detailing this, with an option to continue forward.

Enter the unwrapper, a small service I wrote this weekend that does exactly this, but it abuses a lot of safeguards we have in place for the web so comes at a high price.

The Unwrapper

The service is just a go server that, when receiving a request for t.co, makes a HEAD request to the shortened link and extracts the Location header before the redirect is followed. If this is found, the value in the header is returned and rendered on a simple interstitial page.

But how do you intercept the calls to t.co? The magic comes in by using (abusing?) the DNS rewriting feature on Adguard home to intercept DNS requests for t.co and return the IP of my reverse proxy, and then adding a proxy rule to forward all requests for the host t.co to the go service.

Surprisingly this worked pretty well, although with a huge caveat - this is your run of the mill Man in the Middle (MitM) attack. Browsers will, quite rightfully, complain that the certificate being presented from my reverse proxy is not valid for t.co so the user should not continue, or proceed with extreme caution.

To mitigate this I did something bad. Well, I’m already doing something bad, I run my own self-signed Certificate Authority (CA) and my reverse proxy uses certs signed by this CA. This root CA is trusted on my devices so I can access various internal services that run on my network when I’m connected to it, or via VPN when I’m remote.

With the badness in place, I figured why not add to it by adding t.co to the Subject Alternative Name on the cert for my reverse proxy? Now the browser has no problem and doesn’t complain anymore.

This is obviously a terrible solution and not recommended, but it works, and works on all my things, including the Twitter app on my phone. It’s bad though, I’m probably looking past a lot of other security issues that I’ve just opened myself up to.

During my testing, another interesting issue quickly arose, a lot of the links cloaked under t.co tend to be links to other link shortening services like bit.ly, buff.ly, trib.al (as it turns out….the list is endless) meaning the obfuscated link issue still remains.

Going deeper

So to get around this I extended the service to work with multiple “known” URL shortening services and “follow” the trail until you reach the end. Most of them work the same way, a simple redirect and Location header, so following the trail is really just finding where the chain stops, and throw an error if a cycle is detected.

It’s actually quite funny to see how many hops some of these links can take you on, the deepest I’ve ever seen is a link from NYTimes taking you on a 9 hop voyage around various link shortening services. I’m guessing people are pasting short URLs to other short URLs into social media distribution platforms which just adds to the chain.

Bonus feature

With the “real” link URL now avaialble it’s often polluted with various query parameters used for tracking or affiliate descriptors, so the service also strips these out and presents a “cleaned” version of the link alongside the original which I can then choose to click on or copy to send to others.

Lessons learned

I’m not sure whether I’m going to run this system full time as the obvious security safeguards put in place have just been overridden with hacky self signed certificates, DNS rewriting and other bad things, but it’s been a useful exercise in exploring what’s possible, even if it is absolutely awful. Probably should just unblock t.co on my adblocker and put the cowboy hacks aside for the time being 🤠

ChatGPT corrects itself when you tell it off

Fri, 02 Dec 2022 20:00:00 +0000

The internet has been abuzz around playing with ChatGPT, a new chat AI thing. Putting biases and unfortunate responses aside, curiosity got the better of me and I started to fool around with it.

One of the surprising interactions was, I told it (assistant? eliza? alexa? mega-clippy?) off for getting something wrong, and it corrected itself.

You are correct, the matrix in my previous response was incorrect.

Like a teenager half arsing their homework, I managed to get them to actually make an effort.

The conversation

I’m doing advent of code again this year, a task I heartily start but eventually give up after day 15 or so, but the early days are always fun. Day 2 starts off talking about the game Rock, Paper, Scissors and computing some scoring logic based on some simple rules.

Being lazy and tired after work I was having trouble picturing all the outcomes of the game to represent them in my program, so I asked the tool to print a truth table. Not a very good question I’ll admit, a truth table usually refers to boolean outcomes not the win/lose/draw states, but they ran with it anyway.

At first glance the output was pretty cool, but I found it hard to read. The labels for the rows were not printed so you couldn’t determine who was player 1 and player 2.

So I asked them to print me a matrix.

Nice.

But something was still off, the text talks about the rules of the game but the matrix looks completely wrong. You can’t tie if both players present different objects.

So I gave them a sharp reply - you are wrong!

To my surprise, they corrected themselves and printed the corrected matrix. Even admitting the previous answer was wrong!

In this case I knew the rules of the game, I just wanted something quick to reference while writing code, but it makes me wonder whether accepting the first response from these AI tools might not always be the best idea.

Warn me when processes on my Mac are running under Rosetta

Sat, 03 Sep 2022 09:00:00 +0000

Rosetta is a great piece of technology, but can be a battery drain.

Most things on my Mac run natively but now and again I need to run apps under the x86 translation layer, and sometimes I forget these apps are running.

You can see what processes are running under Rosetta by looking in Activity Monitor under the “Kind” column, but this requires frequent checking.

So I wrote this little xbar plugin to tell me when there are apps running under Rosetta. If there are, a warning icon appears on the status bar (refreshed every minute) and shows me what processes are running.

This way I get a visual aid and can close down any Intel apps that I don’t need to run anymore. The script is very simple, I used this answer from the Apple stack exchange to get all the processes running under Rosetta and wrote a script around it for xbar.

Source code and installation instructions can be found here: https://github.com/djhworld/rosetta-warn

Executable PNGs

Sat, 26 Dec 2020 09:00:00 +0000

It's an image and a program

A few weeks ago I was reading about PICO-8, a fantasy games console with limited constraints. What really piqued my interest about it was the novel way games are distributed, you encode them into a PNG image. This includes the game code, assets, everything. The image can be whatever you want, screenshots from the game, cool artwork or just text. To load them you pass the image as input to the PICO-8 program and start playing.

This got me thinking, wouldn’t it be cool if you could do that for programs on Linux? No! I hear you cry, that’s a dumb idea, but whatever, herein lies an overview of possibly the dumbest things I’ve worked on this year.

Encoding

I’m not entirely sure what PICO-8 is actually doing, but at a guess it’s probably use Steganography techniques to ‘hide’ the data within the raw bytes of the image. There are a lot of resources out there that explain how Steganography works, but the crux of it is quite simple, your image your want to hide data into is made up of bytes, an image is made up of pixels. Pixels are made up of 3 Red Green and Blue (RGB) values, represented as 3 bytes. To hide your data (the “payload”) you essentially “mix” the bytes from your payload with the bytes from the image.

If you just replaced each byte in your cover image with the bytes from your payload, you would end up with sections of the image looking distorted as the colours probably wouldn’t match with what your original image was. The trick is to be as subtle as possible, or hide in plain sight. This can be achieved by spreading your payload bytes over the bytes of the cover image by using the least significant bits to hide them in. In other words, make subtle adjustments to the byte values so the colour changes are not drastic enough to be perceptive by the human eye.

For example if your payload was the letter H, represented as 01001000 in binary (72), and your image contained a series of black pixels

The bits from the input bytes are spread across 8 output bytes by hiding them in the least significant bit

The output is two-and-a-bit pixels that are slightly less black than before, but can you tell the difference?

The pixels have been adjusted in colour slightly.

Well, an exceptionally trained colour connoisseur might be able to, but in reality these subtle shifts can really only be noticed by a machine. Retrieving your super secret H is just a matter of reading 8 bytes from the resulting image and re-assembling them back into 1 byte. Obviously hiding a single letter is lame, but this can scale to anything you want, a super secret sentence, a copy of War and Peace, a link to your soundcloud, the go compiler, the only limit is the amount of bytes available in your cover image as you’ll require at least 8x whatever your input is.

Hiding programs

So, back to the whole linux-executables-in-an-image thing, that old chestnut. Well, seeing as executables are just bytes, they can be hidden in images. Just like in the PICO-8 thing.

Before I could achieve this I decided to write my own Steganography library and tool to support encoding and decoding data into PNGs. Yes, there are lots of steganography libraries and tools out there but I learn better by building.

$ stegtool encode \
--cover-image htop-logo.png \
--input-data /usr/bin/htop \
--output-image htop.png
$
$ echo "Super secret hidden message" | stegtool encode \ 
--cover-image image.png \
--output-image image-with-hidden-message.png
$ stegtool decode --image image-with-hidden-message.png
Super secret hidden message

As it’s all written in Rust it wasn’t that difficult to compile to WASM, so feel free to play with it here:

Anyway, now that can embed data, including executables into an image, how do we run them?

Get it running

The simple option would be to just run the tool above, decode the data into a new file, chmod +x it and then run it. It works but that’s not fun enough. What I wanted was something similar to the PICO-8 experience, you pass something a PNG image and it takes care of the rest.

However, as it turns out, you can’t just load some arbitrary set of bytes into memory and tell Linux to jump to it. Well, not in a direct way anyway, but you can use some cheap tricks to fudge it.

memfd_create

After reading this blogpost it became apparent to me you can create an in-memory file and mark it as executable

Wouldn’t it be cool to just grab a chunk of memory, put our binary in there, and run it without monkey-patching the kernel, rewriting execve(2) in userland, or loading a library into another process?

This method uses the syscall memfd_create(2) to create a file under the /proc/self/fd namespace of your process and load any data you want in it using write. I spent quite a while messing around with the libc bindings for Rust to get this to work, and had a lot of trouble understanding the data types you pass around, the documentation for these Rust bindings doesn’t help much.

I got something working eventually though

unsafe {
    let write_mode = 119; // w
    // create executable in-memory file
    let fd = syscall(libc::SYS_memfd_create, &write_mode, 1);
    if fd == -1 {
        return Err(String::from("memfd_create failed"));
    }

    let file = libc::fdopen(fd, &write_mode); 

    // write contents of our binary
    libc::fwrite(
        data.as_ptr() as *mut libc::c_void, 
        8 as usize,
        data.len() as usize,
        file,
    );
}

Invoking /proc/self/fd/ as a child process from the parent that created it is enough to run your binary.

let output = Command::new(format!("/proc/self/fd/{}", fd))
    .args(args)
    .stdin(std::process::Stdio::inherit())
    .stdout(std::process::Stdio::inherit())
    .stderr(std::process::Stdio::inherit())
    .spawn();

Given these building blocks, I wrote pngrun to run the images. It essentially…

Accepts an image that has had our binary embedded in it from the steganography tool, and any arguments
Decodes it (i.e. extracts and re-assembles the bytes)
Creates an in-memory file using memfd_create
Puts the bytes of the binary into the in-memory file
Invokes the file /proc/self/fd/ as a child process, passing any arguments from the parent

So you can run it like this

$ pngrun htop.png

$ pngrun go.png run main.go
Hello world!

Once pngrun exits the in-memory file is destroyed.

binfmt_misc

It’s annoying having to type pngrun every time though, so my last cheap trick to this pointless gimmick was to use binfmt_misc, a system that allows you to “execute” files based on its file types. I think it was mainly designed for interpreters/virtual machines, like Java. So instead of typing java -jar my-jar.jar you can just type ./my-jar.jar and it will invoke the java process to run your JAR. The caveat is your file my-jar.jar needs to be marked as executable first.

So adding an entry to binfmt_misc for pngrun to attempt to run any png files that have the x flag set was as simple as

$ cat /etc/binfmt.d/pngrun.conf
:ExecutablePNG:E::png::/home/me/bin/pngrun:
$ sudo systemctl restart binfmt.d
$ chmod +x htop.png
$ ./htop.png

What’s the point

Well, there isn’t one really. I was seduced by the idea of making PNG images run programs and got a bit carried away with it, but it was fun none the less. There’s something amusing to me about distributing programs as an image, remember the ridiculous cardboard boxes PC software used to come in with artwork on the front, why not bring that back! (lets not)

It’s really dumb though and comes with a lot of caveats that make it completely pointless and impractical, the main one being needing the stupid pngrun program on your machine. But I also noticed some weird stuff around programs like clang. I encoded it into this fun LLVM logo and while it runs OK, it fails when you try to compile something.

$ ./clang.png --version
clang version 11.0.0 (Fedora 11.0.0-2.fc33)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /proc/self/fd
$ ./clang.png main.c
error: unable to execute command: Executable "" doesn't exist!

This is probably a product of the anonymous file thing, which can probably be overcome if I could be bothered to investigate.

Additional reasons why this is dumb

A lot of binaries are quite large, and given the constraints of needing to fit them into an image, sometimes these need to be big, meaning you end up with comically large files.

Also most software isn’t just one executable so the dream of just distributing a PNG kinda falls flat for more complex software like games.

Conclusion

This is probably the dumbest project I’ve worked on all year but it’s been fun, I’ve learned about Steganography, memfd_create, binfmt_misc and played a little more with Rust.

Socket activated services

Sat, 14 Nov 2020 17:00:00 +0000

I’ve been running Linux on my personal computer for nearly six months now and just thought I’d write about a neat feature in systemd called systemd.socket

Personal finance on-demand

tl;dr Created a socket activated service to spin up a local webapp I use sometimes when something connects to it, and then tear it down again after 5 minutes

I’m a big personal finance nerd and have spent the last 3 years cultivating a ledger file that contains pretty much every facet of my financial life. This file is in a format understood by a suite of command line software called Beancount. It’s is made even better used in conjunction with Fava, a webapp to explore your beancount file.

Fava is a HTTP a service that reads your beancount file stored on disk and presents you with a series of reports and charts to help you make sense of it. I usually only use it when editing my file or occasionally when I need to check something.

Example screenshot taken from https://beancount.github.io/fava/

For quite some time I had this as a --user level systemd service, enabled to run on startup. This worked fine, but I noticed recently that the process seems to consume 2% CPU at all times. Not a big deal in the grand scheme of things but battery life on my laptop comes at a premium.

Fava CPU usage

This made me think, what if there is a way of starting up Fava only when I need it 🤔, sort of like the serverless compute world where resources are spun up on-demand. It was at this point I remembered reading about systemd.socket where you can activate a service when something connects on a socket. At the time I’d filed it away under the “that sounds cool but I’ll probably never use it” place in my brain, but it’s pleasing to now have an actual use case!

The setup is as follows

Create a fava.socket file under ~/.config/systemd/user - this sets up the socket.

[Unit]
Description=Fava Socket
PartOf=fava.service

[Socket]
ListenStream=127.0.0.1:5000

[Install]
WantedBy=sockets.target

Create a fava.service file under ~/.config/systemd/user - this defines the service

[Unit]
Description=Fava
After=network.target fava.socket

[Service]
Type=simple
ExecStart=fava -H 127.0.0.1 /home/djh/beancount/financials.beancount
RuntimeMaxSec=300 # kill the service after 5 minutes.

[Install]
WantedBy=default.target

Enable and start the socket + service

systemctl --user daemon-reload
systemctl --user enable fava.socket
systemctl --user start fava.socket

Go to a web browser and visit http://127.0.0.1:5000 and Fava should load.

After 5 minutes the service will be terminated, but you can start it again by refreshing the page.

Ideally I’d want the socket to terminate the service if nothing connects to it for 5 minutes but I’m not sure how to do that, hence the RuntimeMaxSec setting in the service file 😞

Better ways

You might be wondering why not just host Fava on a Raspberry Pi, it’s a web application afterall. In my case I tend to only use it when editing my beancount file which is stored locally on my Mac. Changes are committed to a git repository when I’m happy with them. The hosted version could listen to changes to this git repository and keep its local copy up to date I suppose, but I’d have to constantly create/push commits to see my changes. This approach also requires network access.

Another way would to just run fava directly on the command line when I need it, but I like the idea of requesting it on demand via a HTTP request and letting systemd handle the rest.

Anyways just thought I’d share x.

Running Linux on my Macbook

Sun, 07 Jun 2020 09:00:00 +0000

Obligatory desktop shot with nothing on it

This is another one of those posts, the one’s where Linux desktop apologists have the urge to justify to the world why they do things.

So here we go, a few weeks ago I installed Fedora 32 on my Macbook Pro (early 2015 model). In this post I hope to document the pitfalls, traps and joyous moments I found along the way, complete with the annoyances that I’ve come to tolerate.

I’ll preface this post by saying most of the issues encountered are down to the minimalist nature of the setup I’ve gone with. So don’t take this as a reflection on Fedora/Linux, I suspect the defaults with GNOME come with a lot less footguns.

Rationale

The main reason for the switch was one killer app; the i3 window manager. Keyboard shortcuts, tiled windows, lightning fast to use - it feels like a piece of software designed for people who tinker and use computers a lot.

Since installing I’ve tweaked my configuration to

Take screenshots with keyboard shortcuts similar to OSX using maim
Always open my web browser on workspace 1
Remove title bars

The scratchpad window makes note taking a joy

..and it’s generally been a joy to use. My favourite feature is the “scratchpad” where you can bring up and dismiss a window in the same workspace via a keyboard shortcut. This has been an absolute blast with some custom note taking software I wrote, which I mount via FUSE. Writing notes is a joy because the context switch is minimal.

Software

The meteoric rise of the web browser as a platform has made me realise that I don’t really use that much native GUI software anymore. All I seemed to use on OSX was a web browser, Visual Studio Code and a terminal with lots of CLI/TUI software configured via a set of dotfiles.

I’m a huge gamer at heart but I’m fortunate enough to own a beefy gaming PC and a set of consoles to meet that need, so I can’t really comment on Linux support on that side of things.

So switching OS’s isn’t really that much of a barrier for most of my use cases.

Installation

Installation was a bit of anxiety inducing to start with, for one thing I didn’t want to screw up the boot partition of my Mac just in case things went a bit wrong and it took a while to search about ways to do this cleanly.

Thankfully Alex Dzyoba wrote an excellent article on creating the appropriate partitions for dual booting.

Once that was done it mostly just seemed to work. Wi-Fi worked, sound worked, so I installed i3, applied my dotfiles and got going.

Keyboard woes

The first issue I encountered was trying to get the keyboard settings to work with the Macbook layout, especially on a GB localised keyboard and tuning it to recognise that I like Caps Lock and Ctrl to be switched.

This was solved with setxkbmap which you have to run on login:

setxkbmap -layout gb -model apple_laptop -variant mac -option "ctrl:swapcaps"

Additionally, sometimes I throw my laptop onto a desk and plug it into an external keyboard (not Apple branded) and this also needs additional tuning when I plug it in as the alt/windows keys are swapped for some reason.

setxkbmap -layout gb -model apple_laptop -variant mac \ 
-option "altwin:swap_lalt_lwin" -option "ctrl:swapcaps"

This took a while to reach those settings but I’m happy with them.

The copy and paste problem

Moving from OSX to Linux means throwing away 10 years of muscle memory on keyboard shortcuts. Cmd+C, Cmd+V for clipboard just won’t work without a lot of tinkering that just didn’t seem worth the effort to me. Pull off the band aid instead, it’s got to happen some time.

So I’ve had to train myself to go back to using Ctrl as the modifier key, which was tricky at first but it’s amazing how quickly I’ve adapted. The annoying part is having to remember to hit the Shift key when copying/pasting into terminals.

Displays

As mentioned previously I often plug in my laptop to an external monitor and like the keyboard tweaking, this took a lot of effort. For one, I position my laptop below my external monitor so like the layout to be above/below. It took me ages to try and find the decent xrandr settings to support this. Every time I tried the monitor above bled into the laptop screen below.

Eventually I found the excellent tool arandr which presents a GUI interface to generate the appropriate settings.

arandr

Unfortunately I’ve not found a way of automatically applying these settings when unplugging/plugging the monitor so I’ve had to write a script to run when switching (which also includes the setxkbmap settings described above)

xrandr --output eDP1 --primary --mode 2560x1600 --pos 640x2400 \
--rotate normal --output DP1 --scale 2x2 --mode 1920x1200 --pos 0x0 \
--rotate normal

Retina

To get everything scaling nicely on the laptop took a bit of effort but thankfully Doug Beney wrote a decent guide which was simple to implement.

Brightness tuning

It’s easy to take for granted something as simple as changing the brightness on your screen, but it took me a while to figure this out. Thankfully using the guidance in this askubuntu post and setting some i3wm config settings, I was able to make the brightness keys work roughly the way you would expect.

# Screen brightness controls
bindsym XF86MonBrightnessUp exec xbacklight -inc 5
bindsym XF86MonBrightnessDown exec xbacklight -dec 5

Please just go to sleep

Closing the lid on the laptop should suspend the OS but this just never seemed to work, I’d often find my machine in a state of hot panic the next morning with 30 minutes of battery left.

To solve this, thanks to an excellent post by Josh Sherman, you need to prevent the USB controller from waking up the system.

Unfortunately you need to apply this setting every time you boot so I wrote a systemd service and script to enable this.

Farewell Firefox

This is probably the most depressing part of switching. I’ve been a long time Firefox fan, especially with extensions like Tree Style Tabs that acted as an enabler for my tab hoarding vices.

Unfortunately on Linux it’s just been dogshit, absolute dogshit. Slow, takes ages to start, websites render in really slowly, switching between tabs feels so lethargic it’s like the fire in the fox has gone out.

I’ve tried everything to fix it, changing things in about:config, trying Firefox Nightly and enabling WebRender but nothing seemed to be working.

In contrast Google Chrome is lightning fast, it really is night and day, so for my Linux forays sadly I’ve had to go with the big G for now.

Other tools and things I’ve setup

alacritty for my terminal, very slick, fast and has great font rendering.
redshift which acts like f.lux for OSX, meaning my eyes don’t get burnt at night.
Dropbox was very easy to set up, especially when adding a systemd service (thanks to Joe Roback)
Visual Studio Code - unsurprisingly no problems there.
GIMP - works fine for screenshot editing, if a little cumbersome

Joys

After getting everything working the way I like it, it all kinda just works?

i3wm

i3wm is a blast to use, switching between workspaces, moving windows and getting used to the tiling has been a little bit of a learning curve but it’s meant I’m using my mouse much less.

The additional benefit is just how fast everything feels, it might just be a matter of perception but sometimes perceptions matter.

Me writing this blog post

Note: I’m aware of swaywm that is config compatible with i3 and runs on Wayland. I’ve tried this out and it seems neat, and it would probably solve the keyboard/monitor issues described above, but on my Retina display Chrome just looks really blurry. I’m guessing because it’s rendered via XWayland, once that’s resolved I’ll look into making the switch.

systemd

Systemd gets a lot of flak in the community but I really, really like it. I’ve already written a few services of my own that perform tasks or run software, installed under .config/systemd/user and they were trivial to write.

Getting used to the tooling has been a hill to climb but it feels so much better than the old days of init.d scripts.

DNF

I use Fedora on other machines in headless mode so I’m fairly used to the tooling, but it’s nice to have a decent package manager that generally keeps everything up to date. Homebrew on OSX is a heroic effort, but it’s just not the same.

Annoyances

WiFi sometimes drops

I’ve not been able to figure this one out, but maybe once or twice a week the WiFi driver will just stop working. To fix it I have to issue a command to reload the kernel module.

sudo modprobe -r brcmfmac && sudo modprobe brcmfmac

Webcam

I do sometimes use Skype and Zoom to communicate with family members but the webcam doesn’t work out of the box, it looks like there is a reverse engineering effort going on to remedy this, but I’ve found my iPad works as a decent video calling device, so I’ve not gotten around to fixing it.

Browser hardware video acceleration

Web browsers in Linux straight up do not support GPU video acceleration. This became apparent to me when investigating why my laptop was panting, expelling enough heat to cook an egg while watching a YouTube video about…cooking eggs.

Apparently there is a patched version of Chromium out there that supposedly supports it but for the time being it doesn’t look like the browser vendors see this a top priority. It’s a shame but oh well.

Installing VLC and the intel driver works though, so that will have to do.

OSX/Linux differences

There are a few things I immediately missed from the OSX world, but there mostly appear to be Linux equivalents or workarounds.

Screenshot editing: On OSX it’s nice to take a screenshot then immediately jump into the in-built editor to add annotations and adjustments. This can be somewhat replicated with maim+GIMP.
pbcopy/pbpaste: these are command-line tools to interact with the clipboard. The linux equivalent is xclip
Spotlight/Alfred: I only really used these as a quick calculator, and never made that much use of the file searching features. Firing up a terminal (Alt+Enter) and using bc seems to be a reasonable equivalent. I might see about binding this to a hotkey.
Notifications: Setting up dunst offers good enough desktop notification support.
Airdrop: Very infrequently I would use this to send things to my iPad - I’ve not found a suitable solution for this yet.
1Password: This really hasn’t been an issue as 1PasswordX works fine, if anything, I think it’s better!

Work

Switching to Linux in my home life suddenly presents a problem at work. The cognitive overhead of alternating between different keyboard shortcuts on different OS’s didn’t seem very appealing, along with the fact that I just knew I’d miss i3.

So to workaround this I use a virtual machine via VMware Fusion which works surprisingly well. A little too well, it feels almost native! My work machine is a MBP 2019 with 6 cores and 32GB of RAM, so it’s more than capable.

Overall

There are always compromises to be had, whether it be Firefox performance, lack of HW video decoding in the browser and having to tweak a few things, but overall it’s been a mostly positive experience to switch. I’ll admit that a lot of the issues I came across were of my own making, but it’s been worth it.

I’ve really not missed OSX that much, in fact it’s probably sealed my decision to not go with Apple next time round. The hardware is excellent, but getting Linux running on Macbook models >= 2016 sounds like an exercise in sadness so it might be the end of the road for Apple laptops in my house.

I don't know how CPUs work so I simulated one in code

Tue, 21 May 2019 09:00:00 +0000

A few months ago it dawned on me that I didn’t really understand how computers work under the hood. I still don’t understand how modern computers work.

However, after making my way through But How Do It Know? by J. Clark Scott, a book which describes the bits of a simple 8-bit computer from the NAND gates, through to the registers, RAM, bits of the CPU, ALU and I/O, I got a hankering to implement it in code.

While I’m not that interested in the physics of the circuitry, the book just about skims the surface of those waters and gives a neat overview of the wiring and how bits move around the system without the requisite electrical engineering knowledge. For me though I can’t get comfortable with book descriptions, I have to see things in action and learn from my inevitable mistakes, which led me to chart a course on the rough seas of writing a circuit in code and getting a bit weepy about it.

The fruits of my voyage can be seen in simple-computer; a simple computer that’s simple and computes things.

Example programs

It is quite a neat little thing, the CPU code is implemented as a horrific splurge of gates turning on and off but it works, I’ve unit tested it, and we all know unit tests are irrefutable proof that something works.

It handles keyboard inputs, and renders text to a display using a painstakingly crafted set of glyphs for a professional font I’ve named “Daniel Code Pro”. The only cheat bit is to get the keyboard input and display output working I had to hook up go channels to speak to the outside world via GLFW, but the rest of it is a simulated circuit.

I even wrote a crude assembler which was eye opening to say the least. It’s not perfect. Actually it’s a bit crap, but it highlighted to me the problems that other people have already solved many, many years ago and I think I’m a better person for it. Or worse, depending who you ask.

But why you do that?

“I’ve seen thirteen year old children do this in Minecraft, come back to me when you’ve built a REAL CPU out of telegraph relays”

My mental model of computing is stuck in beginner computer science textbooks, and the CPU that powers the gameboy emulator I wrote back in 2013 is really nothing like the CPUs that are running today. Even saying that, the emulator is just a state machine, it doesn’t describe the stuff at the logic gate level. You can implement most of it using just a switch statement and storing the state of the registers.

So I’m trying to get a better understanding of this stuff because I don’t know what L1/L2 caches are, I don’t know what pipelining means, I’m not entirely sure I understand the Meltdown and Spectre vulnerability papers. Someone told me they were optimising their code to make use of CPU caches, I don’t know how to verify that other than taking their word for it. I’m not really sure what all the x86 instructions mean. I don’t understand how people off-load work to a GPU or TPU. I don’t know what a TPU is. I don’t know how to make use of SIMD instructions.

But all that is built on a foundation of knowledge you need to earn your stripes for, so I ain’t gonna get there without reading the map first. Which means getting back to basics and getting my hands dirty with something simple. The “Scott Computer” described in the book is simple. That’s the reason.

Great Scott! It’s alive!

The Scott computer is an 8-bit processor attached to 256 bytes of RAM, all connected via an 8-bit system bus. It has 4 general purpose registers and can execute 17 machine instructions. Someone built a visual simulator for the web here, which is really cool, I dread to think how long it took to track all the wiring states!

A diagram outlining all the components that make up the Scott CPU
Copyright © 2009 - 2016 by Siegbert Filbinger and John Clark Scott.

The book takes you on a journey from the humble NAND gate, onto a Bit of memory, onto a register and then keeps layering on components until you end up with something resembling the above. I really recommend reading it, even if you are already familiar with the concepts because it’s quite a good overview. I don’t recommend the Kindle version though because the diagrams are sometimes hard to zoom in and decipher on a screen. A perennial problem for the Kindle in my experience.

The only thing that’s different about my computer is I upgraded it to 16-bit to have more memory to play with, as storing even just the glyphs for the ASCII table would have dwarfed most of the 8-bit machine described in the book, with not much room left for useful code.

My development journey

During development it really was just a case of reading the text, scouring the diagrams and then attempting to translate that using a general purpose programming language code and definitely not using something that’s designed for integrated circuit development. The reason why I wrote it in Go, is well, I know a bit of Go. Naysayers might chime in and say, you blithering idiot! I can’t believe you didn’t spend all your time learning VHDL or Verilog or LogSim or whatever but I’d already written my bits and bytes and NANDs by that point, I was in too deep. Maybe I’ll learn them next and weep about my time wasted, but that’s my cross to bear.

In the grand scheme of things most of the computer is just passing around a bunch of booleans, so any boolean friendly language will do the job.

Applying a schema to those booleans is what helps you (the programmer) derive its meaning, and the biggest decision anyone needs to make is decide what endianness your system is going to use and make sure all the components transfer things to and from the bus in the right order.

This was an absolute pain in the backside to implement. From the offset I opted for little endian but when testing the ALU my hair took a beating trying to work out why the numbers were coming out wrong. Many, many print statements took place on this one.

Development did take a while, maybe about a month or two during some of my free time, but once the CPU was done and successfully able to execute 2 + 2 = 5, I was happy.

Well, until the book discussed the I/O features, with designs for a simple keyboard and display interface so you can get things in and out of the machine. Well I’ve already gotten this far, no point in leaving it in a half finished state. I set myself a goal of being able to type something on a keyboard and render the letters on a display.

Peripherals

The peripherals use the adapter pattern to act as a hardware interface between the CPU and the outside world. It’s probably not a huge leap to guess this was what the software design pattern took inspiration from.

How the I/O adapters connect to a GLFW window

With this separation of concerns it was actually pretty simple to hook the other end of the keyboard and display to a window managed by GLFW. In fact I just pulled most of the code from my emulator and reshaped it a bit, using go channels to act as the signals in and out of the machine.

Bringing it to life

This was probably the most tricky part, or at least the most cumbersome. Writing assembly with such a limited instruction set sucks. Writing assembly using a crude assembler I wrote sucks even more because you can’t shake your fist at someone other than yourself.

The biggest problem was juggling the 4 registers and keeping track of them, pulling and putting stuff in memory as a temporary store. Whilst doing this I remembered the Gameboy CPU having a stack pointer register so you could push and pop state. Unfortunately this computer doesn’t have such a luxury, so I was mostly moving stuff in and out of memory on a bespoke basis.

The only pseudo instruction I took the time to implement was CALL to help calling functions, this allows you to run a function and then return to the point after the function was called. Without that stack though you can only call one level deep.

Also as the machine does not support interrupts, you have to implement awful polling code for functions like getting keyboard state. The book does discuss the steps needed to implement interrupts, but it would involve a lot more wiring.

But anyway enough of the moaning, I ended up writing four programs and most of them make use of some shared code for drawing fonts, getting keyboard input etc. Not exactly operating system material but it did make me appreciate some of the services a simple operating system might provide.

It wasn’t easy though, the trickiest part of the text-writer program was getting the maths right to work out when to go to a newline, or what happens when you hit the enter key.

main-getInput:
	CALL ROUTINE-io-pollKeyboard
	CALL ROUTINE-io-drawFontCharacter
	JMP main-getInput

The main loop for the text-writer program

I didn’t get round to implementing the backspace key either, or any of the modifier keys. Made me appreciate how much work must go in to making text editors and how tedious that probably is.

On reflection

This was a fun and very rewarding project for me. In the midst of programming in the assembly language I’d largely forgotten about the NAND, AND and OR gates firing underneath. I’d ascended into the layers of abstraction above.

While the CPU in the is very simple and a long way from what’s sitting in my laptop, I think this project has taught me a lot, namely:

How bits move around between all components using a bus
How a simple ALU works
What a simple Fetch-Decode-Execute cycle looks like
That a machine without a stack pointer register + concept of a stack sucks
That a machine without interrupts sucks
What an assembler is and does
How a peripherals communicate with a simple CPU
How simple fonts work and an approach to rendering them on a display
What a simple operating system might start to look like

So what’s next? The book said that no-one has built a computer like this since 1952, meaning I’ve got 67 years of material to brush up on, so that should keep me occupied for a while. I see the x86 manual is 4800 pages long, enough for some fun, light reading at bedtime.

Maybe I’ll have a brief dalliance with operating system stuff, a flirtation with the C language, a regrettable evening attempting to solder up a PiDP-11 kit then probably call it quits. I dunno, we’ll see.

With all seriousness though I think I’m going to start looking into RISC based stuff next, maybe RISC-V, but probably start with early RISC processors to get an understanding of the lineage. Modern CPUs have a lot more features like caches and stuff so I want to understand them as well. A lot of stuff out there to learn.

Do I need to know any of this stuff in my day job? Probably helps, but not really, but I’m enjoying it, so whatever, thanks for reading xxxx

Reed Solomon codes are cool

Sun, 24 Feb 2019 09:00:00 +0000

Imagine wending your way through a great book on your e-reader, the world melting away, and suddenly everything comes crashing back to reality with an apologetic Sorry! Chapter 20 corrupted! message.

A few tired cells of the flash storage gave up the ghost overnight and corrupted your book.

Wouldn’t it be great if your device didn’t complain about its innards, and recovered from the problem itself?

As with all blog articles with long winded opening sections, there’s a big reveal coming. Which gives me great pleasure to welcome Reed Solomon to the stage! Come on out Reed, Solomon - you too!

Recovering from corruption

sigh. I know…

The central idea of Reed Solomon codes, is you can recover data in the event of corruption or data loss up to a tolerated level of failure. This is commonly found in data storage or signal processing systems as an extra layer of protection in the event of problems arising.

It’s actually used a lot in your day to day life, according to Wikipedia:

They have many applications, the most prominent of which include consumer technologies such as CDs, DVDs, Blu-ray Discs, QR Codes, data transmission technologies such as DSL and WiMAX, broadcast systems such as satellite communications, DVB and ATSC, and storage systems such as RAID 6.

Even if a QR code gets a bit smudged in the rain, or your CD-ROM copy of Encarta ‘98 is suffering from a bout of bit rot, you can take solace in the fact that this error correction algorithm keeps the good times rolling.

Which is kind of reassuring in a way, especially with the write-once-read-many situations where getting a backup copy might involve a lot of time and money.

The Wikipedia page doesn’t mention cloud storage, I’m pretty sure Google use it as part of their file storage systems¹, so you have Reed Solomon to thank for at least one of the lines of defense that keep your cloud data durable.

It’s even being used as a safety net for storing data in DNA, which is mind blowing. If anything, it shows that if you can store something in bits, this algorithm is generic enough to provide an error correction mechanism.

What’s going on

In the storage case, you prepare your data by splitting it into fixed evenly sized data shards, then feed them to the algorithm. After some number crunching, it will return some additional parity shards. This bag of shards make up the set of data you store on disk(s).

Those parity shards, are the magic sauce that lets you recover any combination of corrupted or missing shard(s).

It just amazes me that this works! It feels like some sort of magic, but it’s actually grounded in some clever mathematics, that I won’t attempt to explain or provide a strained analogy for, other people have done it better.

It’s not limitless though, the amount of parity shards dictates the maximum number of failed shards you can recover. So you’d base this on expected level of failure.

You could say you want one parity shard for each data shard, the downside to that approach is the storage required would be double, as well as increasing the computation time needed to recover from failure. Probably overkill for not much gain.

According to this post, Backblaze uses a parity of 3 for their Vault product, which from the sounds of it is the sweet spot between balancing the risk of failure vs. processing speed.

A Vault splits data into 17 shards, and has to calculate 3 parity shards from that, so that’s the configuration we use for performance measurements. Running in a single thread on Storage Pod hardware, our library can process incoming data at 149 megabytes per second.

Yeah, seen it all before

This might be common knowledge to a lot of people but I found it an interesting detour down the Wikipedia rabbit hole. I’m not sure if I’ll ever find myself implementing something that uses this, but it’s nice to have it in the toolbox.

The Data Centre as a Computer, 1.6.2 https://www.morganclaypool.com/doi/pdf/10.2200/S00874ED3V01Y201809CAC046 ^[return]

I ported my Gameboy Color emulator to WebAssembly

Fri, 21 Sep 2018 09:00:00 +0000

Around five years ago I wrote a Gameboy Color emulator in Go. It was a very frustrating, but rewarding experience that I’ve been dining out on in job interviews ever since.

However, as the passage of time progressed, it landed on the pile of mostly-done-but-not-finished projects and left largely abandoned. One might generously say, on hiatus. Well, until very recently that is.

That 5 year gap

You see, a few weeks ago Go 1.11 came out, and with it came the promise of experimental support for compiling Go code to WebAssembly. There’s nothing one likes more than experimental APIs so this got me thinking, what could I do to test out this new WASM target?

If only I had a decently sized project written in Go that wasn’t some trivial TODO list manager 🤔

Hello, old friend

Going back to old code is like looking at old photos of yourself. So young, so naive, questionable style. Much to my surprise though, compiling the project using the new WASM target actually worked.

As in, within 5 minutes of commenting out code related to GLFW/GL calls, there was something running in the browser. Obviously, not rendering anything to the page, but there was stuff printing to the developer tools console at least to indicate the emulator was running.

This absolutely blew my mind, here was some old code, written for a non browser environment in a language not supported by browsers, running in the browser. It was exciting enough for me to blast out furious torrent of commits.

The end result being gomeboycolor WASM edition (warning: ~5mb or so download), go on, try it out, load some ROMs on it that you’ve illegally downloaded¹.

Alternatively try out the demo below, click “start” to run the emulator and “stop” to stop it. The preinstalled ROM is a test suite used to test the CPU:

Reinventing the wheel

Running emulators in the browser isn’t new and I’d imagine some people have had some fun porting other emulators using emscripten. In fact someone is doing a WASM Gameboy emulator in AssemblyScript, so I’m definitely not the first.

There are some caveats and performance issues of this WASM implementation that I will explain below if you’re still interested in my long ramblings, but it mostly works.

Except in Google Chrome, that is. Oh boy there’s trouble there, we’ll get onto that. Firefox and Safari seem to perform reasonably well though. I’ve not tried Edge.

Admit it, you're just here for the screenshots

Draw an owl ²

No one ever talks about the journey, just the destination. But this time I’ll indulge myself a little and document a few challenges I found along the way. Maybe we can all learn something. Learn something about porting Gameboy Color emulators to WASM at least.

By the way, as a side note, if you want to know what programming your own Gameboy emulator is like, @Tomek1024 paints a pretty accurate portrait, I’m impressed he managed to do it in less than two months. Embarrassingly it took me about six. Okay maybe seven. It was 2013.

Back to WebAssembly…

The first real issue was, while my previous self had endeavored to make the code very modular, with individual packages for each hardware component (e.g. CPU, GPU, memory management unit etc) there were some bits of the code base that made a lot of assumptions about its environment. These issues only presented themselves at runtime with the WASM virtual machine throwing up.

The problems were namely anything that was doing stuff around the os package, so like opening files, querying filesystems and user info. My emulator expected to read ROM files, and save battery state³ to disk. The browser WASM environment is in a sandbox and won’t let you play outside it.

So the first round of refactoring was to move the ROM-loading and save-saving layers to the outer edges and use interfaces like io.Reader further in. As a professional Java developer I should have known better, even 5 years ago, but still.

So that’s lesson 1. Limit the scope of your environment to the bits of code that need to know. Abstract elsewhere

Graphics

That was the easy bit. But no one likes playing a Gameboy with no screen, so this is where the hacks started to creep in.

Every so often the emulator emits a 2D array (frame) of pixels. The real hardware does this at a rate of 59.7 frames per second.⁴

Drawing an array of pixels seemed like a prime candidate for the HTML5 canvas API, constructing an ImageData object and repainting on every frame update. When trying this though, using the go syscall/js package, the performance was abysmal, eventually causing the browser tab to freeze. It appeared as though the having the emulator and UI on the same thread was causing a lot of contention.

Sounded like a job for threading to me, but WebAssembly doesn’t support threads (yet) so I needed to figure out something else.

Have faith in the workers

Which led me towards Web Workers.

Web Workers is a simple means for web content to run scripts in background threads. The worker thread can perform tasks without interfering with the user interface.

Perfect!

Initialising the emulator inside a worker and then using the postMessage API to post the frames to the user interface on every tick seemed a more promising approach.

There were still performance issues though, as I was sending an array of bytes out of WASM land into JS land on every frame call. While the postMessage API supports sending an extra parameter to ‘transfer’ ownership of large data sets…

Transferable objects are transferred from one context to another with a zero-copy operation, which results in a vast performance improvement when sending large data sets.

…it seemed tricky to do in Go code because the transferable parameter needs to be an ArrayBuffer ~~which doesn’t seem possible~~^{*see edit} with the API that syscall/js provides.

Instead, to workaround this, the first hack was born. Within my Go code, the array of pixels gets converted to a base64 string, which is sent to a global javascript function that issues the postMessage call. This yielded a much smoother experience in the user interface!

The hack in diagram form.

Go code:

func (s *html5CanvasDisplay) DrawFrame(screenData *types.Screen) {
    for y := 0; y < 144; y++ {
        for x := 0; x < 160; x++ {
            var pixel types.RGB = screenData[y][x]
            s.imageData[i] = pixel.Red
            s.imageData[i+1] = pixel.Green
            s.imageData[i+2] = pixel.Blue
            s.imageData[i+3] = 255
            i += 4
        }
   }

   // hack
   screenData := base64.StdEncoding.EncodeToString(s.imageData)

   // call global function
   js.Global().Call("sendScreenUpdate", screenData)
}

Note the image data is a flattened array that represents the pixel grid

Javascript in the worker:

function sendScreenUpdate(bs64) {
    // decode base 64 back to byte array
    var bytes = base64js.toByteArray(bs64)
    var buf =  new Uint8ClampedArray(bytes).buffer;

    // uses transferable on post message
    postMessage(["screen-update", buf], [buf]);
}

Decode the base64 and then send the data to the user interface

The update canvas function in the user interface:

worker.onmessage = function(e) {
    if(e.data[0] == "screen-update") {
       updateCanvas(e.data[1]);
    }
}
function updateCanvas(screenData) {
    var decodedData = new Uint8ClampedArray(screenData);
    var imageData = new ImageData(decodedData, 160, 144);
    canvasContext.putImageData(imageData, canvas.width / 4, 4);
}

Repaint the canvas context with the new frame

The curse of the workers

Moody

Unfortunately the web worker approach presented a whole new set of challenges. It was beginning to feel I was making a rod for my own back.

Gamers usually like to play games by interacting with them via key or button presses. As Web Workers are isolated, they don’t have access to all the good stuff you get in the browser like setting up handlers for keyboard events and so on. The only way you can communicate with them is by sending them letters via the postMessage call and hoping they read them.

So the next challenge was to start posting the keyboard updates back to the emulator.

The Circle

Go code:

var messageCB js.Callback
messageCB = js.NewCallback(func(args []js.Value) {
    input := args[0].Get("data")
    switch input.Index(0).String() {
        case "keyup":
            // tell emulator what key has been released 
            i.KeyHandler.KeyUp(input.Index(1).Int())
        case "keydown":
            // tell emulator what key has been pressed
            i.KeyHandler.KeyDown(input.Index(1).Int())
    }
})

// receive messages from outside
self := js.Global().Get("self")
self.Call("addEventListener", "message", messageCB, false)

Handling the messages received in the worker

Javascript code in the user interface:

let keydownhandler = function (e) {
    worker.postMessage(["keydown", e.which])
}
let keyuphandler = function (e) {
    worker.postMessage(["keyup", e.which])
}
document.addEventListener("keydown", keydownhandler);
document.addEventListener("keyup", keyuphandler);

Sending keyboard change events to the worker

This back and forth postMessage approach was working and soon ballooned into a some sort of faux protocol that

Allows users to load ROMs into the emulator by passing the byte array to the worker
Allows users to configure the emulator settings by passing configuration data to the worker
Allows the emulator to send battery save data to the UI thread, which in turn puts it in LocalStorage
- On emulator start, the save data is loaded from LocalStorage and sent to the emulator
Allows the emulator to send a diagnostic frame rate counter back to the user interface

I’ll be honest, it was a royal pain in the backside to write and is probably very fragile and prone to breakages. There would be more arrows on this diagram to express the to-me, to-you handshaking but frankly I’ve had enough of diagrams.

Imagine more arrows

Performance

That was the implementation side of things, there was a lot of deck chair rearranging elsewhere, but most of the effort was in getting stuff in and out of the emulator.

Performance was still a problem though, the framerate was choppy and the keyboard input mechanism was unreliable, with some presses being skipped, making the games feel spongy and not fun to play.

To try and isolate whether the problem was due to WebAssembly performance, I introduced a “headless” mode that stops sending the screen data on every draw frame call. This was to try and remove the whole web worker → UI dance from the equation and just see how well the emulator can run.

The following tests were performed using these browsers on OSX 10.13.6:

Chrome 69.0.3497.100
Firefox 63.0b4
Safari 12.0 (13606.2.11)

Using the Game Tetris DX, running for 60 seconds, these were the results:

As you can see, Safari runs a pretty smooth shop. Firefox was more erratic but the emulator kept on pausing every so often so that would account for the drops. Chrome, while a fairly straight line, didn’t even make it past 20fps.

So WASM performance, at least on Firefox and Safari seemed pretty reasonable.

Repeating the same test with headless mode turned off, there was a definite 10 frame or so performance penalty, probably owing to the base64 conversions and passing messages bit. Oddly Firefox didn’t seem to pause at all during this test.

The problem with the unreliable keypresses was still there though, sometimes the emulator just wasn’t responding at all to anything. My hunch at the time was, as there is no threading in WASM yet, there’s probably a lot of spinning plates going on around handling postMessage callbacks. I don’t know how the browser internals work on this one, but I’d imagine they have timers and stuff going on that poll for updates.

So the next logical step was to see about slowing the emulator down by locking the frame rate to a maximum fixed value. The reasoning being, a slower emulator might give the browser more headroom to do its thing. I chose a 30fps lock for this test

A marked improvement in stability! Plus, my hunch was right, the keyboard was much much more reliable. Chrome continued to be in the doldrums though and still wasn’t acknowledging input. Redoing the test with a frame rate lock of 25fps just about made the keyboard work for Chrome, but it made for pretty choppy visuals.

Conclusion

Good god, that was a long post. Sorry.

This little experiment made for a fun ride. I somewhat suspect the use of web workers in this manner is definitely not what that feature was designed for. It’s doubtful that many people would build video games in the browser in this way. A better approach might be to use WebGL, but my mental, physical and emotional strength is not there right now to open that can of worms.

The coolest thing about this, and maybe the promise of WASM in general, is I can send my emulator to my friends without having to worry about whether they have shared libraries on their system. I don’t have to spin up a bunch of infrastructure to build versions for different operating systems⁵. It just works, and I’m sure it’s going to get better and better as WASM develops.

On the Chrome front? I don’t know why the performance just isn’t there for this use case. The biggest surprise to me was actually Safari, I’m a born and bred Firefox boy and Safari never really made it into my esteemed browser list. Good job Apple 👍

If I were to do this again I’d probably take a second look at wasmBoy to see how they’re doing it, it doesn’t look like they are using web workers, and are using HTML5 canvas to render the output so I’ve probably made a few missteps somewhere.

If you want to see the code, I have a few repos. A benefit of doing refactoring work was it allowed me to decouple the ‘frontend’ and ‘backend’ bits of the codebase. So anyone can write a frontend that handles the screen display and keyboard controls which hooks into the backend where the emulator logic is. The frontends I’ve written so far can be found here:

gomeboycolor-wasm - The WASM version described in this blog post
gomeboycolor-glfw - This is what my emulator was originally written for, and uses GLFW to render the screen to a window
gomeboycolor/_examples - For fun I wrote an example frontend that renders in the terminal. It’s playable to a degree!

Thank you for reading, happy WASM’ing.

UPDATE - 2018-09-22: Since this post was written, Johan Brandhorst on the gophers slack #webassembly channel gave me some tips on how to avoid the base64 hack with some tweaks to the Go code, turns out it is possible to use transferable on the postMessage call after all - see this commit for the changes!

Nintendo have been very active recently shutting down ROM sites, and threatening legal action. So I’m not going to risk hosting the ROMs myself. ^[return]
http://sethgodin.typepad.com/.a/6a00d83451b31569e2019aff29b7cd970c-800wi ^[return]
Save data was written to memory and kept active via a battery. Which was cool until the battery ran out of power… ^[return]
https://en.wikipedia.org/wiki/Game_Boy#Technical_specifications ^[return]
While Go supports cross compiling, it gets tricky if you have CGO bindings to shared libraries like libSDL or libGLFW ^[return]

I thought I found a browser security bug

Sun, 12 Aug 2018 09:00:00 +0000

A few weeks ago I thought I’d stumbled across something really bad when just casually browsing the web. It all started on a financial information website, upon clicking a link, the page partially loaded some of its content, then, without warning, redirected the browser to a completely different domain with some weird spam/search engine content on it, from a known domain squatter.

Strange…

After refreshing a few times, it was still doing it. Oddly, this behaviour seemed to only appear in Firefox; Chrome and Safari did not exhibit the same. This was a Firefox thing, I was sure of it!

After digging into the source of the affected website, it became apparent that something seemed off with this tag for flash content:

<embed 
  id="button" 
  src="http:///zclip/js/ZeroClipboard.swf" 
  type="application/x-shockwave-flash">
embed>

Always host your own

They had hardcoded a URL to a SWF file on a domain since taken over by a squatter. A quick look on waybackmachine suggested the previous owner was a developer hosting some code they had written, but had since let the domain expire.

So instead, the browser gets whatever is now being served. This is bad in itself because they could craft a malicious SWF file to do nefarious things if they were really inclined.

However, even if they did return a SWF file, it didn’t explain why the redirect was occurring, because I don’t have Flash installed on my computer.

Flash or no flash

This is where I begun to think this was a security issue, because the response for the ‘swf’ request was HTML with some javascript in it, with the smoking gun being right here:

if (top.location != location){
  top.location.href = location.protocol + "//" 
                        + location.host 
                        + location.pathname;
}

Get the browser to it!

Setting top.location.href was causing the redirect to happen.

So wait a minute, the affected website had an tag in it, expecting some flash content, but got served some HTML+Javascript instead, which it embedded and executed? That seemed weird fallback behaviour to me.

I raised this on bugzilla because at the time my initial thought was this was only affecting Firefox.

After writing a small application to mimic the issue, the browsers behaved differently when it came to setting the type to application/x-shockwave-flash¹

Chrome displayed a “click here to run flash” placeholder
- After enabling flash, it did not embed the response (i.e. it did nothing)
Firefox embedded the response so executed the javascript
Safari displayed a “Missing Plug-in” placeholder

Examples of the placeholders in Safari/Chrome

After further investigation it turns out the “problem” can be replicated in Chrome too, using other MIME types like type="video/webm". In this particular case the behaviour changed slightly

Chrome embedded the response so executed the javascript
Firefox embedded the response so executed the javascript
Safari displayed a “Missing Plug-in” placeholder

I tried this with a few different MIME types and got differing results. You can see this in action on this demo page I built. Try it out on different browsers to see how they behave. Note if you have Flash or Quicktime installed, then you will see different results.

Embed ain’t so simple

One of the purposes of the tag is to allow third party plug-ins such as Flash or Quicktime to be embedded into a page, but as these have fallen out of favour over the years it appears the general advice is “don’t use it”

Keep in mind that most modern browsers have deprecated and removed support for browser plug-ins, so relying upon is generally not wise if you want your site to be operable on the average user’s browser²

However, clearly the browsers seem to behave differently with the type attribute, and how they handle the response. While Safari seems to be staunchly against doing anything without a plugin installed for the type, Firefox and Chrome are a bit more of a mixed bag.

Results

The firefox bug engendered some really interesting discussion, and their engineers implemented a patch to change this behaviour in Firefox, possibly scheduled for version 63! They also raised a spec bug on the HTML standards github around this to get some clarity on this issue. I thought that was really cool!

On the Chrome side of things, I filed a similar report, but they closed it, saying it was a larger problem with the web in general. Very polite and curt response, but fair enough.

You are correct that there can be a vector for crypto-miners, or possibly deceptive messaging to a user, but that is a larger problem on the web. When content is loaded there are signals as to what type of content it is, but we don’t constrain it based the the extension of the resource file (i.e. ‘.webm’).

As for the financial website? I eventually got a contact in their information security department and filed the issue with them. They’ve since fixed it by removing the hardcoded dependency on that domain and are hosting the SWF file themselves.

Is it a security bug?

I started out this journey just bumbling around on the web, and ended up encountering some behaviour that didn’t seem right.

But after thinking about it, I’m in two minds.

On the one hand, the tag is doing its job, embedding stuff. The question is, if you specify the type attribute with a valid MIME type, what should the browser do, and what should the fallback case be if the requested plugin cannot be handled or is missing?

You could argue that this is a non-issue that just stems from questionable coding practises, especially around hardcoding dependencies to third party sites that you cannot trust.³ However there is a bit of me that wonders if there are dark corners of websites out there that are embedding video⁴ or flash content from domains that have since changed owners, or started returning dodgy responses.

Clearly the domain squatter is setting top.location.href for a reason, which makes me think they already know about this behaviour and are using it as a way to drive the browser into redirecting to their content. Other nefarious actors could use this ‘feature’ to silently embed cryptominers, or craft redirects to phishing websites, but it’s quite an involved process that relies on the ‘victims’ website embedding content from elsewhere, so probably not a high risk.

It was an interesting ride anyway.

UPDATE - 2018-08-13: The Chromium team kindly relaxed the viewing permissions on the bug I had filed (it was in their security space which is understandably restricted), so I’ve updated the post to include the link

I didn’t have time to test Microsoft Edge, or Opera. It’s safe to assume Internet Explorer has weird behaviour though. ^[return]
Quoted from MDN web docs - : The Embed External Content element ^[return]
The JavaScript Supply Chain Paradox: SRI, CSP and Trust in Third Party Libraries - troyhunt.com ^[return]
Given the general advice for audio/video is to use the HTML5 media elements, this is probably a legacy issue. ^[return]

Using HyperLogLog in production, a retrospective

Thu, 29 Mar 2018 12:00:00 +0000

A few years ago I was involved in a project that required us to provide a time series metric on how many concurrent users were using our products, and what quality of service they were receiving.

On embarking on this journey, it quickly became apparent that the tricky part would be doing the count of unique concurrent users, over a set of dimensions in one minute windows. We’d run into the classic count-distinct problem.

Our dataset was stored in Amazon S3, with PrestoDB as our query backend, and the query we were running was slow and difficult to scale. As you can imagine, running count(distinct) on thousands of anonymous session IDs across millions of records isn’t an easy problem to solve. Well, until someone pointed us in the direction of the approx_distinct function.

tl;dr: We tried using HyperLogLog over fine grained time series data in production, but at the time it was not working as well as we’d hoped. I wanted to explore HyperLogLog further though as I thought it was cool, so spent some time building HyperLogLog Playground, a website to help visualise what this cool algorithm is doing under the hood.

By trading away some degree of accuracy, approx_distinct gave us a result in a much quicker time, with what looked like a reasonable trade off for production use. However, after a few weeks, it was becoming obvious that, while the speed of the query was satisfactory, sometimes the result added noise. You would see that over one minute windows, the line on our graphs would deviate in a sporadic fashion.

This meant users were questioning the validity of the data, was it the error in the approximation, or were there user impacting problems? In this case, at the time it seemed the approximate count was the wrong choice for fine grained time series analysis, and meant we had to pivot towards a streaming data pipeline that would give a fully accurate picture, at the cost of latency, added infrastructure and complexity.

Since then, it looks like the Presto contributors implemented a change to increase the precision options of approx_distinct, so this might have alleviated our noise problem, but we had moved on by that point. Our users liked having a near real time, accurate figure from the new pipeline.

So while, at the time, approx_distinct did not work for us, I was still intrigued by what approx_distinct was actually doing. What made it so quick? How could it get so close to the actual figure? This line of questioning inspired me to create a small website called HyperLogLog Playground where I could somewhat consolidate my understanding, while learning a bit of Javascript on the way. It was a fun exercise and I hope people like it.

Whenever you are presented with a count distinct problem, it’s always worth keeping HyperLogLog in mind as a tool to use, but just be 100% sure that the small loss of accuracy is tolerable, and for time series data make sure you test it against a variety of scenarios.

Running Go AWS Lambda functions locally

Sat, 27 Jan 2018 12:00:00 +0000

AWS recently announced Go support for Lambda, giving developers more choice over how their functions are written.

In an attempt to kick the tires of the new runtime, I found myself rummaging around the open source library required when writing Lambda functions in Go, and was delighted to find a glimpse into what happens when your function is invoked.

This post is a brief tour of what I’ve gathered, and describes a simple way of invoking your function in a local environment.

tl;dr: AWS Go Lambdas are invoked using net/rpc over TCP and make use of the Go standard library. You can “simulate” a lambda being invoked, which could be useful for integration tests or sanity checking, see below for an example

Background

A Go lambda needs two things to run

A handler to handle requests
A main function that calls lambda.Start(...) with your handler as an argument

package main

import (
    "strings"
    "github.com/aws/aws-lambda-go/lambda"
)
 
func ToUpperHandler(input string) (string, error) {
    return strings.ToUpper(input), nil
}
 
func main() {
    lambda.Start(ToUpperHandler)
}

The simplest of functions that converts a string to uppercase

lambda.Start(ToUpperHandler) is where the magic happens, it is a blocking call that will block until the process is killed or an exception is propagated that cannot be handled.

This gives us our first clue that AWS isn’t simply just running your go binary every time a function is invoked, it’s actively listening for requests and passing them to your handler.

Doing it this way has a few performance benefits, it allows you to set up expensive, thread safe dependencies up front so they are warm if your function is called more than once¹.

Looking underneath

Taking a look at the code under the hood, we can see that it:

Creates a TCP server listening on a port defined by the environment variable _LAMBDA_SERVER_PORT
Uses the net/rpc package to handle Ping and Invoke requests from remote clients
Uses the context package to store and manage state
Uses the encoding/json package to perform SerDe for objects you pass to and from your function

The use of net/rpc is really neat in a way, just plain old Go standard library code.

Remote calls

The two methods you can call via RPC are in function.go and they allow remote clients to:

Perform Ping requests by sending a *messages.PingRequest object, and unsurprisingly this does exactly what it says on the tin. I’m assuming AWS use this to check liveliness of your function and whether it is still reachable wherever they are hosting it.
Perform Invoke requests by sending a *messages.InvokeRequest object,
- I’m wondering what the Deadline attribute is for. At one point in the execution path, they use whatever this is set to with the context.WithDeadline function, which leads me to believe this might be the timeout you have configured against your lambda.

These functions will be called by something in AWS that is managing the lifetime of an invocation.

Diagram

To help visualise this, I drew this crude, overly simplified diagram that describes the above interactions. Obviously the AWS Lambda service has a lot of components and infrastructure that we are not privy to, but I think conceptually it’s mostly right.

Testing locally

As the lambda is just listening on a port over TCP, it’s pretty simple to test the above behaviour locally.

By forcing the lambda to run on a known port²

_LAMBDA_SERVER_PORT=8001 go run lambda.go

And then writing a client to submit a InvokeRequest to it, you can successfully execute the function end-to-end, which might be useful for integration testing or whatever.

$ go run client.go 8001 "\"daniel\""
"DANIEL"

I’ve created a small library imaginatively called go-lambda-invoke that wraps up this logic, meaning you can just make the following call in your code

response, err := golambdainvoke.Run(8001, "daniel")

It probably has limited uses, in most cases just writing plain old unit tests for your logic should be sufficient, rather than testing the scaffolding AWS erects around it. However I could see it being useful if you want to test that you’ve built a valid linux binary and perform some pre-deploy sanity tests or something.

Conclusion

Speculating over what AWS are doing under the hood with Lambda is a pastime that has circled the internet ever since it launched, and this post doesn’t really reveal much to answer that question. Is it containers? Who knows. Probably.

But I think looking at how the Go programming model works, a plain old TCP server that allows RPC clients to connect to it, gives us at least a small glimpse into the interactions AWS are performing with your application.

See Using Global State to Maximize Performance ^[return]
Note the environment variable _LAMBDA_SERVER_PORT might change between library versions ^[return]