quanttype

Software for myself

Sun, 15 Mar 2026 00:00:00 +0000

Thanks to the coding agents, it’s easier than ever to create small pieces of software.

I’ve been creating small apps, tools, and toys for myself. Last week I posted about Goblin Mode, my tool for spinning up development VMs. Let’s take a look at what else I’ve made!

Small web apps

I made a photo gallery for sharing my photos online. The app is called kuvasivu (Finnish for “picture page”) and it’s implemented in Rust. I’m self-hosting it on a Hetzner virtual server.

I’ve been meaning to put more of photos online, but there hasn’t been an obvious place for them. There are dozens of decent photo gallery projects and services but it felt easier to just prompt one into existence and self-host it.¹

I also made a web app for scheduling meetings with my friends. You create a poll with a few dates as options and everyone fills in their availability. If anyone still remembers Doodle, it’s like that but it does not have ads. It’s called beet-scheduler and it, too, is implemented in Rust.

Again, there are plenty of alternatives out there. You could use Tapaaminen.net (a service) for example. It’s probably AI-free.

Also, I made a dashboard that shows how often I climb and run. I record my runs on SmashRun and my climbs in a spreadsheet. The dashboard downloads the data directly from them. I like to use it to check that I’m exercising the Goldilocks amount.

Small games

The coding agents are pretty good at one-shotting small games. I like to try out new models by asking them to create a snake game.

I tried to create a QWOP-like game for paddling a kayak. Back in 2023, I did a nine-day kayaking trip on Lake Saimaa. It was an emotional rollercoaster, type 2 fun, and I wanted to make a game to commemorate it.

Unfortunately I didn’t get the LLM to understand how kayaks work quickly enough, and I couldn’t get the controls to click. The game was certainly frustrating but not especially fun. Type 3 fun maybe?

Implementing research

A while ago I heard about a data structure called Bw^e tree. It’s an evolution of B+ tree that is optimized for the current storage solutions. You could use it to implement a database index, for example.

The paper was published by Alibaba and they’re using Bw^e trees in production in some of their database services, but they haven’t open-sourced their implementation.

I thought that why don’t I simply prompt an implementation into existence. A couple of days with Claude Code and Opus 4.5 (this was in January) and I got some 10k lines of Rust. It even has YCSB-based benchmarks where it beats RocksDB, just like the paper said it would!

However… I don’t know how to verify the implementation. I have skimmed the paper, but I don’t have in depth understanding of it and 10k lines is a lot of code. For example, the structural modification operations like page split look subtle and they’re crucial for correct concurrent operation.

According to the paper, Alibaba’s C++ implementation is 33k lines which makes the 10k number suspiciously low.

I’m unlikely to publish the code as I don’t know what’s in there. If you need it, spend two days with Claude, or implement it by hand. You’ll do better job.

Speeding up compression algorithms

pi-autoresearch is an autoresearch plugin for the Pi agent. It prompts the agent to autonomously experiment to improve some numeric metric about the codebase.

That seems cool, so I tried it on floatbungler. Last year, I was studying lightweight float compression algorithms like Gorilla. I implemented a few of them (by hand! I wanted to understand them!) and put them into a Python library.

I didn’t consider the performance at all when I wrote them, so I figured out I could unleash pi-autoresearch on the codebase and it would find ways to speed them up. It sort of worked. You can see an example of what it did for Chimp128. It turned the code into a mess but the runtime benchmark did improve almost 3x.

There’s no need to make floatbungler faster, so I’m not going to merge the changes. Better keep it simple and understandable in case I want to look at it again.

Playing around with pi-autoresearch revealed some flaws in the implementations and the test suite, so I did fix those. The coding agents seem to be terrible at debugging the library, possibly because I haven’t built any debugging tools.

I did not notice attempts to cheat. The agent did try some changes that broke the algorithm, but after the tests failed, it rolled them back.

In conclusion

Crafting great products remains hard, but there’s a lot of fun in making software that’s good enough for exactly one person: yourself.

I’ve worked with software engineers who seem to have an urge to rewrite all the code they work on into their own style. Maybe I’m becoming one of them. ↩︎

Goblin Mode, or spinning up VMs for feral agents

Thu, 05 Mar 2026 00:00:00 +0000

I’ve been working on a new tool for spinning up development environments. It’s called Goblin Mode. It spins up a new virtual machine (VM) on Hetzner and configures everything to be ready so that you can just SSH in and that start developing.

It is something that I’ve wanted to have for a while now, but the arrival of the coding agents got me to implement it.

The agents like Claude Code and Codex are excellent. I’d like to run them in the YOLO mode / --dangerously-skip-permissions where they do not ask you for a permission to run commands, write files, search web, and so on.

However, my laptop has too much ambient authority for that. I’m logged in to all kinds of services and I’ve got a lot of important data there. I don’t want an agent to mess with it. A potential – but by no means the only – solution is to use a separate VM with just enough authority to get the job done. The agents can then go goblin mode there.

I created an alpha version of the tool and I’ve been using for a while. It’s too specific to me and my workflows to be published right now, but it’s still interesting enough to talk about.

How does it work?

I had two goals:

The environment should not need any manual setup. Running gob up should be all you need to receive an environment that works.
It should be cheap to run. By “cheap” I mean something like a few euros per month for my typical use. For commercial development, much higher price would be acceptable, but I want to use this for my hobby projects, too.

I ended up with a command-line tool implement in Rust. When you run gob up, it does a few things:

It spins up a project-specific VM on Hetzner. Hetzner’s CX43 instances are available for less than two cents per hour and they’re beefy enough for compiling Rust projects. This is cheap enough for me to not to have to think about it unless I leave the box running all the time.

It connects the box to Tailscale. Tailscale is a VPN solution that is really nice to use. Thanks to its MagicDNS, the box gets an easy-to-remember hostname that I can SSH to. For the projects that launch network servers, Goblin Mode uses Tailscale Serve to expose the service over HTTPS. This great for developing web applications.

It install the toolchain. Goblin Mode detects whether the project is a Python project or Rust project (my two main programming languages) and installs the appropriate tools. Some tools like git get always installed, and you can add extra packages via a config file.

This is implemented with cloud-init. It’s a standard YAML config file for newly provisioned servers. Many cloud providers, Hetzner included, support it. Goblin Mode generates a list of packages to install and commands to run and adds them to cloud-init when launching the VM.

What is a bit annoying about this is that there are so many ways to install packages. I run Debian and you can get many but not all packages via apt. Some packages, like Claude Code, are best installed by curl | bash and for others I resort to cargo-binstall. It would be nice to have a single solution - is Nix the answer?

It sets up dotfiles. Can you imagine doing anything without your custom zsh config? Me neither.

In practice this is implemented by cloning your dotfiles git repository on the VM and running the included installation script in it.

It sets up the git repo. You’re expected to run gob up in the git repository for the project you’re developing. Goblin Mode sets up a new repo on the VM, pushes the contents of the local repo there and copies the configuration for remotes over.

This has been enough for my projects right now. I’ve added a few convenience commands, too:

gob mosh to mosh to the VM - in practice I always use mosh instead of SSH
gob zed to open the project on the VM in Zed via its remote development support.

Problems

Spinning up VMs is slow. gob up takes something like two minutes. It’s a bit annoying - you think “alright, I’m going to work a bit on project X” and then you have to wait for a few minutes while the server boots up.

I’m not sure if there’s any great solution to this that does not involve running more capacity than you need.

Also, while Hetzner is great, you aren’t guaranteed to get a cheap cloud VM when you need it. I’ve seen a bunch of these:

Error: Failed to create server (412): {
 "error": {
  "code": "resource_unavailable",
  "message": "error during placement",
  "details": {}
 }
}

Logging in to Claude Code is annoying. If you want to use Claude Code, you’ll have to go through the browser-based flow to log in. I’m not sure if this could be automated. Likely yes if you use the API; not sure if you use a Claude subscription.

Restricting GitHub permissions is annoying. GitHub does not make it too easy to grant just the right amount of permissions to the development VM. Ideally the VM would be able to check out the project, work with the issues and create PRs, but not merge to main. With fine-grained personal account tokens (PATs) and bot accounts this should be possible, assuming you have a paid-for GitHub plan, but you can’t create PATs via GitHub’s API.

For now my solution has been to use a self-hosted Forgejo instead of GitHub. It’s not ideal, but it has enabled me to experiment without fear.

Product management is still needed. I had a bright idea that I’ll just dump my improvement ideas into Forgejo issues and let the agents work on them. I created a script that loops through the issues and prompts an agent to solve each one and to create a PR.

A few hours of later I had a dozen of PRs and 5k lines of Rust to review. The PRs were good but I quickly realized that my ideas were half-assed. They were for nice to haves that could be great one day but that I did not yet need. I didn’t want to maintain the code, so I left most of the PRs unmerged.

So, yeah, you still need to have a proper vision.

Successes

I’ve enjoyed the tool more than I expected. It’s nice when you can spin up a new environment with one command and everything actually works (modulo Claude Code auth). It has also had a couple of benefits that I didn’t expect:

If you want to demo a web service to your friends, you can expose it to Internet with Tailscale Funnel. Start the service in tmux, leave the development VM running, close your laptop, and your friends can still access it. It’s not a great production setup, but it’s excellent for trying things out.
If you need more capacity, provision a bigger VM. My laptop has paltry 16 GB of RAM and I needed more for some data crunching, so I launched a big VM and got the job done.

I know of at least four projects with similar goals:

devcontainers use Docker containers to create an ephemeral dev environment locally.
gmab (Give Me A Box) is a CLI tool that spins us ephemeral cloud boxes
exe.dev is a neat commercial service that gives you virtual machines for development. My project is sort of “we have exe.dev at home”.
Sprites by Fly.io is another commercial take on the same idea.

Let's automate our jobs

Wed, 25 Feb 2026 00:00:00 +0000

Thanks to the coding agents like Claude Code, programming is now over. It’s more efficient to prompt an AI model to write code than it is to write it by hand. However, programming is just one of the many tasks that a software engineer has to take care of, albeit a central one. What about the other tasks? Can they be handled by an AI agent?

The software delivery loop

When you work in a company that produces software, the software delivery works something like this:

You start with some business problem to solve. You refine that into technical requirements for building the software that can provide the solution.
You build some code that matches the requirements and you test it.
You deliver the software to your users, often by deploying to production or by pushing an update to an app store.
You gather data on user behavior, customer feedback, business results and technical issues like bugs, production incidents, and performance problems. This gives you some new business requirements. You go back to step 1.

Claude Code can do a lot for step 2. It cannot yet take care of it completely – right now, there’s a lot of writing prompts and reviewing results involved. As the models improve, I expect them to be able handle more and more complex coding tasks. Let’s look at what else can be automated.

Getting the agents to complete the tasks autonomously. You shouldn’t be prompting claude manually. It should look at your GitHub issues or Jira backlog and start the work independently. Maybe you will review the PR it create, once you’ve had your agents review it first of course.

You already can do something like this by assigning tasks to GitHub Copilot or by mentioning @claude on GitHub. But why isn’t Copilot or Claude deciding by itself what to work on?

OpenClaw is an attempt to make the agents decide by themselves. It has already led to interesting results– like an agent publishing a hit piece on a human open-source author.

Structuring big programs, architecture, and projects. The agents are not that great yet at designing big programs. Right now, fairly fine-grained tasks work the best. Maybe better models can solve this – or maybe having an hierarchy of models where your Software Architect model does the high-level design and creates tasks for the Coder models?

Steve Yegge’s Gas Town was a briefly-infamous attempt at building a hierarchy of agents that can complete tasks at autonomously. It was silly, but I think Yegge was on the right track and we’re going to see other takes on the same ideas. Claude Code already comes with subagents.

Running agents safely. If you’re running claude on your laptop, you’re giving it a lot of ambient authority. Are you logged into your company’s AWS production account? Great, so is Claude.

On one hand, this enables the agents do things and get the job done. On the other hand, the agents make mistakes all the time - just yesterday someone used OpenClaw to accidentally delete their inbox.

There are all kinds of attempts to build a sandbox (including the one in Claude Code itself), but the best practices have yet to emerge.

Setting up verification. The coding agents work the best when they can programmatically verify that what they produced is what you want. They’re good at fixing compiler errors and test failures. A big question for software engineers in near future is how to best provide the agents with these guardrails. How to go from the business requirements to verifiable technical requirements?

And, given how the agents are lazy in interesting ways, how to check that they’re not cheating with the verification?

Operations and monitoring. There’s a whole bunch of work in keeping the lights on for software that runs in production. There are alerts, performance degradations, and runtime exceptions. Traditionally a software engineer or a site reliability engineer triages the issues and devises solutions. Could an agent do the triage instead? Can they debug the issues or connect them back in to the software delivery loop?

For example, if you have a data pipeline and something changes in the input data and now your ETL Python script complains about NoneTypes, can you make an agent fix it without being involved yourself?

Refining the business requirements into technical requirements. If you start from scratch, the coding agents are actually not too bad at this. They can do a lot with a simple prompt.

However, if you work in a big organization, there’s a lot of context that the agents are missing. They don’t know how to fit the new software into your architecture and tech roadmap. How are you going to give this information?

Closing the loop for user and customer feedback. Your users and your customers have things to say about your software. You probably also have metrics that show what they’re doing. You may want to drive those metrics. How could agents do it?

I’ve focused on things that could be attempted with the current AI models. Even if the models were frozen today, they’re so good that we could probably automate a lot more of software engineering work than we have already done.

The models are getting better all the time, though. For now, it’s an interesting opportunity to invent new ways of working. What it means for software engineering in the long run, that I don’t know.

Programming is over

Thu, 12 Feb 2026 00:00:00 +0000

Coding agents are so good now that they’re going to change the work of software engineers permanently. Allow me to have some takes¹.

Programming is over

Claude Code with Opus 4.5 has left a strong impression on me. It can code now. Not just a bit – it can create big chunks of decent enough code. The quality of the output is as good as that of many human developers but it works much faster. It can figure out complicated code bases, fix bugs, and add features.

This leads to my first hot take: programming is over.

By programming I mean writing and editing code by hand. More and more of code will be written by models. Writing code yourself will be relegated by and large to hobbyists. It will take a while for the change to take effect across the industry, but let’s say that in 2030 it’s going to be rare for most software engineers to write code by hand. Undoubtedly there will remain some holdouts of programming-by-hand far in the future.

This is subject to a bunch of assumptions:

The models continue to improve. I don’t know how long the rapid improvement will continue, but I assume it has not yet tapered out.
The coding agents remain available for moderate prices. Maybe the financial shenanigans will crash the AI industry or maybe the leading companies will find a way to lock you in and jack up the prices.
The society stays stable. We live in tumultuous times and the AI tools themselves are part of the tumult. You know that thing where you use an AI model to run a company? A friend calls it strapping a paperclip maximizer to a paperclip maximizer.

In any case, software engineering is not yet over. If anything, I expect that more software than ever gets produced. There remains room for people who are experts in it even if if the need for programming is gone.

Code review is over

The coding agents can produce a lot of code quickly. How do you know if the code does what it is supposed to do? You could review it. Programmers famously love reviewing – you never hear complaints about how slow the process is or about rubberstamped “Looks Good To Me” reviews.

Wait, no, that’s exactly what you hear about all the time. This leads to my next hot take: code review is over.

We’re not going to be reviewing all that code generated by the agents, whether the agents are operated by ourselves or our teammates. We might do some spot checks, but we will be relying on other ways of making sure it’s correct.

What are those ways going to be? People are trying to figure it out. Testing the code is going to be a big deal, no doubt about that.

People who know me know that I’ve advocated for code review for a long time. It’s because I see it as a point of collaboration and forcing function for understandable code. If programming is theory-building, so is review. However, when it’s the agent working on the code, not you, the value of that work becomes ambiguous.

What about theory building?

Peter Naur argued in Programming as Theory Building (1985) that programming is not just about producing program text. Instead, it’s about the programmer developing insights – a theory -about how “certain affairs of the world will be handled by, or supported by, a computer program”. This enables them to explain why each part of the program is the way it is and, importantly, to modify the program.

With coding agents, we cede the production of program text to the agent. But what happens to the theory-building? More specifically:

Do we humans still need to theory-build? Is it enough for us to have a high-level theory while the agent takes care of the low-level details? Can you debug a program without having a theory of it?
Does the agent have a theory of the program? How does it form it and where does it reside? Does the agent reconstruct it every time you start a new session?

I don’t have clear answers.

A lot of debugging is like code archeology where you dig through layers of unfamiliar code. In the short term, it will become even more so: now your own code is unfamiliar and you will have develop understanding on the spot. In the long term as the agents catch up, debugging might be over, too.

The takes are hot, fresh off the pan. Better have them now, they might turn stale quickly. ↩︎

Yearnote 2025

Wed, 07 Jan 2026 00:00:00 +0000

Welcome to 2026. It sure has been a year already in terms of world events. Nevertheless, I’m going to talk a bit about how my 2025 was, the same procedure as every year.

On the photos: there’s one photo for each month, in chronological order.

Professional life

I make my living by working as a software engineer. It’s a job that I really enjoy: building software is a lot of fun.

Early in 2025, I left my job and ventured out as an independent consultant. Most of my year was spent with a big corporate client where I worked on modernizing the tools and the infrastructure for orchestrating their data processing jobs – classic data engineering work. It was a nice project and I really enjoyed working with the team. All in all, I’d consider my first year a success.

That said, it’s not like I have figured this business out. I found the big gig by bumping into a friend at a party. That was lucky, but now I need to do it again.

I’ve been struggling a bit to explain what it is that I want to do and what it is that I excel at. In lieu of a focused pitch, here are a few things I do:

Got a Python backend that is a bit slow? Would you like to make it go faster? Maybe some Rust would help? Hit me up.
Are you using Amazon S3 or another object storage in anger? Need someone to dig deep into the details? Hit me up.

If you get me on your team, I’ll also fix your development environment¹, make your GitHub Actions pipeline’s cache work, be endlessly patient with AWS IAM problems, and foster an enviroment where it’s okay to not know things.

Now I’m again open to new engagements. If you would like to work with me, [email protected]">send me an e-mail or connect on LinkedIn and let’s talk.

Software engineering community

My main way of engaging with the software engineering community was public speaking.

I gave a lightning talk at HYTRADBOI about Why S3’s conditional writes made people excited (see also the the companion blog post).
In May, I gave a talk about compressing floating point data with Gorilla at Helsinki Python Meetup.
In September at PyCon UK and again in October at PyCon Finland, I gave a talk about using Python and Rust together. I also wrote a post on setting up uv and Maturin so that they work together.

It was fun to give the talks and I’m hoping to continue it this year. I have at least one idea for a talk about data storage.

Other life events

I bought a home. I wasn’t planning for it, but the landlord decided to sell the apartment I was living in and after taking a look at the housing market, I ended up buying it. Now I get to renovate it. It would be amazing to finally live in an apartment with a dishwasher. Wish me luck.

Outdoors life

It was a weird year for me. I had no big adventures (apart from sailing from Scotland to Norway), which feels a bit disappointing. On the other hand, I climbed more than ever, skated more than ever, and ran more than ever.

It was a year of courses. I took a tour skating course, two kayaking courses, two climbing courses (introductions to lead climbing and climbing outdoors), and two first aid courses. I also acted as an assistant teacher on an introductory kayaking course.

I tend to take courses too late, so that I already know the stuff they’re going to teach. Still, the social element of learning together is enjoyable.

I did publish a couple of posts on my outdoors newsletter Small Rapids - see the archive. The most popular one was the one about ditching Spotify called You can still listen to .mp3s.

In 2026, I’m going to do a big hike again, that’s for sure.

Best of 2025

Best new album: Neko-a-Sekai by Tinyhawk & Bizzarro. They play groovy instrumental rock. It’s both a lot of fun and it leaves me in awe of their skill.
Best book read: Matara by Matias Riikonen. It’s about a group of boys spending the summer playing society that just so happens to resemble Ancient Rome. I enjoyed Riikonen’s rich language and how the book takes play seriously. (The book has not yet been translated to English.)
New favorite writer: Anne Carson. Have you read Autobiography of Red? There’s surprisingly lot to it considering how little it is.

Traditional commentary on Finnish politics

In each yearnote, I express (lack of) surprise at the current cabinet of the goverment of Finland.

Like I predicted, prime minister Petteri Orpo’s cabinet held together through the whole year. It wasn’t because lack of scandals! However, Orpo has his eyes on the ball and will get through every crisis to reach his party’s policy goals.

Nevertheless I believe that the cabinet will fall apart before the parliamentary election of 2027. The cabinet has already got massive policy changes done and there’s always a chance to score a few points in the eyes of voters by breaking up. As the election approaches, I bet both True Finns and Swedish People’s Party will get anxious about their popularity. Thus it’s more likely than not that the cabinet will break up before the year is over.

Fixing the development environment used to mean things like making sure that the tests can be run locally and that the IDE features work. In 2026, that probably also means ensuring that the coding agents can be run reliably and safely. New challenges! ↩︎

PyCon Finland 2025

Sun, 19 Oct 2025 00:00:00 +0000

After a nine-year break, PyCon Finland was back this Friday as a part of PloneConf 2025.

I took a morning train to Jyväskylä and gave my talk about extending Python with Rust. It went fine. I went through the slides so fast that I must have skipped over something, but thankfully there were a lot of great questions from the audience so we made good use of the extra time. A talk video will be uploaded eventually.

Like at PyCon UK, I bumped into a few people who already are extending Python with Rust. In the talk, I claim that it’s a pattern, a trend. The evidence for the trend keeps accumulating.

I didn’t catch many other talks, but the keynote by Patrik Lauha about Muuttolintujen kevät was a delight. It’s a mobile app that can automatically recognize bird songs from recordings, tailored for the bird species in Finland. If you hear a bird that you can’t recognize, you can record its song with your phone and the app will tell you the species.

Patrik is the scientist who has developed the machine learning model behind the app. His talk covered how the app works and how they’re using the data they’ve gathered through it. Through the citizen science style data collection, they’ve been able to make existing bird distribution models more accurate. Cool stuff!

Saturday was the sprint day. Most of the people were working on Plone and Plone-related projects, but Hugo van Kemenade was running a CPython sprint. I sat down with him and a few other people to work on CPython issues. That was nice, and a first for me. I haven’t attempted to contribute to CPython before.

The combination of frosty weather and autumn leaves made Jyväskylä look beautiful, especially around the Ylistö campus of University of Jyväskylä. I posted a few photos on Mastodon.

It was great to see PyCon Finland happening again and I’m glad I got a chance to give a talk there. Thanks to all the organizers, sponsors, and attendees for making it happen and making it fun!

Check out Juha-Matti Santala’s conference report, too.

PyCon UK 2025

Mon, 29 Sep 2025 00:00:00 +0000

Welcoming. That’s the word that comes to my mind when I think about PyCon UK 2025.

PyCon UK is the main Python conference in the UK. This year’s edition took place a week ago - mid-September - in Manchester at Contact Theatre. I attended the conference in person and gave a talk, too.

My way to Manchester

In May, I was thinking about extending Python with Rust. I had done it at work and in a hobby project and it had turned out to be far easier and more convenient than I thought. People should know about that!

When I saw someone advertising the PyCon UK CFP, I figured out that I might as well pitch a talk about it. I used to think that conferences and giving talks was not for me. However, attending Heart of Clojure made me change my mind.

To my surprise, the talk proposal was accepted. Another surprise was that it attracted full audience.

The title of the talk is Python and Rust, a perfect pairing and it is about extending Python with Rust using PyO3, Maturin, and rustimport. The talk video is already on YouTube and the slides are here. Check it out!

If you’d like to see me present it live, you still have a chance. I’m going to give the talk again at PyCon Finland 2025 in Jyväskylä on Oct 17 - slightly improved based on feedback.

The other talks

You can fit a lot of talks in three days. What most piqued my interest were the talks about Python performance. Sasha Romijn talked about Python performance mistakes. Kolen Cheung compared Numba and JAX. Peichao Qin talked about extending Python with C++ and Pybind11. My talk contributed to this theme as well.

Out of the three keynotes, my favorite was the one by Sheena, titled Playing the long game. It was a balanced take on the LLM tools and how they might be affecting software development careers, especially the people early in their careers.

It’s a different tack, but there was a rehearsed reading of Emily Holyoake’s play Ada. I enjoyed it a lot.

The hallway track

The hallway track is the most important track in every conference.

For me, it was a chance to meet a bunch of Internet friends in person for the first time. That was great! I also met with a number of new people and talked with them about Python and Rust, and about climbing and life. That was great, too.

So what made the conference welcoming? It’s not just one thing, but the conference’s inclusion efforts seemed to pay off, as noted by keynote speaker Hynek Schlawack.

Manchester

I didn’t have a chance to see that much Manchester, but I did see Oxford Road a lot. My hotel was on Oxford Road, as was the conference venue, as was the main bar we went to, as was Manchester Museum (they have frogs), as were the coffee places I had my morning coffee at.

After the conference, I did a walk in Peak District near Kinder Scout. The day was stunning and the scenery beautiful. This time, I was lugging all my stuff with me, so I had to walk. The next time, it would be great to run the route and explore further. If there’s a PyCon UK in Manchester in 2026, I will do it.

uv and maturin

Fri, 12 Sep 2025 00:00:00 +0000

uv is a Python package and project management tool.¹ Maturin is a build backend for Python extension modules implemented in Rust. How to best use them together?

Why uv?

First of all, if you’ve already got Maturin, why do you need uv? There are a couple of reasons:

uv takes care of the virtualenv management.
You can use uv to install development dependencies such as pytest.

You could use another virtualenv manager such as tox. Myself, I wanted to use uv because I’m already familiar with it and it’s fast.

Setting up the project

Start by creating a new project the usual way:

maturin new my_project

Among other things, this creates a standard pyproject.toml that works with uv. If you run now uv sync, uv will build the project using Maturin’s build backend, create a virtualenv and install the package there.

However, there is a problem: when you edit your Rust source files, there’s no easy way to rebuild the package. Running uv sync will not help because uv does not know that the Rust source files affect the package.

Running maturin develop does rebuild the package, but if you do anything with uv that triggers a sync - for example, using uv run - then it will re-install the cached old version of the package.

Below, I present three options for making it work.

Option 1: Let uv handle the rebuild

You can tell uv that the package needs to rebuild whenever the Rust source changes. To do that, you’ll need to set tool.uv.cache-keys in pyproject.toml:

[tool.uv]
cache-keys = [{file = "pyproject.toml"}, {file = "Cargo.toml"}, {file = "**/*.rs"}]

Note that even if you have a mixed Python/Rust package, the package does not need to rebuild when the Python files change. uv sync by default installs the package as editable, so the Python interpreter uses the source files directly.

There’s a downside to this approach: uv sync builds the package in the release mode which yields faster code but slower build times than the debug mode used by maturin develop.

Option 2: Use maturin develop

The alternative is to not install the project package at all when running uv sync. Edit pyproject.toml:

[tool.uv]
package = false

Now if you run uv sync, it will install dependencies but not the project itself. You need to install it manually with Maturin:

maturin develop

You will have to re-run this command whenever you change the Rust files.

Option 3: Use maturin import hook

If you want to use maturin develop, but you also want the code to rebuild automatically whenever it changes, then the Maturin import hook is the way to go. It rebuilds the code if needed when you import it in Python.

Like in option 2, you need to set tool.uv.package to false. You also need to set the minimum required Python version to 3.9 or higher - the default by maturin new is 3.8:

[project]
# ...
requires-python = ">=3.9"

Then, let’s install the import hook:

uv add --dev maturin_import_hook
uv run -m maturin_import_hook site install
maturin develop

You need to install the project with maturin develop once, but after that the import hook will take care of the rebuilds.

Gotcha: If you’re on macOS and you use Python installed with Mac Homebrew, this will not work with Maturin 0.3.0. A workaround is to use uv-managed Python: uv sync --managed-python

Managing Python dependencies

You can manage Python dependencies with the usual uv commands. For example, to add pytest as a development dependency:

uv add --dev pytest

In my opinion, uv is the snappiest and the easiest-to-use package management tool for Python right now and it’s the one that I recommend to most people. I’ve previously recommended Poetry, but uv has surpassed it in features and it was always much faster. ↩︎

Compressing floating point data with Gorilla

Mon, 16 Jun 2025 00:00:00 +0000

Last week, I gave a talk titled Compressing floating point data with Gorilla at Helsinki Python Meetup. There’s no recording, but here is a blog version of the slides.

Gorilla is a simple algorithm for compressing floats. It’s over a decade old and it’s not the state of the art anymore but since it sparked the development of a whole family of algorithms, it’s a great starting point for learning about this topic.

I had a lot of fun learning about this and I figured others would enjoy it as well. You get to learn a bit about floats, time series data, and data compression. I hope you like it!

Actually, don’t just read this blog, go subscribe!

The talk title mentions the Gorilla algorithm, but the actual Gorilla was a time series database developed by Facebook. It was not ever made publicly available but they described it in a paper published in 2015.

They were trying to monitor web backends, including metrics like CPU load and network latency, and they wanted to be able to query recent data quickly. Because of this, they designed a new database that could hold the data in the memory. It was pretty much an observability tool, although they didn’t call it that back then.

To make the data fit in the memory, they came up with a simple but efficient encoding for the floats. This encoding is the algorithm that we’re talking about.

There are many general purpose compressing algorithms like DEFLATE and Zstandard. These algorithms need to work on any data you can throw at them. However, if you know something about the data you’re going to be compressing, you may be able to design an algorithm that is faster or compresses better or uses less memory than the general purpose algorithms.

With Gorilla, we’re going to compress time series data, which is data where the data points are associated with a timestamp. Do we know anything about time series data that we could use?

One thing is that it’s often produced by some sort of slow process. For example, on the slides you can see some temperature data from my home. As you can see, the numbers are roughly the shame. The temperature changes slowly and within a limited range. It’s not going to suddenly jump to plus or minus hundred degrees.

Here’s an insight we need: in time series data, the consecutive values typically resemble each other and stay within a narrow range.

Before we can talk about the algorithm itself, we need know a bit about how floats work.

Link: https://float.exposed/0x3ff0000000000000

float.exposed is a web playground for seeing how computers represent the floats. You can enter a float at the top and it shows you the bits and the decomposition.

You can click the bits to toggle them.

See what happens when you toggle the sign bit (marked with red underline).
See what happens when you increase or decrease the exponent.
Toggle the bits in the significand (marked with blue underline) and see if you can do the binary math in your head to verify the result.

Try entering 1.25. The slides said that the exponent would be 0 but float.exposed says it’s 1023. That’s because the value is biased. You have to subtract 1023 to get the actual exponent. This is what allows us to store negative exponents in that fields - try entering 1 in the exponent and see what you get.

We don’t have to remember all of this to understand Gorilla. However, it’s good to notice that the sign bit and the exponent come first. This means that if we’ve got a bunch of numbers that a roughly the same, and they aren’t right around zero, the left-hand side of their bit representations is going to be roughly the same as it’s only the significand that changes.

Now we’re ready to talk about the algorithm!

The second column shows the binary representation for the inputs. I made up the syntax in the third row - you can’t actually xor two floats in Python - but I’m using it to mean the xor of the binary representations.

Xor sets the bits that differ to one and the bits that are the same to zero. See the highlighted bits - they are the only two that differ in the inputs.

Xor is also reversible. If you xor 20.5 with 21.0 and then xor the result with 21.0 again, you’ll get 20.5 back.

Now if we look at the xor result, we can see there’s a run of leading zeros. Then there’s a handful of meaningful bits. Now it’s just two ones but there could be some zeros in the middle. And finally there’s a run of trailing zeros.

Here’s how we’re going to encode this: we’re going to store the number of zeros instead of the actual zero bits.

We’re going to store the first value as-is and then for the remaining values, the value xorred with the previous value. Since xor is reversible, we can revert this process when we’re decoding the data.

Then we encode the values as described in the slides.

In the second case (control bits 10), the precise condition is that the number of leading zeros is the same or greater than in the previous value and the number of trailing zeros is the same. If you have more leading zeros than in the previous value, you will have to include those zeros in the meaningful bits.

Link: https://miikka.github.io/python-gorilla/?floats=20.5%2021.0%2021.0%2021.2%2021.1%2020.9

Okay, sorry blog readers, this part would really benefit from a screencast with me gesturing at things.

This is a playground that you can use to see the algorithm in action. Use it to step through the compression of the values 20.5 21.0 21.0 21.2 21.1 20.9.

The input is in the first column and the output is in the third column.
The second column shows the binary representation. The first value is as-is and the remaining values have been xorred with the previous value.
You can hover the highlighted blocks to see what they mean.

Background color guide for the highlighted blocks:

Green: control bits
Orange: the number of leading bits
Violet: the number of meaningful bits
Navy: the meaningful bits themselves

In the last step you can see that the compressed data uses 41.17 bits/value. Uncompressed data would use 64 bits/value.¹ Is this good performance? We’ll return to this question in a moment.

Note how the changing bits are mostly in the middle or on the right-hand side.

Link: https://github.com/miikka/python-gorilla. See the implementation and the test suite.

The implementation, including encoding and decoding, is just a bit over 100 lines, so you can see it’s a fairly simple algorithm. The algorithm uses little memory since all the state it needs to keep is the previous value.
I used the bitstring library to do bitwise IO. It was okay, but it’s easy to do the bit twiddling with ints and do the bitwise IO yourself.
The test suite has just one test but it’s a pretty good one since it’s a property-based test powered by Hypothesis.

How do you know if a compression algorithm is any good? One way to do is to gather a set of data that you want to compress and run benchmark on it, comparing it to other algorithms. I believe Facebook did this, but the Gorilla paper is a bit thin on the results. Luckily there are some later papers that have more informative benchmarks.

Chimp is an algorithm directly inspired by Gorilla. They ran benchmarks over a number of time series data sets and here’s one of the graphs they produced.

The horizontal axis is bits/value and the vertical axis is how long the compression takes. In both cases, smaller is better.

We can compare Gorilla (blue circle) to Zstd (orange circle), which is a great general purpose compression algorithm. Gorilla seems to produce about twice as many bits and to be four times faster.

Is this a good tradeoff? It could be - you have to decide. In any case, Zstd was published in only 2016, so it did not exist in public when Gorilla was published.

Here’s another benchmark from the ALP paper.

The horizontal axis is decompression speed and the vertical axis is compression speed. The compression rate is in the labels. For all of them, the higher the better. You can see that Zstd (black) seems to actually beat Gorilla (green) in decompression speed.

ALP, published in 2024, is the state of the art algorithm for compressing floats. It’s significantly more complex but it also delivers results. As you can see, it’s right there in the upper-right corner with excellent 3x compression ratio.

The algorithm does not actually make use of the float structure directly, so you can compress any data with it. I have not tried it, but you could compresse your web pages or source tarballs with Gorilla if you wanted, as long as you pad the length to be a multiple of 8 bytes. I expect that the compression ratio is rather poor.

Most of you probably won’t be implementing these Gorilla or Chimp or ALP yourself, but you might be already using some of them indirectly. For example, the analytics database DuckDB implements Chimp and ALP.

Thanks for reading!

The astute reader might notice that the string 20.5 uses only four bytes, or 32 bits, whereas storing it as a float uses 8 bytes, or 64 bits. Another idea for a compression algorithm right there! ↩︎

FP compression family tree

Wed, 07 May 2025 00:00:00 +0000

I’ve been looking into algorithms for compressing floating-point data in columnar databases. Here’s a family tree for a few of them.

Gorilla: The original algorithm based on xorring the consecutive values. Described in 2015 by Facebook in the paper describing their Gorilla timeseries database.

Chimp: Improves upon Gorilla by using more complicated, more efficient encoding for the xorred values. Paper from 2022.

Chimp128: Instead of xorring the current value with the previous value, look at the previous 128 values and xor the current value with the one that produces the most trailing zeros. Published along with Chimp.

Patas: The algorithms above produce a stream of bits which makes the IO inefficient. Patas is a simplification of Chimp128 that produces a stream of bytes instead. The compression ratio suffers, but it goes much faster. Developed for DuckDB in 2022.

Elf: A careful mathematical analysis gives a way for even more efficient encoding for the xorred values. The compression ratio is a bit better than with Chimp/Chimp128 and the performance is worse. Paper from 2023.

Camel: A new XOR-based algorithm. I haven’t studied this one enough to summarize it, but it is included here for the sake of completeness. Paper from 2024, but seemingly not open access.

Pseudodecimal Encoding (PDE): Unlike the previous algorithms, this one is not XOR-based. The key insight is that a lot of float data is originally fixed-point decimal data and it’s more efficient to store it as a tuple of significant digits and a decimal exponent. Publishes as part of the BtrBlocks columnar data format in 2023.

Adaptive Lossless floating-point Compression (ALP): ALP starts with the same idea as PDE but it is designed for vectorized execution, yielding much better performance. It also has an alternative scheme for storing the values that do not compress well with PDE, hence being called adaptive. Paper from 2023.

Storing timeseries data

Fri, 28 Mar 2025 00:00:00 +0000

What are your options for a file format if you want to store timeseries data?

CSV or JSON lines: The simple, stupid option which is not to be overlooked. It works as long as you don’t have too much data or too stringent performance requirements.

Parquet: Apache Parquet is the industry standard for columnar data used by data lakes and similar. Possible alternatives include ORC; however, it seems that ORC’s popularity is waning.

Arrow: Apache Arrow is an memory format for columnar data. It’s great, but it’s not designed for efficient long-term storage.

DuckDB: DuckDB is an embedded databased focused on columnar data. Like SQLite, it can be a relevant option for storing data. Since version 0.10, they are promising backwards compatibility for files.

The next-generation columnar formats: Lance promises to better suited for ML than Parquet. Vortex just promises to be all-around better than Parquet. BtrBlocks and FastLanes are more academic projects.

Roll your own: The fun option.

Photo: A small isle on frozen Lake Bodom in Espoo.

Leader election with S3 and If-Match

Tue, 25 Feb 2025 00:00:00 +0000

Let’s implement leader election using Amazon S3’s If-Match condition by building a distributed lock with it.

In August 2024, Gunnar Morling published a blog post that shows how to do it with the If-None-Match condition. Back then, If-Match had not yet been released. This post shows another way to solve the same problem.

The post is intended to stand on its own so you don’t need to read Gunnar’s post first. But do read it as well to see how the solutions compare!

What’s If-Match

PutObject is the API call that you use to upload data to Amazon S3. By default, the PutObject calls are upserts: they will replace the object contents or create an object if one does not already exist.

In 2024, Amazon introduced two conditions for the PutObject calls If-Match (announcement) and If-None-Match (announcement). They allow you to restrict the behavior in the following ways:

If you set If-None-Match: *, the call will only succeed if the object does not already exist.
If you set If-Match: <value>, the call will only succeed if the object exists and its content has the matching entity tag (ETag) value. Entity tag is essentially checksum for the object content.¹

DeleteObject also takes the If-Match condition, so you can delete an object only if it has matching ETag.

If the call fails, you’ll get a 412 error response (or, in some cases, another 4xx error).

Together with S3’s consistency guarantees these conditions allow you to do compare-and-swap (CAS) operations. They are a key building block for distributed systems.

What’s leader election?

Many distributed systems require designating one of the nodes as the leader. Typically the leader accepts the write requests from the clients and then sends them to the other nodes that process read requests.

How do the nodes choose the leader? Martin Kleppmann in Designing Data-Intensive Applications writes:

One way of electing a leader is to use a lock: every node that starts up tries to acquire the lock, and the one that succeeds becomes the leader.

If we can build a distributed lock, we can perform leader election. Let’s see how to do that on S3.

The locking protocol

We will use a single object in the bucket for locking. Let’s call it lock. It will be a JSON blob that looks like this:

{
  "expires_at": 1740151473.206179
}

Here expires_at is a timestamp in seconds since the UNIX epoch for when the lock expires.

To acquire the lock, the nodes do the following.

Read the contents of lock. If the object does not exist, there’s no lock and we can jump to step 3.
If expires_at is in the past, the lock has expired and we can continue. Otherwise acquiring the lock has failed.
Put a new version of lock with the desired expiration time and with one of the conditions:
- If lock existed in step 1, use If-Match with its ETag value.
- If lock did not exist in step 1, use If-None-Match.

If the put in step 3 succeeds, the node has acquired the lock.

S3 has strong read-after-write consistency, so if there is a lock, in step 1 every node is guaranteed to see up-to-date version of the lock data. In step 3, the use of the conditions guarantees that only one node will succeed at acquiring the lock.

If the leader wants to release the lock, it can delete the object using If-Match with the ETag value received in step 3.

Fencing tokens

The elephant in the room is that this relies on the nodes having their clocks in sync, which is a famously difficult problem. Consider what happens if the leader’s clock is behind the others or the clock of one of the secondaries is ahead the others: the leader thinks it still holds the lock while the secondary thinks it has expired. If the secondary now grabs the lock, the former leader can end up issuing zombie requests.

In his post How to Distributed Locking, Martin Kleppman explains that you can use fencing tokens to solve the issue. Fencing token is a number that increases every time a node acquires the lock. The token should then be included in the requests to the system that we hold the lock over, and it should track the highest token it has seen and reject the requests with lower tokens. This filters out the zombie requests.

In our case, even expires_at could work as a fencing token if the lock duration is always the same. The protocol guarantees that it will always increase.

However, we do not have to make the lock duration fixed. We can add another field token to the JSON object:

{
  "expires_at": 1740151473.206179,
  "token": 1
}

token is a number, starting at zero, that should be incremented every time the lock is acquired. The node acquiring the lock reads it in step 1 and it can increase it in step 3.

Releasing the lock by deleting object does not work anymore as that would reset the token. You can release the lock by setting expires_at to zero without incrementing token.

{
    "expires_at": 0,
    "token": 1
}

Python implementation

Here’s a basic implementation in Python using boto3. Adding support for the fencing tokens and releasing the lock is left as an exercise for the reader.

import dataclasses
import json
from dataclasses import dataclass
from datetime import UTC, datetime, timedelta
from typing import TYPE_CHECKING, Self

import boto3
import botocore.exceptions

if TYPE_CHECKING:
    from mypy_boto3_s3.client import S3Client

s3_client: "S3Client" = boto3.client("s3")


@dataclass(frozen=True)
class LockData:
    expires_at: float

    def to_json(self) -> str:
        return json.dumps(dataclasses.asdict(self))

    @classmethod
    def from_json(cls, data: str) -> Self:
        return cls(**json.loads(data))


def acquire_lock(
    s3_client: "S3Client",
    bucket: str,
    key: str = "lock",
    expires_in: timedelta = timedelta(seconds=60),
) -> bool:
    """Try to acquire a lock using S3 as the coördination mechanism.

    Args:
        s3_client: boto3 S3 client
        bucket: S3 bucket name
        key: S3 object key
        expires_in_seconds: Lock timeout

    Returns:
        bool: True if the lock was acquired, False otherwise
    """

    try:
        existing_lock = s3_client.get_object(
            Bucket=bucket,
            Key=key,
        )
    except botocore.exceptions.ClientError as e:
        if e.response["Error"]["Code"] == "NoSuchKey":
            existing_lock = None
        else:
            raise

    if existing_lock is not None:
        existing_data = LockData.from_json(existing_lock["Body"].read().decode("utf-8"))

        if datetime.now(UTC).timestamp() <= existing_data.expires_at:
            return False

        condition = {"IfMatch": existing_lock["ETag"]}
    else:
        condition = {"IfNoneMatch": "*"}

    lock_data = LockData(expires_at=(datetime.now(UTC) + expires_in).timestamp())

    try:
        s3_client.put_object(
            Bucket=bucket,
            Key=key,
            Body=lock_data.to_json(),
            **condition,  # type: ignore[arg-type]
        )
    except botocore.exceptions.ClientError as error:
        if error.response["Error"]["Code"] in (
            "ConditionalRequestConflict",
            "PreconditionFailed",
        ):
            # We could alternatively retry on ConditionalRequestConflict (409)
            return False
        raise

    return True

Here’s another exercise for the reader: The lock object does not include information about who is holding the lock as it’s not necessary for the protocol. However, it would be handy in a real-world implementation in case you ever need to debug this.

Does this make sense?

What’s nice about this compared to Gunnar’s version is that there’s no need for a background process to delete the stale lock objects. Gunnar’s design creates a new object every time a lock is acquired but in this version, there’s only a single object that gets modified.

However, with both designs you have to ask whether they make sense in the real world. As I’ve mentioned before, while S3 storage is fairly inexpensive, the requests are not cheap: in the standard tier and us-east-1 region, PUTs cost $0.005 per 1000 requests and GETs cost $0.0004 per 1000 requests. The latencies are in double-digit milliseconds. S3 Express One Zone makes the requests only 2x cheaper, so it does not materially change the situation.

This means that if you’re looking to build a high-performance, low-cost distributed lock, S3 is not going to be your first choice. You would probably use it because you’re already using S3 for something else and you want to hold a lock over S3 resources. Unfortunately S3 does not support fencing tokens for PutObject calls, which limits the usefulness of this approach.

New winds

Fri, 31 Jan 2025 00:00:00 +0000

In my last post, I wrote about my plans for this year:

This time I’m trying to be more intentional, especially career-wise.

I concluded that it is time for me to move on from my position as a software engineer at Oura Health. I had a great two and a half years there, building and operating the backend-slash-database that is the main data store of the company, and I learned a lot about Python, Rust, AWS services, cloud databases, and scaling backend systems.

However, I want to focus even more closely on how data is stored in the cloud. Therefore I’m planning to venture out as an independent expert.

More on that later. Meanwhile, if you’ve got exciting opportunities for me, [email protected]">email me or connect on LinkedIn.

Photo: A view of Merikoski power plant in Oulu in early January.

Yearnote 2024

Wed, 08 Jan 2025 00:00:00 +0000

It’s 2025 now. This century is now 25% complete! As is my tradition, I’m going to reflect on the year gone by.

On the photos: there’s one photo for each month, in chronological order.

Software engineering

At work, I continued to build cloud backends to manage data. I ended up spending a lot of time with Rust, OpenTelemetry, Docker, GitHub Actions (GHA). Out of these technologies, learning Rust was great and trying to use OpenTelemetry only led to frustration.

Docker and GHA got the job done. Later I posted about how to cache Docker builds.

Later in the year, I spent some time on clarifying how we do schema evolution. Then I switched to building developer tooling. That was a nice change of pace. Building tools is fun and when you build internal tooling, your end users are right there.

The tooling project led me to revisit Python packaging. It didn’t go well.

Rust in production

The Rust project was interesting. We implemented a new web backend in Rust and shipped it to production. We managed to run into a lot of rough spots in the Rust web dev ecosystem.

For starters, async Rust is Rust in hard mode and it’s also work-in-progress – for example, async fn was allowed in traits only right when we were starting the project. This was the first real Rust project for all of us, so there was a bit of a learning curve.

Then we ran into numerous sqlx issues. The OpenTelemetry crates are unstable and there are regular breaking changes which increased the maintenance burden. If you’re used to Python testing tools, Rust just isn’t on the same level - we struggled with fixtures and with mocking AWS services.

Not everthynig was bad. For routing we used Poem, which got the job done even if it was pretty verbose. Cargo Lambda was also great – it just worked and even cross-compilation works out of the box.

It wasn’t the smoothest project, but at least we learned a lot. Later we were able to use that knowledge to introduce Rust into another system where it gets to shine for real.

Software engineering community

I didn’t have much time for open source, but I did create paketoi, a command-line tool for building Python deployment packages for AWS Lambda. The announcement post explains the motivation.

I attended two conferences, Heart of Clojure and EuroRust 2024. In-person conferences are great except that I got predictably sick afterwards.

At Heart of Clojure, I gave a lightning talk where I said that you should blog more. You should blog more!

Blogging

In August, I started posting weeknotes. Having a weekly writing routine was a big success. Even if most of the posts are nothing special, it got my writing engine going. Many of the posts sparked nice conversations with colleagues, friends, and Internet strangers.

This year, my most read post was Do not use requirements.txt (my most popular post ever!). Out of the articles I’ve posted this year, the trip report on Heart of Clojure was the most popular one.

I also announced that I’m starting a newsletter for my outdoors adventures called Small Rapids. The first post is still waiting for itself, but I promise it is coming soon! It will be about tour skating and ice safety.

It was great to get back to writing in public and I want continue it in 2025. I’m hoping to write a couple of better-researched and better-edited posts.

Microblogging

I’m active on Mastodon. Check out my most banger posts:

Looks like Fediverse enjoys cable cars.

Outdoors life

It was a good outdoors year. It began with a cold winter in Helsinki, which meant that I got plenty of good opportunities to skate on the sea ice.

In the spring, I ran regularly. This paid off when I decided on a whim to go hike in Urho Kekkonen National Park after Midsummer. Never has hiking been so easy! I spend a few days there, visiting Paratiisikuru and taking the high route whenever possible.

It was actually my first hike that was over 100 km. I was going to do one already in 2020 - I was hoping to hike Kungsleden in Sweden - but I had to cancel it due to COVID-19. Back then it felt like such a big goal. Now it was just something I did without much thought.

Paddling

My big summer project was to learn to roll a sea kayak. I started by taking a course by Anssi Nupponen. Anssi’s teaching was great, but two days was not enough for me to learn even a butterfly roll. However, I learned enough to practice myself.

After a bunch of sessions, I managed to do sweep roll on one side. Later at the rescue practice camp of my kayak club, I managed to roll the kayak reliably in pretty big waves – big for Baltic Sea, that is. That felt great.

Maybe in 2025 I will learn to do it from the other side. It would be fun to learn some other rolls, too.

Climbing

In the spring, I learned to climb on top rope. I didn’t go climbing outdoors too many times, but I had good time when I did. This year, I hope to learn to lead climb.

Best of 2024

Best new album: Lehto / Korpi by Pauli Lyytinen. Who could not love saxophone improvisation on top of bird song field recordings?
Best gig: The secret gig at We Jazz where Antti Lötjönen, Heli Hartikainen and Aino Juutilainen played together. The atmosphere of the gig was something else, so focused. You just had to be there.
Most interesting book read: Raukoilla rajoilla by Markku Eskelinen. It’s an iconoclastic history of Finnish prose and it filled me with new interest in Finnish literature. Eskelinen does not hold it in very high regard, but the counter-examples he points out are interesting.
New favorite writer: Lydia Davis. I read her latest book Our Strangers and her super-short short stories observing everyday life are wonderful.

What about 2025?

Last year I wrote:

What about 2024? I don’t know yet. I’ll figure it out.

I did not explicitly set out to figure it out and the result was that I did not figure it out. It left me feeling aimless. This time I’m trying to be more intentional¹, especially career-wise. I’m asking myself where I want to go. Haven’t answered it yet.

Traditional commentary on Finnish politics

In each yearnote, I express (lack of) surprise at the current cabinet of the goverment of Finland.

A year ago I predicted that while the scandals will continue, Petteri Orpo will be able to hold his cabinet together. Looks like I was right for once!

What about the next year? The closer we get to the election, more likely the cabinet is to fall apart. There are two reasons for this. First, the more time goes on, the more Orpo has been able to achieve and thus it’s less of an imperative to hold together. Second, the policies of the cabinet have been incredibly unpopular and the closer we get to the election, the more anxious the parties get to distance themselves from the cabinet.

Orpo’s cabinet was formed in 2023 and the next parliamentary election is in 2027, so there’s still plenty of time to the next election. Thus my prediction is that Orpo’s cabinet will hold together for the next year. The turbulence from the upcoming municipal and county elections will not be enough to break it.

It’s not that you have to wait for a year to change to start being intentional. The timing just happens to coincide. ↩︎

Weeknote 19: ADRs record decisions

Sun, 22 Dec 2024 00:00:00 +0000

This week I wrote a bunch of architectural decision records (ADRs) for a project I’m working on.

ADRs were popularized by Michael Nygard’s blog post Documenting architecture decisions. Nygard does a great job explaining why they will be useful in the future.

They will give useful context about decisions for the people who weren’t around when the decisions were made, and also for people who were there but already forgot about it. This in turns allows being intentional about accepting or changing decisions.

However, they have benefits already today.

Writing things down clarifies your thoughts. I keep banging this drum, but it’s worth mentioning again. As you write an ADR, you will have to work through the details. This will clarify your thoughts and reveal the flaws in your arguments.

An ADR makes it clear if a decision was made. ADR is a kind of a design document, but not all processes for design documents track their status. This can become a problem because while you can find the old docs, you don’t know what was decided about them and whether they were ever implemented. This is bound to happen if nobody ever explicitly makes a decision about them.

ADRs can act as a forcing function. Nygard’s ADR template has a Status field. He suggests that the value can be “proposed”, “accepted”, “deprecated”, or “superseded”. The template also has a section called “Decision” that spells out the decision. This should make the situation clear. You would not mark an ADR as accepted unless a decision was made, right?

You still have to keep publishing new ADRs and mark the old ones as deprecated as you go along, but now you’ve got a process for it.

Blog meta

This is going to be the last weeknote of the year. I’m going to be off for a vacation. The next post is going to be my yearly yearnote for 2024. Happy holidays everyone!

Photo: A pile of firewood. ADRs are form a decision log, I suppose.

Weeknote 18: Code comments

Sun, 15 Dec 2024 00:00:00 +0000

When do you add comments to your code? Do you do it all? Everyone has seen one of these comments:

counter += 1  # increment counter by 1

This comment is explains what is already obvious from the code, so it’s not useful. What’s worse is when somebody then changes the code to be like this:

counter += 2  # increment counter by 1

Now the comment does not match the code anymore. People bring up examples like this when they disparage comments. However, there are other kinds of code comments that are much more valuable.

Explain why the code looks wrong

Code should be written to be easy to understand because understanding is required for both debugging and modifying the code. This is mainly about the code itself: writing clear code, choosing module boundaries carefully, using descriptive names, and maintaining cohesive style.

At times, comments are needed too. In A Philosophy of Software Design, John Ousterhout proposes the following principle:

Comments should describe things that aren’t obvious from the code.

I agree. One heuristic that I’ve lately used is that you should add a comment if the code looks wrong. Think of the next person reading the code – is there anything they would think as weird or out of place? If the wrong-looking code is correct but you can’t make it look correct, you should add a comment.

For example, I recently ran into code that looked like this:

thing1 = create_thing()
thing2 = create_thing()
thing3 = create_thing()
thing4 = create_thing()

for thing in [thing1, thing2, thing3]:
    process_thing(thing)

Isn’t there something odd about this? For some reason, thing4 is not processed. Maybe the author of the code has accidentally left it out. Should you go and add it to the list?

In this case, there was a good reason for the omission and the code itself did not need changes. We added a comment, though, explaining the reason right next to the for loop. This way, the next reader will not need to question it.

# `thing4` is not included because it will be processed by another component
for thing in [thing1, thing2, thing3]:
    process_thing(thing)

GHC Notes

Sometimes you need to explain the same thing in comments in multiple places. Glasgow Haskell Compiler (GHC) has a nice convention sometimes known as GHC notes. They have a syntax for the long comments that can to be referred to from other places.

My Python adaptation of the style looks like this:

# Note [This is a note]
# ---------------------
# Here goes the long explanation I want to refer to some other place in the code.
# It is usually multiple lines long.

Then you can refer to it from another comment like this:

do_a_thing()  # See Note [This is a note]

I’ve used this a bit in the last few months and it feels promising. Now there’s only one place for the information, which makes it easier to keep it up to date.

It would be great to have editor support for this so you could jump to the actual note from the reference. Unfortunately so far I haven’t found anything suitable for VS Code.

Photo: A snow-covered small tree on the shore of Lake Vähä-Parikas.

Weeknote 17: Caching Docker builds on GitHub Actions

Sun, 08 Dec 2024 00:00:00 +0000

In the past few weeks, I’ve spent a lot of time building Docker images on GitHub Actions. Anyone who has done it knows what it is like: I’ve been waiting a lot.

To avoid waiting, you’ll want to set up build caching. Here’s a few things I learned about it.

Building Docker Compose targets

For one of our services, we have a Docker Compose based setup for running it locally. To ensure that it continues to work, I created a CI workflow with GitHub Actions that builds the images, starts the system, and executes a few tests to check that everything is okay.

The first version of the build step looked like this:

- run: docker compose build

It worked great except for one thing: this does not cache anything between CI jobs. The way to go for caching Docker builds is to use BuildKit’s nice cache backends. But how to use them with Docker Compose?

Docker now has the docker buildx bake command for building the targets defined in a docker-compose.yml file. There’s an official action docker/bake-action for it.

GitHub Actions cache backend (type=gha) is the easiest backend to use as it needs no setup. Use the set input to enable it like this:

- uses: docker/bake-action@v5
  with:
    set: |
      *.cache-from: type=gha
      *.cache-to: type=gha,mode=max

The *. wildcard means it will be enabled for all targets.
If you’re building a multi-stage image, you’ll want to set the cache mode to max with mode=max to cache the earlier stages in addition to the last one.

See the docs for all the settings, but you’ll probably want to use one or both of these:

To push the images, set push: true.
To make the images available for local docker commands, set load: true.

The complete test workflow could look something like this:

name: Test the local setup

on: [push]

jobs:
  bake:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: docker/setup-buildx-action@v3
      - uses: docker/bake-action@v5
        with:
          load: true
          set: |
            *.cache-from: type=gha
            *.cache-to: type=gha,mode=max
      - name: Start the services
        run: docker compose up -d --wait
      - name: Test the services
        run: echo TODO

The registry backend

When building Docker images, you can easily run into the 10GB size limit of GitHub Actions cache. As a workaround, you can use the registry backend (type=registry) which uses a container registry as the cache backend.

Since you’re already on GitHub, the easiest solution can be to use GitHub Container Registry (GitHub Packages), but for example AWS ECR works too.

The setup is a bit more involved - see the docs for your registry on how to allow GitHub Actions to push there.

Gotcha: Multi-arch builds

If you’re caching a multi-arch build, note that it does not work out of the box, at least not with type=registry. You have to jump through hoops to cache it.

Photo: An Independence Day view on the snow-covered lake Vähä-Parikas in Vihti, Finland. It is a color photo.

Weeknote 16: Late code review

Sun, 01 Dec 2024 00:00:00 +0000

This week I reviewed my notes about code review and noticed there’s something I haven’t written about: code review often happens too late.

Typically a developer requests code review for their pull request once it’s pretty much done. This is too late.

If you have developed a big change and the reviewer thinks the whole approach is wrong, it’s wasteful to throw the whole change away at this point, and it’s not fun feedback to get. This puts a limit on how big changes the reviewer can ask for in the review.

Another issue is that if in your mind, as the author, the pull request is finished and ready to merge, then it’s not fun when somebody pops up to ask you to make changes. This is especially true if your code review process is slow and you have already moved on to work on something else.¹

Dealing with it

How to counteract it? You need to get the reviewer involved more early. Here are a few things that I’ve tried:

Talk about what you’re doing. Telegraph your moves. Tell your colleagues in the daily standup what you’re working on and how you’re going about it. This is simple and usually works well enough.

Split up your work. Do multiple pull requests that build up to the goal you’re working towards. Smaller pull requests are easier to review, too. This has worked well, too, although the overhead in the review process will slow you down.

Request an early review. I’ve sometimes asked my colleagues to review the general approach taken in PR when it’s still a draft. It has not worked very well. Either people do not say anything or then they review the PR as if it was already ready. I still think this could work, but you need to be clear about what kind of feedback you’re looking for.

Bottom line

Code review is an opportunity for collaboration and it works much better if you embrace it as such, both as an author and a reviewer.

Photo: I’ve temporarily ran out of seasonal photos, so here’s a view from summer from top the Wank near Garmisch-Partenkirchen.

Starting new work when you’ve got old work in progress is questionable but common. ↩︎

Weeknote 15: Technology radar

Sun, 24 Nov 2024 00:00:00 +0000

Every time ThoughtWorks publishes their Technology Radar, I take a look. It’s a nice source for finding out about new technologies and I think the Hold/Assess/Trial/Adopt system makes sense even if I often disagree with the classification choices.

At work, I’ve been building some internal tooling for working with timeseries data and I’ve had a chance to try some new things. I decided to make my own mini-radar based on these experiences.

Adopt

These are great, you should use them.

Polars is a dataframe library for Python and Rust. A dataframe is an abstraction for tabular data - in my mind it’s similar to a database table. I think they originally come from R.

While pandas has long been the go-to Python library for dataframes, Polars feels fresh and streamlined. I’ve found it to be both fast and easy to use. The API makes sense to me and it’s easy to get data into and out of Polars. It has been a great choice for working with generic timeseries data.

DuckDB is “SQLite for columnar data”. It’s an embeddable single-file database like SQLite, but it’s geared towards analytics type of queries. My favorite feature is that it has support for a number of data sources so if you need to analyze, say, a bunch of Parquet files on S3, DuckDB can do it out of the box. It’s fast, too, and the duckdb command-line tool is nice to use.

I can’t believe I haven’t mentioned DuckDB on the blog before!

Trial

These are pretty good, but I’ve got some qualms about them.

htmx is a JavaScript library that makes it easy to use AJAX without writing any code. I’m not sure how to best explain it, but check out the introduction.

It’s great for when you have made a traditional server-rendered web application and need to sprinkle just a little bit of JavaScript on it to make it nice to use. I’m putting it under “Trial” because while it is handy, I don’t understand when and how to switch to something more advanced.

Assess

There’s show a lot of promise, but I haven’t yet tried them seriously.

uv is a Python package management tool. Initially it started as a pip replacement, but it has recently gained full project support. It’s snappy and it’s gaining features fast. I’ve previously recommended Poetry for Python project management, but uv could be a better choice already since it’s both faster and more compatible with the rest of Python ecosystem than Poetry is.

I’m putting uv under “Assess” since I have not yet had a chance to try it in a big project.

Hold

I don’t think these work very well in practice.

Python for command-line tools. Due to packaging difficulties, I’ve found it difficult to ship a command-line tool written in Python to software engineers who are not Python developers. You need to get all the dependencies installed somehow and all approaches seem to run into trouble.

If I were to do it again, I would seriously consider Rust or Go where there’s good support for shipping software as single binary.

In conclusion

This is what’s on my radar - what about yours?

Long-time readers may have noticed that I haven’t posted about outdoors hobbies in a while. That’s because I have wanted to focus this blog on software engineering.

I’m starting a newsletter about my outdoors pursuits such as kayaking and tour skating. It’s called Small Rapids. The first post is coming soon - subscribe so you won’t miss it!

Weeknote 14: Throwing it away

Sun, 17 Nov 2024 00:00:00 +0000

Apparently it was Fred Brooks in The Mythical Man Month who wrote that, when building a new software system, you should “plan to throw one away”.

It sounds like great advice, but I haven’t seen anyone follow it. I recently built a prototype of an internal tool at work and I thought this time I will throw it away.

The tecnology I chose was Python. Normally when I write Python, I insist on using mypy the type checker. This time I didn’t use it, and I didn’t write any tests either. It’s not code we’re going to keep if it hasn’t got tests or types, right?

However, I showed my prototype to a few people and some of them became early adopters and started using the tool for real work. This meant that while I need to do big changes, I didn’t want to break it for them. Thus I added mypy and created a basic test suite. And boom, now we might as well keep the code.

I think I got the basics right, so it’s probably fine. Next time I will write the prototype in Elixir or some other language that nobody at work uses. Then we can’t keep it, right?

In other news

People seemed to be delighted by my HTML cable car.

Photo: A closeup of moss with needles and dead leaves on it.

Weeknote 13: Deterministic Simulation Testing

Sun, 10 Nov 2024 00:00:00 +0000

This Tuesday was the day of the first Systems from HEL meetup. It’s a meetup about systems programming – there’s a bunch of systems meetups in the US and now there’s one in Helsinki, too.

Pekka Enberg, the founder of the meetup, was also the first person to give a presentation. He talked about deterministic simulation testing (DST). He gave an overview of the technique and demoed his prototype implementation penberg/hiisi. Pekka’s company Turso has recently started using DST in anger and he shared some lessons they had learned. Pekka’s talk can be viewed on YouTube.

A problem with systems like databases is that there are these bugs that are really difficult to trigger. Sometimes you need to do things in just the right order to trigger a bug, but in real world systems there are many sources of non-determinism: file and network I/O can be slow or fail, threads can get scheduled in different order, etc. This kind of bugs may get triggered in production, but it’s diffult to debug them because you can’t reliably reproduce them.

DST’s answer to the problem is simple: take control of all sources of non-determinism and make them deterministic in the testing environment. If you abstract away the calls to a file system, the network, or a time source, you can create a simulator runtime that can deterministically mock the results and inject faults.

Just like in property-based testing, you can use a pseudo-random number generator (PRNG) to to generate the results and the faults. You can also use it to generate the inputs to the system such as client calls. If you re-run the test with the same PRNG seed, you should get the same results – now you can debug it. By running the system with a lot of random PRNG seeds, you get a good chance of triggering rare bugs.

DST in practice

The trouble with DST that it’s difficult to pull off. The biggest takeaway for me from Pekka’s experiences was that you don’t have to go all in to get benefits. Controlling every source of non-determinism is a lot of work, but tackling even some of them lets you find bugs. At Turso, their experience was that every time they have taught the simulator new tricks, they have found new bugs.

If you think it sounds like fuzzing and chaos testing in addition to property-based testing, yup, you’re right. It combines ideas from all of them.

Historically FoundationDB pioneered the technique about a decade ago. Right now the some of the same people are pushing the envelope with Anthithesis, a general-purpose simulator testing platform. They have gone as far as developing a deterministic hypervisor. Another well-known implementer of DST is TigerBeetle, a financial database.

In conclusion

The presentation was interesting and there was plenty of questions during and after the presentation. Great discussion altogether! I’m looking forward for the next meetup.

Photo: Overripe berries of lily-of-the-valley in autumn sun. I was going to take a photo of the presentation for this post but I was following it so closely that I forgot!

Weeknote 12: In the weeds

Mon, 04 Nov 2024 00:00:00 +0000

Last time I thought I had a trick in my pocket to ensure that I get some writing done:

The solution I’m going to try is free writing, similar to morning pages: set a timer for ten minutes and just write whatever is on your mind.

It did not quite work the way I hoped for. Free writing is great, but it’s just that my work last week was in the weeds and I cannot write about it in detail.

Instead, I’m going to tell you about my Mastodon data exporter.

Before Mastodon had search, I wanted to be able to search my own toots¹. I made some scripts to download the data locally and search them in my text editor.

I’m using the “git scraping” technique described by Simon Willison. I made a Python script that gets all my statuses via Mastodon’s API and stores them in a local JSON file. Then I made a private GitHub repo and a GitHub Actions cronjob that runs the script and stores the result in the repo.

Whenever I want to search my posts, I pull the repo. I’ve got another script that formats the data into a Markdown file and I’ve also imported the data into a SQLite file. The SQLite file is useful if I want to find e.g. my most popular post.

I haven’t shared the scripts because I’ve got the scripts and the data all mixed up in one repo. Maybe one day. In any case, it’s nice that GitHub provides us this free infrastructure you can use to automate tasks like this.

Photo: Empty plastic water jugs lying in high grass, illuminated by autumn sun.

Maybe it was possible to search your own toots in Mastodon already back then,.I learned about it after I already had built the exporter in any case. ↩︎

Weeknote 11: Weeknotes

Sun, 27 Oct 2024 00:00:00 +0000

I’ve now done ten weeks of weeknotes. Let’s talk about how it has gone. In the first note, I wrote this:

I haven’t posted much. I’d like to change that. Instead of trying harder, I’d like to try solving it. A friend suggested posting weeknotes, so here goes.

Certainly it worked in the sense that I’ve posted every week. In general I’ve seen three kinds of benefits from blogging:

Writing clarifies my thoughts for myself.
Sharing the writing with people I know sparks conversations.
Getting a broad audience for the writing builds my reputation.

The weeknotes have hit the first and the second points on the list. My thoughts have been clarified and I’ve had a bunch of great conversations.

Whether these posts are an asset for my reputation as a software engineer, I don’t know. I have not attempted to circulate the posts widely as I don’t think they’re likely to get much traction. I’ve mostly shared them with a few groups of friends and on Mastodon.

I’ve spent maybe an hour per week on writing the weeknotes. It’s not much, so considering the benefits, it has been time well spent. I’m planning to continue the practice for the time being.

Making it smoother

There are a few things I want to change.

The earlier in the week you start writing the note, the easier it will be to write. The trouble is that sometimes you notice that it’s already Thursday and you feel you haven’t thought anything worthwhile.

The solution I’m going to try is free writing, similar to morning pages: set a timer for ten minutes and just write whatever is on your mind.

My work has been a bit scattered and you can see it in how the weeknotes jump from one topic to another. However, I’ve now been able to focus more tightly on building an internal tool and I’m hoping that I can now bring more continuity to the weeknotes as well.

Finally, I’ll drop the non-engineering tidbits. They have been well-received but they feel out of place in the weeknotes.

Photo: A yellow leaf lying in a puddle on asphalt.

Weeknote 10: Prototyping

Mon, 21 Oct 2024 00:00:00 +0000

My week wasn’t too focused as I was slightly ill, so I’m going to offer a selection of small thoughts.

My main concern at work has been how to get the tool I started working on a while ago to the users. The work has been slow due to my conference trips and other priorities, but I’ve gotten to the point where I must get feedback from the potential users.

I’ve tried to figure out what is needed to make their work easier. However, I haven’t myself done the work they’re doing, so it’s just a guess. I can continue to build based on my hunches, but more I build without validation, more there will be wasted work.

There’s a chicken-and-egg problem, though. You’ve got to build something to show the users and have them try it out, but you need to get feedback from the users to know what to build.

Anyway, I found some people willing to try my prototype. It feels like wheels are spinning forward again. Remains to see if there’s any traction.

I’m using FastAPI and the way I develop the app is by using FastAPI’s development server (fastapi dev). Under the hood (I believe) it uses Uvicorn’s auto-reload support to reload the code whenever the code changes. The auto-reloader works by restarting the whole Python process whenever the code changes.

What’s good about this approach is that it gives you a clean slate. There’s no old code dangling around. What’s bad is that it’s slow if you have heavy dependencies.

In my case, the deps are heavy indeed and it takes seconds to reload. If I was using Clojure, I could just send the changed functions to the running server via REPL or use tools.namespace to reload just the changed code.

Is there anything like that for Python? I may try to rig up something. Waiting for seconds takes too long.

Since I was a bit ill and I had to stay at home, I started playing The Legend of Zelda: Echoes of Wisdom. It’s the latest game in the series and the first Zelda game I’ve played since Oracle of Seasons on GameBoy Color.

Beware: There will be slight spoilers about the game below the photos!

I hate the boss fights!

The boss fights are puzzles. You’ll have to figure out the weird trick that works on this boss and then do it three times or so. After you defeat the boss for the first time, it comes back with a more powerful attack, as is traditional. Just like there has to be modulation in every Eurovision song!

The main game mechanic is that you can create “echoes” of items you’ve seen and monsters you’ve encountered. Usually the solution to the boss fight is to use one of the monsters. This means that you’ll run around dodging the boss attacks and drinking potions and trying stuff until you figure out something that works.

Once you figure out the trick, then you run around dodging the attacks and spawning the monsters until the boss is defeated. It feels like a chore.

Some other echo puzzles are nice, though. I like the jumping puzzles, where you have to get to some hard-to-reach spot by building bridegs, ladders, and platforms. The most important item in the game is the old bed. It’s the basic 2x1x1 tile item and the solution to so many puzzles.

I’ve had good time so far, but I’ve played now about two thirds of the main quest (I think) and it’s starting to feel repetitive to be honest.

Photos: Autumn vibes from Suvilahti, Helsinki.

Weeknote 9: EuroRust 2024

Sun, 13 Oct 2024 00:00:00 +0000

This week I attended EuroRust 2024 in Vienna, Austria. It was a two-day conference about the Rust programming language, organized by Mainmatter, a Rust consultancy¹, and this was the second time it was organized.

This was my first Rust conference. Unfortunately, it ended up being more of a miss than a hit for me, and it might be the last one in a while.

It was on me, really. I go to conferences to meet people, not for the talks. However, I was not in the mood for getting to know new people and I didn’t have many pre-existing connections to the community, so the hallway track didn’t amount to much.

Conference impressions

Overall the conference was well-run. There were some snags, but nothing unusual. The talks that I attended were fine, but none of them really hit it out of the park. Not much to report.

There was a bit of unconference in the form of “impl rooms”. Open source maintainers could post ideas for contributing to their projects and there was time and space reserved for them to mentor the contributors. I did a couple of PRs but unfortunately they could not be merged during the event.

So was there anything surprising about the event?

Level of experience. I’ve used Rust professionally alongside Python since late 2023 and I’ve thought myself as a newcomer to Rust, but it turned out I have plenty of experience already.

Many attendees seemed enthusiastic about Rust but they weren’t actually using it much. They wanted to find Rust jobs or to introduce Rust at the workplace. As a veteran of Clojure and Haskell communities, I’ve had this conversations many times before! I think the Rust people will have more success, though.

Focus. The conference seemed focused on the language itself. This was foreshadowed by the opening talk by Jon Gjengset about things people struggle with when learning Rust. He dove into technical details and talked about things like understanding the Send and Sync traits.

I’ve spent my time on nerding over programming languages, but I’m more interested in what kind of systems people are building in Rust and how. Charlie Marsh’s closing talk about how uv, the Python package manager, was built in Rust and what kind of tricks they used to make it fast was much more to my liking.

Descriptive complexity theory. From Amanda Stjerna’s talk on Polonius, the next generation of Rust borrow checker, the following factoid stuck to my head: Datalog is a language for programs that can be run in P. Surely I’ve encountered this fact before, but it’s interesting in any case.

Vienna was cool though

After the first conference day, I decided to skip the dinners and went to St. Stephen’s Cathedral to listen to a concert by the Finnish organist Jan Lehtola. That was great! Bach’s Chaconne was a powerful experience when played with the giant organ of the cathedral.

I also had time for Egon Schiele’s paintings at Leopold Museum and for a glass of Austrian wine at Die Rundbar (great wine, relaxed atmosphere). Vienna seemed like a nice city and I would like to go there again.

Photo: The huge inflatable EuroRust mascot at the conference venue.

A programming language conference organized by a consultancy specialized in the language? I feel like I’ve seen this pattern with some other programming language. ↩︎

Weeknote 8: Feedback

Sun, 06 Oct 2024 00:00:00 +0000

It is the perf season at work. We’ve been reviewing our own and each other’s job performance. The reviews will feed into managers’ assessment of our performance which in turns affects whether we will get a raise or a promotion.

I struggled a bit with writing the reviews. I realized I haven’t been giving as much feedback as I should have been and now I’ve got a backlog and the perf review is an awkward place to bring up new feedback. Praise is fine, but it’s not fun to get surprise negative feedback during the perf review. It’d be better to receive it directly from the other person when it’s fresh.

I talked with a few friends about how to get better at giving feedback. It would be much easier to do if there was a culture of feedback where it was normal and safe to share feedback. But how to foster such culture?

The best advice I got was to start by soliciting feedback yourself. It builds the relationship and lays the groundwork for also giving it.

It’s also apple season

The Finnish word for apple crumble is omenahyve, literally “apple virtue”. Considering how delicious it is, it might as well be apple vice.

Photo: A rock I saw in at the Kauhala crag in Kirkkonummi.

Weeknote 7: Memray + k8s

Mon, 30 Sep 2024 00:00:00 +0000

I wanted to track a memory leak in a Python program. The program was leaking only in production and so I had to figure out how to use Memray to attach to a process in Kubernetes. There were a few hurdles on the way, so here’s what I did.

Add Memray to your container image. We use Poetry for our Python projects, so I added Memray as a dependency.
```
poetry add memray
```
If you’re using the official Docker Python images as a base, be sure to use the non-slim variant. The debug symbols have been stripped from the slim variant.
Memray relies gdb (or alternatively lldb), so install that. We’ll also need the setcap binary, so install that too. On Debian-based images:
```
apt-get install gdb libcap2-bin
```
Unless you’re running a privileged container, gdb needs CAP_SYS_TRACE capability to work. As explained in py-spy docs, add it to Deployment.spec.template.spec.containers in your k8s spec
```
securityContext:
  capabilities:
    add:
      - SYS_PTRACE
```
Use setcap to add CAP_SYS_PTRACE to the permissible and effective capability sets of the gdb binary.
```
setcap cap_sys_ptrace+pe /usr/bin/gdb
```
Use kubectl exec to attach Memray to your Python process. Typically it’s PID is 1.
```
kubectl exec -it <your pod> -- memray attach 1
```

This starts memray’s live TUI. In practice you’ll want to generate a flamegraph, but I’ll let you to figure that out.

Note that CAP_SYS_PTRACE can be used for privilege escalation.

Something you should not do is to use setcap to set cap_sys_ptrace on the Python binary. The trouble is that it makes the actual process you want to inspect non-dumpable. As explained by PR_SET_DUMPABLE man page, a process’s dumpable attribute can get set to 0 when it executes a program that has capabilities:

The process executes (execve(2)) a program that has file capabilities (see capabilities(7)), but only if the permitted capabilities gained exceed those already permitted for the process.

I did this mistake because I thought that Memray itself needs the capability. That’s not the case since it relies on gdb.

Debugging notes

/proc/PID/status is your friend - see the lines starting with Cap for capability sets. You can use capsh --decode to make sense of the numebrs.
If /proc/PID/mem is owned by root even though the uid of the process is something else, that means the process is non-dumpable

Recommendation: Making music with C64

I bumped into Linus Åkesson’s video Making 8-bit Music From Scratch at the Commodore 64 BASIC Prompt and I recommend watching it. He shows how to code a small sequencer on Commodore 64 and makes some music with it.

Photo: Fog over reed in a sea shore on an autumn morning. Sunrise colors the clouds pink.

Weeknote 6: Heart of Clojure

Wed, 25 Sep 2024 00:00:00 +0000

Last week I attended Heart of Clojure. It was held the last time in 2019 and I had such great time that I decided to go again even though I don’t use Clojure these days. I knew what to expect, but nevertheless I ended up surprised.

I gave a lightning talk where I told everyone to blog as it clarifies their thoughts and feelings. Well, here’s a plot twist: Heart of Clojure ended up being an emotionally complex event for me and I have hard time putting it into words, at least in public. Therefore I’m just going skip it.

I’ll list a few things I liked about it. The keynote about working in the open by Lu Wilson. Meeting in person people I’ve known for years over the Internet. Meeting old friends. Meeting new people. The livecoding set by pulu. Juggling. Food. Giving a lightning talk (always give a lightning talk!). Belgian chocolate. In his keynote Eric Normand talked about category theory without mentioning the words category theory.

Luckily, other people have written about it and Toni Väisänen even made a video. Here are a few links:

Heart of Clojure 2024: It’s okay by Daniel Janus
Scrappy hearts and Clojure fiddles by Manuel Uberti
Travel Log: Heart of Clojure 2024 by Johnny
People of Heart of Clojure by Toni Talks Dev

Big thanks to everyone involved!

More conferences

Jamie Brandon has announced HYTRADBOI 2025.

Back in 2022, he organized a little online conference called Have You Tried Rubbing A Database On It, or HYTRADBOI. The conference was a chaos, but it was also great fun and it had a big impact on me. Have you noticed that databases are everywhere? Have you noticed that databases are cool?

I’m definitely attending it again, and submitting a lightning talk, too, if there’s an opportunity.

Even more conferences

EuroRust 2024 is right around the corner and I will be attending it. Say hi to me if you’re coming too!

Weeknote 5: Broken Input

Sun, 15 Sep 2024 00:00:00 +0000

I was thinking about debugging tools last week. The same theme continues this week.

My big insight is that many development and debugging tools must work with broken or incomplete inputs. Here are a few examples:

A debugger must work with programs that crash. In fact, that is exactly the time when many of us reach for a debugger.
Your editor should work even if there are some syntax or type errors in your code. You expect features like syntax highlighting and autocompetion to work while you’re in the middle of writing code.

This also applies to the tool I’m working on. If you want to use it for debugging, it’s not enough that it works for the kind of well-behaved data that we’ve got in production. It also has to work on the kind of data that we get when things are not ready yet and we’ve got some debugging to do.

Dynamic typing is possibly another example of allowing broken input. I’m a big fan of Python’s type system, but I’ve enjoyed not using mypy while prototyping new code. It’s convenient that you can have slightly broken code and you can just run it and see how it behaves. When you’re experimenting and playing, you don’t need to handle every None or other corner case.

Upcoming events

I’m going to be at Heart of Clojure the next week. If you’re coming, come say hi to me!

Photo: Geese in a park in Münich.

Weeknote 4: Debugging Tools

Sun, 08 Sep 2024 00:00:00 +0000

I’m switching the gears a bit at work and instead of working on databases, I’ll be working on a new internal developer tool for making sense of some timeseries data. That is what has been on my mind this week.

Essentially I’m making a debugging tool. What’s the bar for a successful debugging tool? At least it must be better than printf.

There are endless articles where experienced engineers admit that they debug software by inserting printf calls instead of reaching for a debugger like gdb. I do it, too, and printf debugging has a couple of pretty big upsides compared to more powerful tools.

You already know how to do it and how it works. Printing is one of the first thing you learn when you start learning programming.
It works almost everywhere.
It fits into your existing workflow – just add a line of code and re-run your code or tests.
The performance penalty is small and you control where you incur it.

I’ve tried the Python debugger integration in VS Code a few times and it has not been a success. The instructions for setting it up were cryptic and I wasn’t sure if it’s going to work. It made the whole program very very slow, not just the parts that I wanted to debug. The control flow is difficult to understand when you step through async code. Ultimately I gave up.

It’s tough to beat printf on its own turf, but when it comes to timeseries data, printf has a weakness: it’s difficult to analyze series of long lists of numbers by just printing them out. Having a graph would help a lot.

Now, there’s an abundance of tools for visualizing timeseries data. I hope to make use of some them. Certainly I’m not going to implement my own plotting library. The problem for me to solve is how to integrate them so smoothly in our developers’ workflows that they actually want to use them.

Literary pondering: On being aimless

I’m reading Pussikaljaromaani by Mikko Rimminen. Published in 2004, it’s a novel that chronicles how three good-for-nothings spend a summer day in the Kallio district of Helsinki.

In the spring I went to see Waiting for Godot, the play by Samuel Beckett. In the play, the two main characters Vladimir and Estragon, well, wait for Godot whom they’re supposed to meet any moment now.

What these two works have in common is that in both, the protagonists are spending time. Nay, they are killing time. They have no aim, no goals, other than to pass time. They are idle.

After seeing Waiting for Godot, I found it difficult to relate to but I couldn’t explain why. Now with Pussikiljaromaani, I understand it better: it is the aimlessness. But why is it an issue? I don’t know yet and it bothers me a bit. This thought is still in progress; eventually there will be a conclusion.

My recommendation this week, in so far there is any, is to find a cultural work that is widely appreciated and that you don’t get and ponder why.

Photo: Sea waves are hitting some rocks that are right on the sea level somewhere west of Hanko.

Weeknote 3: Object Storage

Mon, 02 Sep 2024 00:00:00 +0000

This week I’ve been thinking about object storage services such as Amazon S3. Despite being called “object” storage, in my mind they are used for storing files. However, that’s not the only way to think about them. Another perspective is that they are key-value stores for storing binary blobs

Nowadays a lot of people want to use them as a storage backend for their databases because they’re very durable and the data storage is cheap. We do, too, at work.

Retrofitting object storage to an existing system is not easy, though. Richard Artoul from WarpStream explains how difficult it is to make Kafka use object storage. Designing a new system based on it is easier but, as Chris Riccomini explains, the price of API calls remains too high:

The most naive implementation of a cloud-native LSM might simply send all WAL writes directly to object storage. This works and is reasonably low latency with S3 Express. Unfortunately, it’s expensive when you have a lot of writes. PUTs are $0.0025 per-1000 requests. A high-volume service that sustains 10,000 writes per-second would cost 2.5c per-second, or $65,000 per-month.

You need to do some kind of tiered storage where you combine multiple storage mediums with different trade-offs and automatically move data between them. Riccomini’s new project SlateDB initially stores the writes in the memory – cheap but not durable – and then flushes them in batches to an object storage service – durable but the API calls are expensive.

There’s another problem. Jack Vanlightly points out that latency is a problem. Standard S3 comes with double-digit latency for small objects, and while S3 One Zone Express brings it down to single-digit latency, it comes at the cost of durability.

Object storage makes a great bottom tier for large amounts of rarely-accessed data in a “cloud native” database. But what should the higher layers be?

Recommendation: Lehto / Korpi by Pauli Lyytinen

This week I’m recommending music again. I just bumped into saxophonist Pauli Lyytinen’s new album Lehto / Korpi which combines recordings from the Finnish nature with jazz and it’s delightful. I feel like I shouldn’t say too much and you should just listen to the track Korpi III.

Photo: The southern tip of Bodö in the Archipelago Sea. I’m not 100% sure if it’s Bodö, but if not, it’s one of the islands around it, maybe Bergholm.

Weeknote 2: Developer Experience

Sun, 25 Aug 2024 00:00:00 +0000

This week at work we were talking about improving the developer experience around the company. It is about how it feels to work with the technical systems as a software developer. Is it fun or frustrating to develop a new feature? What about debugging a production issue?

If you were truly ambitious about it, you might ask how to make it more fun and more exciting and more meaningful. We were more pragmatic and asked how to make it less frustrating.

Doesn’t lessening frustration make things more fun? To an extent yes, but joy is not just a lack of a frustration. Being less frustrated makes more space for joy but there has to be something else to be joyful about.

But maybe you can’t bring joy or meaning into the work from developer experience perspective. That has to come from the work itself.

Anyway, reducing frustration in development workflow comes down to two things:

Reducing friction. When you want to do something, do you know how to do it? Do you have to ask someone or can you just do it? Do you know how to look things up? Do you need to execute many manual steps? Do you need to make many decisions you don’t care about?
Tightening the feedback loops. When you do something, you’ll want to know if it had a positive effect. You made a code change – did it work? You deployed something – did it work?

These are overlapping themes. You can reduce friction by doing things like improving documentation, automating manual work, and agreeiqng on standard ways of doing things. Feedback loops can be tightened by speeding up build times, test suites, and CI pipelines, and improving observability.

The elephant in the room is technical debt. Legacy systems, hasty implementations, and poor architectural choices are a big source of frustration for software developers, but they cannot be made go away by polishing the workflow.

Every team is going to have tech debt, but you can keep it under control by actively managing it. However, that requires investing time and effort into it and often the decisionmakers aren’t eager to allow that despite it slowing everything down.

Recommendation: Yeung Man Hamburger Helper

This time I’m going to recommend a recipe I recently discovered.

While we were sailing, a friend cooked us hamburger helper based on the recipe by Yeung Man Cooking. It’s a pasta dish with tofu, soy sauce, and red wine, inspired by the American food product from the 1970s. I liked it, and I liked it when I cooked it myself at home. There’s no guarantee that what tastes great during a long day on the sea also tastes great at home, but this time it worked.

I’ve been looking for new, easy, vegetarian dishes for my weeknight cooking rotation and after cooking it a few times, looks like it’s going to be one. I don’t cook much with tofu or red wine, so it brings some new variety.

You can read the recipe from the video description but I recommend watching the video. The ASMR style presentation is great and funny.

Photo: A cable car going to the top of Wank. You can see Garmisch-Partenkirchen in the valley.

Weeknote 1: Schema Evolution

Sun, 18 Aug 2024 00:00:00 +0000

Preface: I’m trying out weeknotes

I believe writing regularly in public about your ideas is valuable. Writing helps you to clarify your thinking and sharing it lets you get feedback. You get something you can refer back to and link to.

I’ve been writing regularly. However, as regular readers may have noticed, I haven’t posted much. I’d like to change that. Instead of trying harder, I’d like to try solving it. A friend suggested posting weeknotes, so here goes.

Weeknotes are weekly updates about what you’re working on. I’m going to post about a software engineering topic that has been on my mind that week. I can’t write in public in detail about what I work on in my job, but at least I can write about the concepts. I’ll also include some non-engineering tidbit or recommendation.

This week: schema evolution

I’ve been thinking about how data models can be changed in a system where you cannot update all the participants at once. A typical example is a backend service that is called by a mobile app. When you change the backend API schema, the already-in-use versions of the mobile app should continue to work. To make it work, your changes have to be backwards and forwards compatible:

Backwards compatible: data written with an old version of the schema can be read with the new version of the schema.
Forwards compatible: data written with the new version of schema can be read with old versions of the schema.

Backwards compatibility is required so that the backend service accepts requests from the old app versions. Forwards compatibility is required so that the old app versions accept responses from the backend service.

What this exactly means depends on how you have implemented everything. For example, maybe your API schema includes a JSON object that contains an optional field name. Can you remove the field?

From the backwards compatibility perspective, it’s okay if your deserialization code ignores unknown fields. If it doesn’t, you’ll get errors about the unknown field name.

From the forwards compatibility perspective, you need to ask what an optional field means. Does it mean that the field can be omitted entirely or does it mean that the field is nullable, i.e. {"name": null} is acceptable? Do you accept both and do they mean the same thing? If the field can be omitted, then the change is okay.

If you just think about JSON, saying that nullable and optional are the same may sound silly. But if you consider how you’d model an optional field with a data validation library like Pydantic in Python, the sensible way is to use a nullable field:

from pydantic import BaseModel

class MyModel(BaseModel):
    name: str | None

Technically you could omit properties from a Python object instance but that would be strange and un-Pythonic.

You could, of course, avoid the whole problem by building a system that translates the data between schema versions. The most ambitious take in this space is Ink & Switch’s Cambria. If anyone is running a system like that at scale, I’d love to hear about it.

Recommendation: Ghosts by Hania Rani

A few years back I moved on from Spotify to buying albums and this means that now I listen to the same albums again and again. Ghosts by Hania Rani is a recent favorite. I like it how her soundscapes are rather abstract, but her singing brings the music back to concrete. It’s a good album to listen to in the morning as the songs have energy but they’re not in your face about it.

Picture: A view from the top of Wank, a mountain near Garmisch-Partenkirchen.

paketoi 0.1

Tue, 23 Apr 2024 00:00:00 +0000

A few months ago I wrote about how to build AWS Lambda deployment packages with Pex. It works, but it left me wondering: why isn’t there a one-command solution for building the packages for simple Python projects? I decided to build one.

It’s called paketoi. It takes a requirements.txt file and your source files and bundles them into a zip file that you can deploy on AWS Lambda. I’ve released the initial version on PyPI.

It’s a bit rough right now - it is version 0.1 after all - but I hope to polish it later on.

Why use paketoi?

AWS Lambda’s developer guide has instructions for building deployment packages with pip. Why use paketoi instead of them? There are two benefits:

It’s a single command instead of a bunch of calls to pip and zip
It works around that pip bug that sometimes results in wrong versions of dependencies being installed, depending on the Python version you’re using.

How it works under the hood is that it downloads deps with pex like in my previous post. It comes with the “complete platform information” files, so you don’t have to care about them. As a bonus, the result is zipped with repro-zipfile, so the checksum of the deployment package stays the same if the inputs stay the same.

Usage

For full installation and usage instructions, see the README.The easiest way to install it is with pipx: pipx install paketoi.

Here’s a small example. Let’s say you have a simple lambda with just source code file, lambda_function.py, and some dependencies listed in requirements.txt.

.
├── lambda_function.py
└── requirements.txt

Let’s say you want to build a deployment package that works with Python 3.12 runtime on arm64 architecture. You can do that by running the following command in the project directory:

paketoi -r requirements.txt --runtime 3.12 --platform arm64 lambda.zip

Now upload lambda.zip to AWS Lambda and enjoy your function.

Could you use uv instead?

Update 2024-09-11: Looks like uv has added the --target flag since I wrote this post. So this section is wrong and you could, in fact, use uv.

uv is a brand new Python package installer developed by Astral, the same company that also develops ruff. uv is billed as a drop-in replacement for pip so you might ask if it has the same shortcoming as pip.

Unfortunately uv does not yet support the --target flag, so it cannot be easily used for building the deployment packages.

But what about Poetry?

Are you using Poetry instead of requirements.txt like I have recommended on this blog? That’s great! I’ve noticed that since my post about Pex, a new Poetry plugin has appeared: poetry-plugin-lambda-build. I have not tried it out, but it looks potentially useful.

A toolbox of methods

Tue, 26 Mar 2024 00:00:00 +0000

There’s a lot of debate about software engineering methods and paradigms such as test-driven development (TDD), pair programming, agile methods, functional programming, and so on. Which one is the best?

This is how the debate is often framed: it’s about finding the one true method that will deliver the best results and should be used always. Someone will write a post putting forward their favorite method; another person will counter that it won’t work in their specific situation, thus clearly it isn’t good for anything.

I think this misses the mark. The way I think about methods is that you have a toolbox of methods out of which you pick the right tool for the job.

For example, Hillel Wayne recently wrote about What Mob Programming is Bad At. He posited that pairing and mobbing are great for knowledge sharing but they suck for optimization work.

My experience is the same. I’m glad that I have pairing in my toolbox and I can use it when it works well such as when onboarding people to a new project. I don’t think it’s the best choice for every situation. I have other tools for those times.

Code review is another great tool that I use all the time and advocate for, but let’s face it: sometimes all it does is slow you down.

Any tool can be used as a hammer but you do not have to.

Listing all the variables that affect the suitability of a method is left as an exercise for the reader. Here’s a starter:

Team composition and experience level
Interpersonal dynamics and power dynamics
Time and resource constraints
Type of work: design, feature development, maintaining an existing system, debugging, optimization
Goal of the work: prototyping, shipping production-quality code, learning, knowledge sharing

GitHub's PRs could be better

Thu, 29 Feb 2024 00:00:00 +0000

At work, we use GitHub to collaborate on code. We create short-lived feature branches and merge them back to the main branch via pull requests (PRs). This is a fairly standard workflow.

Unfortunately I’m not too happy with it. I’ve had trouble finding a perfect way of working with git and GitHub’s PR view.

I’d like to have the following:

Useful git history. There are many opinions on what makes git history great. Myself, I look at the blame annotations regularly, so for me descriptive PR titles are the most important part. Implement <feature> or Fix <bug> are great; Code review fixes or Make it work not so much. When using command-line tools, you can use --first-parent to see the merge commits instead¹, and in VS Code and on GitHub you can quickly jump to the PR that touched the line.

Easy re-reviews. As a PR reviewer, it’s nice if you can easily see what has changed since the previous review round. If the PR reviewer has pushed new commits, then it’s easy: GitHub can show you changes from the commits that were added since the last review. However, if the author has amended the existing commits and force-pushed them, then GitHub won’t do this.

No busywork or custom tools. When I’m ready to merge, I’d like to just press a button in GitHub to do so.

So, what’s the issue?

When you make changes to your PR after the first round of review, you need to either add new commits or amend the existing ones. If the changes are small, then it’d be better to amend them into the existing commits to avoid messy history, but then you won’t get easy re-reviews.

As a compromise, we sometimes create fixup! commits and push those. Once the PR has been approved, we then rebase them with autosquash, force-push, and add the PR to the merge queue. You cannot just merge by pushing a button on GitHub. Maybe we should script this, but this goes against my desire to avoid custom tools.

You could also consider using GitHub’s Squash and merge option, which squashes all the commits into a single commit on top of the default branch. This could be a great option if you do single-commit PRs anyway, except for one thing: now git on your computer cannot tell that the branch was merged. git branch -d will complain, git rebase -i will include stray commits, and if you use Jujutsu, jj git fetch does not hide the merged branch.

I’m not going to try to tell GitHub what they should do, but the situation does not feel optimal!

A general point: when we debate about how to best use git, it’s not just about git itself. It’s also about all the tools that integrate with git: the code forge, the editor, the CI system, etc. It’s also about the people with whom you use git.

I recently learned that GitHub now allows you to include the PR title and description in the merge commit message. This improves the usefulness of seeing the merge commits a lot! ↩︎

Creating AWS Lambda zip files with Pex

Wed, 31 Jan 2024 00:00:00 +0000

Update: See also my new tool that simplifies this.

So, you want to deploy a Python script to AWS Lambda and you have a few dependencies with native code. How do you build the .zip deployment package for it?

Let’s say that you have your script in lambda_function.py and your dependencies listed in requirements.txt. If you want to follow along at home, I’ve prepared a GitHub repo with an example script.

AWS Lambda’s documentation suggests a pip invocation that looks like this for installing the dependencies in the directory package:

pip install \
  --platform manylinux2014_x86_64 \
  --target=package \
  --implementation cp \
  --python-version 3.12 \
  --only-binary=:all: --upgrade \
  -r requirements.txt

You can then create a .zip file like this:

cd package
zip -r ../package.zip .
cd ..
zip package.zip lambda_function.py

And that’s it: package.zip is your deployment artifact.

However, the pip invocation above does not always result in a correct deployment package due to a shortcoming in pip. If your local environment does not match the AWS Lambda platform, the result may be wrong.

Hopefully the issue is fixed some day. Meanwhile, a common solution is to run the same command inside a Docker container. That works, but Docker on macOS is annoyingly slow. Wouldn’t it be great to have a correct solution without Docker?

Turns out pex offers one.

Let’s do it with pex

Pex is a tool for generating Python Executable files. It allows you take a Python program and all its dependencies and wrap them into a single .pex file that can be executed with python. The idea is similar to uberjars that are used to deploy Java and Clojure programs.

Taking a Python program and its dependencies and wrapping them into a single .zip file is what we want to do for AWS Lambda and pex’s new pex3 variant can do that. You can provide it with “complete platform information” that allows it to choose the right wheels, unlike pip.

You can install pex with pipx:

pipx install pex

Note that this installs two binaries, pex and pex3. They have different features and command-line interfaces. We will use pex3.

Getting the complete platform information

You can get the complete platform information for your local environment like this:

pex3 interpreter inspect --markers --tags

The result is a large JSON blob containing environment information such as Python version and the list of platform tags compatible with your environment.

However, we do not want platform information for your laptop. Instead, we need it for your AWS Lambda environment. Huon Wilson offers a solution on the issue tracker of Pants build system: upload the following code to AWS Lambda and run it.

import subprocess

def lambda_handler(event, context):
    subprocess.run(
        """
        pip install --target=/tmp/subdir pex
        PYTHONPATH=/tmp/subdir /tmp/subdir/bin/pex3 interpreter inspect --markers --tags
        """,
        shell=True
    )
    return {
        'statusCode': 200,
        'body': "{}",
    }

Grab the result from the logs and store it in a file called complete_platform.json.

It’s crude but effective. I’ve run the code on AWS Lambda for Python 3.12 on x86_64. You can see the result on GitHub.

I don’t know how often AWS changes their Python environment in such a way that you would need to generate a new file. My guess would be that not very often.

Building the deployment package

pex3 venv create will build a Lambda-compatible zip file for you if you use --layout flat-zipped. Like this:

pex3 venv create \
  --layout flat-zipped \
  --dir package \
  --complete-platform complete_platform.json
  --no-build \
  -r requirements.txt
zip package.zip lambda_function.py

And that’s it! Upload package.zip to AWS Lambda and try it out.

What if there are no pre-built wheels?

You might encounter a dependency with native code and no pre-built wheels. Unfortunately Pex won’t magically set up a cross-compiling environment for you. Using Docker might really be the easiest solution for ensuring a consistent build environment.

An exercise for the reader

Wouldn’t it be cool if there was a simple tool that built AWS Lambda deployment packages quickly and correctly? Pip and pex and Docker get the job done, but it feels complicated.

Considering how popular both Python and AWS Lambda are, I’m surprised that there does not seem to be popular tool that would just do it.

poetry bundle lambda, anyone?

Yearnote 2023

Mon, 08 Jan 2024 00:00:00 +0000

Hello and happy new year. Like last year, I’m going to indulge in self-reflection and tell you about my year.

On the photos: there’s one photo for each month, in chronological order.

Software engineering

I poured a lot of energy into technical work. I built a system for managing large data migrations and spent a lot of time on polishing and operating a large cloud backend written in Python.

It was nice to spend a lot of time on actually implementing things. I learned a lot about new technology. I now know a lot more about various AWS services (AWS Lambda, looking at you, and DynamoDB too) and about using Python at scale.

The flipside of the coin is that I’m not sure if I’m any more skilled at shipping large projects than I was a year ago. Technical work is in my comfort zone; collaboration and coördination less so.

Blogging

I wrote this about 2022:

I didn’t blog much in the latter half of the year. I was busy at work and I didn’t have much to say. I hope to get back to blogging soon, though. I’ve learned a few lessons worth sharing and it would be great to participate in the software engineering community’s intellectual discourse.

I didn’t get back to blogging in 2023. A part of the problem is that after leaving the Clojure community behind, I haven’t become a member of another software engineering community. Writing in the void doesn’t work – you have to write for someone and right now I don’t know who I am writing for.¹

Still, I wrote one good post about Python packaging, titled “Do not use requirements.txt”. Thanks to everyone who engaged with it! I didn’t realize it would strike such a nerve, but it got a lot of attention. Alas, the trouble with Python packaging is here to stay.

Eating better

I chaged how I cook and eat and it improved my life!

In fall, I was having trouble with cooking for myself on weekdays. Starting to cook when you’re already hungry is the worst. After work, I would often procrastinate with cooking and then feel miserable the whole evening.

Luckily I bumped into booritney’s ultimate guide to meal prep. Obviously I had heard about meal prep before, but it was such an inspiring post that I started meal prepping immediately.

I have always cooked with the intention of eating the leftovers the next day, but now I reserved time for cooking even bigger batches of food. I started to assemble the dishes into ready-to-eat portions and ensured that I have a small dessert for each meal.

Turns out this works really well for me! I enjoy cooking as long as I don’t have to do it while hungry, so batching it into time when I have the energy for it was a great improvement.

Despite all the meal prep, I was often weirdly hungry and craving for treats. I decided to keep a food diary for a week. My conclusion was that… I’m not eating enough. I started eating more and I became less hungry. Wow!

My calorie and protein intakes were clearly under various recommended numbers. This is not a common problem for the average Finn in this day and age, so it had not occurred to me to consider it. I’m glad it turned out to be so simple to fix. Being hungry sucks.

Outdoors life

In the summer, I kayaked Soisalo Runt. Located in Lake Saimaa between Varkaus, Kuopio and Heinävesi, Soisalo is the largest island in Finland. I paddled around it, starting and ending in Varkaus.

It took me nine days to paddle the 245 km trip. Along the way there are a bunch of beautiful places like Southern Kallavesi and Kolovesi National Park. I especially enjoyed the scenery of Heinävesi Route and it was fun to paddle downstream.

The trip was not without hardships. The constant rain and mosquitos made my mood miserable and my new paddling jacket gave me a rash. Still, I finished the planned route. That was some Type 2 fun.

As a cherry on the top, I went through Taipale Canal with a kayak. Going through a 160 m lock with five meter difference in the water level was exciting and slightly scary.

I also participated in Nordic Sea Kayak Camp in Inkoo organized by Suomen Merimelonta (formerly known as NIL Finland). It was a great weekend of kayaking lessons, hanging out with other kayakers, and even a small competition.

My favorite lesson was the one about paddling backwards, taught by Anssi Nupponen. Turns out it is fairly easy to do as long as you understand how it works, and that understanding also transfers to paddling forwards.

In addition to kayaking, I climbed indoors a lot, sailed a bit, ice skated a bit, and did a small hike in Lapland.² Also, I fell through the ice while walking on a frozen lake. It was less unpleasant than I expected.

Reading

I read a bunch of books and I did a Fedi thread on them. It was a nice exercise to write at least a couple sentences about each book.

The book I’ve thought about the most this year was a graphic novel. Kate Beaton’s Ducks: Two Years in the Oil Sands is a memoir about how she graduates from the university and goes to work at Canada’s oil sands to pay off her student debt. A big topic is being one of few women at the isolated work camps (content warning: sexual abuse). Recommended!

Beaton is also known for her web comic Hark! A Vagrant, which I recommend for lighter mood.

Best of 2023

Best album: Tremors in the Static by Vega Trails. Lovely atmospheric jazz. In general I love everything released by Gondwana Records.
Best porridge: The rice porridge from Helsinki Christmas market with browned butter and miso caramel. I didn’t actually have it at the Christmas market but I followed the recipe (in Finnish only) and it turned out great!

What about 2024?

I don’t know yet. I’ll figure it out.

Traditional commentary on Finnish politics

In each yearnote, I express (lack of) surprise at the current cabinet of the goverment of Finland.

Last year, I predicted that Sanna Marin’s cabinet would fall apart right before the election like Juha Sipilä’s cabinet did. That did not happen but I will not let it put a damper on my political punditry.

I don’t think Petteri Orpo’s cabinet is going to fall apart in 2024. The cabinet has gone through many scandals already and their politics face strong opposition. Nevertheless, the cabinet parties continue to enjoy their voters support. Prime Minister Orpo gets it that you can stay in power as long as you keep the cabinet parties happy.

I expect that the scandals will continue, though. The Finns Party has a strong track record there and many of their politicians got elected by being provocative and polarizing. I don’t think they are planning to stop.

This post in specific is for my friends, but that’s not the case for my more engineering-focused posts. ↩︎
Before writing this post, my perception was that I had slow outdoors year. After going through all my photos and notes, I’d say I actually had a pretty good outdoors year. It’s easy to forget all that you have done when you hear your friends’ cool stories. ↩︎

Do not use requirements.txt

Tue, 31 Oct 2023 00:00:00 +0000

Are you developing a backend service in Python? I have two pieces of advice for you:

Do not use pip and requirements.txt to manage Python dependencies. They lack crucial features that should be built-in.
Use Poetry instead.

To me, the first one is a no-brainer. The second one is more tentative: Poetry is a great option, but it’s hardly the only option worth considering. I’ll explain below.

pip’s missing features

pip is a tool that you can use to install packages from The Python Package Index (PyPI). It comes with Python and if you’re a Python developer, you have probably used it many times.

The traditional way to manage dependencies for a Python project was to list them in a file called requirements.txt and use pip install -r requirements.txt to install. However, pip was designed to be a package installer and not a full-fledged project workflow tool. pip lacks two essential features, dependency lockfiles and automatic management of virtualenvs.

Dependency lockfiles

If you want to get same behavier in all environments - your laptop, CI, production - you need to pin the versions of your dependencies and their transitive dependencies. You can pin the versions of your direct dependencies in a requirements.txt by specifying for example requests==2.31.0 instead of requests.

However, pip won’t pin the versions of the transitive dependencies. This can be solved by using pip-tools to expand requirements.txt into a file that lists the full dependency graph with exact versions and checksums for the artifacts. pip-tools is great but you need to set up it yourself and figure out how it fits your workflow.

This feature is table stakes in other languages - for example, npm has had package-lock.json for years now and Cargo has Cargo.lock. This really should be a built-in feature in a project workflow tool.

Automatic management of virtualenvs

The way to create isolated environments in Python is by the use of virtualenvs. Traditionally you manage them manually: you create one with a shell command (python -m venv example to create a virtualenv called example) and when you want to use it, you need to activate it with another shell command.

This is error-prone: forgetting to activate the virtualenv or activating a wrong virtualenv are common mistakes. There are bunch of workarounds. For example, you can use pyenv-virtualenv to make your shell auto-activate a virtualenv when you enter a project directory. direnv can do it, too.

Again, this too should be a built-in feature in your workflow tool. You should not need to glue multiple tools together. You won’t hear about npm or Cargo users having problems with virtualenvs.

Poetry and other options

Fortunately, a lot of people have identified these problem and worked to solve them. Less fortunately, this has resulted in an explosion of Python project workflow tools. So how to pick one?

My recommendation is: go with Poetry. It has lockfiles, it has virtualenv management, it’s popular and actively developed. In my experience, it’s not perfect but it works.

You could also consider Hatch or PDM. They’re similar to Poetry. I haven’t used them myself, but I’ve heard other people use them with success. Hatch seems to be especially popular with library authors.

If you’re looking for a more powerful option that can deal with e.g. multiple subprojects, Pants build system has great Python support. It has significantly steeper learning curve however.

Finally, if you’re looking for a rustup-style solution that can install Python for you, there’s rye. It’s new and experimental, but maybe it’s the right choice for you?

Where is the canonical workflow tool?

It would be great if Python came with a canonical project workflow tool. A lot of people wish that pip would become one. Node.js comes with npm and Rust comes with Cargo, so why can’t Python come with one? Why are there so many competing options?

The biggest obstacle, to my knowledge, is that since Python is used so widely and for so many different use cases, coming up with a universal official solution is difficult and slow (and underfunded) work. It’s not clear if pip is the right home for these features, either.

If you want to learn more, read and listen to these people who are, unlike me, deeply involved in the Python community:

Stargirl (Thea Flowers) on Fediverse: So You Want to Solve Python Packaging: A Practical Guide
Pradyun Gedam: Thoughts on the Python packaging ecosystem
Talk Python to Me (podcast): Reimagining Python’s Packaging Workflows

An aside on Clojure

Clojurists reading my blog may ask: hey, what about Clojure, how come we do not have lockfiles? That’s a great question!

The Clojure community has solved this by always using explicit versions instead of version ranges for dependencies, even in libraries. The version descriptors would actually support ranges, but nobody ever uses them. This way, as long as the version resolution algorithm is stable, you always get the same versions.

In theory, the transitive dependency version mismatches could be a problem, but Clojure is amenable to a coding style where it rarely causes issues.

In contrast, in Python and Node.js communities it is expected that libraries list version ranges for their dependencies and the package management tools complain about version mismatches.

Recipes for updating poetry.lock

Thu, 25 May 2023 00:00:00 +0000

I’ve been using Poetry for package management in Python projects for a while now and, for what it’s worth, it’s working well for me. However, some regular tasks require multiple commands with specific arguments. Here are a few recipes you might find handy.

Updating the lock file after editing pyproject.toml After you edit pyproject.toml, you’ll want to update your lockfile and your virtualenv. Here are the right commands:

poetry lock --no-update
poetry install --sync

Without --no-update, Poetry will upgrade all dependencies that are not pinned down, which usually is not what you want. Without --sync, Poetry does not remove packages that you have removed from pyproject.toml.

I use these commands so often that I’ve put them into a script called poetry-locksync.

Upgrading a secondary dependency. If you want to update a direct dependency, you can edit pyproject.toml and run poetry lock --no-update. But how do you upgrade a dependency of one of your direct dependencies to a specific version? You might want to do that to upgrade a package with a security vulnerability, for example.

One way to do it is by adding the dependency as a direct dependency with poetry add and then removing it again.

poetry add --lock your-library@latest
poetry remove your-library

Resolving merge conflicts in the lockfile. If two developers change the dependencies at the same time, you will end up with a merge conflict in poetry.lock at least in the content-hash line. The easiest way to resolve them is to regenerate the file with Poetry. First, resolve any conflicts in pyproject.toml. Then you can run this script which I call git-resolve-poetry-lock

git checkout --ours poetry.lock
poetry lock --no-update
git add poetry.lock

Branchless git workflows

Mon, 15 May 2023 00:00:00 +0000

I’ve been experimenting with the so-called branchless version-control workflows. The idea is that instead of using named branches, you just juggle a bunch of commits on top of the main branch.

Here are a couple of tools implementing the idea - while Sapling and Jujutsu bill themselves as new VCSs, they both work with git repos:

git-branchless, which builds on top of git
Sapling (sl), a new VCS published by Meta
Jujutsu (jj), a new VCS which has some Google backing

git-branchless is the one I’ve used most so far. The others I have only dabbled with.

A central feature of these tools is the “smartlog” which shows the graph of commits you have on top of the upstream main branch. Sapling’s smartlog output looks like this - you have three commits on top of main and the @ sign indicates that you’ve checked out the commit with hash 335bb92d2.

$ sl
@  335bb92d2  3 seconds ago  miikka.koskinen
│  fix: fix bug B
│
│ o  d5c08952e  21 seconds ago  miikka.koskinen
├─╯  feat: implement feature A
│
o  5ca31cccb  69 seconds ago  miikka.koskinen
   refactor: refactor the code base

This kind of situation happens to me regularly: I’ve made a refactoring (or a bug fix) in one branch and I want to start another branch on top of it. But what happens if you spot a mistake in the refactoring and want to edit in?

In plain git, you’d probably do a fixup! commit on top of either of your branches - the one with the feature A or the bug fix B - and git rebase -i it into the refactoring commit. Then you’d rebase the other branch on top of the new refactoring commit.

In Sapling, you’d switch to the refactoring commit with sl previous or sl goto and use sl amend to edit it. This automatically rebases the descendant commits on top of the new commit.

This is especially nice if you like to use stacked PRs. I’ve lately used them a lot since I’ve worked on big changes that would be difficult to review at once. I’ve yet to try any of the tools’ GitHub integration - I’ve just manually managed the PRs - but the tools make it easier to deal with the code review fixes to the root PR.

Another thing it’s good for is creating experimental commits - having the smartlog and not having to name your branches removes a lot of friction from branching out for experiments.

All of this is possible with plain git, but the new tools make it more convenient and less error-prone.

Yearnote 2022

Sat, 21 Jan 2023 00:00:00 +0000

Year 2023 is here. So, how was 2022? Allow me to review my year.

On the photos: there’s one photo for each month, in chronological order.

Professionally

In January 2022 I wrote this:

For many years, I’ve focused on building web services in Clojure. However, I feel that I’ve done enough of it for now and more interesting problems and bigger impact await elsewhere. Thus I’d like to turn a new page in my career. I didn’t quite manage to do it in 2021, but there were a few starts.

The new page got turned. In April, I left my job of six years at Metosin (the Clojure consultancy). After a month and a half of job search, I landed at Oura Health (the smart ring company) where I work on data storage and access.

Here’s what I learned about job search:

It’s a lot of work! Leaving my previous job without having the next one lined up was the right choice for me so I was able to properly focus on the search and the interviews. I’d recommend it if you’re in a position financially and otherwise to do so.
Cold applications are waste of time. I did get the new job by sending an application, but most of my other applications didn’t even get a response. Networking on Twitter worked much better and uncovered many interesting opportunities.

The start at the new job was rocky. It’s not so easy to get to know your coworkers when you are a remote worker joining during the summer vacation season and I was certainly second-guessing my choice of employer when the company announced layoffs a week after I started. Once the summer was over, I was bit by a tick, got infected with TBE and ended up on a long sick leave.

Luckily I recovered well. I got back to work and started to finally get up to speed. I got to know my teammates and people beyond my team and learned or re-learned the tech that we’re using (Python, various AWS services). I even got a nice data migration project under my belt.

Blogging and microblogging

While I was looking for work, I wrote a few posts about databases. I thought they turned out well:

I didn’t blog much in the latter half of the year. I was busy at work and I didn’t have much to say. I hope to get back to blogging soon, though. I’ve learned a few lessons worth sharing and it would be great to participate in the software engineering community’s intellectual discourse.

After Elon Musk took over Twitter, I lost my interest in posting there. I don’t think any single platform can fully replace Twitter, but there’s a nice thing going on right now with Fediverse. I migrated there and you can follow me at @[email protected].

Hobbies

I first tried bouldering in late 2021. In 2022, it really became my thing. I spent a lot of time at the climbing gym, doing at least one session almost every week, often more. I ventured out a few times, too.

What I like about bouldering is the feeling of progress and mastery it gives you. I’ve surprised myself by how strong I’ve become (not very strong, but I wasn’t very strong to begin with) and it feels great to be able to climb a difficult route after enough practice.

I did a few kayaking trips, visited Amsterdam, sailed, read a bunch of books and completed a Discworld Pareto Read.

Something I didn’t do was photography. I’m a bit bummed about it! I wish I had taken more photos of things I’ve done and friends I’ve spent time with. In 2023, I want to take more photos.

Best of 2022

Just some good things that have stuck with me.

Best mämmi experience: I made mämmi myself and it turned out well! Still, not worth the effort, I’ll just buy it the next time.
Best wine bar experience: Let Me Wine’s pop-up wine bar at Harju8. They had created a smoky atmosphere by using a fog machine. It was so cool and the wine was great, but it was so hard to breathe in there that we had to leave after having only one glass.
Best taco experience: Bacalar in Amsterdam-Noord. The tacos with meat were great but the vegetarion options were even better. And they served me my favorite drink, a mezcal negroni!
Best reading experience: Radalla by Iida Sofia Hirvonen. Hirvonen’s command of language is something special.
Best music experience: Far Star by Gilad Hekselman. I love this kind of rich yet simple jazz.

What about 2023?

I’ve enjoyed my work a lot lately and I hope that continues and I can grow my impact and influence inside the company. It would be great to find ways to write and talk about my work externally, too.

I’m going to continue bouldering a lot. I didn’t have time for a proper hike in 2022 and that’s something I would like to rectify in 2023. Or maybe I’ll do a big kayaking trip.

Traditional commentary on Finnish politics

In each yearnote, I express (lack of) surprise at the current cabinet of the goverment of Finland.

Sanna Marin’s cabinet held together like I expected, although there were a few close calls. The patient safety act debacle was one, and so was the time when the MPs of Center Party (a cabinet member) voted against the nature conservation act proposed by the cabinet.

However, I don’t think that they will hold together until the next parliamentary election in April. Instead, I think the cabinet will fall apart right before the election in such a way that it does not really affect any policymaking but may score the parties some points for the election.

SQL, Clojure, and editor support

Mon, 20 Jun 2022 00:00:00 +0000

When you’re writing code with a modern editor or an IDE, you can count on having a number of convenient features such as syntax highlighting, autocomplete, and code navigation. Unless you’re writing SQL, of course!

A lot of editors and IDEs, such as IntelliJ IDEA, have really nice support for SQL. However, it’s not so easy to benefit from it in practice. If you’re writing Clojure and you want to use SQL to query a database, you have a few options:

Embed SQL into strings in your Clojure code.
Put SQL into .sql files and import them with HugSQL or similar.
Use a low-level query builder such as HoneySQL.
Use a high-level query builder or an ORM such as Toucan.

Let’s look into each of them in more detail.

Embed SQL into strings. A basic query would look like this:

(jdbc/query db ["SELECT title, rating FROM movies WHERE movie_id = ?" movie-id]))

This is simple, but most likely you aren’t getting syntax highlighting or other features for the string. For example, IntelliJ IDEA does support SQL inside Java strings, but it does not work in Cursive. Furthermore, if you’d need to parametrize query in a way that is not allowed by the ? placeholder, you’ll have to resort to string templating, which is prone to SQL injection bugs.

Put SQL into .sql files. With HugSQL, the SQL file would look something like this:

-- :name get-movie-by-id
SELECT title, rating FROM movies WHERE movie_id = :id

When you load this with HugSQL, it defines a function called get-movie-by-id which executes the query.

(hugsql/def-db-fns "my_queries.sql")
(get-movie-by-id db {:id movie-id})

What is great about this approach is that now you get the full editor support for SQL. In Cursive, however, you can’t jump from Clojure code that refers to get-movie-by-id to its definition in SQL. Jumping to definition works with normal Clojure functions, but Cursive does not know how to deal with functions defined by HugSQL.¹

HugSQL has advanced support for parametrizing your queries using snippets and Clojure expressions. If you use Clojure expressions, though, you now have the opposite problem: there’s Clojure code embedded into your SQL comments and there’s no editor support for it.

Use a low-level query builder. With HoneySQL, the query would be built like this:

(->> (sql/format {:select [:title :rating]
                  :from   [:movies]
                  :where  [:= :movies.movie_id movie-id]})
     (jdbc/execute db))

There’s a lot to like about this. Your editor’s Clojure support works with this. You won’t get autocomplete for database identifiers, but completion for commonly used keywords can be good enough. Queries are Clojure data, so you can use the full power of Clojure to generate them.

The main problem is that you need to learn a new, non-canonical syntax for your SQL queries. You know how to write the query you want in SQL, but now you need to figure out how to map it to HoneySQL. It shouldn’t be too hard, but over the years me and my colleagues have struggled with it.

Use a high-level query builder or an ORM. With Toucan, the query would look like this:

(defmodel Movie :movies)
(db/select Movie :movie-id movie-id)

From the editor support perspective, this is about the same as using HoneySQL. It does save you quite a bit of boilerplate. Toucan relies on HoneySQL for advanced paremetrization, so the syntax problem remains.

None of the approaches seems like an obvious winner. In practice, every big project I’ve seen has used a mix of them.

Contrast this with Datomic and MongoDB: the query languages of both can be represented cleanly enough as Clojure data and so that’s what you use. This assortment of options does not exist for them because it’s not needed.

In his article Against SQL, Jamie Brandon argues that SQL’s drawbacks cause “a massive drag on quality and innovation in runtime and tooling”. He does not mention editor support, but I can’t help but think that it’s an example of the effects of that drag.

I think jumping to definition works in CIDER and/or Calva, but have not verified it. ↩︎

What does `identical?` do?

Wed, 15 Jun 2022 00:00:00 +0000

Dear fellow Clojure enthusiasts, do you know what the following two code snippets evaluate to? And why is that the result?

(identical? ##NaN ##NaN)
(let [x ##NaN] (identical? x x))

I didn’t know it a couple of days ago and it took me a while to understand it. Now I want to share my understanding.

Go on, make a guess and check it with a REPL. When you’re ready – or if you already saw me post about it on Twitter – scroll past the photos below for an explanation.

The photos in this post are from Lill-Skorvan island near Porkkalanniemi, Finland.

Here are the results on Clojure 1.11.1:

(identical? ##NaN ##NaN)          ; => true
(let [x ##NaN] (identical? x x))  ; => false

At first I thought that this is the NaN != NaN feature¹ of IEEE 754 floats, but that is not the case.

clojure.core/identical? checks the reference equality of its arguments. Its docstring says:

Tests if 2 arguments are the same object

##NaN refers to the constant Double/NaN, which is a primitive double. That is, it’s not an object. When a primitive value is passed to a Clojure function as an argument, it gets wrapped into an object. This is called boxing. Concretely this means calling Double/valueOf, which converts the primitive double into a java.lang.Double object.

The two snippets evaluate to different values because in the first snippet ##NaN gets boxed only once, but in the second snippet each function argument is boxed separately. This comes down to implementation details of Clojure. You can see the behavior in the disassembled byte code I posted on Ask Clojure.

When the reference equality of two boxed doubles is compared, they’re considered not equal even if they wrap the same value. This explains the results we saw.

Here’s a bonus exercise: what does this evaluate to and why?

(identical? 1.0 1.0)

I’ve used Clojure for a decade and there are still nooks and crannies I’m not familiar with. I guess it just takes a while to learn a programming language properly.

There are good reasons for the feature, or at least they were good back in the day. A lot of programmers dislike floats, but my hot take is that they’re actually successful solution to a complicated problem. What we should do is to start using decimal floats, which would match programmer intuitions better than binary floats. ↩︎

Tasks of a schema migration tool

Mon, 16 May 2022 00:00:00 +0000

The last time I wrote about how database schema migration tools could do more to help us. The way I see it, any schema migration solution has to cover three tasks: creating, managing, and executing migrations.

Creating migrations. The first task is to create a migration script. If your scripts consist of DDL commands in SQL files, you’ll probably write them by hand. A linter like Squawk can help you to ensure the migrations do not cause unnecessary downtime.

Things get more interesting if you can describe your target schema in another programming language or as data. For example, when using Django’s ORM or automigrate for Clojure, the tool generates the migration automatically by comparing the target schema to the current schema. Sometimes you may need to edit the generated migration by hand, but mostly the tool does the job for you.¹

Managing migrations. You need to keep track of which migrations have been applied to each of your databases (local, staging, production). When you want to migrate a database, the tool is able to give you a list of required migrations and the order in which they should be performed. The tools tackling this problem end up having essentially same features.

One problem that migration managers need to solve is how to support multiple developers creating migrations at once. If you number your migrations incrementally like Flyway does, migrations in concurrenty-developed branches may end up using the same number.

To avoid this, some tools such as dbmate and Migratus use timestamps in migration names. Ben Kuhn’s sketch of a migration tool has an alternative solution: it uses incrementing numbers, but it provides a rebase command to automatically renumber the migrations based on Git history. That’s a nice touch!

Executing migrations. At the basic level this means running the DDL commands from your SQL files. However, as we learned the last time, you need to be careful to avoid downtime and do things like set lock_timeout in PostgreSQL.

While every tool that manages migrations can also execute them, there are specialist tools such as gh-ost for MySQL which focus only on executing migrations in a robust way without touching any of the other tasks.

Again, not using DDL for migrations seems to open the door for innovation. For example, Reshape for PostgreSQL has a novel way of executing migrations: by using schemas, views, and triggers, it allows you to use both the old and the new schema at the same time. This means that your application does not have to be backwards-and-forwards compatible with database changes.

As seen, there are plenty of tools out there for dealing with these tasks. There’s no one-size-fits-all solution. If your needs are simple, a simple tool will suffice. However, if you’re looking to ensure safety and efficiency, you should look further.

You could describe the target schema with a CREATE TABLE command and have a tool generate the migration by comparing that against the current schema. I’m not aware of such tool, however. ↩︎

Schema migrations and avoiding downtime

Mon, 02 May 2022 00:00:00 +0000

If you’re developing an application that is backed by a SQL database, sooner or later you will need to do a schema migration. Maybe you’ll need to add a new table or a new column, create a new index, or change some constraint.

Luckily there are plenty of tools to help you! Migratus for Clojure, Flyway for Java, and dbmate as a more language-independent option are examples of the most common design pattern: you write migrations as DDL commands in SQL files and the tool keeps track of which of the migrations have been applied to the database.

Let’s say you forgot to make the e-mail field of your user table unique and now you want to fix it. With these tools, the migration could look something like this:

ALTER TABLE users ADD CONSTRAINT users_email_unique UNIQUE (email);

Many object-relational mappers such as Django’s ORM and ActiveRecord from Ruby on Rails exhibit another pattern: the migrations are written using a DSL in the ORM’s programming language (Python and Ruby, respectively). Here’s what the migration above could look if it was created with Django:

from django.db import migrations, models

class Migration(migrations.Migration):
    operations = [
        migrations.AlterField(
            model_name='user',
            name='email',
            field=models.TextField(unique=True),
        ),
    ]

One tricky thing about migrations is that to alter a table, you have to lock it potentially for a long time. This means that migrations can cause downtime. Braintree has published a great guide for avoiding downtime with PostgreSQL migrations but even then you have to be careful.

For example, when you add a uniqueness constraint in PostgreSQL, an index gets created. The guide advises that the migration above should be done in two steps to avoid grabbing an exclusive lock while the index is being created:

First, create a unique index using CREATE INDEX CONCURRENTLY.
Then, add the constraint with USING INDEX which only requires a short-lived exclusive lock to alter the table’s metadata.

If you’re like me, you might think that it’d be a great idea to encode all this knowledge into a tool. And it turns out that people have done it!

There are linters such as Squawk that check your SQL files. Here’s what Squawk has to say about our migration:

example.sql:1:0: warning: disallowed-unique-constraint

   1 | ALTER TABLE users ADD CONSTRAINT users_email_unique UNIQUE (email);

  note: Adding a UNIQUE constraint requires an ACCESS EXCLUSIVE lock which blocks reads.
  help: Create an index CONCURRENTLY and create the constraint using the index.

Seems good – this is exactly what I was looking for.

Since Django migrations are specified on a higher level, could it do this kind of tricks automatically? Possibly, but as far as I know, currently it doesn’t. For ActiveRecord, there’s pg_ha_migrations which implements these ideas.

Tools like gh-ost and pt-online-schema-change for MySQL are another stab at this problem. They’re only concerned with running ALTER TABLE statements in a safe, robust way. The problem of managing migrations is left to you.

My point is that be careful with your migrations but my meta-point is this: migrations of heavily-loaded databases take skill and your tools can help you. I’ve heard it said that all you need for migrations is a bunch of SQL files in a directory. It goes a long way, yeah, but you can go further.

Clojure and what could've been

Wed, 06 Apr 2022 00:00:00 +0000

Epistemic status: Anecdotes and opinions.

I’ve used Clojure for over a decade now: I first learned it in 2012 and started using it professionally in 2013. I’ve been reflecting on what has happened and what the future looks like. In this post, I want to share a few musings about it.

Missed opportunity: production REPLs

REPL is so central to Clojure that people even talk about REPL-driven development. It’s not just about having a prompt for typing commands like in Python¹. Instead, it’s a way of connecting your editor to a running Clojure system.

Usually this is done only during the development. You could connect to the REPL of production system to debug or maintain it, if you dare. There are stories about adventurous Clojure developers doing to hotfix a running system.² Usually these stories end with a word of caution: the devs will tell you about a time when they forgot to commit the fix to the version control system and the changes were lost the next time the system was restarted.

You used to be able to do it, anyway. Nowadays when the standard model of deploying software is ephemeral containers in a Kubernetes cluster, the production REPL is less meaningful. If you need to apply the same hotfix to eight different containers and Kubernetes can re-schedule them at any moment, there isn’t much point to this. Your services are now cattle, not pets.

At Metosin, we had this belief that a small but skilled team using sharp tools can deliver better software faster than an ordinary team relying on standard tools. The recent blog post by Tailscale about their database choices seems like an example of this: SQLite is an unusual choice for their setup, but they know what they’re doing and it’s working great.

I wonder if we’ve missed a similar opportunity here. Ephemeral containers have their benefits, especially for scaling, but do they always outweight the productivity benefits of using production REPLs for debugging?³

FaaS did not kill Clojure

A few years ago, Function-as-a-Service (FaaS) platforms such as AWS Lambda were surging in popularity. I thought it could be the end of Clojure for web services. Replacing all the glue code of web services with an API gateway and some lambdas seemed like a big win: all your code is about the business logic; the infrastructure has been abstracted away. I didn’t see what Clojure could offer here: it’s startup time was slow and REPL-driven development didn’t fit the FaaS platforms.

I was wrong. People still develop long-running server software just as they did in 2015. AWS Lambda became a popular tool scripting AWS services and for connecting them to each other, but it didn’t replace traditional web backends. As far as I understand, the startup time problems have been mitigated, too, and Clojure lambdas are a feasible choice nowadays.

Thinking back, I expected AWS Lambda developer experience to be rapidly improved. This didn’t happen and it still clunky if you have a lot of lambdas. Frameworks such as Serverless tried to smooth it out, but they never hit the big time. Is this another missed opportunity?

JavaScript is taking over ClojureScript

There’s a lot to like about developing browser applications with ClojureScript. Tools such as shadow-cljs and Google Closure Compiler are great – I’ll take them any day over configuring webpack. There are a bunch of good libraries such as Reagent, and of course the language itself is nice.

Despite the rough spots such as JavaScript interop and awkward testing tools, for a long time ClojureScript was a clear winner over JavaScript for me. Today, I’m not so sure about it. JavaScript language, tooling, and ecosystem have improved immensely over the last decade.⁴

In some ways, the JavaScript experience is superior now. One essential feature is async/await syntax for asynchronous programming. It’s much smoother than using core.async or promises and looks like we’re not going to have it in ClojureScript.

Another one is TypeScript. JavaScript ecosystem is strongly embracing TypeScript and it’s enabling powerful static analysis. I’m not seeing an easy way for ClojureScript to benefit from that.

JavaScript has improved so much that it’s harder for me to look over ClojureScript’s rough spots now. I certainly feel that JavaScript is overtaking ClojureScript. Time will tell if I’m wrong once again.

Python REPL leaves a lot to be desired if you’re used to Clojure REPLs, but to give it credit, Python has popularized another innovative way of interacting with a live system: notebooks. ↩︎
As a teenager, I fixed my websites by editing the PHP files directely on the server over SFTP. Now that was continuous delivery. ↩︎
Another thing I wonder about is what Erlang and Elixir are doing here. Their support for distributed systems is on the next level. If you have seen a good article on how Erlang and Elixir people deploy and interact with their production systems, please send it my way! ↩︎
Admitedly you may still need to use webpack, and it’s still difficult to configure. ↩︎

Yearnote 2021

Sun, 16 Jan 2022 00:00:00 +0000

2021 is over. It was the second full year of COVID-19, possibly establishing the new normal.

I felt pretty good about 2020. This year was more of mixed bag. I missed my friends a lot and felt stuck professionally, but I had a plenty of good time, too.

On the photos: there’s one photo for each month, in chronological order.

Professional life

For many years, I’ve focused on building web services in Clojure. However, I feel that I’ve done enough of it for now and more interesting problems and bigger impact await elsewhere. Thus I’d like to turn a new page in my career. I didn’t quite manage to do it in 2021, but there were a few starts.

A higlight was taking a part in a project to implement a cookie banner for a popular Finnish web service. The banner itself isn’t that interesting, but I enjoyed learning about the compliance issues involved. Getting a large organization to honor the users’ data collection consent is a lot of work even when everyone is on board with the change!

This made me interested in privacy engineering. However, I didn’t have a chance to dig into that more deeply. The direction I ended up taking was to become more involved in the operations: responding to incidents, increasing observability, learning about resilience engineering. I’ve always enjoyed debugging so incident response fits me well.

Open source and writing

It was a slow open-source year for me, but I had some small successes.

In early 2021, I ran a bunch of open source mob programming sessions at Metosin. Mob programming is like pair programming but with a bigger group of people. We would fix bugs or implement features over a video call. One person would share their screen and everybody else would tell them what to do.

This was an experiment to get others more involved in the open source work. The sessions were well-received but they didn’t become a regular habit. They needed more preparation than what I had time for.

In April, I thought that I’ll stop the weekly posting cadence on the blog and post better-thought-out posts more rarely. Well, that didn’t happen. I did write a lot, but it was all notes and morning pages and none of it ended up on the blog.

Out of the posts I published, here are the ones I liked the best, in chronological order:

Hobbies

It was a big year for my hobbies. Here are some highlights:

Getting a drysuit. I bought a drysuit (Bora by Palm Equipment) and this allowed me start the kayaking season early, right after the sea ice was gone. I did a lot of day paddles between April and July.

Whitewater kayaking course. I attended a beginner course organized by my kayaking club, Merimelojat. We spent a weekend at Pernoonkoski in Kotka. I learned a lot and it was a lot of fun! The whitewater community in Finland is not big but seems like they’re having good time.

Hiking. I did a number of trips but the highlight was my big trip to Lapland. First I hiked the Hetta-Pallas trail and then did a loop in Urho Kekkonen national park. Along with Karhunkierros, these are the most classic hikes in Finland.

Bouldering. I had tried indoor bouldering a few times during the pandemic and enjoyed it. In the fall, I gave it a serious go and ended up doing something like 10 sessions over six weeks. Again, a lot of fun: bouldering combines elements of exercise and puzzle-solving. Indoor bouldering is also a nice social sport in that it’s easy to try out and you can have a good time even if your skill levels vary. Hopefully in 2022, I’ll hit the outdoor boulders as well.

Slay the Spire. It’s a deck-builder video game and I played a lot of it. If you get into the game, I recommend watching Jorbs’s videos – he’s a great player and has useful explanations for why he does things. To allow yourself to discover things, play a bit before watching, though.

What about 2022?

I don’t know. I want to do so many things that there’s no way I can do them all. I don’t expect the pandemic situation to significantly change during 2022.

Traditional commentary on Finnish politics

I’m not surprised that Sanna Marin’s cabinet held together this year. There has been some signs of internal conflict, but I still expect them to hold together until the next parliamentary election in 2023.

Split tokens in Clojure

Sat, 04 Sep 2021 00:00:00 +0000

On Dhole Moments, there’s a nice post about a recent Lobste.rs password reset vulnerability. Via the post, I learned about a simple technique called split tokens for making your password reset token validation more resistant to timing attacks. I wanted to poke at it a bit and ended up creating a tiny Clojure library for generating and validating split tokens, called split-token. Check it out if you’re into generating random tokens!

Enjoying the silence

Tue, 30 Mar 2021 00:00:00 +0000

I brewed some coffee while on the go.

Last year, Finland closed down the week I had my winter vacation. This year, the government was debating movement restrictions. Since COVID-19 broke out in Finland, I’ve thought so many times that “surely this will be over by date X” just to see the date X to come and go. I’m not going to speculate about the unprecedented restrictions we’re going to see on my winter vacation the next year.

I spent a night at Liesjärvi. My mom said that it must have been nice to enjoy the silence in the nature. To which I say: I don’t know about that.

I was camping next to a lake and I was alone, except for the (quiet) mouse that wanted to inspect my backpack to see if there’s anything edible.

But still, somebody was camping on the other side of the lake and they chopped firewood. During the night, there were cranes calling. In the morning chickadees were singing and during the day woodpeckers pecked the wood. The lake was frozen and the ice was creaking, booming, and banging.

So much for the silence.

clojure.xml and untrusted input

Sat, 13 Mar 2021 00:00:00 +0000

Clojure’s standard library includes the namespace clojure.xml, which implements a XML parser. It’s not used much – which is great, because it’s vulnerable to XML external entity (XXE) attacks. It’s something that you want to be aware of if you’re using clojure.xml to process untrusted input.

Update (2022-03-27): XXE processing has been disabled in Clojure 1.11.0.

Juha Jokimäki tweeted about this already back in 2014. However, I still see clojure.xml occassionally used, so I thought it’s a good idea to blog about it.

Note: clojure.xml is not to be confused with data.xml, which is a separate library. data.xml has disabled XXE by default.

XML external entity attacks

XML external entities allow you to refer to resources outside of the file that you’re processing. For example, you can include the content of an external file. Here’s an example from OWASP:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE foo [
  <!ELEMENT foo ANY >
  <!ENTITY xxe SYSTEM "file:///etc/hostname" >]>
<foo>&xxe;</foo>

Let’s try it out:

;; I saved the example above as "hostname.xml"
(require 'clojure.xml '[clojure.java.io :as io])
(with-open [input (io/input-stream "hostname.xml")]
  (clojure.xml/parse input))
;; => {:tag :foo, :attrs nil, :content ["nixos\n"]}

My laptop’s hostname is nixos, so that checks out!

If you point the file:/// reference to a directory instead of a file, you get a listing of the directory contents. In principle, you could use http:// URLs too, but that did not work on my machine.

If you use a domain name in the file:// URL, Java tries to connect to it over FTP.

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE foo [
  <!ELEMENT foo ANY >
  <!ENTITY xxe SYSTEM "file://quanttype.net" >]>
<foo>&xxe;</foo>

You might able to exfiltrate data using this mechanism. At least it’s a way to call home and if your FTP server is suitably broken, the parser seems to get stuck forever.

XML bombs

Juha Jokimäki’s example code also demonstrates a small XML bomb. An XML bomb is a short XML file gets expanded to a extremely large one when processed.

Luckily JDK defines some limits on the entity expansion to hinder this attack. The Wikipedia article has an example with a billion-time expansion, but JDK limits the expansion factor to 64 000 by default.

Thus, the Wikipedia example does not work, but here’s a 1.4 KB file gets expanded to 47 megabytes:

<?xml version="1.0"?>
<!DOCTYPE lolz [
 <!ELEMENT lolz (#PCDATA)>
 <!ENTITY lol0 "0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000">
 <!ENTITY lol1 "&lol0;&lol0;&lol0;&lol0;&lol0;">
 <!ENTITY lol2 "&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;">
 <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
 <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
 <!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
]>
<lolz>&lol5;</lolz>

Let’s try it:

;; Save the example above as "lol.xml"
(with-open [input (io/input-stream "lol.xml")]
  (-> (clojure.xml/parse input) (:content) (first) (count)))
;; => 50000000

(/ 50000000 1024.0 1024.0)
;; => 47.6837158203125

It’s not catastrophic: a single XML document won’t crash your server. Still, you might want to think about it if you process XML files from untrusted sources.

Workaround

Juha Jokimäki shows how to create a parser that disallows the document type declarations (DTDs) required by the attacks above:

(defn startparse-sax-no-doctype [s ch]
  (..
    (doto (javax.xml.parsers.SAXParserFactory/newInstance)
      (.setFeature javax.xml.XMLConstants/FEATURE_SECURE_PROCESSING true)
      (.setFeature "http://apache.org/xml/features/disallow-doctype-decl" true))
    (newSAXParser)
    (parse s ch)))
    
(with-open [input (io/input-stream "hostname.xml")]
  (clojure.xml/parse input startparse-sax-no-doctype))
;; Execution error (SAXParseException) at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper/createSAXParseException (ErrorHandlerWrapper.java:204).
;; DOCTYPE is disallowed when the feature "http://apache.org/xml/features/disallow-doctype-decl" set to true.

However, my recommendation is to replace clojure.xml with data.xml. It has a couple of benefits:

It has nice, full-feature interface.
The parse tree it produces is similar to the one produced by clojure.xml, so for many users it’s a drop-in replacement.
It’s part of the Clojure contrib library suite, so it’s widely used and maintained.

XXE processing is disabled by default:

;; clj -Sdeps '{:deps {org.clojure/data.xml {:mvn/version "0.2.0-alpha6"}}}'
(require 'clojure.data.xml)
(with-open [input (io/input-stream "hostname.xml")]
  (clojure.data.xml/parse input))
;; => #xml/element{:tag :foo}

XML bombs are subject to the same limits as clojure.xml, since both the libraries use JDK’s XML parsing facilities. If you want to prevent them altogether, you can disable DTDs by setting :support-dtd false:

(with-open [input (io/input-stream "lol.xml")]
  (clojure.data.xml/parse input :support-dtd false))
;; Error printing return value (XMLStreamException) at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl/next (XMLStreamReaderImpl.java:652).
;; ParseError at [row,col]:[11,13]
;; Message: The entity "lol5" was referenced, but not declared.

Update: As a follow-up, see CLJ-2611 which aims to disable XXE processing in clojure.xml.

clojure.spec and untrusted input

Sat, 06 Mar 2021 00:00:00 +0000

If you’re going to use clojure.spec to validate or conform untrusted input, you should be careful. It’s easy to write code that looks correct, but opens the door for denial-of-service (DoS) attacks. For example, if you have implemented a HTTP API in Clojure and you use spec to check the incoming requests, you should be aware of this.

I believe that this is well-known among the experienced practitioners. For example, Dominic Monroe recently mentioned the issue in the defn podcast recently (the section starts around 12:30). However, I have not seen blog posts about this before.

In clojure.spec, specs for entity maps are open. This means that they are allowed to have keys that are not included in the spec. For example:

(require '[clojure.spec.alpha :as s])

;; Let's define a spec for an empty map
(s/def ::my-map (s/keys))

;; Empty map is valid, as expected
(s/valid? ::my-map {}) ; => true

;; Extra keys are allowed as well
(s/valid? ::my-map {:example "dog"}) ; => true

If the extra keys have specs, they will be validated as well:

(s/def ::my-map (s/keys))
(s/def ::number int?)
(s/valid? ::my-map {::number 2}) ; => true
(s/valid? ::my-map {::number "two"}) ; => false

This is great for many use cases, but it’s problematic for validating untrusted inputs. There are two potential problem:

An attacker may be able to set fields that they were not supposed to set.
An attacker may be able to make the validation very slow.

The first problem is nothing new – I’ve seen it in hand-written validation code as well. The second problem is clojure.spec-specific and I’m going to focus on it here

clojure.spec has support for structural regular expression specs with s/cat, s/+ and others. They’re usually used for writing specs for functions and implementing parsers in macros. Unfortunately they also make spec vulnerable to regular expression denial of service (ReDoS) attacks.

We can come up with a pathalogical regex. Here’s a Clojure vector equivalent of the regular expression (0+)+1:

(s/def ::slow (s/def ::slow (s/cat :ones (s/+ (s/+ #{0})) :zero #{1})))
(time (s/valid? ::slow [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1]))
;; "Elapsed time: 1932.420446 msecs"
;; true

That’s slow, considering the input is a vector of 18 integers. Worse, the asymptotic complexity of the algorithm seems to be O(2^n). If you add one more zero to the input, it takes twice as long to validate it.

Now, most likely you would not use regular expressions specs to validate untrusted input. However, spec validates the extra map keys in the entity maps, so if you have loaded a library that defines a slow spec, an attacker may be able to craft an input that gets validated against it.

For example, Ghostwheel comes with some slow specs.

;; To load Ghostwheel: clj -Sdeps '{:deps {gnl/ghostwheel {:mvn/version "0.3.9"}}}'
(require '[ghostwheel.core :as g])
(time (s/valid? ::g/some-unsafe-ops (repeat 35 'let)))
;; "Elapsed time: 13896.968285 msecs"
;; true

Note: I’m not here to pick on Ghostwheel. Ghostwheel is using clojure.spec to parse macro input, which is an intended use case of clojure.spec. It was the first example of pathologically slow specs in the wild that I could find, but I’m sure there are more examples out there.

The input is a long list of let symbols. If your untrusted input comes in as JSON, it won’t get converted into symbols. However, many Clojure services use EDN or Transit, which support symbols. The input is about 300 bytes of Transit.

Pulling a denial-of-service attack based on this requires specific circumstances:

you process untrusted input encoded in EDN or in Transit, for example via a HTTP API,
you use entity maps (s/keys) to validate the input, and
you have loaded a library that defines slow specs.

That does not describe every Clojure backend service I have ever seen, but it’s not an unheard combination either.

As far as I can tell, Clojure itself does not come with pathologically slow specs. Still, there are some slow-ish specs available in core.specs.alpha, which always gets loaded.

(time (s/valid? (s/keys) {:clojure.core.specs.alpha/ns-clauses
                          (repeat 100000 (list :gen-class))}))
;; "Elapsed time: 4048.091583 msecs"
;; => true

;; How big is the input? About 1.24 MiB
(/ (count (pr-str (repeat 100000 (list :gen-class)))) 1024.0 1024.0)
;; => 1.2397775650024414

Workaround

One of the new features in spec 2 is the support for closed maps. That should improve the situation once it gets released. Meanwhile, you can use select-spec from spec-tools to remove the extra keys:

;; clj -Sdeps '{:deps {metosin/spec-tools {:mvn/version "0.10.5"}}}'
(require '[clojure.spec.alpha :as s] '[spec-tools.core :as st])

(s/def ::good int?)
(s/def ::good-map (s/keys :opt-un [::good]))

(st/select-spec ::good-map {:good 1, :bad 2})
;; => {:good 1}

Why take notes, anyway?

Sat, 27 Feb 2021 00:00:00 +0000

Recently I read Sönke Ahrens’s book How to Take Smart Notes. It is about notetaking using the Zettelkasten method. The gist of the method is that you note each idea on a separate card. Then you organize the cards into a hierarchy and interlink them using a clever numbering system. This interlinking allows you to generate new ideas. Nowadays you can use a computer program to do the same, of course.

The book is thin on how the method actually works and focuses on why you should use it. I’ve never been great at taking notes and the book gave me insight into why is that: notetaking is part of a bigger process.

The purpose of Zettelkasten is not to remember what you have read. The purpose is to be able to write. The book is geared towards students and researchers in the academia and their main job is to write:

Studying does not prepare students for independent research. It is independent research. (p. 35)

The book asserts that the writing process starts when you read something and take notes about it. Zettelkasten is a way to convert what you’ve learned from reading into writing of your own: first you make notes of interesting ideas, then you develop those ideas by connecting them to your existing ideas, and then you put that together that into a paper. You do all this work using the interlinked note cards.

Now, I do not write papers, but I do have this blog, so Zettelkasten is relevant for me. I’ve never thought about the ideas-to-blog pipeline in this way, but it makes sense. However, if you do not read to write, or to speak, Zettelkasten might not be the right method for you.

The book also asserts that writing is thinking. This rings true even if writing is not the only way to think. To write about an idea, you have to understand it. To write an argument, you have to confront the gaps in it. If you want to think more or think better and you do not write much, you should try writing more.

The goal is to deliver working software

Sat, 20 Feb 2021 00:00:00 +0000

As software engineers, how do we evaluate new technologies, programming languages, and practices such as code review? We must keep our goal in mind.

Our goal is to deliver working software. We need to achieve this goal with limited resources: we have only so much time, manpower, and computing capacity available.

The goal is not to perfectly follow a proceduce described by book. The goal is not to craft the perfect masterpiece of code. The goal is not to make you feel smart, either.

Forgetting this leads to arguments that look at the technolgies and the practices in isolation but forget about their context.

For example, at work I’ve thought about splitting up a Scala web service and re-implementing a part of it in Rust. The resulting microservice would be simple, but it would be under heavy load. It is one of the hottest nodes in our graph of services, so great performance is important.

Just looking at the technology, this seems like a no-brainer: surely Rust would offer us greater and more predictable performance than Scala on JVM. The size of the service would be so small that there would be minimal risk of screwing it up.

On the other hand, our team has zero Rust experience right now. Our JVM expertise does not transfer to operating Rust services. We would need to learn how to do it and keep the team staffed with people who are able to and want to work with Rust for years to come.

Is it worth it? It could be, it could be not.

On the other hand, I’m not here to argue that the ends justify any means. The goal may not be to make you feel smart, but software engineering is not supposed to hurt, either.

Consider crunching, the practice of working overtime to meet a deadline. It is common in the games industry where the games have big bang launch dates. Twitter is full of stories about how miserable this has been for everyone involved. They were trying deliver working, delightful software, but was the price acceptable?

Winter-posting

Sat, 13 Feb 2021 00:00:00 +0000

Mustikkamaa is one my favorite spots in Helsinki. It’s a small island that is connected to Helsinki by Isoisänsilta, the white bridge that is in many of my blog pictures. There’s a nice walking path along the shore that I like to follow.

This winter has presented us with a rare treat: the sea has frozen.

The ice is covered by snow, with a couple of buoys and sea marks sticking up here and there. Only at the shoreline and under the bridges there are a few brown spots of ice. The snow is marked by footprints and ski tracks – people have seized to opportunity to walk on the ice. I, too, did so just today.

Near the bridge to Korkeasaari Zoo, there was a family with fishing. They had drilled two holes, with two of the kids sitting still around one of the holes. “Something is pulling”, they would shout. “It is pulling right now!” The other two kids were swinging their rods wildly over the other hole. I think they were more likely to find success as dancers than as fishers.

Every few minutes, there was a sound of swooshing. The cross-country skiers looking for exercise were passing by on the track ashore. On the walking path, there were runners and couples pushing prams. Everybody else – the flâneurs – were on the ice, either walking or skiing. They were taking photos of themselves and the ice.

Between Mustikkamaa and Korkeasaari, there are two big rings made from snow. Somebody has ploughed the snow inside the rings. Why? I don’t know. They’re a bit small for skating rinks. Maybe they look like something from the air. Maybe they are art? At least everybody passing by stopped and took a look at them.

The ice won’t be here too long. Better enjoy it while it lasts.

When to not use code review

Sat, 06 Feb 2021 00:00:00 +0000

Code review is not a panacea, unfortunately.

I talk a lot about code review and think that it’s a great tool, but I want to re-iterate a point I made previously: code review is not the right tool in every situation.

What I mean by code review is the widely-used, pull request style workflow.

When you want to integrate your code changes to the main branch, you post your changes to a code review tool, for example by creating a pull request in GitHub. Your teammates then review your change and either approve it or ask you for some improvements. Once the change has been approved, it is merged to the main branch.

Furthermore, I’m talking about professional software development teams shipping software-based products or services. Thus, my advice won’t directly apply to e.g. open-source development. The same workflow is popular in the open-source world, but the relationship between open-source collaborators is different from the relationship between the members of a professional team.

In this context, what are some situation where code review is not beneficial?

High-velocity collaboration over a small code base. For example, when you’re starting a new project, there’s a lot of scaffolding to set up and many new things to build. Typically the details do not yet matter much and changing things is easy. Code review is too slow and too detail-oriented - you’re better off with talking to each other and reading each others code after it has been merged to the main branch.

When code review does not do anything. If getting your changes reviewed takes days and the result is just “LGTM” (looks good to me), the review process is not bringing you any value. Slow reviews are demoralizing, as finishing any task takes seemingly forever, and they hurt the team’s ability to ship.

You could try fixing the broken process, but the simplest way to improve the situation is to stop reviewing altogether. There’s an emotional barrier to actually stopping, but if the reviews are not providing any value, it is a safe step to take.

Writing is a core skill for developers

Sat, 30 Jan 2021 00:00:00 +0000

We software developers write a lot of code, but we write text, too. We write issues, commit messages, specifications, references, docs, and architectural decision records. We write chat messages and e-mails. Some of us even blog!

That’s a lot of writing and it would be great if the writing was good. When working remotely, it matters even more, since you can’t solve everything by talking. This is a bit of a blind spot for some: a person can be a great communicator in person, yet they may not be comfortable with text.

It looks like the pandemic and remote work are going to be with us for a while, so it’s worth it to look at how your everyday writing is doing. It does not have to be eloquent, but it has to be effective. Does it get your point across quickly and clearly?

Clojure project automation tool of my dreams

Thu, 14 Jan 2021 00:00:00 +0000

For a long time, Leiningen was the Clojure project automation tool almost everyone used. Clojure itself did not come with a tool for managing dependencies – you had to use Leiningen, Maven, or Boot, or download the dependencies yourself and construct the classpath by hand.

This changed when Clojure 1.9 was released in 2017. It included the new clojure command-line tool that supported the deps.edn file for declaring dependencies. Many people started to use the new tool instead of Leiningen.

Leiningen is straightforward to use and it has a good set of features and a nice plugin ecosystem. However, it is a heavy tool if all you want to do is to run some Clojure code. deps.edn was created to solve this problem: it offers a lightweight way to run Clojure code that uses external libraries.

I think it’s great: in addition to being nimbler than Leiningen, the dependency information is now data and it supports git dependencies.

It is not a replacement for Leiningen, however. It is not, and it was never meant to be, a full-fledged project automation tool. I’ve used it quite a bit and I find myself missing many of nice features of Leiningen. In theory you could use Leiningen and deps.edn together with lein-tools-deps but in practice it has turned out to be awkward.

Here are a few things that I’m missing from Leiningen:

lein new for setting up a new project
lein repl for starting a featureful, nREPL-enabled REPL
lein ubejar for building jars with dependencies included - essential part of many deployment workflows
lein install, lein deploy, and lein release for publishing the project to a Maven repo such as Clojars

The community has built versions of all of these for deps.edn. However, you have to set each of them up separately and they do not form a coherent whole.

What I would love to see is a new tool – Leiningen 3.0, if you will – that would resolve this tension. There must be a way to build full-featured, user-friendly project automation tool on top of deps.edn.

I’d like to see is something with simple setup, with convention over configuration. Clojure is all about configuration and making things your own, but Leiningen’s strength comes from its good defaults. It is a very configurable tool but a lot of people have been happy with the default settings. That makes it easy to jump between projects, too.

Yearnote 2020

Wed, 06 Jan 2021 00:00:00 +0000

We didn’t have fireworks, but at least we had a jätkänkynttilä for New Year’s Eve.

2020 was definitely a year. As is my habit, I want to reflect a bit on how it was.

I was lucky and privileged in that my year wasn’t that bad. It was easy for me to start working remotely, the business was good, and me, my close friends and my family did not have big health problems, due to COVID-19 or otherwise. Many people had a way worse year than I did.

Still, it would have been nice to meet people in person. It would have been nice to do things.

Working life

I worked with two big Finnish companies. I switched from one to the other during the lockdown in April. It’s a bit odd that I’ve only met the people I work with once – we had a lunch together in September – but we’ve got the work working nicely.

It was great to get to finally try out remote working for real. It suits me well: I enjoy the lack of commute and the peace of my home compared to open-plan offices. Remote work forces you to communicate more explicitly and to document things better, but that’s just a plus in my book. I just wish we had more space at home. I would love to have a study.

Will I return to the office in 2021? I don’t know. There’s so much uncertainty about the vaccination schedule that it’s hard to say when it would be possible. It could make sense to take steps to make remote-by-default a permanent arrangement.

Open source

I published clj-branca and clj-base62, but apart from that, I didn’t do much. I wasn’t really in the mood for working on open source in my free time, and at Metosin, there were more pressing needs than open source.

At Metosin, we released Malli (my contributions were minor). It was great to see such positive reception from the Clojure community!

Free time

I had a lot of time for hobbies this year. I read more books than ever. I wrote more blog posts than ever, many of them technical. Here are a couple of my favorite ones about Clojure:

I wrote more Christmas cards than ever. I started to write morning pages, too. That has turned out to be a benifical practice. I had this feeling of “I wish I would think more” and the morning pages were exactly the structure that I needed.

In the summer, I kayaked more than ever. It became a theme for the summer: when I didn’t know what to do, I went to kayak. I cooked more than ever. Thanks to the transition to remote work, I started to eat lunch at home. It also meant that I have washed more dishes than ever.

I exercised a fair bit, although probably not more than ever, and even bought a kettlebell.

Regretfully, I did not hike more than ever. I had hoped to hike the northmost section of the Kungsleden trail but had to cancel it when travel to Sweden was restricted. I did hike the Kaakkurinkierros loop around Repovesi National Park, though. We were there on some of the hottest days of the summer and it was great. The numerous small lakes offer a lot of opportunities for swimming.

In 2021, I hope to hike more then ever. I’ll go to Kungsleden if that is possible but if not, Finland has plenty of interesting trails.

What’s next?

2020 ended with expectant mood. Who knows what will happen in 2021? I feel hopeful that we can make it a better year than the previous one.

Traditional commentary on Finnish politics

I’m not surprised that Sanna Marin’s cabinet has held together. I think they’ve done pretty good job with the pandemic. Were there even any other political themes this year?

NixOS impressions

Sun, 20 Dec 2020 00:00:00 +0000

In February, I got a new laptop for home use and installed NixOS on it. This was the first time I have ever used NixOS. In this post, I’ll share some thoughts about how it has worked for me.

The laptop is a second-hand ThinkPad X250. Installing NixOS was pretty straightforward and I haven’t had any trouble the hardware compatibility.

Good. Configuring your system with a centralized configuration file is a pleasure. It offers a unified way to configure everything and a short config goes a long way. If you like Infrastructure as Code, you will like this. You can see my configuration.nix here.

Another great feature is that you can roll upgrades and configuration changes back. Just today I upgraded my packages (nixos-rebuild switch --upgrade), rebooted the laptop, and the system didn’t come up anymore. It got stuck waiting for udev, whatever that means. I rebooted the system again, selected the previous generation from the boot menu, and the system started again. I will have to face the broken udev eventually – hopefully a later upgrade will fix it – but at least the system works for now.

Bad. Let’s face it: Nix and NixOS are doing their own thing. Your skills with other Linux distros won’t directly transfer. You will be tempted to say “eh, I’ll just edit this file and run this command” and you will be frustrated when you can’t find those files and those commands do not work. Learning to operate the system and make your own packages takes a while.

Since I’m mostly using this laptop for writing and surfing the web, I have not bothered to learn much. I did manage to package and install git-cal with Nix, but have not looked into how to make that package available for others. Should I contribute it to the main nixpkgs repo? I don’t know.

Ugly. The tooling does not put much emphasis on the versions of packages. You’ll get a package, sure, but whatever is the version is secondary. If I was using NixOS for development, I’d like to pin the versions of the tools I’m using like I pin the versions of the libraries. Apparently the upcoming Nix Flakes feature offers a decent solution, but I was suprised that this wasn’t a solved problem.

Overall, Nix and NixOS remind me of Clojure: they’re powerful, they do things differently from what you’re used to, and learning them is… complicated. I’ll leave it for another time to debate whether it is worth it.

Code review in context

Sun, 13 Dec 2020 00:00:00 +0000

I’ve posted about code review a number of times, about how it requires trust and needs to be fast. But, in the end of the day, code review is just one tool in the toolbox of collaborative software development. It should be evaluated in the context in which it is used.

For example, when you’re starting a new project from scratch, usually there’s a lot scaffolding to set up. This is an important time for collaboration as you’re laying the foundation for building new things. However, most of the code written at this stage will be boilerplate. Detail-focused PR-based code review will be a hinderance: there’s no point in line-by-line critique of boilerpalte and it’s too slow. You’ll be better off pair coding and iterating quickly. Maybe simply regularly talking to each other is all you need!

Whether code review is the right tool for you and how it should be done depends on the team, the goals, the environment, and the stage of the project. This is why I believe that the team should choose its own tools and continuously evaluate them. Regular retrospectives are a great way to make this happen.

I’ve realized that I give a lot of advice that assumes a skilled team in a psychologically safe environment. It works to an extent for less skilled teams as long as the environment is safe and they want to learn.

But what if the environment is not safe? I’m pretty sure you’ll have to start by fostering safety, but I don’t know how that is done.

New shell prompt with Starship

Sun, 06 Dec 2020 00:00:00 +0000

I’ve used zsh for at least 15 years. Occasionally I’ve toyed with the idea of switching to some of the newer shells such as fish (“Finally, a command line shell for the 90s”) or nushell. I’m stuck in my ways, though, so despite all the cool features and ease of use promised by the other shells, I’ve kept using zsh.

I decided to at least freshen up my shell prompt. In 2012, I added git branch information to my prompt using zsh’s vcs_info. However, I never got around to configuring nice prompts for different git states like rebase.

It was time to fix this. Instead of diving into vcs_info, I decided to use Starship, a cross-shell tool that produces shell prompts for you.

Its default configuration is spammy – for some reason, it insists on displaying the version of the programming language implementation you’re using when you’re in a project directory. For example, if you’re in a directory with PHP files, it shows the version of PHP you have installed. This is information that I almost never need.

However, Starship is easy to customize and you can disable all this stuff. My configuration has now a list of blocks like this:

[php]
disabled = true

What is nice is that Starship’s git support is good and shows useful information like whether you’re ahead or behind of the upstream branch. I constantly use git status (well, my custom alias git st) to check this, so it makes sense to put it in the prompt. The downside is that this makes your prompt slow when entering big repositories.

Another nice touch is that it only shows your hostname when you connect over SSH. This is something I had been meaning to make my prompt do for ages but never got around to it.

My prompt used to look like this:

Now it looks like this:

Yes, I did configure it to be exactly the same. If I want to add some new stuff, now it actually feels doable. And as a pleasant surprise, the new prompt feels as fast as the old one (outside of big git repos)!

If your prompt needs tweaking, I recommend taking a look at Starship – just take your time with the configuration.

Early impressions on morning pages

Sun, 29 Nov 2020 00:00:00 +0000

A couple of months ago, I was feeling that I should think more about things. It was not that I was too busy to think – the problem was that my thinking was scatter-brained.

Since then, I’ve discovered a practice that helps. It’s called morning pages and the idea is this: every morning, you sit down, grab a pen, and write three pages. Just write about whatever pops in your mind.

Basically it’s a journaling pratice. The idea comes from the book The Artist’s Way by Julia Cameron. I have not read the book, but, as far as I can tell, you do not need to read the book to use the method.

Writing morning pages does two things for me:

It offers me a moment every day to sit down and be in touch with the stuff that is on my mind.
Writing it down helps me to focus and actually pursue lines of thought instead of just jumping around.

Topics I’ve written about range from the taste of my morning coffee to my thoughts on company strategy. There’s no wrong topic to write about, it’s just what happens to be on my mind on any given morning.

I’ve tried it a few times before, but this time I’ve stuck to it. Since I started in early November, I’ve written 16 times – not every day, but almost. It’s working so well that my intention is to continue until I’ve done it 100 times.

In practice, I write three pages with a pen in an A5 size notebook. I write in my native Finnish. It takes me about half an hour to write the 350 or so words. When I wake up, first I have breakfast and then I write; but if I can’t write in the morning, I try to do it later in the day.

It’s worth it to experiment a bit with the specifics. For example, I tried to write on the computer, but I found that I prefer the limit of three physical pages to a word limit on the screen. There’s something satistfying about filling a notebook, too.

It’s better to have a length limit instead of a time limit to ensure that you actually write something. If nothing else pops in your mind, you can write about how nothing else pops in your mind.

You don’t need a fancy pen or a fancy notebook to write your morning pages, but if you are into such things, morning pages are a great opportunity to use them. If you don’t want to keep a written record of your thoughts around, you can use loose sheets and put them into a shredder once you’re done. That’s okay, too.

I’ll report back in March when I’ve completed the 100-morning project. Meanwhile, you should give it a go.

Code review is for collaboration

Sun, 22 Nov 2020 00:00:00 +0000

A lot of people see code review as a gatekeeping step in the software development process. They think that the reviewer is there to prevent bugs from getting into production. This is not a good way to think about it.

Code review is an opportunity for collaboration. The task of the reviewer is to work together with the author to produce great software. Identifying flaws is a part of that, as is finding ways to address them.

If the goal was to prevent shipping bugs, you could just block every change. It guarantees that no bugs are shipped – but no bugfixes or features are shipped, either.

Why bother with Integrant?

Sun, 15 Nov 2020 00:00:00 +0000

The Clojure backends that I’ve worked on have often had a number of subcomponents: a HTTP server such as Jetty, a database connection pool, a scheduler, a message queue processor and so on. When you start the backend, you need to start all those components. You may have seen or written code like this to do so:


(defn start []
  (let [connection-pool (connection-pool/create ...)
        stop-scheduler (scheduler/create ...)
        ring-handler (create-handler connection-pool)
        stop-server (jetty/start ring-handler)]
    (fn []
      (stop-server)
      (stop-scheduler)
      (.stop connection-pool))))

This function starts a bunch of service components and returns a function that stops those components when called. As you probably know, the same problem can be solved with Integrant. This works, though, so why bother?

Correct start-up and shut-down order. In the example, to create the Ring handler, you need to first create a connection pool and then pass it to the handler. They should be closed in the opposite order. Integrant keeps track of this for you and automatically starts and stops everything in the correct order.

Starting a partial system. In the example above, if you only want to start the HTTP server but not the scheduler - maybe for development or tests - you need to create another function that takes care of that. In Integrant, all you need to do is to call init with a sequence of keywords specifying the components that you want to start.

Good for REPL-driven and test-driven workflows. If you run a REPL and your tests, or if you run your tests in parallel, in the same JVM, you’ll probably want to start separate instances of your service for each of those. On the other hand, for REPL-driven development, you’ll probably want to have a single global instance so that it is easy to poke. You can set it up by hand… or you can use Integrant and integrant-repl.

You do not need to use Integrant (or its alternatives like mount). It’s easy enough to write something passable by hand. But using it will cut down boilerplate, enable nice workflows, and probably squash some subtle bugs as well.

Two albums of sad music

Sun, 08 Nov 2020 00:00:00 +0000

Looking for something to listen? Here are two great, slightly sad albums.

This year has not been like the other years. Maybe it’s the gravity of COVID-19, but I’ve found myself listening to sorrowful music than before. That’s the mood I associate with these two albums, one of them by a duo of folk musicians and the other by a jazz pianist.

Mielo by Maria Kalaniemi and Eero Grundström is full of yearning to the wilderness. Kalaniemi plays an accordion and Grundström plays a harmonium and a modular synthesizer. The album’s weird energy comes from the combination of the traditional instruments and the beat of the synthesizer.¹

Listen on: Spotify.

My Finnish Calendar by Iiro Rantala consists of 12 tracks, one for each month. January starts with melancholy but the year has many moods and already February is full of joy and energy. My favorite track, September, lies somewhere in between. Rantala combines piano with some light percussion. If you’re familiar with his music, the album definitely sounds like him although it’s easier-going than some of his work.

Listen on: Bandcamp or Spotify

This is not the only time Eero Grundström has been experimenting with folk and synthesizers. For a very different take, check out Suistamon sähkö. ↩︎

What are DIDs?

Sun, 01 Nov 2020 00:00:00 +0000

Have you heard of Decentralized Identifiers (DIDs)? They’re a work-in-progress W3C recommendation that I’ve seen pop up in a couple places - mostly recently in the just-updated Thoughtworks Technology Radar. Since I’ve been interested in technologies related to identity and access management, I thought I should take a look.

Why do you need identifiers?

When you want to refer to a subject, for example a person, in a data processing system, you need an identifier for them. If you’re operating in a single system, you can get away with a surrogate key: an arbitrary identifier assigned by the system. For example, in a typical web application each user gets assigned an user ID. Usually it’s either randomly generated or based on an incrementing counter.

If you’re need to exchange data between multiple systems, the systems need to be able to refer to the same subject with the same identifier. One solution is to have a central authority that issues identifiers. For books, this could be ISBN. For humans, it can mean using national identification number such as henkilötunnus in Finland¹ or using their account elsewhere as the identifier via services like Facebook Login.

What if you do not want to be tied to a central authority, or no suitable authority exists? Self-sovereign or decentralized identifiers could be the solution. The idea is to create unique identifiers such that the controller of the identifier can cryptographically prove that they control it. DID is an attempt to create a standard framework for such identifiers.

What are DIDs like?

Note: I’m basing this post on the version “W3C Working Draft 27 October 2020” of the DID specification.

DID identifiers (“DIDs”) are URIs and they look like this:

did:example:123456789abcdefghi

There are three parts separated by colons. The first part, did, identifies the URI scheme. The second part, example, identifies the DID method, and the last part, 123456789abcdefghi, is the method-specific identifier.

Each DID is associated with metadata (“DID document”) and the DID method tells you how to find (“resolve”) that metadata. The specification itself does not specify any methods but there already are plenty of them in the DID Method registry. Most of them seem to be blockchain-based, but there’s e.g. the github-did that looks up a specially-named file in a specially-named repo for the given user. My GitHub account is miikka, so I could control the DID did:github:miikka if I created a suitable file.

The specification defines the data model for a DID document and includes three serialization formats (“representations”): JSON, JSON-LD, and CBOR.

You can, of course, define your own representations if you want. If the Clojure community would find use for DIDs, I imagine somebody would quickly define an EDN represetation.

Here’s a JSON-LD example of DID document for did:example:123456789abcdefghi:

{
  "@context": "https://www.w3.org/ns/did/v1",
  "id": "did:example:123456789abcdefghi",
  "authentication": [{
    "id": "did:example:123456789abcdefghi#keys-1",
    "type": "Ed25519VerificationKey2018",
    "controller": "did:example:123456789abcdefghi",
    "publicKeyBase58": "H3C2AVvLMv6gmMNam3uVAjZpfkcJCwDwnZn6z3wXmqPV"
  }]
}

Under authentication, there’s a Ed25519 public key that can be used to verify that somebody is acting on behalf of the subject of this DID. Again, there are plenty of verification methods if Ed25519 does not float your boat.

What’s the point?

There’s a related standard called Verifiable Credentials (VCs). They’re a way to issue cryptographically verifiable claims about a subject and, well, you need to able to identify the subject. I believe this is the origin of DIDs.

There’s a list of use cases for VCs. They include stuff like universities issueing VCs to certify that a person has completed a degree.

At this point, I’m not sure what to think. Does this solve a real problem in such a way that people are willing to use it?

In any case, this is a framework with so many options that if you intend to build something actually interoperable on it, you should start by defining a profile of what representations, verification methods, resolution methods etc. have to be implemented and should be used for your use case.

Are you a citizen of Finland? Consider signing this initiative to outlaw using HETU to authenticate people. ↩︎

Caching HTTP requests in Clojure

Sun, 25 Oct 2020 00:00:00 +0000

Sometimes you want to cache the results of a function with side-effects. For example, you might cache the results of HTTP requests or database queries.

If you’re using Clojure, you might reach for core.cache. The caches created by core.cache are supposed to be wrapped in an atom, so you would write something like this:

(require '[clojure.core.cache :as cache])

(defn http-get [url] ...)

;; cache the results for a minute
(def my-cache (atom (cache/ttl-cache-factory {} :ttl 60000)))

(defn fetch-data [url]
  (-> (swap! my-cache cache/through-cache url http-get))
      (cache/lookup url))

If you try it, it seems to work. There’s a bug, though, but you will likely notice it only under high load. Can you spot it?

The problem is that swap! may re-run the function that is passed to it. This may cause a variaton of cache stampede.

(swap! atom f) works something like this:

Read the value of atom.
Apply f to the value.
Compare-and-set the new value to atom: if the value of atom is the same as it was in step 1, update it to the new value. If it has chenged, go to step 1.

In our case, if the cache gets updated while we’re doing a HTTP request in step 2, the request has to be re-done to update the cache – even if the other update was for another cache key! If you’re processing a lot of requests in parallel, it may take multiple retries to succesfully update the cache.

We experienced this at work recently. A microservice was calling another microservice exactly once per incoming request. When we enabled caching for the requests, implemented like above, the typical rate of requests went down but we started to see 10x spikes in requests. This is not what you want to see for one your busiest services.

Luckily there’s a simple solution: wrap the side-effecting call with delay.

(defn fetch-data [url]
  (let [new-value (delay (http-get url))]
    (-> (swap! my-cache cache/through-cache url (fn [_] @new-value))
        (cache/lookup url))))

The cache update may still take multiple attempts, but the delayed value is computed at most once.

This looks a bit messy, so let’s use the new clojure.core.cache.wrapped namespace that was introduced in core.cache 0.8.0 (August 2019). It takes care of wrapping the cache in an atom and implements the delaying logic and more:

(require '[clojure.core.cache.wrapped :as cache])

(def my-cache (cache/ttl-cache-factory {} :ttl 60000))

(defn fetch-data [url]
  (cache/lookup-or-miss my-cache url http-get))

This is nice, but there’s still room for improvement.

If multiple threads request the same URL at roughly the same time, they all will do the HTTP request. It would be more efficient if only one of the threads would do the request and the other would wait for it to finish. You could implement this yourself by doing some locking… but you could also use core.memoize, which does it for you.

(require '[clojure.core.memoize :as memo])
(def fetch-data (memo/ttl http-get :ttl/threshold 60000))

I guess the moral of the story is that if you use high-quality higher-level libraries, the authors will have already solved the thorny lower-level problems for you.

Generating random tokens in Clojure

Sun, 18 Oct 2020 00:00:00 +0000

Web applications have a couple of common use cases for random tokens. For example, e-mail confirmation or password reset e-mails usually have a link that contains a random token. The same goes for “share this item” links in case the item does not have a canonical URL.

The token should be:

Unpredictable. It would be bad if an attacker was able to guess the token for a password reset e-mail.
URL-safe. You’re going to embed it in an URL.

How to generate such tokens in Clojure?

Random data

An easy and secure way to generate random data on JVM is to use java.security.SecureRandom. On UNIX-y operating systems SecureRandom uses /dev/urandom by default, which is great at least on Linux.

How much random data do you need? In other words, how long should the token be?

Long eough to not run out of tokens. I’m stating the obivous, but if you’re going to generate 100k 16-bit tokens, you will generate duplicates as 2^16 = 65536 < 100000.
Long enough to avoid collisions. Due to birthday paradox, you have approximately 50% chance of generating a duplicate n-bit token after generating 2^(n/2) tokens. If you use 16-bit tokens, this means you have a significant chance of collisions already after generating 2^8 = 256 tokens.

You will probably store the tokens in a database with a uniqueness constraint. Having a high chance of collisions will degrade the performance of your systems because you have to regularly retry the generation. Worse, an attacker can generate random tokens themself and try them and they will have a high chance finding one that works.

In 2009, Colin Percival recommended using 256-bit random IDs in his Cryptographic Right Answers. He wrote:

I doubt any application thus far has come close to selecting 2^64 random values; but if computers continue to scale exponentially, this could occur in the upcoming decade. In most applications, using 256-bit random values instead of 128-bit random values carries no significant increase in cost; but it puts randomly finding a collision safely into the realm of “not going to happen with all the computers on Earth in the lifetime of the solar system” problems.

This seems like good advice¹ and Latacora concurred in 2018 in their version of Cryptographic Right Answers). Go for 256 bits (32 bytes).

URL-safety

By URL-safe, I mean that you should be able to embed the token into an URL and it should come out intact after all the encoding and decoding and parsing involved in handling URLs. My favorite answer for making data URL-safe is using the URL-safe variant of Base64 encoding without padding. It encodes arbitrary byte data using lower-case and upper-case letters, numbers, - (minus), and _ (underscore). Conveniently Java comes with java.util.Base64.

Let’s put it all together

(import 'java.security.SecureRandom 'java.util.Base64)

(let [random (SecureRandom.)
      base64 (.withoutPadding (Base64/getUrlEncoder))]
  (defn generate-token []
    (let [buffer (byte-array 32)]
      (.nextBytes random buffer)
      (.encodeToString base64 buffer))))

Calling (generate-token) returns tokens like this:

"EE_jyfwk78cQgCcXkO8CAslDhZOL9T-8v9tHXLadenk"
"ANB9bv2D_jhYZJVoYk0NQvXNSWrrWisKEGEUdeuosIo"
"72mAcjEXWUALSxdmXc0A4jwd51s8t6r-JMmWkFdW868"
"Z3ek1rKEJLexyqx9rwZAmIXEBHphRBFLIK5I1zBhC3s"
"pknMoF8qZFNsq8nu-8Zfv5WOlaejEkvTM2xxSV6tSis"

For something like URL shortener, you may want something shorter and without ambiguous character combinations like iI1l or oO0Q. Otherwise this should be a good starting point for your random token needs.

Update: If you want to make your tokens even more secure, take a moment to learn about split tokens.

Counterpoint: Many programming languages and databases have built-in support for random UUIDs (128 bits, out of which 122 are random), but they do not have equally convenient way of handling 256-bit IDs. ↩︎

clj-branca: lessons learned

Sun, 11 Oct 2020 00:00:00 +0000

Last week, I wrote about about creating a Clojure library for encoding and decoding Branca tokens. The library is finally ready. It’s called clj-branca and the version 0.1.1 is now out.

Quick pitch

Need to pass information from a service to another service, possibly going through an user’s browser? URL-safe authenticated encrypted tokens could be the solution you’re looking for. Check clj-branca out!

(Please do not use it for stateless sessions, it’s a bad idea. Also, this is a side project, it has not been audited, and I’m not a security engineer. These same caveats apply to a lot of security-related open-source libraries but it does not mean that you should ignore them.)

Lessons learned

When I started the project, I thought it would be a quick way to kick the tires of Branca and libsodium. Like I wrote last time, it wasn’t quick, but on the other hand, I learned more than I expected.

base62 encoding. Branca tokens are Base62-encoded. This means taking the raw token data as an array of bytes and encoding it into printable, URL-safe ASCII string with a 62 character set. It’s the same idea as Base64, but without the - and _ character used by the URL-safe variant of Base64.

Encoding and decoding Base64 is very efficient. Since log2(64) = 6, each Base64 character represents exactly six bits. It’s straightforward to create encoders and decoders that work in linear O(n) time. However, log2(62) ≈ 5.95. Arbitrary radix conversions cannot be done in linear time – based on some Internet searches, I think their complexity is O(n log(n)). This means Base62 encoding and decoding cannot work in linear time.

Base62 is probably fast enough for all Branca use, but aesthetically this bothers me. Base64 is practically as URL-safe (as long as you use the URL-safe variant and do not use padding) and there are performant implementations available for almost any platform imaginable.

sodium on JVM. Lazysodium is probably the easiest way to use libsodium on JVM, because it bundles the libsodium binaries in the jar. However, its “Lazy” interface does weird hex string conversions and I couldn’t figure out how to use it interoperably with libsodium on Node.js. I recommend using the “Native” interface, which looks very C-like, but which does what you expect from the type signature and libsodium documentation.

Even better option would be to use Google’s Tink, which contains pure-Java implementations of many of the same algorithms. It encourages you to buy into its key management scheme, which is probably a good idea.

deps.edn and releases. Clojure CLI does not come with any tools for building jars or deploying them to Clojars. Using this guide, I managed to cobble together Maven, pack, and a bunch of shell scripts. If you know what you’re doing, it is possible to create a release with correct SCM (git revision) information with this setup. The jar is not signed, but I’ve given up on that.

Anyway, I would not recommend this brittle setup. If you’re developing new libraries, save yourself time and energy by using Leiningen and lein release.

sourcehut. clj-branca is hosted on sourcehut, which is this new software development platform akin to GitHub. It’s structured as a bunch of separate services: there’s the project hub, git repo, issue tracker, and, uh, mailing list. If you want to contribute a patch, you’re welcome to send it to the mailing list. You can do this via sourcehut’s user interface if you register a user account.

To be honest, I’m not a big believer in mailing lists. This is going to be a barrier against contributions, but I was not expecting many contributions in the first place.

Sourcehut is pretty basic, but the features that are there seem to work well and quickly. I think I will use it for my private projects, but the biggest benefit of GitHub for open-source projects is that everybody else is already an user and that is going to hard to beat.

Branca and yak shaving

Sun, 04 Oct 2020 00:00:00 +0000

A while ago I wrote about the alternatives to JWT tokens. One of the them was Branca. It’s a base62-encoded, XChaCha20-Poly1305 encrypted token with an abritrary payload. Before writing the post, I thought out it would be nice to have a Branca library for Clojure and that’s something I’ve occassionally worked on since then.

First I wrote base62 encoder and decoder and published them as a tiny library called clj-base62. That was easy enough. Then I needed to implement the encryption part.

First I looked at using caesium, a Clojure binding for libsodium. It looked great except for one thing: it does not come with native libsodium binaries (the .so/.dll/.dylib files). You’ll have to install them yourself and this makes using it a hurdle.

Then I looked at using Google’s Tink which has pure-Java implementation of the required algorithms. It is also opinionated about key management. The opinions seem very smart, but it also makes it hard to just pass in a string as an encryption key. Passing in a string seems like not so great idea, but I’d like my library to be interoperable with the other Branca libraries that do exactly that.

I also couldn’t build tinkey, Tink’s key management tool, on my personal laptop. It requires a specific version of Bazel that was not available via NixOS’s package collection.

Finally I tried using Lazysodium. Like caesium, it’s a Java binding for libsodium, but unlike caesium, it comes with the binaries. I quickly implemented token encryption and decryption - and realized that my code refuses to decode tokens produced by branca-js and vice versa.

I was using Lazysodium’s Lazy interface. It operates on strings, which seemed weird to me, until I realized that it’s not just byte arrays disguised as strings – it’s hex strings (strings of ASCII hexademical digits). So now I still need to either add some byte-arrays-to-hex-strings conversions or use Lazysodium’s Native interface, which is very C-like but which operates on byte arrays. Then I’m hopefully done.

I thought implementing Branca tokens in Clojure would be a quick two-evening job. Alas.

Solving the diamond problem with shading

Sun, 27 Sep 2020 00:00:00 +0000

Dealing with difficult library upgrades has been a recurring task in my career as a software developer. This week I got a new tool into my toolbox for handling dependency conflicts.

I was working on a Scala project. It heavily uses a library that is no longer maintained. We would like to migrate to a newer library.

It would nice to migrate piece-by-piece: that would make both development and testing much easier. However, both the libaries depend on the different version of the same library. Our dependency graph looks something like this:

In our case, there’s no need to pass data structures between old-lib and new-lib, so this could work if v1 and v2 were somewhat compatible. Unfortunately that was not the case: the changes were so severe that our app would not even start if the newer version of base-lib was added to the classpath.

This is a variation of the diamond dependency problem. It’s an annoying problem in general, but in this case we were able to solve it easily by shading base-lib in old-lib. This means that we created our own version of old-lib that bundles base-lib with all the base-lib classes renamed and all the usage sites in old-lib changed to use the new names.

For example, net/quanttype/base_lib/ExampleClass.class would be renamed to shaded/net/quanttype/base_lib/ExampleClass.class in the resulting .jar file. Our dependency graph now looks like this:

Note that shaded-old-lib does not explicitly depend on old-lib or base-lib, since it already includes shaded versions of them.

Shading sounds difficult, but it was easy in practice using sbt-assembly’s built-in shading support. To do this, we need to create two sbt projects. The first one is the one that does shading:

lazy val shadedOldLibRoot = project
  .settings(
    libraryDependencies ++= Seq(
      // We depend on old-lib, which pulls in the correct version of base-lib
      "net.quanttype" %% "old-lib" % "1.0.0"
    ),
    assemblyShadeRules in assembly := Seq(
      // You can add more rules here if needed
      ShadeRule.rename("net.quanttype.base_lib.**" -> "shaded.net.quanttype.base_lib.@1").inAll,
    ),
    assemblyOption in assembly := (assemblyOption in assembly).value.copy(includeScala = false),
    skip in publish := true
  )

The second project publishes the first one without the dependency information:

lazy val shadedOldLib = project
  .settings(
    name := "shaded-old-lib",
    version in ThisBuild := "1.0.0",
    organization in ThisBuild := "net.quanttype"
    packageBin in Compile := (assembly in (shadedOldLibRoot, Compile)).value
  )

The resulting package can be installed to the local Ivy repository with sbt shadedOldLib/publishLocal. You can do full-fledged publish if you want – for us the local version was enough.

Add the new package to the dependencies of your main project:

libraryDependencies += "net.quanttype" %% "shaded-old-lib" % "1.0.0"

Finally, if you have used base-lib directly in your app, you’ll have rewrite any imports in your codebase. Like this:

find . -name "*.scala*" | \
  xargs sed -i .bak 's/net\.quanttype\.base_lib/shaded.\0/g'

Now you can add new-lib as a dependency and everything should just work.

When to use this?

There are some drawbacks to this approach.

The biggest one is that if old-lib or base-lib have any other dependencies, they get included in shaded-old-lib. If your app also depends on them, there may be new conflicts and they’re more confusing than before, since the conflict is not directly visible in the dependency graph. You can manually exclude those dependencies - see sbt-assembly’s instructions.

For us, shading solved the diamond problem neatly. Since we’re trying to migrate away from the old library, the drawbacks were acceptable.

Local memoized recursive functions

Sun, 20 Sep 2020 00:00:00 +0000

You probably know how to create a top-level memoized recursive function in Clojure, but how do you create a local one? By local, I mean defined with let or letfn.

For the lack of better example, consider a Clojure function that returns the n-th Fibonacci number. Here’s a top-level definition created with defn:

(defn fibo [n]
  (if (< n 2)
    n
    (+ (fibo (dec n)) (fibo (- n 2)))))

It’s a classic example of recursive function. However, the implementation above is not exactly efficient. If you run (fibo 50), it will take forever to finish.

There are two standards ways to make it fast: tail recursion and memoization. You can use clojure.core/memoize to create a memoized version of fibo:

(def fibo2
  (memoize
    (fn [n]
      (if (< n 2)
        n
        (+ (fibo2 (dec n)) (fibo2 (- n 2)))))))
        
(fibo2 20)  ; 6765
(fibo2 50)  ; 12586269025

This is fast enough for calculating the 50th Fibonacci number.

fibo2 is defined on the top level, but how do you create a locally-bound version of it? It seems like it shouldn’t be too hard. Here’s the non-memoized version created with let:

(let [fibo (fn fibo [n]
             (if (< n 2)
               n
               (+ (fibo (dec n)) (fibo (- n 2)))))]
  (fibo 20))

Note that we had to name the function twice: first in the let form and second time inside fn. This is because the binding created by let is not visible inside fn. With letfn, only one name is enough:

(letfn [(fibo [n]
          (if (< n 2)
            n
            (+ (fibo (dec n)) (fibo (- n 2)))))]
  (fibo 20))

Exercise: If you want, try to define a memoized version of fibo inside let yourself. Try calling it with large n to check that it works correctly.

Here are a couple of nice pictures so that you don’t accidentally scroll to the answer before you’re ready. If you don’t feel like attempting the exercise, that’s okay too.

We can’t use memoize with leftn, so let’s continue with let. Here’s a naïve attempt:

(let [fibo (memoize (fn fibo [n]
                      (if (< n 2)
                        n
                        (+ (fibo (dec n)) (fibo (- n 2))))))]
  (fibo 20))

If you try this with (fibo 50), you’ll notice that it does not work as intended. This is because inside the fn, fibo refers to the fn itself and not to the memoize-wrapped version. If you called (fibo 50) twice in a row, the second time would be fast since the final result of the calculation does get memoized.

There are a bunch of solutions on StackOverflow. For example, the answer by Michał Marczyk can be adopted:

(let [fibo
      (with-local-vars
        [fibo (memoize (fn [n]
                         (if (< n 2)
                           n
                           (+ (fibo (dec n)) (fibo (- n 2))))))]
        (.bindRoot fibo @fibo)
        @fibo)]
  (fibo 50))

However, this code probably makes you go hmmm. Calling Var methods via Java interop does not feel like very Clojure-like to me. I want to offer an alternative, more functional solution.

The function can’t refer to the memoized version of itself from the inside, but we can pass it in from the outside as a parameter. Let’s first try it without memoization:

(let [fibo (fn [rec n]
             (if (< n 2)
               n
               (+ (rec rec (dec n)) (rec rec (- n 2)))))]
  (fibo fibo 40))

The first parameter of fibo is the function itself. It looks a bit odd, but it works! We can, once again, clean it up by using a fixed-point combinator. It allows us to wrap fibo so that it receives the wrapped version of itself as the first argument.

(defn fix [f] (fn g [& args] (apply f g args)))

(let [fibo (fn [fibo n]
             (if (< n 2)
               n
               (+ (fibo (dec n)) (fibo (- n 2)))))
      fibo2 (fix fibo)]
  (fibo2 20))

Now the function definition looks pretty normal. fibo takes “itself” as the first parameter, but it’s not very different from (fn fibo [n] ...). Let’s try to memoize it:

(let [fibo (fn [fibo n]
             (if (< n 2)
               n
               (+ (fibo (dec n)) (fibo (- n 2)))))
      fibo2 (fix (memoize fibo))]
  (fibo2 50))

It works!

I’m always delighted to find use cases for fixed-point combinators. Writing a nice macro to wrap this up is left as an exercise for the reader.

No blog this week

Sun, 13 Sep 2020 00:00:00 +0000

I’ve committed to blogging every week and I have even a Beeminder goal to enforce it. But really, there’s no blog post in me this week, so I’m just posting this to appease Beeminder. Sorry.

Is this art?

Sun, 06 Sep 2020 00:00:00 +0000

This view in Pensar reminds me of certain famous Finnish paintings.

Sometimes when people encounter a creative work that challenges them, they ask “is this art?” and struggle to answer. To find a satisfying answer, you need to go beyond that question.

Epistemic status: Armchair philosophy.

I’m not going to give an all-encompassing definition of art, but the word is used in at least two senses:

Art as in artwork: Creative works made with the intention of expressing ideas and skill and affecting emotions.
Art as in artful: Works that are skillful and aesthetic.

When somebody says that “this piece of code is a work of art”, they usually mean the latter (there are exceptions). When you refer to the body of work by an artist, you mean the former, although the latter might apply as well. The trouble starts when you conflate these two meanings.

Some famous (if dated) works that raise the question of what is art are Marcel Duchamp’s Fountain and John Cage’s 4'33". That is just as well – they were meant to do that. They both are art in the former sense, but are they skillful? Considering their fame, I’d say yes, but the skills they demonstrate are different from the usual kind of sculpture and composition.

The word art is too loaded and leads to frustrating discussions. If a creative work has been created with artistic intention, why not just accept that it is art or at least analyze like it was art? Then you can continue to more interesting questions to poke at its artistic merits. For example:

Does it evoke emotions?
Does it convey a message?
Does it interest you?
Is it boring, or imaginative, or beautiful, or hidesome, or funny?
Was it made skillfully?
Why did the author make it?
Why was it exhibited or performed?

JWT and its alternatives

Sun, 30 Aug 2020 00:00:00 +0000

So are there any worthwhile alternatives to JWTs?

JSON Web Tokens (JWT) are a popular way to create URL-safe access tokens for web applications. They are often used for stateless sessions¹ and they’re part of OpenID Connect (OIDC) protocol.

Note: JWT is just one part of the JavaScript Object Signing and Encryption (JOSE) framework. It would be more accurate to talk about JOSE but JWT has come to represent the whole suite, so I’ll go with that.

Unfortunately, JWT is not a great design. It’s complex and many implementations have had serious security flaws directly connected to that complexity. For example: No Way JOSE! Javascript Object Signing and Encryption is a Bad Standard That Everyone Should Avoid

Recently I had a legitimate need for a URL-safe authenticated token. I needed to pass some information via the user’s browser from one service to another one that I control. The receiving service had to be verify that the information was coming from the original service intact.

The common wisdom on Twitter is that just slap a HMAC-SHA256 on it. That seems like a good idea. To avoid any issues with canonicalization, I’d dump my data as JSON, encode the result with base64url (no padding!), and authenticate that with HMAC-SHA256 and a secret shared by the services.

This sounds very much like a JWS (the signed variant of JWT). I’d need to implement it in two programming languages, JavaScript and Clojure. That sounded like a chore and the temptation to just a library was weighing heavy on me. But since JWTs are bad, I decided to take a look at alternatives.

I know of a couple of alternatives to JWTs:

Fernet. Encrypted tokens only. Encrypted with AES-128 in CBC mode and encoded with base64url. The choice of cryptographic algorithm seems a bit dated by now. Simple.

Branca. Encrypted tokens only. Encrypted with XChaCha20-Poly1305 and encoded with base62. Intended as a modern version of Fernet. Simple, but base62 seems like an odd choice when base64url implementations are so widely available.

PASETO. Has both signed and encrypted variants. Version 2 (the latest version) uses XChaCha20-Poly1305 with base64url encoding. More complex and e.g. has its own string array encoding. Still less complex than JWT.

I don’t know how both Branca and PASETO both ended up with XChaCha20-Poly1305. It’s not something that I’m familiar with and I don’t follow the latest developments in cryptography, but I’m going to assume that it’s a good, modern choice. At least it’s available in libsodium.

Any of these tokens standards could have solved my problem. I didn’t need encryption but I don’t mind it as long as the tokens are authenticated.

Nevertheless, I decided to go with JWTs. I took a look at the library situation of the alternative tokens and either the libraries did not exist or they looked like they hadn’t seen much use.

Working with OIDC, I’ve had to deal with JWTs for years, and by this point there are libraries that I trust despite the abysmal track record of JWT libraries in general (shoutout to Buddy for Clojure!). There are also tools such as JWT Debugger and step command-line tool. Creating a library and tooling myself would be possible, but not a worthwhile investment for an one-off library.

Looks like I’m going to be stuck with JWTs for a while.

Stateless meaning that there’s no server-side state. The session state lives in the token. It sounds great on paper, but in my experience you’ll end up introducing server-side state by the time you try to make logout work the way you want. For more information, see Stop using JWT for sessions, part 1 and part 2 by Sven Slootweg. ↩︎

Looking good in a suit

Sun, 23 Aug 2020 00:00:00 +0000

What have you been wearing during COVID-19?

For me, the strong isolation measures meant that I started working remotely. I wasn’t meeting people and this changed how I dressed. Instead of my normal office clothes - sweaters and chinos - I would dress in athleisure. Basically I looked all the time like I was on my way to a yoga session or a hike.

Recently, however, I’ve had the chance to wear a suit. We’ve celebrated some big events in friends’ lives.

Picking the right clothes was a bit of problem. I have this early 2000s (almost vintage!) grey suit that I really like. It has Prince of Wales check and wide legs. Unfortunately the fit is not perfect: it bunches slightly it the neck.

I’ve have these three commandments for looking good in clothes:

The clothes should be clean and not damaged.
The clothes should fit you well.
The clothes should exhibit good taste.

The earlier commandments are more important than the later ones. No matter how fancy clothes, they won’t look good on you if they’re of wrong size or stained.

I nevertheless ended up choosing the grey suit, because between it and the black suit that does not meet the first commandment, it was the better choice…

The minimalist program

Sun, 16 Aug 2020 00:00:00 +0000

In visual arts, minimalism was a movement that emphasized non-figurative, non-emotive, abstract art. There’s no need to refer to the world – the shapes and the materials make the piece of art interesting in itself.

In computing, minimalism refers to something else. Mostly it’s about reduction and parsimony. But I wonder: are there programs that would fit the visual arts definition of minimalism?

The programs relate to the world by solving problems. The minimalist program should be afunctional. It should not solve problem – it should exist because the computation itself is worth of our attention.

Can you think of such programs? My suggestion would be quine, a program that prints its own source code.

Two things that make logging out hard

Sat, 08 Aug 2020 00:00:00 +0000

When building a web service, logging out is often an afterthought. When you finally get to it, it turns out to be complicated.

As a user, you have a couple of reasons to log out from a web service:

You want to switch to another account.
You want to prevent anyone else (or even yourself) from accessing your account using the same device.

We now have multiple devices and the services can be used with browsers and native apps. Sometimes you want to log out of the other devices, for example if your phone gets lost or stolen. The service administrator may log out all your devices if they detect that your account has been compromised¹.

In a traditional Web 2.0 architecture, the active sessions are stored in a database. The user gets a HTTP cookie with the session ID and every request is checked against the database to see if the session is still active and to whom it belongs. Implementing log out is straightforward: delete the user’s session from the database and the user is logged out.

Contemporary web services and single-page applications come with two things that complicate the situation:

Single sign-on (SSO). You have logged in to multiple services using the same credentials. Now you click the log out button in one of the services. What happens? Do you get logged out from that service or all the services? On that device or on all the devices? Does the answer depend on whether the single sign-on provider is run by the same organization as the service or by a third party?

I’ve discussed this question with developers, designers, and product managers and I’ve learned that very reasonable people disagree about it. It depends.

Stateless sessions. Instead of storing sessions in a database, you can use a signed (and possibly encrypted) cookie or a token. As long as the user’s requests include a valid token, they’re logged in. There’s no server-side state associated with the session, hence the name.

In this design, logging out means forgetting the session token. You can unset the cookie and delete the tokens from applications’ storage. The security people do not like this, because the token stays valid even if the user has nominally logged out. Having a short expiration time helps, but then you have to figure out how to periodically get fresh tokens. Logging out other devices is impossible without re-introducing server-side state.

In conclusion. If you’re looking to implement a SSO system or stateless session architecture, talk about how logging out should work early on. Especially do this if you’re going to use OpenID Connect (which often is used in stateless fashion). It does not come with great answers to these questions.

For example, if they detect that all the accounts on their service have been compromised. ↩︎

Summer reading

Sun, 02 Aug 2020 00:00:00 +0000

In case you’re looking for some summer reading, here are a couple of books that I’ve read this year and want to recommend:

This Is How You Lose the Time War by Amal El-Mohtar and Max Gladstone. It’s a story about two time agents who fight on the opposing sides but slowly they get to know each other. The story is told entirely through the letters they send to each other. The book is rich in language and references.

I recommend it to sci-fi readers, but I think that even people who are not into sci-fi could like it if you’re into literature.

Bad Blood by John Carreyrou. The book tells the story of the infamous Silicon Valley startup Theranos. To me, it’s a story of the magic of Silicon Valley: you can will almost anything to existence if you are just persistent enough, and have the right connections and look the part. It’s rather thrilling!

I recommend it to people who follow Silicon Valley startup news. (If you’re a programmer and you read Hacker News, that counts.)

Monimuotoisuus by Juha Kauppinen. This one is in Finnish. It’s about the decline of diversity in the Finnish nature. The author visits a number of places in Finland, looking for species that are threatened or already extinct. These eloquent stories are used to explain the importance of diverse habitats. It’s a sad book.

I’d recommend it to people who enjoy the nature, and can read in Finnish, of course.

Signing .jars is worthless

Sun, 26 Jul 2020 00:00:00 +0000

If you try to deploy a new release of Clojure library with Leiningen, it prompts you to sign the .jar file with GPG. This step often causes confusion and breaks. I believe that it’s not worth the effort to make it work.

As far as I know, nobody ever verifies the signatures in a systematic way. There are a bunch of obstacles:

It’s unclear if any tools for verifying the signatures actually work. For example, I just tried to run lein deps :verify against a couple of projects and it reported every dependency as :unsigned. I know that some of those dependencies are signed and I verified that the .asc files exist on repo.clojars.org.
It’s hard to find the public keys for the library maintainers. Sometimes they upload them on the keyservers, sometimes not.
There’s no established way of communicating that which public keys should be trusted. If there’s a new release and it has been made with a new key, your best bet is to e-mail the maintainer and ask what is up.

It’s hard to get any security benefits from the signatures in practice. Thus it’s okay to set :sign-releases to false in your project.clj even if Leiningen’s manual does not recommend it. Something like this:

:deploy-repositories [["clojars" {:url "https://clojars.org/repo"
                                  :sign-releases false}]]

In principle, the systematic checking of signatures could provide security against a dangerous supply-chain attack: weak or leaked passwords for package manager accounts. For example, several RubyGems have been attacked this way. Most likely the signing keys would not be compromised at the same time.

There are alternative solutions, though, such as disallowing publishing packages without multi-factor authentication. Using Clojars’s deploy tokens helps a bit as well.

Right now we place a lot of trust on Clojars and Maven Central. If either of them got compromised, we all would be screwed. Package signing could be a part of a solution to mitigate that risk, but a comprehensive solution would be something like using The Update Framework. Go’s checksum database is also worth taking look at.

Finally, if you’re moved to do something about this, please do not build anything new using PGP. To quote Latacora: PGP is bad and needs to go away.

I’ve written this post in part to be proven wrong. I’m eagerly waiting for posts from y’all about how you do, in fact, systematically verify the signatures.

On paddling

Sun, 19 Jul 2020 00:00:00 +0000

Every now and then, I kayak. I joined a kayaking club for a couple of years ago, but I’m still a novice. I just haven’t been kayaking that much. This summer I’ve made a new effort to become more skilled and confident at paddling.

I’ve been practicing the basic techniques - how to paddle efficiently and how to maneuver the kayak. Turning a kayak is easy enough when the water is flat, but when there’s wind and waves, it gets more involved.

Just the other day I got in a situation where I was sideways to the wind and hard time getting the kayak to turn to either direction without using stern rudder. It’s weird because usually the kayaks turns into the wind, or if the skeg is down, downwind. I’m sure the situation would have been trivial for an experienced kayaker but for me it was a mystery.

What I like about kayaking is how immediate you are with the sea. It’s right there. The sea is a bit scary! There is always wind and waves, and there are boats and ships. Before midsummer, the water is cold and now it’s just cool. But I also like it - it feels good to learn to deal with the waves and everything.

I’ve come to like paddling alone. Due to my meager skills, I’ve had to limit myself to safe and familiar routes and calm weather. Once I learn more, I can explore more alone. There’s a good reason why every paddling safety guide suggests paddling in a group: capsizes, accidents, and any other problem situations are much easier to deal with whene there are multiple people.

There’s no kayaking photo in this post because, well, I haven’t taken any. At first I took some photos with my phone - I had this handy transparent waterproof floating case for my phone. One day I dropped it in the water and turns out it’s neither waterproof or floating. I managed to catch phone before it sunk, and it survived intact, but putting it a non-transparent actually-waterproof bag prevents photography.

Summer vacation

Sun, 14 Jun 2020 00:00:00 +0000

I’m starting my summer vacation next week and this blog will also take a small break. My original plan was to hike on Kungsleden in Sweden, but COVID-19 messed that up. I’m not sure what I’m going to do – I guess I’ll settle for some hiking and paddling in Southern Finland. I’ll be back in July.

Building software without hiring anyone

Sun, 07 Jun 2020 00:00:00 +0000

A number of enterprises in Finland rely heavily on external contractors for software development even though the software is a core part of their business. Often the product owners, project managers, and possibly architects work for the company, but the designers, developers, ops, QA specialists, and data scientists work for consulting companies.

Contractors are the obvious choice when you need extra staff quickly or when you need someone with skills that your organization does not have.

For example, many years ago when I worked at ZenRobotics, none of us were experts in building user interfaces. Our product, the robotic recycler, needed a control panel, so we hired a consulting company to build it. I wasn’t too happy with the result back then¹, but the premise still makes sense to me.

But why use contractors as the backbone of your software development organization? I have a couple of educated guesses, but lacking insider knowledge, I’m going to keep them to myself for now. Instead, I’m going to highlight a couple of oddities compared to the more traditional solution of hiring people.

Teams are thrown together. When the client needs more hands, there’s no hiring process. Instead, there’s either a sales process or a public tender. The client has little agency in picking the specific persons they want - the consultants are assumed to be interchangeable.

On the upside for the consultants, if you work for a company with a good reputation, you’re assumed to be competent. Bullshit interviews and whiteboarding are much more rare.

Contractors are second-class citizens. You may be excluded from internal communications, meaning that nobody will tell you about the client’s product strategy, or the upcoming office renovation - even though you may work at the office every day² for years. They’re even less likely to ask your opinion.

If you’re driven to build great products, you can still do that as a contractor, but it’s hard to have the same level of ownership as an employee could have. I think this is partially because contractors exist outside of the organization chart, so they cannot “officially” own anything.

People management is weird. In the traditional setup, if you are an individual contributor (IC), you’d have a manager who is your boss and whose responsibilities include supporting you and your growth.

In the consulting setup, you don’t have a people manager at the client. There is probably a some kind of project manager, but its not their responsibility to support you. You probably have some kind of boss at the consulting company, but typically they have very little insight into the day-to-day work at the client and they cannot support you.

You may end up in a team where no two people have the same employer, which makes peer-to-peer support harder to establish.

Incentives are weird. Let me crudely over-simplify: the contractors do not have incentives to ensure the business success of what they build, but they may have incentives to generate more work for themselves and their company. For employees, the incentives are opposite.

My point is: from the day-to-day perspective, building your software development organization on contractors is not an obvious recipe for success. I’d like to understand why it is so big thing in the Finnish IT scene.

The interface was a single-page web application built with AngularJS running in Chrome. I took over its maintenance and let’s just say that when I later learned React and Reagent, I never looked back. By the way, I’ve heard that SpaceX built their flight interface with JavaScript running in Chromium. I don’t know if they use Angular. ↩︎
At least when there is no ongoing global pandemic. ↩︎

Who is going to use the programming language?

Sun, 31 May 2020 00:00:00 +0000

When talking about programming languages, we often talk about the cool features that we could use and the thriving ecosystems that we could leverage. But when we’re choosing programming language for a project, what really matters is the team that is going to use it.

Learning a new programming language – if you want to use it in anger – takes time, even if you have a lot of experience. In addition to the language itself, you have learn about architecture, ecosystem, deployment, and operations with the new language.

For example, I was working in a team of experienced Clojure and Java developers when the management decided invest in Python. We went along and started a bunch of new projects in Python.

None of us had significant recent Python experience, but it has a reputation of being an easy language. It took us a good while to figure out how to handle dependencies, how to get a suitable test setup, how to structure our programs, how to deploy them with our existing infrastructure. We didn’t actually get the software into production use before I moved on, but I bet that would have taught us a number of new lessons.

This was an investment with real opportunity cost: we had a tight six-month schedule to ship a new product and we took weeks to learn Python. We could have spent all that time on polishing product, had we used a language that we already knew. Hopefully the investment pays off in the years to come.

In the short term, an experienced team using a programming language that they know will always beat an experienced team using a programming language new to them. In the long term things may be different.

Automating spec-tools releases

Sun, 24 May 2020 00:00:00 +0000

My employer, Metosin, is well-known for its Clojure open-source libraries. When people hear that I work for Metosin, they often ask if I contribute to the development of the libraries. I do, but not so much in the form of coding new features or bug fixes. My focus has been on maintainership tasks such as creating releases.

At the moment, we do not have a well-defined process or schedule for releases. Personally, I believe in small releases. If a library has some unreleased work and it looks like there won’t be more changes in the immediate future, I’ll do a release.

The actual release process goes something like this:

Merge ready PRs. First I check out if there are any open PRs that could be merged. If yes, I merge them.

Update the changelog. I go through the PRs merged since the last release and add them to CHANGELOG.md. If there are breaking changes, I mark them as such.

Update the version number. If there are breaking changes, I bump the minor or the major number. If not, bumping the patch number is enough.

Build and deploy the release. On a good day, this means running lein release.

Announce the release. Usually just on Clojurians Slack.

This takes quite a bit of manual work and I’d like lessen the load for myself and the other maintainers. As an experiment, I decided to automate build-and-deploy step for spec-tools. It’s a small step, but it’s the easiest to automate. It’s also something that we get regularly wrong:

Many of our releases are missing git tags and some of those tags are wrong.
Some of the releases have been built against dirty work trees - that is, they contain uncommited code.
If there’s Java code (like in jsonista and reitit), it won’t work with Java 8 unless it’s compiled with Java 8. The release is basically broken if you build it while running Java 11.
Signing the artifacts with GPG constantly goes wrong.

I used GitHub’s Release feature, GitHub Actions, and a bit of shell scripting to automate the build.

There’s GitHub Actions a workflow that runs when you create a new GitHub release. Each GitHub release corresponds to a git tag. The workflow checks out the code for the tag and runs lein deploy.
There’s a shell script that does a couple of sanity checks and creates a new GitHub release from the head of master branch. Me and the other maintainers can run it whenever we want to publish a new release.
The release artifacts are not signed with GPG. If somebody managed to steal the Clojars deploy token used by GitHub actions, they would be able to steal the GPG key as well. And anyway, I don’t think anyone checks the signatures.

It seems to work! I hope to start to use this in more projects – especially the Java 8 problem would be nice to automate away.

There’s room for improvement, for sure. For example, the GitHub Release text is now just whatever is the latest commit message, and we could trigger a cljdoc build right after the release is deployed.

To go further, we could use a tool like release-drafter to generate the changelog from pull request titles. We’ve used release-drafter for work projects with good results and I’ve experimented with something similar before. This would save a lot of work.

Have you seen the swan?

Sun, 17 May 2020 00:00:00 +0000

Töölönlahti is surprisingly good bird spot in central Helsinki

How have you spent the copious free time given to you by the COVID-19 isolation? I have been watching the urban nature.

Usually in May, I’m spending the weekends at the summer cottage and in Nuuksio and in the nature in general. This year travelling and even using public transport has been discouraged, so I’ve taken walks in the nearby parks.

It’s funny, but I’ve probably paid more attention to the flora and the fauna than ever before. When you visit the same places again and again, you have more time to look at them and you start noticing the changes. It’s spring, so flowers pop up, trees get leafs, and new birds start showing up.

One of my favorite sights right now is the brooding mute swan in Töölönlahti. It’s been there for a couple of weeks - hopefully soon we get to see some cygnets.

You also start to notice the gaps in your knowledge. I have no idea of the names of the flowers and there are so many birdsongs that I should recognize but I do not. I’ve been slowly looking them up. Eventually I’ll learn.

Essential features of data specification libraries

Sun, 10 May 2020 00:00:00 +0000

Last week I took a look at three data specification libariers in Clojure: Schema, Spec, and Malli. This week, let’s talk about the essential features of these libraries.

Data specification language. This is the foundation of every data specification library: a way to describe the data with a schema. You could use an existing language such as JSON Schema, but in practice that’s clunky and everybody develops their own language.

;; Let's model users. We want to know the user's name and the year of birth.

;; Schema
(require '[schema.core :as schema])
(def User {:name schema/Str, :year-of-birth schema/Int})

;; Spec
(require '[clojure.spec.alpha :as spec])
(spec/def ::year-of-birth int?)
(spec/def ::name string?)
(spec/def ::user (spec/keys :req-un [::name ::year-of-birth]))

Validation. The basic operation is to validate data against a schema. What is interesting is what happens when the validation fails. Easy-to-read error messages are essential for the developers, but having errors as data is a useful building block, too. For example, you could build front-end form validation on top of a data specification library, and then you’d know what was the problem and in which field.

;; Our user data example is missing the year of birth. Who gives out their
;; real year of birth, anyway, to the services we (the software industry) build?
(def a-user {:name "[email protected]", :year-of-birth nil})

(schema/validate User a-user)
;; Execution error (ExceptionInfo) at schema.core/validator$fn (core.clj:155).
;; Value does not match schema: {:year-of-birth (not (integer? nil))}

(spec/valid? ::user a-user)
;; => false

(spec/explain ::user a-user)
;; nil - failed: int? in: [:year-of-birth] at: [:year-of-birth] spec: :user/year-of-birth

Conforming. This feature is only implemented by Spec, but the idea is more general. A data specification language allows you to declare a grammar for your data structure. Conforming is parsing your data against that grammar. The result is akin to an abstract syntax tree, and Spec calls the conversion in the other direction “unforming”.

;; The regular expression specs and spec/or reveal the power of conforming.
;; Let's model hiccup-style HTML data.
(spec/def ::hiccup
  (spec/cat :tag keyword?
            :options (spec/? map?),
            :children (spec/* (spec/or :tag ::hiccup, :text string?))))

;; Now let's parse an anchor tag into a neat map.
(spec/conform
  ::hiccup
  [:a {:href "http://www.example.com/"} "Check out " [:i "example.com"]])
;; {:tag :a,
;;  :options {:href "http://www.example.com/"},
;;  :children [[:text "Check out "]
;;             [:tag {:tag :i,
;;                    :children [[:text "example.com"]]}]]}

Coercion. Coercion, or more generally, schema-driven transformation of data means that you walk your data and schema together and apply transformations to build new data. This allows you to do things like converting date strings in JSON to java.time instants and vice versa.

Program instrumentation. Yet another use case for data validation is to define a contract for a function: what kind of inputs it takes, what kind of data it returns, and how these are related. This is what Spec’s fdef does.

(require '[clojure.spec.test.alpha :as stest])

(defn plus [x y] (+ x y))
(spec/fdef plus :args (spec/cat :x int? :y int?) :ret int?)
(stest/instrument `plus)

(plus 1 2)
;; => 3

(plus 1 "2")
;; Execution error - invalid arguments to user/plus at (REPL:1).
;; "2" - failed: int? at: [:y]

Schema introspection. The ability to inspect the schema enables features such as generating JSON Schema from your schemas, as shown by spec-tools.

Performance. Data specification libraries can end up playing a pretty important role in your application. For REST APIs, JSON coercion is part of every request, and if you use instrumentation, validation is literally everywhere.

In my experience, indiscriminate use of Spec’s instrumentation makes programs crawl. Michael Borkent shared a similar experience on The REPL podcast:

If you have an application and you have like 100 core specs and then you start instrumenting those, the application becomes really really slow, even in just in development. So the overhead of calling spec validations on every function call in Clojure program becomes too much for core functions. That is what I had found and I was a little bit disappointed that it didn’t work out, so, at least for dev purpose.

The performance matters and your data specification library can become a bottleneck.

As far as I know, no data specification library for Clojure has all of the features mentioned above. Hopefully this list helps you to choose one – or to decide to roll your own!

Schema, Spec, and Malli

Sun, 03 May 2020 00:00:00 +0000

There are a number of data specification libraries for Clojure. The best-known ones are Schema and clojure.spec. Then there’s Malli, designed by my colleagues at Metosin.

The libraries are used in two ways. They’re used for specifying the shape of data inside the program, for example the parameter and return types of a function. They’re also used for specifying the external interfaces of the program, for example what kind of JSON a REST API endpoint accepts. I’ll call these use cases internal and external.

Schema was designed for internal use, but it has seen a lot of external use via libraries such as compojure-api.
Spec was designed for internal use. For example, its support for conforming (normalization) and regular expressions over data structures are great for parsing little languages in macros and configuration.
Malli was designed for external use. For example, the robust support for schema-driven transformations is a key feature (see e.g. my basic JSON coercion example) and the schema language of Malli is designed to be serializable.

Both Malli and Spec can be extended to the other use case via libraries such as spec-tools and aave.

So which one should you use? I don’t think there’s one obvious answer at the moment. Here’s how the situation looks to me:

Schema is a proven choice, but it won’t be getting any new features. If it does what you need, great.
Spec is the best option for the internal use case and it has a nice ecosystem of libraries such as Expound built around it. However, spec isn’t exactly easy to extend and I’m saying this as one of the authors of spec-tools. Spec’s alpha2 fixes some problems and brings new features such as select, but the development has been slow and I don’t know if there’s a good story about how we’re going to migrate the ecosystem to alpha2.
Malli has great features for the external use case and it’s easy to build on. There’s just one downside… it has not been released yet. Go for it if you’re okay with using a library that will get some potentially-breaking changes before its first release.

Thus, it depends, but more so than usual. You have to consider your use case and your capacity for bearing technical risks.

Elegant knowledge transfer

Sun, 26 Apr 2020 00:00:00 +0000

Let’s say you’re leaving a software team and somebody else is joining the team to replace you. How do you efficiently transfer knowledge to the new person in short time?

My answer is that you… don’t.

You can bring on the new member in the usual way. Maybe somebody gives a presentation, maybe you’ll pair-code. The new person starts with some easy tasks that act as a good introduction.

I hope you have a great README. The common knowledge should ideally be encoded in the shared artifacts of the team: the software itself, the documentation, and the project management tools. This helps with onboarding.

You can’t, however, teach everything an individual has learned over the course of months and years in a week or two - unless you’ve poured a lot of effort into making it teachable.

If you have specialized knowledge, you can try to teach it to the existing team members instead of the newcomer. At least you have a lot more of common ground with them.

I believe people understand this intuitively yet I keep seeing well-resourced organizations doing rushed hand-offs. I assume it’s to keep costs down. It causes a loss of tacit knowledge, but that’s hard to measure so it’s assumed to be essentially free.

Ricoh GR III, a year later

Sun, 19 Apr 2020 00:00:00 +0000

The red buttons are the anchors for Peak Design’s camera straps.

I got my Ricoh GR III in April 2019. Back then, I thought that it’s a pretty great pocketable camera. Now it’s time to take a second look.

Background: Ricoh GR III is a fixed lens compact camera with a fast, 28 mm-equivalent lens. It’s a successor to GR II, which has cult following in the street photography circles.

I still haven’t explored the camera that much and, honestly, that’s a good sign. I quickly settled with a workflow and the camera has faded into the background. I’ve been focusing on the photos, not on operating the camera. This is how it should be.

The good. The camera fits in my pocket and it’s quick to operate with one hand. Focusing with the touchscreen is great. I like the colors in straight-out-of-camera JPEGs.

Initially I used the camera with a neck strap, but nowadays I just keep it in my hand or stash it in the pocket. When I put it in the bag, I use a small Pelican case (1020 Micro Case) that just fits the camera and the charging cable. It’s sturdy but a bit heavy.

The firmware updates solved my problems with autofocus and the battery life has been good enough. When I hiked Karhunkierros, I recharged the camera only once despite the cold weather, and I took a lot of photos.

Flaws. The P mode still skews towards wide apertures. You can easily shift it with the front dial, but I sometimes forget to do it and get photos with too shallow depth of field.

I’ve discovered a new flaw in the camera. The camera comes a ring cap that protects the lens barrel. The cap can be detached to attach a wide conversion lens. The flaw is that cap sometimes comes off by itself. I dropped the cap somewhere and a new one costs 40 €, so now I don’t have a cap.

I sometimes wish I had zoom, and on sunny days, a viewfinder would be nice. However, the lack of these features is part of what makes the camera so great for me. Can’t get everything.

In conclusion. Still great.

Put files where they are expected

Sun, 12 Apr 2020 00:00:00 +0000

When creating a new file for your programming project, how do you choose how to name it and where to put it?

I recommend choosing a location that is expected by both humans and programs. Here are a couple of examples:

README. There’s a decades-long convention of putting the basic overview documentation for a project in a file called README. This convention is now recognized by tools: for example, GitHub renders it in the repository front page.

Docker Compose. If you use docker-compose to set up the development environment, you could put its configuration file with a custom name in some subdirectory and use docker-compose -f path/to/my/config-file… or you could put it in docker-compose.yml in repository root and ignore the -f parameter. Then docker-compose up would be enough to start the dev environment.

Clojure tests. When you create tests for your Clojure project, you could name them arbitrarily… or you could put tests for the namespace a.b.c in a.b.c-test and you can enjoy using Projectile’s and Cursive’s “jump between test and implementation” feature. Cursive even has a nice feature for creating new tests, but it assumes this naming convention.

If you do what the humans expect, they don’t have to ask questions or look up the documentation. If you do what the programs expect, they’re more ergonomic to use.

Video calls are cool but have you tried writing

Sun, 05 Apr 2020 00:00:00 +0000

It takes some skill to work remotely. When migrating from office to remote work, you can’t just set up some communications tools and expect the work to work as if nothing happened. You and everybody else have to learn how to use the tools to collaborate.

Now software development organizations are mass-migrating to remote work. Everybody is setting up Zoom or Teams that to enable video calls and learning to raise their hand to get their turn to speak. That’s a great start, but the real remote transformation (in Finnish: etäloikka?) will happen when you move to asynchronous communication. That means communicating by writing.

Replacing meetings with asynchronous communication in text has a couple of benefits.

Your schedule is not dictated by meetings. You can manage your time and your energy yourself. You don’t have to sit through irrelevant meetings in full and this gives you time to focus the issues that really need your attention.

There are fewer interrputions. They can’t interrupt you if they can’t reach you.

Text has great features. Text can be reread, searched, and shared easily.

Organizations usually already have good-enough tools for this. For example, we have a chat, a wiki, an issue tracker, a code review tool, and there’s always e-mail. It’s more about the mindset and developing the skills. Do you really need a meeting?

Video calls are a great tool when you need it. Personally I think that they’re great for creating a feeling of human connection in a way that is hard to have in text. I just wanted to say that you can go further.

Freezing deployments is risky

Sun, 29 Mar 2020 00:00:00 +0000

A log in a bog.

If you’re running mission-critical software, how do you minimize the risk of it breaking during these difficult times? Releases cause problems, so it can be tempting to require an approval of each release by a change advisory board. You could even declare a deployment freeze: no new versions of the software can be deployed unless absolutely necessary.

Releases are inherently risky and a freeze does reduce this risk by making the releases rarer. On the other hand, it makes the relases even more risky than usual by making them bigger. Because releases become rare and it’s hard to get an approval, people will cram as many fixes and features as they can into each release

If you start with a reasonably reliable release process, I believe that making releases bigger and rarer causes more problems than it solves. The Accelerate State of Devops Report 2019 concurs. They write about heavyweight change processes (pp. 50-51):

We found that formal change management processes that require the approval of an external body such as a change advisory board (CAB) or a senior manager for significant changes have a negative impact on software delivery performance.

According to their research, the heavyweight processes increase change fail rates. They conclude that “Analysis suggests this approach will make things worse.”

Working from home: initial impressions

Thu, 12 Mar 2020 00:00:00 +0000

COVID-19 is out there and like many, I’m working from home. Allow me to offer my unique perspective.

Not everyone can work from home - not all jobs can be done remotely and even if they can, there are many obstacles to working from home. For example, the people you share your home with - partners, kids, roommates - have to give you enough physical and mental space so that you can work. And they have to have space for their work, too!

Myself, I’m lucky and privileged enough to pull it off. My biggest gripe is that I don’t have a proper home office, so I’m working at our dinner table. It’s not ergonomic enough for the long term, but I can handle it for a couple of weeks.

The organization I’m working with has previously given the option of working 1-2 days per week remotely, but right now everybody is recommended to work remotely full time. This is the first time we’re remoting in anger.

I’ve worked remotely occassionally for ages, but software development is team work. Being productive is not just about the individual. The partial remoting was not enough for the organization to develop proper remote work capability, so this has been a an interesting experience.

Connections: Most of the developers do not have VPN connection to the internal network, due to the usual kind of enterprise bureacracy. We just made some changes that increase the need to access the internal network. What a great timing!

The issue is getting resolved now with high priority, but it’s a bit sad if it takes a pandemic to get a working VPN.¹

Meetings: It was always possible to attend meetings remotely via a video call, but it wasn’t great compared to showing up: it’d be hard to follow what is being said in person, the screensharing didn’t always work, etc.

Now when the meetings are conducted online-first, this stuff has to work. Turns out it does. We’ve made a lot of progress on this front and I’ve already had fun, productive meetings this way!

Communications: It’s common advice that you should overcommunicate when working remotely and it’s true. You have to perform the work even more than in the office. You aren’t bumping into people at the coffee machine, after all. The writing skills that I’ve honed by IRCing for two decades now get to shine on Slack.

At the office, if you ever wonder what is going on, at least you can look around and go talk to the people. When you’re remote, all there is Slack. Quiet Slack is of no use, but people are warming up.

The coffee machine: So what’s the online version of bumping into people at the coffee machine? I don’t know yet but I’m sure something will appear.

Once the epidemic has passed, the organization will return to office work. I hope that the remote work skills we develop now carry over and allow fully productive partial remoting in the future.

Based on what I’m hearing from my software developer friends, this is by no means a unique experience. ↩︎

Programming is writing

Tue, 25 Feb 2020 00:00:00 +0000

When you write programs, you have two audiences: the humans who read the code and the computers that analyze and execute the code. The both audiences have to understand the code.

Humans read programs like a hypertext¹: not from start to end but by jumping around following references.

For humans, the careful choice of terminology is of utmost importance. Consistency matters. The computers do not care. They care about the grammar, though.

Elegant programs are easy for humans to understand and easy for computers to execute.

Psst: Math is programming, ergo math is writing.

A well-known example of a hypertext is Wikipedia. ↩︎

Making decisions without asking your boss

Wed, 19 Feb 2020 00:00:00 +0000

At work, we aim to have a self-managed organization instead of a hierarchical one.¹ One of the challenges is that how decisions should be made. In a hierarchical organization you could ask your boss, but what if your boss does not want to make every decision? You could seek consensus, but what if you can’t reach a consensus and a decision nevertheless has to be made?

A practical solution is to use the advice process. Anyone can make a decision after seeking advice from:

Everyone who will be meaningfully affected by the decision, and
Everyone who has expertise in the matter.

As long as everyone is able to trust the process², this neatly solves the problem of consensus.

For example, we were having problems with Flowdock (an enterprise chat service): the mobile clients were constantly broken and the search wasn’t great. It quickly became apparent that we wouldn’t reach a consensus about what we should do: some people thought we should move to Slack, others (me included) preferred Zulip, and some thought that continuing with Flowdock was the best option despite its flaws.

A person announced that they’d be making a decision about the course of action in two weeks. Since the chat is our most important communication tool, they announced it to everyone in the company and solicited advice. People made arguments about communication styles and social, technical and monetary aspects of each solution. Once the deadline passed, the decision maker announced the solution: we’d be moving to Slack as it had a bunch of practical advantages and was also the most popular solution.

Even though my favorite solution didn’t get picked, I felt like this was a good way to make the decision: everyone affected had a say and it was relatively efficient, timeboxed process that was guaranteed to produce a decision.

You do not have to send me a link to Tyranny of Structulessness. It’s a great essay and I have read it and I still believe that you can improve upon strictly hierarchical organizations. Look, I don’t send a link to Moral Mazes or to The Gervais Principle every time I hear somebody say that they work in a hierarchical organization. ↩︎
You have to trust the decision maker. I’m not sure if trusting the advice-givers is 100% necessary. It does help, of course. ↩︎

No need for something to say

Wed, 12 Feb 2020 00:00:00 +0000

When you’re out there, struggling to create, staring at the blank page, the blank canvas, or the blank MS Paint window, you might be thinking “I have nothing to say”, and you might be right.¹ Yet that is not the problem, is it?

The problem is how to have great idea. Having something say, a message to the world, can be a great source of creativity, but it’s not the only one. What else is out there?

I challenge you to consider art. When you look at it, listen to it, experience it, what do you see, hear, feel? Is there a message? Did the author want to say something? Or was there something else driving them?²

Those without a message, I bet they were playing.³ Maybe they looked at the materials and went hmm or tried out a new technique and went I wonder. Maybe they saw a constraint or a challenge and went I can overcome that. Maybe they felt something and went now that’s delightful. Maybe they saw something and went but nobody else has ever seen this. Maybe they just had good time playing or painting or writing or dancing.

Play and messages: did I cover the sources of creativity exhaustively?

You might be wrong. ↩︎
For many professional creatives, both historical and contemporary, earning a livelihood has been an important driver. Is that a source of ideas? ↩︎
It could’ve been us, but. ↩︎

Logging request IDs in Tornado

Wed, 05 Feb 2020 00:00:00 +0000

When you’re debugging a web service, it’s handy if you can get all the log entries associated with a single HTTP request. This is easy if you generate a unique ID for each request and include it in all the log entries.

If your services calls other services, you can build a simple tracing system by including this ID in all those calls. If you’re using nginx as a reverse proxy, you can use it to generate the IDs.¹

At work, we’re using the Tornado web framework in Python and we wanted to have the request IDs in our logs. The most obvious solution is to pass the request ID to every function that logs anything and manually include it in the logging calls. This leads to cluttered code, though, and it’s easy to forget to add the ID everywhere.

We wanted to have an easier, more magical solution and we found it in context variables and log filters.

Context variables are “context-local” variables. They’re similar to thread-local variables – and to Clojure’s dynamic vars – but they work correctly with Python’s asyncio. Thus they’re a great choice for storing “request-local” data such as the request ID.

In your handler, you can generate or extract the ID and store it in a context variable:

import contextvars
import tornado.web

request_id_var = contextvars.ContextVar("request_id")

class MyHandler(tornado.web.RequestHandler):
    # prepare is called at the beginning of request handling
    def prepare(self):
        # If the request headers do not include a request ID, let's generate one.
        request_id = self.request.headers.get("request-id") or str(uuid.uuid4())
        request_id_var.set(request_id)

In my example, I’ve implemented prepare in the same class as my actual handler, but in our real application, all our handlers inherit from a custom base class implements request ID preparation and some other common features.

Python’s logging cookbook has two recipes for adding contextual information to logging output: LoggerAdapters and filters. We chose filters to avoid having to wrap loggers everywhere. Then we can use the familiar pattern of getting a module-specific logger:

import logging
logger = logging.getLogger(__name__)

The main use case for filters is limiting which log entries get emitted, but they’re allowed to mutate the records. Our filter always returns True:

class MyFilter(logging.Filter):
    def filter(self, record):
        record.request_id = request_id_var.get()
        return True

Unlike log handlers, filters do not propagate. This means that you have to add the filter to every logger… or you can add it to the handler of your root logger.

my_filter = MyFilter()
for handler in logging.getLogger().handlers:
    handler.addFilter(my_filter)

Here’s the full example:

import contextvars
import logging
import uuid

import tornado.ioloop
import tornado.web

logger = logging.getLogger(__name__)
request_id_var = contextvars.ContextVar("request_id")


# Let's have an async function for the sake of demonstration
async def generate_number():
    logger.info("generate a number")
    return 4


class MyHandler(tornado.web.RequestHandler):
    # prepare is called at the beginning of request handling
    def prepare(self):
        # If the request headers do not include a request ID, let's generate one.
        request_id = self.request.headers.get("request-id") or str(uuid.uuid4())
        request_id_var.set(request_id)

    async def get(self):
        number = await generate_number()
        self.write(f"Here's a number: {number}")


def make_app():
    return tornado.web.Application([(r"/", MyHandler),])


class MyFilter(logging.Filter):
    def filter(self, record):
        record.request_id = request_id_var.get("-")
        return True


if __name__ == "__main__":
    logging.basicConfig(
        format="%(levelname)s %(request_id)s %(message)s", level=logging.INFO
    )

    # Log filters do not propagate, but handlers do. Thus we add the filter
    # to the handlers of the root logger so that the messages of child loggers
    # get filtered as well.
    my_filter = MyFilter()
    for handler in logging.getLogger().handlers:
        handler.addFilter(my_filter)

    port = 8000
    app = make_app()
    app.listen(port)
    logger.info("Listening at http://localhost:%d/", port)
    tornado.ioloop.IOLoop.current().start()

The example is also available as a Gist.

If you run it (remember to pip install tornado) and do a HTTP request, you will see something like this:

INFO - Listening at http://localhost:8000/
INFO f00e793f-73e5-4210-8709-41aefe839e5a generate a number
INFO f00e793f-73e5-4210-8709-41aefe839e5a 200 GET / (::1) 1.27ms

See also my other posts about Python.

For a more sophisticated distributed tracing system, take a look at something like OpenTracing. ↩︎

Hello again, Python

Sun, 19 Jan 2020 00:00:00 +0000

Thanks to some organizational surprises, I’m now developing web services in Python. The last time I used Python in anger was in 2016. Has anything changed?

Records. Python has records now, thanks to attrs and dataclasses. Namedtuples were too simple and and writing classes by hand for everything was too complicated. Attrs and dataclasses are just right. I didn’t get this in 2016 when attrs was new, but I get it now. Attrs has a nice overview of how it compares to the other solutions.

Type annotations. The type annotations are useful now. I’m using JetBrains PyCharms as my IDE and it the displays type information doc popus and warns about type errors. In the command-line, mypy works – and many third-party libraries actually have type annotations! Python is no Haskell – heck, it’s not even TypeScript – but this is a welcome development.

Python 2 vs. Python 3. We’re finally in the era of Python 3. I hope.

Package management. The tools have become more robust. I have so many experiences of easy_install or pip or virtualenv mysteriously breaking. Now everything seems to work. It’s still complicated but at least it works.

I’m pleasantly surprised. I still believe that Python leaves a lot to be desired, but undeniably the developer experience has improved!

Joys of a heavy camera

Sun, 12 Jan 2020 00:00:00 +0000

I’ve had three rolls of medium-format (120) film sitting in my fridge. They’ve been there long enough that they have all expired already. It’s black-and-white film, though, so it should be fine, especially since I’ve kept it refrigerated. I’ve shot with decade-old films and the results were okay.

The Mamiya is still with me, so I decided to finally go and shoot the films. I headed to Mustikkamaa for my “usual” set: ice, rocks, and reeds.

This winter has been unusually warm in Helsinki, so we do not have much ice. You have to crop carefully to make the small bits of ice look like a winter wonderland. At least there was enough ice under the Korkeasaari Zoo bridge to make the weird packing noises.

It was my first session with the Mamiya in two years. My usual camera is light and nimble. The Mamiya is the opposite: it’s heavy and cumbersome. It challenges you in two ways.

Every shot takes work. You have to adjust the tripod, measure the light, cock the mirror, advance the film, remove the dark slide, and, finally, trigger the shutter. It’s not cheap, either: the film and processing easily costs over 1 € per frame. You’ll want to cover every motif with as few shots as possible.

Finding a new location takes work. You have to lug around the camera and the tripod while walking on icy rocks. You’ll want to get everything out of the spot where you plop the tripod down.

This leads to slower, more deliberate photography experience. Every shot matters. You’ll focus on the composition and on shooting on the critical moment. For me, it’s a delight.

Sometimes you want to slow down. Sometimes you want to speed up. Heavy cameras have their upsides.

The photos in this post are digital, but click here to see the film shots on Flickr.

Yearnote 2019

Sun, 05 Jan 2020 00:00:00 +0000

New year, new shenanigans, as we say in Finland. A new decade! ¹ I, on the other hand, am back to my bullshit. Please allow me to talk about me in the year 2019.

It was a difficult year for me, but I don’t want to delve on that too much. Instead, let’s focus on the good stuff.

Working

I continued to work as a software developer. In autumn, I started to work for a new client. This is my first time working for a big (in the Finnish scale) tech organization. Finally I get to see in action all the big organization dynamics that I’ve only read and heard about.

Working in a big organization is much more performative than in a small one. It’s not enough to do work – the others must also know that you’ve done it.

Thoughleadering

Year ago I wrote that I want to level up my thoughtleadering game. I did succeed, in a small way!

My post on brewing coffee made it to Hacker News front page.
I gave a lightning talk at :clojureD.

These two experiences made me realize that I prefer blogging to public speaking, both as a writer/speaker and as a reader/listener. Accordingly, I gave up on my public speaking plans and started blogging regularly.

I attended three Clojure conferences in 2019: :clojureD, Heart of Clojure, and ClojuTRE. Heart of Clojure was especially good, but all of them were a pleasure.

I don’t think I will attend as many conferences in 2020, but I hope to get a chance to meet up again with at least some the cool people I’ve gotten to know from the Clojure community.

The most distressing thing I ate in 2019: this combo of amazake and konjac balls.

Traveling

In April, I travelled to Japan for three weeks. I loved to hike up the mountains in Hakone and in Yakushima and to eat dorayakis at every occassion. This was my first time in Japan, but I do get it now why so many people here in Finland love the country. It’s such a foreign place, yet so easy to travel in.

In October, I hiked Karhunkierros. It was my first solo hike and it left me wanting more. I don’t have many plans for 2020, but doing another long hike is one of them.

Traditional commentary on Finnish politics

I can’t believe that Antti Rinne’s cabinet already fell apart. I hope that Sanna Marin will have a better run.

As a mathematician, I assert that the 20s begin in the year 2020. ↩︎

Standard problems, standard solutions

Thu, 19 Dec 2019 00:00:00 +0000

I work as a software consultant. What we do is that we develop software for other companies. Sometimes we do it as a team of our own and sometimes embedded in an in-house development team. Sometimes the clients come to us for our special expertise and sometimes they just need butts in the seats churning out code.

Our job is to implement standard solutions for standard problems. The clients are not in the high-tech business – or if they are, their in-house developers work on the innovative secret sauce and we come in to build the scaffolding needed for making the secret sauce in a complete product.

The main challenges are rarely technical. Instead, we need to understand the business domain and figure out which of the usual problems need solving. Then we need to figure out how to navigate the client organization to allow us to build the usual solutions efficiently.

Our key deliverable is the software itself. Another deliverable is the process that produced the software. If the software is to live on after we move on, the process has to continue and evolve.

After all, software is done only once its last user stops using it.

Just automate syntax formatting

Thu, 12 Dec 2019 00:00:00 +0000

Fighting over syntax formatting in code review is, mostly, waste of time. Having consistent formatting is great for readability, but code review is not the right place to enforce it. To keep code reviews fast and smooth, you’ll want to focus on high-impact issues. With formatting, the returns diminish quickly.

Instead, you should use computers to enforce consistent formatting. There are two ways to go about it:

Use a code formatter tool. Try to find a fast one and configure everybody’s editor to run it on save, or use a git pre-commit hook. If you write Go, use gofmt. If you write JavaScript, I hear Prettier is a popular one.

Use matching editor configurations. If your programming language doesn’t have a good code formatter, your next best bet is to configure everybody’s editors to match each other as well as possible. This solution doesn’t really scale, but for small teams it’s feasible. An EditorConfig file can help a bit.

For example, there’s no canonical formatter for Clojure. Matching the config is simple if everybody uses the same editor, but we’ve managed to get close enough with Cursive and clojure-mode.

What do we do when the editor config mismatch causes unnecessary code reformatting? We let it be.

It’s better to accept some code churn than to fight over indentation in code review.

See also my other posts about code review.

Keeping code review fast

Thu, 05 Dec 2019 00:00:00 +0000

If you want mandatory code review to really suck, make it slow.

Power games aside, one of the most frustrating aspects of code review is how long it takes and how many context switches it involves. There are two main things that contribute to this:

how long it takes for the reviewers to react to new and updated pull requests (PRs).
how many rounds of review are needed

In my experience in a 2-5 person team, if everybody is willing to dedicate some time for reviews every day, the wait for reviews is never too long. If I can get reviews for morning PRs in the afternoon and for the afternoon PRs in the next morning, I’m pretty happy already.

If your team members are not willing to dedicate this time, usually it’s because they either do not see code review as valuable – possibly because the management does not see it as valuable! – or they don’t know how to do it. You’ll have to solve those issues before you can improve the review velocity.

When reviewing, I use a three-level system of approvals:

Not approved; changes are needed and the PR needs to be re-reviewed after the changes.
Approved conditionally: the author needs to do some changes, but they can merge the PR afterwards without another review round. For example, if I point out something minor like a typo, I trust that the author can fix it themself.
Full approval. The PR can be merged as-is.

Ideally a most of your PRs would be approved with a single review round with some needing another round. There are a couple of things that help:

Keep PRs small. A shorter PR is easier to review than a longer one.

Give actionable feedback. The second round of review will go more smoothly if the author knows exactly what changes the reviewer wants to see to approve the PR.

Automate what you can. If you care about consistent syntax formatting, use a linter or an autoformatter to enforce the format. Then you can ignore it entirely in the review process.

Figure out what to do if consensus cannot be reached. Sometimes the author and the reviewer can’t agree. My personal rule is that if there’s no obvious authority, the author makes the decision. You want to be mindful of power dynamics, though.

Figure out what to do if no-one wants to approve the PR. For example, reviewers may hesitate to approve the PR when they are not very familiar with the codebase. My personal rule is that the PR should be approved unless the reviews can give constructive feedback. The PR won’t get any better by sitting on it.

If many of your PRs seem to need three or more rounds of review, you might want to change some other part of the process than code review:

Having a design discussion before implementing anything helps to ensure that nobody has fundamental disagreement with the approach taken during the review process.
Having clear coding standards helps to avoid unnecessary conflict in the review phase.
Pair coding is more efficient than multiple rounds of review. If it looks like a PR needs a lot of changes, consider pair coding the improvements with the author. Then there are at least two people who think it’s an adequate solution.

See also my other posts about code review.

The power of code review

Thu, 28 Nov 2019 00:00:00 +0000

Last week, Camille Fournier tweeted this:

Questioning the value of mandatory code review is definitely the most popular underground belief held by senior engineers I know

It launched a discussion about the merits of code review and a bunch of people came forward about their bad experiences. For them, code review had become an arena of power games and a place to demonstrate how smart you are.

I’ve experienced code review as a highly valuable practice and advocated for it, but it’s easy to see how it could go wrong.

Sarah Mei has a talk called The Power of Agile. She speaks about how the power differentials, such as the one between juniors and seniors, cause problems in extreme programming practices such as pair programming. Her key point is that the agile practices do not really address these problems at all.

The same goes for code review as it is commonly practiced. It’s easy for reviewers to bring the process to halt if they want to be gatekeepers or if they are just a bit too pedantic for their own good. They can keep asking for more changes, raise more concerns, and nitpick more.

This happens even when people see each other as equals. It must be worse when you’re a member of a group whose expertise gets constantly questioned due to prejudice. And when you’re a reviewer, there’s the problem of having your review comments ignored.

I’m sure everyone who has extensive experience with code review has sometimes felt frustrated with the feedback they’ve gotten, but if it’s a constant source of frustration, something is wrong.

The junior-senior power differential in code review was always obvious to me – I was introduced to code review as a junior developer – but I’ve only just started to think about the more general power differentials. I hope to come back to this topic with more insight later, but I’m going to leave you with one thought and one recommendation.

Thought: Developing software is collaborative team work. Tool-assisted code review is just one of the tools in the collaboration toolbox, along with pair programming. Reviewing is not an end in itself. The goal is not to produce “perfect” code. The goal is to deliver working software. When reviewing becomes power games, it hinders that goal.

Recommendation: Alex Hill has a great post and a great talk about how to give and receive code reviews gracefully. You should follow her advice. It will make your reviews more egalitarian and more fun.

See also my other posts about code review.

Coercing JSON with malli

Wed, 20 Nov 2019 00:00:00 +0000

One of the problems with JSON is its limited selection of datatypes. For example, if you want represent timestamps in JSON, you have to encode them as strings or numbers. There’s no way to tag the specific values as timestamps, so if you’re building an application that consumes such JSON, you have to build a mechanism for coercing the strings or numbers into your programming language’s timestamp datatype.

For example, consider JSON data like this:

{"start":       "2019-01-01T00:00:00Z",
 "end":         "2019-01-31T23:59:59Z",
 "description": "The month of January",
 "tags":        ["month"]}

We’re programming in Clojure, so what we would like to actually have is this:

{:start       #inst "2019-01-01T00:00:00.000-00:00",
 :end         #inst "2019-01-31T23:59:59.000-00:00",
 :description "The month of January",
 :tags        #{:month}}

If you’re building a Clojure web backend and you’re receiving JSON via a HTTP request, your routing library probably has a nice, schema-driven way of handling the coercion. When the data is coming from some other sources, there’s usually no “built-in” solution.

I’ve seen a bunch of solutions to this problem. Sometimes the coercion code is intermingled with business logic. This is not great: it’s hard to tell where and when the data gets converted to the proper datatypes and when it’s just strings. Usually the error handling is not great, either.

A better solution would be to have an explicit coercion function and call it before handing the data to the business logic. Something like this:

(defn coerce [data]
  (-> data
      (update :start parse-timestamp)
      (update :end parse-timestamp)
      (update :tags #(into #{} (map keyword) %))))

This is a good start, but it’d be nice to make this more declarative, just like the API definitions in compojure-api and reitit. It’s not too hard to do this yourself with Schema or clojure.spec, at least if you use spec-tools. However, for the sake of novelty, I’m going to use malli.

Malli is the new data specification library by Metosin. It’s more like Schema than clojure.spec: its main use cases are data validation and transformation at the edges of the program,whereas clojure.spec is focused more on the shape of data inside the program. You can read more in malli’s README. Malli has not yet been released, but I think it’s starting to look promising.

Let’s define a schema, then:

(def Event
  [:map
   [:start inst?]
   [:end inst?]
   [:description string?]
   [:tags [:set keyword?]]])

We can now re-write our coercion function using malli:

(require '[malli.core :as m]
         '[malli.transform :as mt])

(defn coerce [data]
  (m/decode Event data mt/json-transformer))

That’s it. mt/json-transformer is a built-in transfomer that knows how to decode instants and keywords from strings and how to coerce a vector into a set. And now you have a schema for your data, which you can use for validation.

You don’t have to use Malli, but do yourself a favor and do not mix data coercion logic with business logic.

Karhunkierros IV

Thu, 14 Nov 2019 00:00:00 +0000

This post continues from where part 3 ended.

How did I get home from Ruka? I took a taxi to Kuusamo, took a bus to Oulu, had decent palak paneer at Garam Masala right next to the railway station, and took a train to Helsinki. It took me 12 hours in total. Boring and uneventful.

Cooking with Andrew Skurka

I read Andrew Skurka’s blog before the trip and got inspired.

I used his Fast & Light stove setup for cooking: a tiny FireMaple gas burner, a 0.9 litre pot, a 2.5 dl cup and a large plastic spoon. This is simpler and lighter than the Trangia I’ve used before, but I didn’t miss anything. The only downsides were that you couldn’t boil water at while you were eating and the gas burner was a bit slow in the cold weather.

I also tried a couple of Skurka’s recipes:

Peanut sauce & noodles is great and a nice departure from “red hiking meal” genre. A friend tried it out with her scout group. Apparently the taste was too weird and the recipe was too vegetarian for Luvia teenagers. But I, with my more metropolitan taste, will definitely use this recipe again.
Pesto noodles wasn’t anything special. Not awful, but probably won’t use it again.
Oatmeal with fixings was nice. It’s not exactly a unique idea, but I liked the Skurka version.

I didn’t try his beans + fries with Fritos & cheese recipe because WTF even are Fritos, but apparently it’s so good that it’s an Internet meme. Maybe next time.

I assumed that I need 3000 kcal per day and prepared 800 kcal portions for lunch and dinner. This turned out to be too much. I just couldn’t eat such a large amount of food in one go and had to start splitting the portions in half. Next time I will go with 500 kcal meals and add more snacks to have enough calories.

My trip was one day shorter than expected, I split some of the portions, and had one meal at a restaurant, Because of all of this, I’m confused about whether the 3000 kcal estimate was right. It was in the right ballpark, at least. The food weighted about 750 g per day.

In conclusion

I’d like to do a bit longer trip the next time. It took me almost two days to get in the proper hiking mood, but once I was in the mood, I could’ve continued for a couple of days more. I’m going to look for a 120 km hikes the next time.

Would I recommend Karhunkierros? It’s a good destination for inexperienced hikers like me:

You can stay at the huts and the lean-to shelters.
Water is easily available from the river.
The trail is marked well – there’s no risk of getting lost.
The trail is easy to get to using public transport.
Great sights!

The downside is that it’s very popular during the season. Off-season, the winter is harsh, so you need to know what you’re doing.

I think it’s suitable for a 4-6 day hike for people of average fitness. You can adjust the number of days based on your fitness, skill level, and how easily you get bored when you’re not walking. If you’re in a hurry, take part in NUTS Karhunkierros trail running competition and run it in a day.

Karhunkierros III

Thu, 07 Nov 2019 00:00:00 +0000

This post continues from where part 2 ended.

Day 4 - from Ansakämppä to Siilastupa - 22 km

People’s sleeping patterns at the huts were a mystery to me. I had been one of the last to go to sleep and I was one of the first to wake up. People were going to the bed already at eight. Me, the well-known night owl, stayed up until ten and woke up just before seven. Many were still sleeping when I left the hut after eight. I guess people just love to sleep?

My hiking speed was picking up, so I decided to re-evaluate my plans. Originally the hike was supposed to take six days. This day, the fourth day, I’d camp at some lean-to on the way to Pieni Karhunkierros. On the fifth day, I’d reach Porontimajoki and then it’d be a short walk to Ruka on the sixth day. I decided to speed things up and get to Siilastupa that day and finish the hike the next day. Otherwise I’d have to spend too much time getting bored at huts.

The morning was uneventful and I got to Jussinkämppä quickly. Then I hit one of the hardest sections of my hike. The route took me right to the riverside. In the map, it looked like a flat and easy section and I thought I’d blaze through it.

In reality, the path was sloped and it was full of rocks and roots. Climbing over them got tiring quickly. When the trail finally hugging the river, there was steep uphill. I was so tired that I worried that I wouldn’t make it to the hut.

After the hill, the path got easier. Soon I entered Pieni Karhunkierros, a shorter trail that overlaps with Karhunkierros. It’s frequented by dayhikers and the trail is in very good condition. It’s not quite as spectacular as its bigger sibling – there’s so much forest that you can’t see much.

Just before the sunset I reached the Siilastupa hut. The hut is small and there was already a couple with a dog and a German family, but the upper floor of the two-level group bunk bed was still entirely free. The other guests were feeding the fireplace eagerly and for the first time on the trip, it was too warm.

Day 5 - from Siilastupa to Ruka - 23 km

In the morning I chatted with the German dad. He apologized for snoring but pointed out that I hadn’t been entirely quiet either. During the night, I kept rolling and rolling around like I was in rotisserie and my pad is loud. This inspired him to propose a trail name for me: Kebab. I might pick it up if I ever to go hike in the US, where they actually use trail names…

When I started walking, I got lost for the first time. The whitewater lured me off the path to take a picture. After walking 20 meters, I realized that I wasn’t seeing anyrail markings anymore and backed out. That’s how well the trail is marked.

The first 15 km of the day were easy. It was flat, uninspiring forest and marsh. When the final 8 km started, shit got real. First there was a steep, small hill called Konttainen.

I was worried when I was approaching it. It looked steep both in the map and in the real life and to get to it, I had to scramble up rocky slopes packed with ice. Luckily I was greeted by the sight of stairs at bottom of the hill.

I walked up and down. After Konttainen, there’s Valtavaara. It’s steep and long with three peaks. This was the most physically demanding section. You had to be careful to not slip when ascending and descending the icy slopes. I fell over a couple of times but did not hurt myself.

It felt good to reach the day hut on top of Valtavaara. There was fog, so there weren’t any sights to mention, but making a cup of hot chocolate in the hut was a welcome break.

After descending Valtavaara and seeing there was 1 km left, I was feeling pretty jubilant already. I had forgotten that the final kilometer goes over Rukatunturi. It was pretty disappointing to realize that the path goes up the hill and under a ski lift.

One more hill later and just before the sunset, I finally reached the end of the trail. The gate of the trail is not in use, because they have dug out the stairs to make space for a driveway. I scrambled up the hill to walk through the gate anyway.

That was it. The day had been demanding and I could not have done another day with so many hills. I was happy to have made it to Ruka.

I stayed night in the skiing resort hotel. Sauna felt good and the hotel restaurant had a pretty decent vegan burger.

This is the third part of a series about hiking Karhunkierros. Read part 4.

Karhunkierros II

Thu, 31 Oct 2019 00:00:00 +0000

This post continues from where part 1 ended.

Day 2 - From Perttumakoski lean-to to Taivalköngäs hut - 13 km

The night was cold and I woke up before the sunrise. My mornings tend to be slow: I took my time packing up the very wet tent and making some porridge. A hour and a half later, the sun had already risen and I started walking.

The trail follows the Oulankajoki river. It took me to a high hill on the side of the river valley and to the first sight of the trail: the Rupakivi rock in Oulankajoki. In photos, you usually see it from the flat angle, but because the trail goes high, you approach the rock by descending a set of stairs. You get to properly see how weirdly shaped the rock is.

Unfortunately there are a lot of trees and they prevent getting nice photos from the hill. If you want to see it properly, you just have to go there!

Around the lunch time I reached the Savilampi wilderness hut. Before cooking a lunch I climbed up on the north side of the Oulanka canyon. The ascend is steep and you’ll likely come back the same route, so I recommend leaving your backpack at the hut.

The view was spectacular! This was my favorite sight along the trail. You do not have to hike the whole Karhunkierros to see it – there’s a parking spot nearby. It would make a nice dayhike.

I returned to the hut and made some lunch. At the hut I met one of the many families with kids who were on an overnight trip to hike a section of the trail. There were a couple of huskies, too. People love to hike with their dogs, it seems.

The water from the river is potable after boiling. All the huts have gas stoves. My friend Mäkipää borrowed me a tiny canister stove. It’s small and light and it took seemingly forever to boil a litre of water when outside and snowing. In warmer weather, it’s probably great, but I wouldn’t want to take it to any colder situations. I used the stoves at the huts as much as possible.

After lunch, I continued to the Taivalköngäs wilderness hut and arrived there early at around 15. The distance for the day was short, but I was still a bit shocked by the hike so I was happy to stop there. It’s a picturesque place: the hut is next to little rapids and there’s a rope bridge across the river.

Again there were a couple of dads with their kids. Hikers kept arriving. We had a full house for the night and one group of late arrivers even set up a tent in the yard. In theory the people who have been at the hut the longest should make room for the newcomers, but in practice nobody is going to get kicked out in the middle of night.

Day 3 - from Taivalköngäs to Ansakämppä - 17 km

Even though we were sleeping inside a hut, I had a cold night. I woke up grumpy and with a headache. Probably it was because of a some combination of dehydration, caffeine withdrawal, and a smoky hut.

I stepped out of the hut to get some water and was met by a surprise: there was snow. It was a Full HD winter wonderland situation.

After some porridge, I started walking. I could see from the fresh snow that I was the first person to hike the section from Taivalköngäs to the next lean-to shelter that morning. Only a squirrel and some reindeers had been there before me. It was delightful, but still it was one of the hardest sections mentally. After the poorly slept night I was in a really bad mood and even considered giving up once I reach Oulanka Visitor Centre.

Once I finally got to the visitor centre, I decided to have a long lunch and ordered a burger and a couple of cups of coffee. After eating, drinking, and resting for two hours my mood had improved considerably. I started walking again and finally got into the proper hiking state of mind, where you stop thinking about how long it is left and start to just enjoy being there.

Right after the visitor centre there are the Kiutaköngäs rapids. It’s another highlight of the route: a canyon with huge rocks and strong whitewater. You can easily get there by a car and there were a lot of visitors and dayhikers.

After the rapids, the trail was flat and in great condition. I even reached my top speed of 5 km/h. I made it easily to the next hut, Ansakämppä. There I had a dinner and was considering continuing to Jussinkämppä. While I was washing the dishes, a woman walked in and informed us that Jussinkämppä was going to be full of people and dogs. People are fine, but five dogs is a bit much. I decided to stay.

This is the second part of a series about hiking Karhunkierros. Read part 3.

Karhunkierros I

Thu, 24 Oct 2019 00:00:00 +0000

Last week I hiked the Karhunkierros trail. To summarize, it was dark and wet but not very cold.

Karhunkierros is a 82 km hiking trail in Northern Finland. The southern end of the trail is at Ruka skiing resort in Kuusamo and the northern end is in a small village called Hautajärvi in Salla. Usually people start at Hautajärvi.

I wanted to do my first longer solo hike and Karhunkierros seemed like a good choice: the length is suitable for a week-long vacation and the trail is well-marked and has good services. There’s a string of wilderness huts and lean-to shelters and you can get to the trailhead by public transport.

Karhunkierros is the most popular hiking route in Finland. This has some downsides. It’s cramped on-season and the trail has been worn out by the heavy traffic. An upside is that if you get in trouble, there’s a good chance that somebody will walk by.

October is off-season, in theory, except that I picked the autumn school holiday week of Southern Finland. The Ruka slopes had been just opened and there were dozens of skiers dayhiking and doing overnight trips. I stayed three nights in huts and they were almost full every night. If you’re looking for some alone time, you should go somewhere else.

Day 0 - from Helsinki to Oulu

My trip got started on Sunday night. I had briely considered flying to Kuusamo, but you can’t take gas canisters to the plane. Avoiding flying is good for the climate anyway. Thus I went to Helsinki railway station, took a photo in Minuuttibaari, and stepped on the night train to Oulu.

Day 1 - from Oulu to Perttumakoski lean-to – 7 km

The next morning in Oulu, I took a bus to Kuusamo. There’s not much going on in Kuusamo, but I had time to eat a vegan lunch at Karpalo. According to their home page is the most northern vegan restaurant in the world. Is it really? I don’t know. The food was okay.

From Kuusamo, I took a bus to Hautajärvi. It’s marketed as the Karhunkierros bus, but turns out it’s also the school bus for the local kids. There were me, two couples with backpacks, and a dozen of 15-year-olds taking the Karhunkierrobussi. The bus took me right next to the trailhead in Hautajärvi and at around 16:00, I started walking.

The walk took me through some marshes where I first heard and then saw two Siberian jays. To my surprise, they allowed me to get right next to them. Later on I saw multiple groups feeding the jays – not so big surprise after all!

After seven kilometers of walking, I reached the Perttumakoski lean-to shelter right before the sunset at 17:45. I felt that it’d be nice to have a bit more privacy than a lean-to offers and so I pitched my tent. It became dark and it started snowing while I was pitching it and it took me a while to get it right. The tent has color-coded ribbons in the corners, but of course I forgot about this. Pro-tip: if you plan to pitch a tent with the rain fly first so that the inner tent keeps dry, better practice it at home.

I cooked a quick dinner, read a book for a couple of hours and started to sleep.

In the spring, I had bought a new synthetic three-season sleeping bag, Marmot Trestles Elite Eco 20. This was the first night where I tried it anywhere close to freezing. Even though it was barely freezing and the bag’s comfort rating is about 0 °C, I woke up to put on more clothes a couple of times. Honestly, it was disappointing. I’ll be on the market for a nice warm down bag.

This is the first part of a series about hiking Karhunkierros. Read part 2.

The joys of coverage

Thu, 10 Oct 2019 00:00:00 +0000

In the project I’m working on, we have been tracking the test coverage of a Clojure web backend with Cloverage and Codecov. We use Codecov’s ratcheting scheme where every pull request has to have as high form coverage as the master branch.

We’ve had this setup for eight months and we’ve gotten a number of benefits out of it:

The coverage report helps you to check if your tests work correctly. Many times I’ve thought that I’ve written a unit test to run all the code of a function, but the report has revealed that there’s a branch that has not been executed. Either the test is wrong, the code is wrong. In any case I am wrong.
The coverage report can help you to find dead code. If a private function is not covered, it’s not needed for anything.
Sometimes I think that “this code is so simple that it does not need a test”, but then Codecov complains and I write one anyway. These tests have found way more bugs than I’d like to admit.

Because Clojure is a dynamically typed language, even just trying to run your code without checking the results can find bugs.

The downsides

In practice, we sometimes have to merge PRs even though they do not have high enough coverage. Cloverage measures both the line coverage (which lines have code that was executed) and form coverage (which Clojure forms were executed). Getting high line coverage is straightforward, but getting full form coverage is tricky:

Sometimes getting full form coverage is impossible. Cloverage measures the form coverage on macroexpanded code. Many macros expand to code with unreachable branches. For example, consider an assertion:
```
(assert (pos? x))
```
It expands to the following code:

    (if (clojure.core/int? user/x)
     nil
     (do
      (throw
       (new
        java.lang.AssertionError
        (clojure.core/str
         "Assert failed: "
         (clojure.core/pr-str '(clojure.core/int? user/x)))))))

The only way to get full form coverage for this code is to execute the both branches of the if expression. That is, you need to get the assertion to both pass and fail! But since a failing assertion means that there’s a bug in the program, the failure branch should be impossible to reach.

Cloverage is implemented with side-effecting macros, but Clojure is implemented in such a way that a macro call may be evaluated more than once (see CLJ-1407). This causes bugs where e.g. fully covered loop and doseq forms are reported as partially covered.
We have a namespace which is too large to instrument with Cloverage. Since Cloverage works by adding annotations to the code before evaluating it, it can make the code size go over the “Method code too large” threshold.
There are some oddities and outright bugs in Codecov’s reporting, so you can’t blindly trust their reports.

In conclusion

When we started tracking the coverage in February, our coverage was around 50% and it has been floating around 80-85% for the last six months. For select parts we’ve made the effort to maintain it over 90%.

I know some experienced developers believe that 100% coverage is mandatory. I want to believe it but I just don’t know how it should be interpreted in the world of Clojure with all the problems! I’m also hesitant to adopt this practice for open-source projects.

That said, Cloverage and Codecov are straightforward to set up if you already are running your tests via a CI service. If you’re okay with dealing with imperfect coverage measurements, I recommend trying it out.

The hallway track conference

Mon, 12 Aug 2019 00:00:00 +0000

If you have ever talked to an experienced software developer about tech conferences, you have probably heard them say something like this: “Oh, I mostly ignore the talks and focus on talking with people in the hallway.”

I’m one of those people. It’s a bit sad: the speakers have spent a huge amount of effort to prepare their talk and we travel there from the other side of the world to ignore them.

A week ago I attended the Heart of Clojure conference in Leuven, Belgium. What was so great about it was that they had plenty of time and space for the hallway track and not too many talks¹.

They had long breaks, including the extra-long siesta break in the middle of the day, and non-talk activities such as a sketchnote workshop, meditation, and a screen-printing workshop (I was delighted).

In the evening there was the Adventurous Dinner, where the participants were randomly split into groups of eight or so people and sent to different restaurants around the city center. This was great for meeting a new group of people and chatting with them about the day. Afterwards we congregated to a noisy bar for a more traditional conference after-party.

The result was that everybody was talking to each other. People made new friends and seems like everybody was having a great time. There was a real feeling of community. Often at conferences you have to be a speaker or otherwise an insider to get the full experience. This time everybody got the insider experience.

I’d recommend the conference but understandably it looks like that Heart of Clojure won’t happen again.

Out of the talks I attended, my favorite was the keynote by Rachel Lawson: My opensource project; It’s all about the code. So topical for the Clojure community! ↩︎

Interpreting Moriyama

Wed, 24 Jul 2019 00:00:00 +0000

Daido Moriyama is one of the best-known Japanese photographers. So I read in his and Takeshi Nakamoto’s book How I Take Photographs.

His name did not ring any bells but I’ve seen his photos before. I doubt the photos in the book are his best work but at least the interviews are interesting. He talks about how he only takes snapshots.

Based on the book I couldn’t understand why he would be the most important Japanese photographer. Where’s the appeal? This inspired me to try to imitate his style. Here’s how I interpreted it:

If you see something interesting, snap a photo.
If in doubt, snap a photo.
Go closer.
Impressions are more important than technical perfection.
Movement is okay; sharpness is not important.
All photography is copying, anyway. Photos of posters are fine.
It’s okay for the photographer to be visible.

Click here to view the resulting series.

The hardest commandment to follow for me was to go closer - especially since I shoot with such a wide lens. I’m going to keep practicing.

Ricoh GR III - initial impressions

Sun, 26 May 2019 00:00:00 +0000

I’ve got a new camera: Ricoh GR III. I got it when it first became available in early April and since then I’ve shot about 2500 frames with it.

I haven’t explored it enough to offer any kind of conclusive review. Heck, I’ve mostly shot in P mode with JPEG output with default settings! Still, I want to share my initial impressions and some pictures I’ve taken with the camera.

A bit of background: GR III is a fixed lens compact camera with a fast, 28 mm-equivalent lens. It’s a successor to GR II, which has cult following in the street photography circles.

Good. The build quality and the form factor is great. The pictures are very sharp and the colors are nice (but maybe not as nice as Fujifilm’s). It fits well into a pocket. The touch screen is great for choosing a focus point. It can be charged over USB-C. The exposure compensation joystick is quick to use.

Not so good. The battery life is so-so – get an extra battery! I get maybe one day worth of travel photography out of one charge. The P mode skews too much toward wide-open aperture for my taste. Manual focus is cumbersome (but there’s the snap focus mode).

Autofocus is usually fast but it has problems with low-contrast scenes. There’s a firmware update that promises to improve the performance. I’m deferring my judgement until I’ve installed the update.

I’m carrying the camera with Peak Design’s Leash sling strap. To get the anchor cords through the attachment holes in the camera body, you have to place them just right and use enough force and a piece of string (you need to do this only once). Leash is light and easy to adjust and slides well over my clothes, so I’m pretty happy with the setup.

In summary, Ricoh GR III is a great pocketable choice for travel and everyday camera if you’re okay with the battery life – and if the idea of fixed 28 mm-equivalent lens makes sense to you in the first place.

All photos in the post have been shot with Ricoh GR III.

Handbrewing coffee

Wed, 06 Mar 2019 00:00:00 +0000

Shawn Blanc wrote about how he switched to a Moccamaster for brewing his morning coffee. It’s a nice post about productivity, but it made me want to talk about coffee.

We hand-brew all¹ the coffee we drink at the office with a drip cone. We have been doing this for a couple of years already. I like it, but out of courtesy towards my colleagues I’ve floated the idea of buying an electric coffee maker a couple of times. So far they have preferred to continue with hand-brewing. But why bother?

It’s not because of the taste. I don’t think that I could tell apart my hand-brew and well-made batch brew in a blind test.

I like the ritual, and the exercise in patience. First you weigh² and grind the beans while waiting for the way-too-slow kettle to boil the water. Then you pour and wait and pour and wait. Then you wash the cone and finally you get to taste the coffee.

The drip cone maintenance is easy, too. Washing the cone after each use takes only a couple of seconds. The fundamentals of good coffee are freshly-ground beans and clean equipment. Taking care of the latter couldn’t be easier. And hand-brewing makes it easier to not drink too much coffee, because the coffee is not there just waiting for you to overdose.

On the other hand, using an electric coffee maker would free me from pouring the water. I brew coffee maybe once per day, so that would take me about 3.5 minutes a day. There are about 200 working days in a year, so that amounts to almost 12 hours. It takes something like six hours to read a medium-size novel. In a year I could read two more books during my breaks instead of staring at coffee. The choices!

We do have an espresso machine, but we use it maybe once a month. Also we buy beans that work great for filter coffee and turns out that not all of them are great for espresso. ↩︎
My standard recipe is 15 grams of coffee to 250 grams of water. ↩︎

Revisiting Clojure testing

Tue, 29 Jan 2019 00:00:00 +0000

Two years ago I wrote about the Clojure test runner of my dreams. Back then, I asked for a Clojure test runner that would support clojure.test and that would have the following features:

Output catching to make noisy test suites quiet.
Test tagging for selectively running tests.
JUnit output for integration with CI tools such as Circle.
Test slowness reporting to speed up slow test suites.

Let’s see where we are now.

After my post, Eftest soon got all the features. It’s a great library and while the included Leiningen plugin is quite rudimentary, bat-test offers a feature-rich wrapper for use with Leiningen and Boot.

When Arne Brasseur announced that he is developing Kaocha, I didn’t see much point in a new test runner. However, I was working a new project that used tools.deps as the dependency management tool. Since bat-test does not have a clj -m -compatible version, I decided to give Kaocha a go.

After using Kaocha for a couple of months, I have to admit that Arne has created a something worthwhile. Kaocha offers test running with all the above-mentioned features and more (e.g. watcher, Cloverage integration) in one coherent, well-documented package. Furthermore, Kaocha’s design allows easy extensions. For example, it was super-easy to create a plugin that toggles on Orchestra instrumentation.

If you’re looking for a new test runner, definitely check out Kaocha.

Side-note: Arne’s work on Kaocha has been funded by Clojurists Together. Their funding has paid for a number of improvements in Clojure projects you’re most likely using. The money comes from individual and corporate sponsors. It seems like a great way for Clojure-using companies to fund work on the tools that their developers use. I’m happy that Metosin has been a sponsor since the beginning and I’m hoping to see more companies on the member list.

What’s next?

Test runners are good now. Does this mean that the Clojure testing landscape is perfect now? Well… I see at least two areas of improvement.

ClojureScript unit testing. Doo (which I co-maintain) is tricky to set up, I’ve had a number of problems with boot-cljs-test, and using Karma directly is a lot of work. Getting my list of dream features to work is possible but tedious.

Luckily there are two new developments: Figwheel Main has built-in unit testing support and Kaocha has kaocha-cljs. Both look promising. This is what I’m most excited about Kaocha – I’m hoping that Kaocha manages to bring the same turn-key experience to ClojureScript that it already has for Clojure.

Assertion libraries. In clojure.test the assertion are written with the is macro, which isn’t very expressive. The main selling point of Midje was its powerful syntax for writing assertions. Unfortunately that power came with the cost of a complex implementation.

Apart from writing your own predicates, what are the current options? I’m aware of testit (my go-to choice), iota, and matcher-combinators. Each of them is significant improvement over is macro, but I don’t love any of them. I guess I need to come up with a list of features for the assertion library of my dreams!

Yearnote 2018

Tue, 01 Jan 2019 00:00:00 +0000

It’s time to look back at 2018 and talk about me.

On being a professional

The biggest thing for me personally was finishing my master’s thesis and graduating as a Master of Science. I’m glad it’s finally done. After graduating I continued working at Metosin as a software developer. In autumn, I joined Metosin’s board of directors.

At work, I took more responsibility on project management. It felt meaningful even if it always wasn’t fun. In general, I feel like I’ve made more mistakes lately. I reckon this is a good thing: either my job has become more challenging or I’ve become better at recognizing mistakes. Both mean more learning.

Here are some things I learned from, other than making mistakes:

Camille Fournier’s book The Manager’s Path helped me to understand how engineering management works.
The Lead Developer London conference was useful as well.
Zach Tellman’s Elements of Clojure and John Ousterhout’s A Philosophy of Software Design are good treatises on software design in the small.

On having a life outside work

These were fun:

I started playing piano again after a decade-long break.
I made a backpack.
A year ago I hoped to hike and sail more and do more yoga. My hiking plans fell apart, but I did have a great sailing trip in the summer and practiced yoga almost every week!

Some cool cultural works:

On Body and Soul was a great, subtle movie.
N. K. Jemisin’s Broken Earth trilogy is solid and topical. Recommended to anyone who reads sci-fi.

What about my plans for 2019?

I’m planning to level up my thoughtleader game this year, so expect more blog posts. I’m back to a situation where it would be sustainable to consistently blog.
I’m giving a lightning talk at the :clojureD conference in Berlin in February. If you’re attending, come to say hi! :)
I hope to give a full-length talk at some other conference later this year. Working on it!

While 2018 was a pretty okay year for me, I know it was a hard year for a lot of people both on the personal and the societal level. Frankly, I’m not feeling optimistic about the politics. It will get worse before it gets better.

Finally, a year ago I wrote this:

I can’t believe it’s 2018 and Juha Sipilä’s cabinet still hasn’t fallen apart.

Unfortunately it’s 2019 and Sipilä’s cabinet still hasn’t fallen apart, but at least the parliamentary elections are coming up in April.

How I use tap>

Thu, 18 Oct 2018 00:00:00 +0000

One of the new features in Clojure 1.10 is tap. The changelog describes it as follows:

tap is a shared, globally accessible system for distributing a series of informational or diagnostic values to a set of (presumably effectful) handler functions. It can be used as a better debug prn, or for facilities like logging etc.

tap> sends a value to the set of taps. Taps can be added with add-tap and will be called with any value sent to tap>. The tap function may (briefly) block (e.g. for streams) and will never impede calls to tap>, but blocking indefinitely may cause tap values to be dropped. If no taps are registered, tap> discards. Remove taps with remove-tap.

I’m already using it as a better debug prn! I’m using Cursive and I connect to a REPL launched by Boot. With my setup, (prn :DEBUG value) has two potential downsides.

The output may go either to the Boot terminal or to the Cursive IDE depending on the code path.
The output is not pretty-printed.

Tap allows me to solve both problems. I want my debug prints to always appear in Cursive’s REPL, so after starting the REPL, I add a tap handler by running this command:¹

(add-tap (bound-fn* puget.printer/cprint))

Here bound-fn* ensures that the output goes to Cursive and not to the terminal. Puget is the pretty-printer I’m used to, but you can replace it with your favorite printer. If you do not want to add new deps, you can use clojure.pprint or even plain old prn:

(add-tap (bound-fn* clojure.pprint/pprint))
(add-tap (bound-fn* prn))

Now when I want to debug-print something, I do (tap> "hello world"). Since both tap> and add-tap are in clojure.core, I don’t need to require anything. I can just tap> away.

Another debugging trick is to store the tapped value in an atom. I’ve used this only once so far, but it was pretty handy. Setup:

(def debug-a (atom nil))
(add-tap #(reset! debug-a %))

Now I can tap> an intermediate value in the middle of some complicated code and then poke at it in the REPL via @debug-a. Ideally you’d use a debugger, but if you are in hurry, maybe tap is enough.

Thanks to Wade Mealing for feedback on this post.

~~I don’t know how to do this automatically. If somebody knows, please tell me.~~ You can do it automatically by adding it to user.clj. ↩︎

Clojure libraries I recommend

Sun, 14 Oct 2018 00:00:00 +0000

Every now and then I see people asking what Clojure and ClojureScript libraries they should be using. In this post, I’ll share my list of libraries for building full-stack Clojure/ClojureScript web applications. I’ve been building this kind of applications for a while now and I believe you can rely on these libraries.

Here’s what I’m looking for:

Robustness. I don’t want to debug buggy libraries – I write enough bugs of my own.
Extensiblity. With a long-lived application, you’ll end up wanting all kind custom things that the library authors didn’t anticipate. Data-driven design helps here.
Implemetation is easy to understand. Eventually I’ll read the source anyway. Either there is a bug or there is no documentation.

This list is not meant as the final truth: there are plenty of good libraries out there and you might choose different ones based on your needs and preferences. There’s an obvious bias, too: many of these libraries were made by my colleagues at Metosin.

Clojure backend

Framework. I appreciate the data-driven nature of Integrant. Combine it with Integrant-REPL for reloaded workflow.

JDBC database connection pool. I’ve never had any problems with HikariCP and it’s easy to configure and instrument. hikari-cp is a handy Clojure wrapper.

Configuration. I use Maailma and I’ve written about how to use it. cprop is very similar and probably a bit more powerful.

HTTP routing. I like the data-driven nature of reitit. It’s fast, too, and has enough extension points so that anything you want is possible. I would not use compojure-api because the implementation is way too hard to understand.

Logging. I use Timbre because it’s so easy to write custom appenders for it. Admittedly I haven’t checked out the popular Java options.

Test runner. Eftest is the test runner of my dreams and you can’t go wrong by using it. At some point I want to give Kaocha a go.

Test assertion library. I like testit and Juxt’s iota. They’re adequate, but I think there’s room to do better here.

Test data generation. I only just started using specmonstah for generating graphs of test data, but I’m excited. It’s so much nicer than writing the equivalent code by hand. There’s a bit of learning curve, though.

ClojureScript frontend

Build tool. Figwheel works very well and its getting better all the time. I’d either use it with Leiningen or directly with clj.

Front-end framework. re-frame is the way to go. Check out re-frame-10x for debugging.

Translations. Tempura works for me. I use a small Clojure tool for extracting the translatable strings into a PO file for editing with Poedit. The tool is proprietary right now but I hope to open source it.

General tools

clojure.spec helpers. Expound makes the spec error messages human-readable and Orchestra instruments the function return values.

Data manipulation utilities. I use the combination of Potpuri, xforms, and Specter. Potpuri is an old-fashioned “missing parts of clojure.core” library and xforms is the same for transducers. Specter is the closest thing to lenses for Clojure. It’s powerful but takes some time to learn. I recommend trying it out for some simple tasks – soon you’ll see opportunities for it everywhere.

What about …?

I haven’t included any SQL libraries, HTTP clients or servers, or async libraries, because I don’t have a clear recommendation for any of these (important!) categories.

I made a backpack

Tue, 21 Aug 2018 00:00:00 +0000

For a long time, I used a Haglöfs Tight Evo XL as my everyday backpack. It was not optimal: it’s a bit too large for my everyday needs, but a bit too small for extended trips. It’s not too stylish, either. However, I didn’t want to spend money on a new backpack. Luckily there is a simple solution: make your own gear.

My main inspiration was the DIY IKEA backpack which is made out of IKEA shopping bags. We didn’t have any spare IKEA bags, but there was a worn-out Clas Ohlson bag which is made of similar material. My girlfriend’s colleague gave me a piece of parachute cutting waste and a broken Haglöfs backpack for scavenging webbing and buckles. I was set for materials.

I came with a list of design constraints:

Should have enough space for a 15" laptop and headphones, but not much else.
Needs to have a convenient place for a bicycle U-lock.
Must look cool!!
Should be simple enough so that I can actually make it.

I realized that I have no idea of what I’m doing. In this kind of situations my usual solution is to copy from others. Thus I started by taking measurements from a Kånken and modifying them to accommodate a 15" laptop.

I wanted to have a top-loading pack and zippers seemed expensive and tricky to sew, so I opted to have a drawcord closure with a lid on top. I also added open pockets to the front and the sides. I don’t like limp packs, so I added an internal pocket for a framesheet. I drew up a pattern and found out that there’s just enough fabric for a one bag. Perfect!

At first I thought that I would make the shoulder straps out of the webbing scavenged from the broken backpack. That didn’t work because there wasn’t enough of it. Then I realized that I could just use the old backpack’s shoulder straps as-is and get comfy straps for free.

It took me one Saturday to make most of the pack and then a couple of evenings to finish it. I did end up buying a cordlock, some cord, and a piece of cardboard to be used as a framesheet. Total budget: about 6 €.

I made the pack in May and I’ve been using it almost daily ever since. I’m really happy how well it turned out. The size is just right and I like the looks. The side pockets are great for the U-lock and as a bonus feature, the framesheet pocket is great for transporting stacks of paper.

I’ve had to fix it a couple of times. I didn’t know how to attach the shoulder straps and as a result, they’re falling off. Unfortunately it’s hard to fix without taking the pack apart. The material wasn’t as sturdy as I thought. It should have been folded for reinforcement in the places with most stress.

Anyway, making a backpack was fun, easy, and rewarding. I’ve made a pair of pants before, but making clothes is hard because they need to actually fit you. With bags, the exact fit hardly matters.

I’m not much of a maker, but it’s great to make something concrete every now and then. It also made me appreciate the high quality of factory-made backpacks more – they might be worth the money after all.

Fully automated releases

Sat, 11 Aug 2018 00:00:00 +0000

Have you ever contributed a patch to an open-source project, got it merged, and then waited months for a new release that would include your patch? Me too, reader, I’ve been there.

Moreover, I’ve been the maintainer who hasn’t gotten around to cutting a release. Cutting releases is a chore. Usually it’s a fragile, multi-step process that is not especially fun.

As programmers, what is our answer to fragile, multi-step processes? We automate them.

How to do it

When creating a release, there are a couple of steps where human input and human judgement is needed.

When to create a release?
How much should the version number be incremented?
What to write in the change log?

There’s an automation-enabling answer to the first question that is familiar for many developers from their work environment: embrace continuous delivery and continuous deployment. Each pull request should leave the project in a state where it can be released. Then you can automatically create a release every time you merge a pull request.

You still need a human to answer the second and third question, at least if you have a conventional versioning scheme. To move the burden away from the maintainer, you can ask the contributors to fill in this information during the contribution process.

I know of two actual implementations of this: Hypothesis continuous release process and semantic-release. Hypothesis asks you to include a special file in your pull requests that looks like this:

RELEASE_TYPE: minor

This release adds a function for printing the text "Hello, world!".

semantic-release relies on specially-formatted commit messages:

feat(core): add function for printing ”Hello, world!”

Here feat means this commit adds a new feature, implying a minor release if you’re following Semantic Versioning.

In both cases, after a pull request has been successfully merged, the CI server will read this information, update the change log, increment the version number, and push a new release to the package manager. As a contributor this means that if your patch gets merged, it will be released.

What about Clojure?

I’m not aware of anyone doing this in the Clojure community¹ , but I believe it would be beneficial. There are a lot of small projects that would get contributions but the maintainers are not around to merge and release them. Automated releases would make the work of the existing maintainers easier and it would also make it simpler to onboard new maintainers.

I have implemented a proof-of-concept version of the Hypothesis process for cache-metrics, a small library of mine, but I haven’t yet dared to introduce it to any ”real” libraries. Many actively developed projects would need to change their ways of working as you couldn’t just merge or commit random things to master.

I hope this post acts a starting point for a discussion. What do you think?

cljsjs/packages kind-of does it, but it is a a package repository and not a library. ↩︎

Migrated to Hugo

Sun, 05 Aug 2018 00:00:00 +0000

After five years of using Hakyll to generate this site from a bunch of Markdown files, I finally migrated to Hugo. Hakyll is pretty cool, but using it makes only sense if you’re using Haskell for other stuff as well. I haven’t touched Haskell lately. Alas. Hugo has everything I need built-in and seems to work well enough.

I’ve tried to keep all the important URLs same or add redirects when that is not possible. Feed readers might do something weird with the updated feed, though. Sorry about that. If something is broken, please let me know.

Why interceptors?

Fri, 03 Aug 2018 00:00:00 +0000

At Metosin, we have been thinking about interceptors in Clojure and ClojureScript. We’ve thought about them so much that in fact we made our own interceptor library for Clojure, called Sieppari¹. To understand why, let’s take a look at the pros and cons of using interceptors instead of middleware.

Interceptors are a Clojure pattern, pioneered by the Pedestal framework, that replaces the middleware pattern used by Ring. In Pedestal, they are used for handling HTTP requests, but they can be used for handling all kinds of requests. For example in re-frame they’re used in handling web frontend events such as button clicks.

At Metosin, we’ve used them in a bunch of projects and we’re developing Sieppari to be used with reitit, our (latest) HTTP routing library.

Let’s work this out

In Ring, a HTTP request handler is a function that takes a request map and returns a response map. Something like this:

(defn my-handler [request]
  {:status 200,
   :headers {"Content-Type" "text/plain"},
   :body "hello!!"}))

To enhance the behavior of the handler in reusable way, you can wrap it with a higher-order function that takes the handler as parameter. This can be used to implement features like content encoding and decoding and authentication.

For example, here’s a debugging middleware that prints the incoming request map and the outgoing response map:

(defn print-middleware [handler]
  (fn [request]
    (prn :REQUEST request)
    (let [response (handler request)]
      (prn :RESPONSE response))))

The good thing about middleware is that they’re simple to implement: it’s just a Clojure function and you can use standard constructs such as try-catch. They’re fast, too.

The problems start when you try handle asynchronous operations. Ring specifies async handlers as functions that take callbacks for sending a response and raising an exception as arguments. We have to write a separate version of our debugging middleware for asynchronous handlers.

(defn debug-middleware [handler]
  (fn [request respond raise]
    (prn :REQUEST request)
    (handler request
             (fn [response]
               (prn :RESPONSE response)
               (respond response))
             raise)))

Using the same code for the synchronous and asynchronous handler is tricky and error handling gets difficult.

Interceptors offer a solution: you split the middleware in two phases², :enter and :leave (or :before and :after as they’re called by re-frame). :enter is called before executing the handler, :leave is called afterwards. Both phases get a context map as a parameter and they return an updated context map. The request is under the key :request and the handler’s response is put under :response.

(def debug-interceptor
  {:enter
   (fn [{:keys [request] :as context}]
     (prn :REQUEST request)
     context)
   :leave
   (fn [{:keys [response] :as context}]
     (prn :RESPONSE response)
     context)})

Middleware can be composed by nesting function calls. With interceptors that does not work, so you need to have an executor that takes a chain of interceptors (called a queue) and executes them in order.

A cool thing you can now do is that if your interceptor returns an asynchronous result (a deferred or core.async channel for example), the executor can wait for it, and if the interceptor returns a synchronous result, the executor can act on it directly. This allows you to use the same interceptors for synchronous and asynchronous operations. The downside is that the executor is bound to be slower than nested function calls.

Another downside is that structures like try-catch and with-open do not work anymore. To allow proper error handling, interceptors have an optional :error phase that gets called if any of the inner interceptors throws an exception.

The queue as data

Middleware do not have to call the handler. For example, an authorization middleware may decide that a request is not authorized and instead of calling the handler, it returns an error response.

Interceptors go further: the remaining queue and the stack of the already-entered interceptors are exposed in the context map and you can manipulate them. If your authorization middleware wants to return early, you can (assoc context :queue []). Another example is that you can have a routing interceptor that pushes route-specific interceptors and a handler to the queue.

Finally, since your interceptor chain is now data instead of middleware-nesting code, you can do fancy tricks like dynamically re-order interceptors based on the dependencies between them. angel-interceptor is an implementation of this and Sieppari supports it as well. I’m a bit skeptical about whether there are real use cases for this, but it’s there if you need it.

Summary

Interceptors allow easy mixing of synchronous and asynchronous code.
Interceptors expose the queue and call stack as data, which gives you a fine-grained control over the execution.
Interceptors prevent you from doing error handling with try-catch – not that it would work well with asynchronous code anyway.
Interceptors are probably a bit slower than middleware.

It’s not yet ready for production. At the time of writing, the latest release is 0.0.0-alpha5. ↩︎
You can do this without interceptors, of course. See e.g. how the session middleware is implemented in the Ring codebase. ↩︎

Lead Developer London 2018

Thu, 05 Jul 2018 00:00:00 +0000

Last week, I attended The Lead Developer conference in London. It’s a conference about being a technical lead. The talks were a mix of general leadership advice (it’s all about feedback!), tech lead specifics (focus on the high-impact stuff in code reviews!), tech (ReasonML is cool!), and self-help (remember to exercise and meditate!).

I decided to attend the conference because I’ve ended up taking on some tech lead responsibilities and I’ve felt that I don’t know what I’m doing. Spending two days listening to talks about tech leadership seemed like a good idea.

The talks repeated the fundamentals: getting and giving good feedback is a key activity, you need to have empathy, and you need to be able to articulate why you’re doing things in addition to how. The hard part, of course, is putting these ideas in action.

The most interesting talks

I have this taxonomy of conference talks that I like. It subjective since it depends on what you already know. Here it is anyway:

Clarifying talks. These do not necessarily teach you new things, but they make you see the things you already know more clearly and connect them to other ideas.
Inspiring talks. These make you want to do things and do them well.
Toolbox talks. These talks give you actionable ideas to use in your work. Great if the ideas are new to you, potentially boring otherwise.
Tour talks. These talks walk you through a topic by covering a lot of ground with way too many details in way too little time. The audience might get confused, but I like that.

Here are my highlights from the conference in each of the categories. Unfortunately the videos haven’t been published yet.

Clarifying talks:

Alex Hill’s talk on code review. She talked about how defensiveness can get in the way of effective code reviews and what to do to avoid it. See Alex’s slides and her blog post. My commandments for code review were inspired by similar ideas.

Inspiring talks:

Tara Ojo’s talk on mentoring junior developers. I’ve heard so many stories about early-career people getting hired but then forgotten by the company. Without help and guidance, there’s a big chance that they won’t succeed and then they either leave or get fired. This is waste of everybody’s time. See Tara’s slides, she has good suggestions on what to do.
Alicia Liu’s talk on how to navigate your job when you get promoted to a new management position and have no idea of what you’re doing.

Toolbox talks:

Kevin Goldmiths’s talk on techniques for 1:1s, team meetings, etc. I haven’t got much experience with these, but his methods sounded good. See Kevin’s slides.

Tour talks:

Jenny Duckett’s talk on how to build sustainable teams. There was way too much information for me to process in one go, but what stuck with me was that you and your team should make a habit of sharing information about your work frequently to prevent the silo-ing of knowledge. For example, actually write down why you decided to do something. See Jenny’s slides.

On the event

Usually the conferences I go to have been organized by people who are not professional event organizers. The Lead Developer conference is organized by an event organizing company and it showed in a good way: everything was smooth.

My only complaint is about the venue. The lobby was so noisy during the breaks that it was hard for me to hear what people were saying. This obviously makes having a conversation hard and so I didn’t get much of talking to people. Otherwise the Barbican Arts Centre was a pretty good venue. Even though there were something like 1100 attendees, it didn’t feel crowded at all.

Some nice touches that would be cool to see at other conferences:

Live captioning. All the talks were captioned live: next to the big screen, there were smaller screens where the stenographer’s text was projected. Obviously this helps the people with hearing difficulties, but it’s great for everybody. If you lose the context for a second or if the speaker uses weird words, you can look at the captioning and get back on track.
Speaker office hours. Instead of a Q&A session after each talk, there was a room where you could meet the speakers during the break after their talks and ask them questions. The one-on-one conversations were interesting and you didn’t have endure any “this is more of a comment than question” takes.
They played kick-ass music when the speakers entered the stage. Really gave me feeling that we’re now going to hear from an amazing person. I assume it made the speakers feel cool, too.
They had a quieter chill-out area. There were guided mindfulness sessions, and while I’m not into guided meditation myself, the sessions ensured a peaceful atmosphere.

In general, I’d recommend the conference if you’re the kind of person who likes conferences, you’re interested in this topic, and your employer foots the bill.

A night in Nuuksio

Tue, 15 May 2018 00:00:00 +0000

Last summer, inspired by Rich Hickey and the general trends, I bought a hammock. I didn’t end up using it much: I only slept half a night in it. I had pitched it in the attic of a summer cottage, but there was no sleeping pad or underquilt. Even indoors the convection was so bad that my butt froze. I spent the rest of night in a bed.

Still, I wanted to get a full night’s sleep in the hammock outdoors! I don’t have a tarp and the EN13537 lower limit for my sleeping bag is 9 degrees Celsius (basically it’s a summer-only sleeping bag) so the weather would need to be perfect: warm, no rain, not too much wind.

Last weekend, the weather forecast promised exactly that! The night temperature would be balmy 8 degrees Celsius, the sky would be clear and there wouldn’t be any wind. I’ve never spent a night alone in the forest, but it was time to cross that off my bucket list.

I left home in the evening and after taking a metro, a train, and a bus, and hiking a couple of kilometers, I arrived to the northern Iso-Holma campsite in Nuuksio. It was already past eight o’clock and the sun would set at 21:48. I set up my hammock by the nice little lake and cooked some dinner.

Turns out that the a Trangia lid (at least a one that got bent) sucks for frying eggs. The eggs is stick to it and cook unevenly. I forgot the spatula home and the one I improvised from a piece of wood sucked as well. A lesson learned for the next time: either bring a spatula or learn to carve.

After the dinner, I settled in the hammock to read and to listen to the birds. There were cuckoos, woodpeckers, and woodcocks. I heard some raptors that I couldn’t identify and a black grouse, which I haven’t recognized before! A couple of whooper swans visited the lake, but luckily they didn’t spend the night there – they were extremely loud. And in the morning I heard what must have been a fox. All in all, it was a pretty good night for observing wildlife by ear.

Camping in Nuuksio on a warm spring weekend is not exactly a unique idea. In the national park, you’re only allowed to camp on the designated camping sites and each one I passed was full of people. I’m not sure if it counts as being alone when there are dozen tents in your vicinity. In addition to the birds, I got to listen to a computer science student praising CS and weed on the other side of the lake.

Sleeping in the hammock was… okay. While I didn’t freeze, my sleeping bag was clearly a bit too cold for the weather. They say that you should sleep in a slight angle in the hammock to avoid the banana shape. It was hard to get the inflatable sleeping pad nicely in that angle, so I ended up as a banana anyway. Next time I will try a foam pad – it’s more practical anyway.

Still, it was cool to sleep under the stars¹. I’ve always felt that you have to have a tent to sleep in the forest, but that’s just not true.

I didn’t see any stars. ↩︎

Edit clipboard contents in Vim

Mon, 07 May 2018 00:00:00 +0000

Wouldn’t it be handy to be able to edit the contents of the clipboard in a text editor? Yes, it would, or at least I do it all the time. For example, I do it when I want to copy text from one website to another website, but I need to reformat it a bit first. For macOS, I have a script called pbedit. It is super-simple:

#!/bin/bash
pbpaste | vipe | pbcopy

vipe is a small program that launches $EDITOR and allows you to edit the data piped between two programs. It’s part of moreutils, which Homebrew users can install with brew install moreutils. pbpaste and pbcopy are the built-in macOS command-line tools for pasting and copying the clipboard.

Try this, or if you’re a Linux user, fashion its equivalent with xclip. Soon you’ll find yourself using it all the time.

How to write a talk proposal

Mon, 19 Mar 2018 00:00:00 +0000

There’s a tech conference or a meet-up coming up and you want to give a talk. Thus you will have to write a proposal with a title and a description for your talk. What should go into it?

Let’s recap the goals of your proposal. There are at least two:

Convince the event organizers to pick your talk.
Convince the event attendees to come to your talk.

The goals are well-aligned: the organizers want to have a line-up of talks that makes the potential attendees excited.

You will need to do some selling and I’ll leave that part up to you. That said, there’s some basic information that you should always include. When I’m reading a proposal, I’m always trying to answer these three questions:

What is this talk about? This includes how you will talk about it. For example, if you’re going to talk about a technology, I want know to if you’re going to give a conceptual overview or do a deep dive into hairy details, or something else.
Why should I learn about this? It’s cool to learn about new stuff, but I’m not going to be excited about your talk if I don’t have a clue about why it’d be interesting for me.
Who is this talk for? If I’m completely new to the topic, can I still get something out of your talk? Or if I’m an expert, will I immediately get bored?

There’s no magical recipe for the perfect talk proposal, but addressing these questions should get you started. Keep in mind that there are plenty of “right” answers. For example, it’s fun is a legitimate answer for the second question for many events.

A lot has been written about this topic. If you want to read more, I recommend starting with these two links:

Lena Reinhard’s very comprehensive article How to prepare and write a tech conference talk. She uses a similar list of questions!
The We Are All Awesome page on How to write a compelling proposal.

Name this conversation pattern

Mon, 22 Jan 2018 00:00:00 +0000

There’s an annoying conversational anti-pattern for which I’d like to have a name. It goes like this:

Person A makes an argument.
Person B disagrees with the argument.
Person A assumes that person B simply didn’t understand the argument properly and re-states it more elaborately.

The crux is that it does not occur to person A that person B could legitimately disagree with them. This leads to frustrating discussions for both sides. Sometimes this is done deliberately to derail conversations, but I’ve seen people do it in good faith.

I associate this pattern with highly-privileged people arguing with marginalized people, especially if the highly-privileged person is committed to the status quo but at the same time wants to be seen as an ally. You can see this happening on Twitter every now and then.

I can’t believe I’m the only one noticing this pattern. If you know a good name for it, please let me know ([email protected]">e-mail me or tweet at me).

Yearnote 2017

Sat, 06 Jan 2018 00:00:00 +0000

It’s January, so it’s time to look both at the past and at the future! Here’s some good stuff that happened to me in 2017:

I finally graduated as a BSc in mathematics and made good progress towards graduating as a MSc.
I had a relaxing summer vacation that included hiking and sailing.
I attended ICFP. I’ve wanted to go there for ages and finally had the chance. It was the most interesting conference I’ve been to.
In the fall, I started practicing ashtanga yoga. It turned out to be a good combination of exercise and light meditation.

My thinking about software development evolved in small but important ways:

Nowadays I’m more concerned about whether our team does good work than whether I do good work personally. Might be a sign of maturing as an engineer.
I no longer care about Haskell. Being a Haskell programmer used to be a part of my ego, but it’s time to admit that I don’t find Haskell interesting anymore. ICFP made this clear. I’m over being a <programming language X> person.

Here’s what I hope 2018 will bring along:

More good things: graduating as a MSc, more hiking, sailing, and yoga! More time outdoors!
I’d like to figure out how to publish writing regularly and sustainably. Blogging every week didn’t work for me, so I stopped it. I did continue to write, but I didn’t publish anything – that wasn’t useful either.
In general, it’d be cool to ship more things. I’ve always been good at understanding things, but it’s not useful unless you somehow reflect that understanding back to the world.

I enjoyed some things in 2017:

The best book I read was Anna Karenina by Leo Tolstoy. It covers universal themes like love, marriage, death of a loved one, and birth of a children. I felt that Tolstoy did a good job describing how differently various characters feel about the same events. I recommend it to everyone!
Good Life Coffee’s Kayon Mountain was excellent, we drank a lot of it at the office.
Umami, the sushi place in Tampereen kauppahalli, is great! I’m not a sushi connoisseur, but it’s the best sushi I’ve had in Finland.

Finally, I can’t believe it’s 2018 and Juha Sipilä’s cabinet still hasn’t fallen apart.

Secure Scuttlebutt: some technical details

Sat, 30 Dec 2017 00:00:00 +0000

I’ve poked a bit at Secure Scuttlebutt (SSB). It’s a gossip protocol for syncing append-only cryptographically verified feeds. Its main application is social networking – I recommend giving Patchwork a go if you want to see it in action.

The protocol is mostly defined by the implementation, which is an archipelago of tiny Node.js modules. To make it easier for the next person trying to figure this out, let me give you a rough overview of the outer layers of the protocol:

All the cryptographic operations are performed with libsodium.
The connection starts with a Secret Handshake. It is used to authenticate the connecting parties and to agree on session keys. It’s implemented in the secret-handshake module.
After the handshake, everything is encrypted with the session keys using the framing documented and implemented in the pull-box-streams.
The encrypted content consists of muxrpc commands and data.
Concretely it’s a stream of packets encoded with the packet-stream-buffers module.

Edit: For more, check out the Scuttlebot Protocol guide. It looks super-infromative, but I didn’t know about it before writing this post! Thanks to André Staltz for pointing me to it.

I toyed around implementing SSB in Pony and this is how far I got. I suppose it’d be simple to implement a client that connects to a Scuttlebot server to publish a message.

I don’t how much it’d take to implement a full-blown SSB node. At least you’d need the feed synchronization. I think it’s implemented by the ssb-friends module, but I’m not 100% sure.

A new JSON library appears

Thu, 21 Dec 2017 00:00:00 +0000

At Metosin, we’ve released the first version of our own JSON library for Clojure, jsonista. I wrote about it on Metosin’s blog.

If you’re a JSON enthusiast, you might want to check out Tim Bray’s post on the new RFC 8259. He calls it “the last specification of JSON that anyone will ever publish”.

Break from blogging

Thu, 14 Sep 2017 00:00:00 +0000

I’m going to take a break from blogging on quanttype. I haven’t enjoyed writing the posts lately and the quality has suffered accordingly. There’s no point in forcing it.

The future of blogs that go on hiatus is always uncertain. I’m not going to make any promises.

What is first-order logic?

Mon, 28 Aug 2017 00:00:00 +0000

I want to tell you about descriptive complexity theory, but it’s hard to explain if you don’t know what first-order logic is. Let’s have a tiny refresher.

Even if you haven’t heard about first-order logic, you’re probably familiar with it anyway. It’s the basic language of logic with the following elements:

logical connectives ∧ for conjunction (AND), ∨ for disjunction, and ¬ for negation (NOT)¹
existential quantifier $\exists$ and universal quantifier $\forall$
equality symbol $=$

Using this language, you can talk about a domain which is a set of values. It’s like having a database and writing queries against it using the first-order logic as the query language. For example, the statement $\exists x (\exists y (\neg (x = y)))$ asserts that there at least two distinct values in the domain.

First-order logic, abbreviated FO, is not very powerful. You can make it more powerful by adding some extra predicates. For example, if you want to talk about natural numbers, you could add the predicates + and <. Then you can state theorems like $\forall x (\forall y (x \leq x + y))$. The resulting logic is called FO(+,<).

FO is called first-order logic, because the existential and the universal quantifiers quantify over atomic values. In second-order logic (SO), you’re allowed to quantify over predicates. In third-order logic, you’re allowed to quantify over predicates of predicates etc. This gives you a lot of leverage. For example, in SO you can assert that the domain contains even number of elements. This is not expressible in FO.

The exact set of connectives varies by the source, but it does not really matter. Having ∧, ∨, and ¬ is enough, because you can express all the other logical connectives in terms of them. ↩︎

Please publish changelogs

Mon, 21 Aug 2017 00:00:00 +0000

When you release a new version of your library, please do a favor to your users and publish a changelog entry highlighting the most important changes.

The changelog tells your users what’s new – it gives them a reason to upgrade. It tells them what’s broken, so they won’t be surprised when nothing works anymore.

I’ve heard this quip that the changelog is one of the most important pieces of documentation, because even if the other documentation is lacking, it tells you what is outdated about the knowledge you have discovered yourself.

Sure, the commit history is always there, but usually it’s hard to understand. It’s easier to write a passable changelog than to curate the history.

How to do it in practice? keep a changelog has elaborate instructions. If you want to follow them, that’s great, but as long as you use a consistent format with version numbers and release dates, I’m happy.

Write more macros

Mon, 14 Aug 2017 00:00:00 +0000

If you’re a Clojure programmer, it’s likely that you don’t write many macros. Everybody is always warning against writing too many macros. Those people are wrong: macros are great and you should write more of them.

Here are some good uses for macros:

Control structures keep your code simple.
with- macros keep your resource usage tidy.

I dislike def macros, because I often need the corresponding non-def function macros and I usually have to read the source to figure out how they differ. That said, they’re okay when they look like this:

(declare foo)

(defmacro deffoo [x & args]
  `(def ~x (foo ~@args)))

You could even write a macro for defining def macros!

Hiking from Pyhä to Luosto

Mon, 07 Aug 2017 00:00:00 +0000

A pond in the Isokuru gorge.

Last week I hiked in the Pyhä-Luosto national park for four days. This was my first multi-day hike ever and it was great! I’m not going to post a full-blown travelogue, but I’ll try to make some notes about what made it great.

On terminology: I’m out of my depth with all these nature words. It’s hard to properly map them from Finnish to English. Figuring out the proper terms for wetlands is especially hard, so I’ll just call them mires. Sorry.

There are many reasons for hiking, but I do it because I enjoy beautiful landscapes. Pyhä-Luosto delivers: it has fells and mires. They are my favorite Finnish landscape elements. This was reflected in our pace. We walked only 10-12 km per day to have time for admiring and photographing the nature, to take detours, to cook and in general just to not be in a hurry.

Hiking in Pyhä-Luosto is easy: the routes are well-marked and all the rest spots are well-equipped. The huts even have gas stoves for cooking. This suited me well as I’m not much of a bushcraft person. The only thing that required some consideration was potable water. This is not the part of Lapland with brooks everywhere. There’s water in the mires, for sure, but the idea of drinking it didn’t exactly thrill me.

Duckboards in the Pyhälatva mire.

Recommendations

If you’re planning to hike in the Pyhä-Luosto national park, here are my recommendations:

Go see the Isokuru gorge. Its main sight is the Pyhänkasteenputous waterfall and it’s one of the most picturesque places I’ve visited in Finland.
If you like mires, walk the Luosto nature trail (vaellusluontopolku). It goes through the Pyhälatva mire, showcasing different kinds of mire vegetation. If you’re there in the right time of year, you might find a lot of cloudberries, too. Unfortunately for us, they weren’t ripe yet.
If you’re going to climb on Ukko-Luosto, the south-east trail walks amidst beautiful rocks, shrubs, and brush. I would go up the south-east trail and take the stairs on the north-east side to get down.
The Lampivaara café has excellent home-made donuts.

This is why they tell you to use good shoes.

Lessons for the next hike

I had some packing problems. I’ll try to do better next time.

Use a backpack that can handle weight of your equipment. My Haglöfs internal frame backpack is a travel model, but it was supposed to withstand hiking. This turned out to be not true: I loaded it with some 20 kg of clothes, equipment, food, and water. The frame got bent and on the second day of the hike part of the frame popped out through two layers of fabric. We managed to fix it, but I don’t want to load it fully again.
Maybe do not pack a Trangia stove sideways in the backpack. The lid of the stove was circular when we started the trip. Now it’s oval. I don’t know what happened.
Go on a practice walk with the fully-packed backpack. I hadn’t used this backpack before with over 15 kg load. It took me a while to figure out how to adjust it properly for the heavy loads. I could have done it closer to home.
Blocky or cylindrical water bottles would make packing easier. I used two 1.5 liter soft drink bottles to carry the water. The pointy tops of the bottles made it hard to pack them efficiently.
Bring enough cocoa to have hot chocolate every day. Self-explanatory.
Bring a backup battery for the camera. The X100T battery won’t last for four days.

Summer vacation

Mon, 17 Jul 2017 00:00:00 +0000

I’m going on vacation and this blog will be on vacation as well. The regularly scheduled musings on programming and stuff will continue in August.

Focus on understandble code

Tue, 11 Jul 2017 00:00:00 +0000

When you review code, what do you focus on?

I focus on understading the code. My rule is that in any project with a long-term team, all members should be able to understand almost all of the code.

If it’s hard for me to understand some code, it’s hard for me to change it or debug it. Moreover, if it’s hard for me to understand it now, it’s likely hard for the author to understand it once they’ve forgotten the details. Complicated code is better at hiding its bugs, too.

Sometimes there are parts that require so much magic and advanced trickery that it’s not reasonable to expect others to understand it without serious study. Working around third-party bugs sometimes creates this kind of code, as does micro-optimization. Usually these should be only a tiny fraction of your code base.

This is why understandable code has such a high priority for me. It makes the code easier to work with, for you and for everybody else, right now and in the future.

This is one of those situations where the more junior developers can provide valuable feedback for the more senior developers. If you’re a senior developer and you wrote some code that the junior developer on your team has hard time understanding, it’s time to do a reality check: is your code complicated for a reason, or did you just mess it up?

What is pair programming like?

Tue, 04 Jul 2017 00:00:00 +0000

Do you like pair programming? I tried it and I liked it.

Since May, I’ve been working in a team where we regularly do pair programming. This is my first experience with extensive pairing. We aren’t anywhere close to 100 %, but even doing a couple of sessions per week is way more pairing than I’ve ever done before. In general, it has been a positive experience.

Pairing is at its best when both of you unsure about what you’re doing. Having someone there helps a lot when you’re trying to figure out how something should work.

My favorite sessions have been the ones where we write the SQL for the data model of our greenfield application. Even though we’ve sketched the data model beforehand, there are always some small problems that you have to think through.

Pairing also works well when one of you is unsure about what you’re doing. That person should then “drive” (i.e. write the code) while the more knowledgeable person guides them. This seems like a good way to share knowledge.

Pairing is less useful when both of you know what you’re doing. The person writing the code does not really need help or guidance. There’s not much to do for the other person.

Pairing is intensive. I can do it only for two hours or so and then I need to do something else. I expect this to change with time, but right now I have hard time imagining pairing full-time.

While I’ve enjoyed pair programming, I find pair debugging frustrating. My working style when I’m debugging does not suit pair work: I concentrate intensely, jump quickly around, and rely on my intuition. This is not helpful for either driving or guiding. My friend Mikhail pointed out that maybe I should consciously focus on helping others to debug instead of solving the problem. I have not yet tried this.

Pairing and code review

Pairing and code review have similar benefits. They both increase cohesion and spread knowledge of the code base. They also help at improving the coding skills and catch bugs and quality issues.

I have a lot of experience with code review and would recommend it for almost any project except short-lived prototypes. It’s worth it for the knowledge transfer alone. My updated recommendation is that if you use pairing, you do not necessarily need code review – you already got most of the benefits with pairing.

My rule of thumb is that all code should be engaged with by (at least) two persons. To achieve this, we mix these practices: some code is written by pairing and the rest of it goes through code review.

I’m happy that I have a new, robust tool for producing good code in my toolbox. If you haven’t yet tried regular pairing, I recommend giving it a go.

Using Beeminder to keep blogging

Tue, 27 Jun 2017 00:00:00 +0000

I aim to publish a blog post every week, at least when it’s not vacation season. I’m happy that I’ve published so many blog posts even though not all of them are very good. Unfortunately actually writing the posts is not that fun. It’s easy to postpone the task when you’re not feeling inspired.

I’ve solved the problem by using Beeminder. It’s a service that allows you to make a bet with yourself about reaching some quantifiable goals. I’ve bet $10 that a new item will appear on the quanttype RSS feed every week. If it doesn’t happen, I will pay $10 to Beeminder the company.¹ It’s not a big pile of money, but having something on the line makes the commitment feel more real.

The brilliant part of Beeminder is that you can change the bet any time, but there’s a delay of one week before the change takes effect. For example, if I want to give up on my blogging goal right now, that’s perfectly fine, but I still have to publish one blog post this week. This means that I can’t just cancel the bet if I’m not feeling inspired some week.

When I tell people about Beeminder, they tell me that it’s weird and ridiculous. It works, though. I’ve kept blogging and I haven’t lost any money in a long time.

Sometimes people say that the money should go to a charity or something instead of Beeminder. If you’re opposed to funding Beeminder, better not to lose the bet, then. ↩︎

What are hybrid maps?

Thu, 15 Jun 2017 00:00:00 +0000

Clojure people sometimes talk about hybrid maps. What are they talking about?

A map, in general, is a data structure that associates keys with values. There are two common usage patterns for maps in Clojure:

As records. The map acts as a collection of fields with predefined keys. For example, a row in a relational database is a record. When you query the database, you know what columns the result will have. Typically in Clojure, you get a map or a collection of maps as a result of database query.
As indices. The map acts as a collection of objects indexed by something. For example, you could represent a database table as a Clojure map of rows indexed by the primary key of each row.

See LispCast for a more extensive discussion of these patterns.

A hybrid map is a map that acts as a record and as an index at the same time. This pattern is not common – usually it’s better to have the index map as a field in the record. One case where it works nicely is the Clojure map destructuring syntax:

(let [{var1 :key1 var2 :key2 :keys [key3] :as result}
      (some-function-call ...)]
  ...)

Here :keys and :as work like record fields, but the bindings for var1 and var2 work like an index.

What about heterogenous maps?

A map is heterogenous if it can contain keys and/or values of multiple types. If a map is not heterogenous, it’s homogenous. Here are some examples:

{:a 1 :b 2}         ; homogenous
{:a 1 :b "hello"}   ; heterogenous values
{:a 1 "b" 2}        ; heterogenous keys
{:a 1 "b" "hello"}  ; heterogenous keys and values
{}                  ; could be either!

This concept is not discussed much in the Clojure community, because Clojure maps are heterogenous by default.¹ An example of less-heterogenous map in Clojure is data.int-map, where the keys must be integers. The values can still be anything, though.

The concept of heterogenous maps, and of heterogenous collections in general, is more interesting in statically typed languages like Scala and Haskell. Expressing the type of a homogenous map is straightforward, but typing a heterogenous map is more complex. This is can be seen in how there are separate libraries for heterogenous collections, like the popular shapeless library for Scala.

What’s the relationship between hybrid maps and heterogenous maps? There’s no relationship, really. Like records, hybrid maps often are heterogenous, but there’s no reason for why you couldn’t use a homogenous map for a hybrid purpose.

Unless you argue that Clojure is unityped, in which case all Clojure maps are homogenous. ↩︎

ZuriHac 2017

Sun, 11 Jun 2017 00:00:00 +0000

The basic dilemma of ZuriHac: to stay inside and code or to enjoy the lake?

ZuriHac, the Haskell hackathon in/near Zürich, took place this weekend and it was again great!

These days hackathon usually means a prototyping competition. ZuriHac is a hackathon in a different sense. There’s no competition. Instead, people come there to collaborate, to learn, and to have fun together. Together we worked on open-source projects, taught each other Haskell, and had good time drinking beer by the lake.

There were some talks, too. The keynotes were given by seasoned Haskellers, but my favorite was the talk given by Mario Meili and Cyrill Schenkel. Mario and Cyrill are students in HSR Hochschule für Technik Rapperswil, where the event took place. They presented their work-in-progress master’s theses.

Mario’s part turned out to be a quite provocative conversation opener. His thesis is about what prevents the wider adoption of Haskell in the industry. He had gathered often-presented concerns about Haskell like its poor performance. The listeners were quick to shoot these claims down, but Mario does have a point – people really do believe these things about Haskell.

My take is that Haskell’s lack of industrial success is not about its drawbacks. The lack of compelling benefits is the actual problem. It’s hard to make a concrete case for Haskell.

Some familiar faces from Helsinki having fun with FunLists.

While on the plane back home, I read this article about the rOpenSci’s Unconference. Sean Kross describes the Unconf like this:

Some have described it as a hackathon, but I think that’s a mischaracterization. Though a great deal of code is written in a short period of time, a substantial proportion of the attendees make large and important contributions while hardly writing any code. The Unconf is not a competition, and it’s intensely social and collaborative.

Sounds like Unconf’s spirit is similar to ZuriHac’s spirit! This is a great event format for programming communities – it’s more social and more collaborative than the usual conferences. After all, everybody seems to agree that the social part is the best part of conferences.

Commit messages are worthless

Fri, 02 Jun 2017 00:00:00 +0000

Commit messages written in rush tend to be less than helpful.

A couple of years ago I wrote about how I hoped that writing good commit messages is worthwhile. It never paid off.

Turns out it’s never my commit messages I’m reading. It’s my coworkers' messages, or open-source contributors’ messages. It’s not enough that you write good messages, you need to foster a culture of good commit messages in all the projects you work with.

Even people who write good commit messages most of the time sometimes create commits with messages like “make it work” or “WIP” or “blargh”. For some reason it’s always these commits that contain the mysterious bugs or the unclear code.

Even if you do write good commit messages, if you use GitHub, it’s likely that nobody reads the messages. GitHub’s user interface isn’t designed for reading commit messages. When you do a pull request, the commit titles are in small font and the bodies are hidden. If you use GitHub, it’s better to spend time on writing good pull request messages. They do get read.

On sailing

Sat, 27 May 2017 00:00:00 +0000

I was on a boat and it had strings attached.

I enjoy sailing. I don’t have a sailboat, and I don’t know how to sail, but sometimes I get lucky and my friends invite me over to their boats.

I like it because when you’re on a boat, you can’t really do anything but sail. You can’t go anywhere. If there’s good wind, you can go fast and you’re busy with the sails and navigating. If there’s little wind, you just sit there and enjoy the sea. It’s one of those in-the-moment activities.

If you haven’t ever sailed and you get the chance, you should take it.

On JSONfeed

Sat, 20 May 2017 00:00:00 +0000

I love it when people link to their RSS feed and it’s actually Atom.

There’s a new alternative to RSS and Atom called JSON Feed. It’s like RSS except that it uses JSON instead of XML. Having written a couple of RSS parsers in my life, I reckon this is a good idea: in many programming languages, parsing and producing XML is a huge hassle compared to dealing with JSON. If you try to consume random RSS feeds, you’ll quickly discover how broken they are.

Based on my experience, it’s common to produce XML by concatenating strings or using a string-based templating language. It’s also common to get this slightly wrong – the content is poorly escaped or the tags are mismatched. In contrast, especially in dynamic languages, JSON is easy to produce by serializing a datastructure. This means that coders in hurry are more likely to produce syntactically valid JSON than syntactically valid XML.

Caveat lector: I do not have experience in consuming random JSON files. My picture might be way too rosy.

The IndieWeb wiki offers some criticism of feed files. I subscribe to approximately 200 blogs and the point about feeds becoming out of date or broken is valid. Feeds just mysteriously break even though the HTML version of the blog continues to work. Sometimes people relocate their feeds without setting up a redirect. The trouble is that the publishers never notice because they don’t read their own feeds. It’s up to the reader to let them know.

A new feed format won’t fix this problem, but then again, I don’t know what would fix it.

I have yet to create a JSON feed for quanttype, but I might as well do it. If someone creates a JSON feed module for Hakyll, let me know.

clojure.spec for configuration validation

Sun, 14 May 2017 00:00:00 +0000

Some tools do not require configuration at all.

In March, I wrote about configuring Clojure web applications. I recommended storing the configuration in EDN files and loading them with Maailma. Something I didn’t mention at all was validating the configuration.

The problem

What happens if you mistype a configuration key? Clojure is not known for great error messages and you’ll witness this unless you validate your configuration.

Let’s say you use HTTP Kit’s HTTP server and your configuration file looks something like this:

{:http/server {:potr 3000}}

Maybe you start the server like this:

(require '[org.httpkit.server :refer [run-server]])

(defn start-server [config]
  (run-server app {:port (get-in config [:http/server :port])}))

But uh oh, there was a typo in the config! It should say :port instead of :potr. HTTP Kit is going to receive {:port nil}. Can you guess the error message?

boot.user=> (run-server app {:port nil})
java.lang.NullPointerException:

Of course. What if you instead pass the whole configuration submap to HTTP Kit?

(defn start-server [config]
  (run-server app (get config :http/server)))

This time you won’t get any error message. The server will quietly ignore the configuration and start at the default port 8090.

I’m using HTTP Kit as an example, but this problem is not specific to it. It’s just rare in the Clojure ecosystem to give useful error messages on bad input, and Clojure’s dynamism does not help here.

This is why configuration validation matters. You’ll save yourself a lot of debugging time by using a bit of time on validation.

The clojure.spec solution

Clojure 1.9 is going to ship with clojure.spec, a library for specifying and validating data shape. My impression is that it’s primarily intended as a development and testing tool. I do not have much experience with that yet, but it works nicely for writing configuration validation code.

Let’s write a spec for the configuration above.

(require '[clojure.spec.alpha :as s])

;; The top level has one required key, :http/server,
;; specified below
(s/def ::config (s/keys :req [:http/server]))

;; :http/server has one required unqualified key, :port,
;; which should be a valid port number.
(s/def :http/server (s/keys :req-un [:http/port]))
(s/def :http/port (s/int-in 0 65536))

(defn validate-config [config]
  (when-not (s/valid? ::config config)
    (s/explain ::config config)
    (throw (ex-info "Invalid configuration." (s/explain-data ::config config))))
  config)

Let’s try this with our example configuration.

boot.user=> (validate-config {:http/server {:potr 3000}})
In: [:http/server] val: {:potr 3000} fails spec: :http/server
at: [:http/server] predicate: (contains? % :port)
clojure.lang.ExceptionInfo: Invalid configuration.

Much better, and it took only a couple of lines of code!

What is functional analysis?

Sun, 07 May 2017 00:00:00 +0000

Studying functional analysis will help you understand wavelets.

I took an introductory course on functional analysis. Let me tell you about it. (See also: descriptive set theory, forcing.)

Functional analysis is the study of infinite-dimensional vector spaces. These spaces are usually (always?) spaces of functions. For example, consider the sequence spaces $\ell^p$ where $1 \leq p < \infty$. They’re defined as follows: $$ \ell^p := { (x_n)_{n=1}^\infty : | x |_p := (\sum_{n=1}^\infty | x_n |^p)^\frac{1}{p} < \infty } $$

Here $| x |_p$ is the norm. You can also consider them spaces of functions of type $\mathbb{N} \rightarrow \mathbb{K}$, where $\mathbb{K}$ is the scalar field.

All the $\ell^p$ spaces are complete. Thus they’re Banach spaces. Banach spaces, or complete normed vector spaces, are central to functional analysis because the functions in them are quite well-behaved. Consider the following results:

Open mapping theorem: A mapping between Banach spaces is open iff it’s surjective.
If $M$ is a subspace of a Banach space $E$ and $f : M \rightarrow \mathbb{R}$ is a suitable linear function, Hahn-Banach theorem may allow you to extend it to the whole space $E$.

What about the name? Functional analysis is not related to functional programming. A functional is a function from a vector space to its scalar field. Of course, in the case of function spaces it is a function of functions. Functionals often come up in functional analysis – for example the Hanh-Banach theorem above is about functionals.

Why do people care about functional analysis? A big reason is that it gives tools for studying partial differential equations. Unfortunately I’m not familiar enough with the matter to say how it helps, except that having well-behaved functions is always nice.

JUnit output for Clojure tests

Thu, 27 Apr 2017 00:00:00 +0000

I’m trying to get my Clojure testing ducks in a row.

In January, I wrote about the features I’d like to have in a Clojure test runner. One of them was JUnit XML reports for the test results. I’ve since contributed a JUnit reporter to Eftest, the test runner library developed by James Reeves.

I had two goals. The first was to support running tests in parallel, which is what Eftest does by default. The second was to produce human-readable output and a JUnit report at the same time. Both of the goals are accomplished by Eftest 0.3.1.

For the second goal, you have to write your own reporting function that calls both the pretty reporter (or the progress reporter, or whatever you like) and the JUnit reporter. Then you can use eftest.report/report-to-file to redirect one of them to a file. Here’s how it goes:

(ns my-project.report
  (:require clojure.test
            [eftest.report :refer [report-to-file]]
            [eftest.report.junit :as junit]
            [eftest.report.pretty :as pretty]))

(def xml-report
  (report-to-file (junit/report "test-results.xml")))

(defn report [m]
  (pretty/report m)
  (binding [clojure.test/*report-counters* nil]
    (xml-report m)))

You have to bind *report-counters* to nil when executing one of the reporters to prevent each assertion from getting counted twice. Ideally this would be handled by eftest, but the design of clojure.test makes it hard to do. Check out eftest’s issue 23 for discussion.

Now run your tests with {:report my-project.report/report} and you’ll get Eftest’s colorful exception reports in the console and a JUnit report in the file test-results.xml.

Test tagging and other features

Another feature on my list was filtering tests by tags. Already in January, Eftest had the ability to filter tests with an arbitrary function. Since then, James has created a Leiningen plugin for Eftest and Facunda Olano has implemented the test selectors supported by lein test.

Output catching and the test slowness report are still missing. Output catching is a bit hairy if you want to deal with loggers, but I already wrote a version that catches any output written to *out*. It might be the 80/20 solution. I’ll send a pull request soon. The slowness report should be easy to implement with a reporter function as well.

I think Eftest is going to be the test runner I wished for. I recommend it, and if you want to use it with Boot, boot-alt-test is a wrapper for it.

Update 2017-09-19: Added the binding for *report-counters*. Thanks to Andrew Gnagy for pointing out this problem.

prog1 in Clojure

Thu, 20 Apr 2017 00:00:00 +0000

Clean your code with this one weird macro trick.

When programming Clojure, sometimes you need the equivalent of Common Lisp’s prog1 macro. It’s like Clojure’s do except that it returns the value of the first form, not the last one.

You could depend on useful, which calls it returning:

(defmacro returning
  "Compute a return value, then execute other forms for side effects.
  Like prog1 in common lisp, or a (do) that returns the first form."
  [value & forms]
  `(let [value# ~value]
     ~@forms
     value#))

Of course, when you’re in a hurry, there’s no time for adding new dependencies. There’s not even time to write your own inline version of the macro. Besides, they say that you shouldn’t ever write your own macros. So what do you do? You compose doto and do:

(doto x
  (do
    (y)
    (z)))
;; returns x

Update (September 2021): Robert Levy pointed out to me that you can use constantly. Clever!

((constantly x)
 (y)
 (z))

But maybe instead of x you have (a complicated form) and you want to give its result a name. Luckily there’s as->.

(doto (a complicated form)
  (as-> x 
    (do
      (y x)
      (z x))))

When you’re debugging a long -> thread, a print function that returns the printed value would be handy so you could insert it in the middle of the chain. This is of course exactly what tools.trace is for. But again, who has time for adding dependencies? Just use doto.

(-> thing
    do-something-1
    do-something-2
    (doto prn)
    do-something-3)

By the way, the other day I wrote this macro that resembles cond->:

(defmacro if-> [expr cond then else]
  `(let [e# ~expr]
     (if ~cond (-> e# ~then) (-> e# ~else))))

I used it with a builder object. The code looked something like this:

(-> (Builder.)
    (.withSetting "password" "kissa2")
    (if-> production-mode?
      (.useProductionMode)
      (.useTestingMode))
    (.build))

Can somebody figure out how to do that nicely without writing a macro?

Darkroom update

Fri, 14 Apr 2017 00:00:00 +0000

It’s Easter holiday and I’m looking forward to photographing Haronmäki again. This time it’s going to be in black and white, shot in medium format.

I’ve been practicing darkroom printing for half a year now and I’m finally starting to get somewhere. Or at least I’m not making mistakes every time I go the darkroom. I’m starting to get consistent results and sometimes I even thinking I’m finding my own “voice”. I had especially good time with photographing ice this winter.

I’ve wasted a lot of paper, but luckily I’ve ruined only two rolls of film. Lesson learned: developing two films at once is a great way to ruin two films at once.

Keskustalaisuus vaalikoneessa

Sun, 09 Apr 2017 00:00:00 +0000

Briefly in English: Today is the municipal election day in Finland. As usual, the vaalikone websites are popular. They’re recommendation engines for the elections: you fill in a quiz about your political opinions and the website recommends you candidates whose answers match your answers. Juho Snellman describes them as OkCupid for elections - I recommend his article for more information. I did some analysis on the vaalikone data from Ilta-Sanomat. Since all the data is in Finnish, the rest of the article is going to be in Finnish as well.

Usein väitetään, että keskustalaisuutta ei voi määritellä perinteisellä vasemmisto-oikeisto-akselilla tai oikeisto-liberaali-akselilla. Ainakin Helsingin Sanomien vaalikone tukee tätä ajatusta, sillä Hesarin omassa visualisaatiossa Keskustan ehdokkaat sijoittuvat tasaisesti ympäri arvokarttaa. Mikä siis yhdistää keskustalaisia?

Helsingin Sanomat on julkaissut oman vaalikoneensa ja Ilta-Sanomien vaalikoneen kysymykset ja vastaukset avoimena datana. Päätin tutkia, mitkä vaalikonevastaukset ovat erityisen tyypillisiä keskustalaisille. Tarkastelin Ilta-Sanomien vaalikoneen kysymyksiä, sillä ne suorasanaisempia kuin HS:n vastaavat. Alempana kerron siitä, miten analysoin dataa, mutta otetaan hauskin ensin eli tulokset.

Seuraavaan kahteen väitteeseen samaistuminen lisää kaikkein eniten todennäköisyyttä olla keskustan kunnallisvaaliehdokas:

Samaa mieltä: Koko Suomi on syytä pitää asuttuna, vaikka se tietäisikin veronmaksajille kustannuksia.
Eri mieltä: Hyvä veli -verkostot ohjaavat kuntien päätöksentekoa.

Alla ovat tulokset muillekin suurpuolueille aakkosjärjestyksessä:

Kokoomus

Samaa mieltä: Julkisia palveluita tulisi ulkoistaa entistä enemmän yksityisten yritysten tuotettavaksi.
Eri mieltä: On parempi, että kunta nostaa veroäyriä kuin että se leikkaa palveluistaan.

Kristillisdemokraatit

Eri mieltä: Homo- ja lesbopareilla pitää olla samat avioliitto- ja adoptio-oikeudet kuin heteropareilla.
Eri mieltä: Eutanasia pitäisi sallia.

Perussuomalaiset

Eri mieltä: Kotikuntani pitäisi ottaa aktiivisesti vastaan kotoutettavia turvapaikanhakijoita.
Eri mieltä: Monikulttuurisuus on kunnalle hyvä asia.

RKP

Eri mieltä: Tuloerojen kasvusta on haittaa yhteiskunnalle.
Samaa mieltä: Kotikuntani pitäisi ottaa aktiivisesti vastaan kotoutettavia turvapaikanhakijoita.

SDP

Eri mieltä: Julkisia palveluita tulisi ulkoistaa entistä enemmän yksityisten yritysten tuotettavaksi.
Eri mieltä: Hyvä veli -verkostot ohjaavat kuntien päätöksentekoa.

Vasemmistoliitto

Eri mieltä: Julkisia palveluita tulisi ulkoistaa entistä enemmän yksityisten yritysten tuotettavaksi.
Eri mieltä: Kerjääminen kaduilla pitäisi kieltää lailla.

Vihreät

Samaa mieltä: Kasvisruoan määrää pitäisi lisätä kouluissa ja liharuuan vähentää.
Samaa mieltä: Homo- ja lesbopareilla pitää olla samat avioliitto- ja adoptio-oikeudet kuin heteropareilla.

Ainakaan minulle näissä ei ollut suuria yllätyksiä. Keskustalaisilla korostuu aluepolitiikka. En tiedä mitä odotin. Hyvä veli -verkostot hieman naurattivat, mutta lienee luonnollista, että kuntavaltaa käyttävien puolueiden ehdokkaat ovat sitä mieltä, että vallankäyttö toimii. Myös Kokoomus sai korkean positiivisen kertoimen kyseiselle väitteelle.

Teknisiä yksityiskohtia

Minulla ei ole tilastotieteellistä koulutusta, joten en uskalla luvata paljoakaan tämän analyysin luotettavuudesta. Luultavasti se ei ole kovin luotettava.

Mallinsin tilannetta logistisella regressiolla. Selittävinä muuttujina ovat vastaukset kysymyksiin jatkuvalla asteikolla 1-5 (missä 1 tarkoittaa ”täysin eri mieltä” ja 5 ”täysin samaa mieltä”) ja selitettävänä muuttujana on kulloinkin analysoitavan puolueen jäsenyys. Ylle valitsin ne kaksi kysymystä, joita vastaavat kertoimet olivat suurimmat. Kertoimet voidaan tulkita vetosuhteen logaritmeina. Jätin kuitenkin vetosuhteet esittämättä, sillä en osaa arvioida niiden luotettavuutta.

HS:n vaalikonedatassa on samassa tiedostossa sekä Ilta-Sanomien että HS:n vaalikoneen kysymykset. Käytin pelkästään IS:n kysymyksiä. Toisin kuin HS:n vaalikoneessa, IS:n vaalikoneessa ei ole kuntakohtaisia kysymyksiä.

HS:n ja IS:n kuntavaalikoneen data on julkaistu CC BY 4.0 -lisenssillä.

How I use Anki

Sun, 02 Apr 2017 00:00:00 +0000

I’ve been using Anki for a couple of years and it’s great. It’s a flashcard software - basically a tool for memorizing things. If you’re not familiar with it, Gwern has an introduction and Robb Seaton has some helpful tips.

For a long time, I thought Anki was cool, but I didn’t really know how to use it. I wasn’t actively studying any foreign languages or anything like that. The topics (like mathematics) I was learning about didn’t seem to be that amenable for memorizing. It took me a while, but I’ve now figured out a way that works for me, and it applies to almost any topic.

Whenever I need to look something up twice, I add it to Anki. For example, English is not my native language, so I often look up definitions of English words in a dictionary. If I look up a word twice in a short period of time, I add it and its Finnish definition to Anki. If a word strikes me as especially important, I might add it already on the first lookup. Now my deck has cards for words like flaccid, amygdala, and pithy (that’s a hard one, I always mix it up with pitiful).

I’ve had success with memorizing mathematical identities as well. After years of struggling to remember the formula for the limit of a geometric series, I finally added it to Anki and now I remember it. I have some more abstract mathematical cards, too. I had to think it through a couple of times recently, so now added a card ”What’s the relationship between Borel sets and the Lebesgue measure?” to my deck (answer: ”Borel sets are Lebesgue measurable but not vice versa”).

People sometimes oppose memorizing because you can just look things up when needed, especially now that we have Internet, or derive them when it comes to mathematics. You can’t always look up or derive everything, though. First, it takes too much time, and second, it’s easier to look things up when you already know something related to them.

I figure that if I need to look up something repeatedly, I might as well add it to the base of condensed knowledge in my head.

Configuring Clojure apps

Thu, 23 Mar 2017 00:00:00 +0000

For the record, I’ve published a post about configuring Clojure web applications on Metosin’s blog. Something I did not write about is how to configure the ClojureScript frontend. If you have any insights into this, please blog about it!

Thinking about gear (acquisition syndrome)

Sun, 19 Mar 2017 00:00:00 +0000

One of the distractions of the hobbyist photographer is the Gear Acquistion Syndrome (GAS): instead of making photographs, you spend all your time obsessing over new gear. Your current gear seems inadequate when there are so much better options available.

This is not exclusive to photographers: I’ve seen at least guitarists and cyclists do this, too. Basically there are two distinct versions of these hobbies: the main hobby and the gear hobby. There’s nothing wrong with this. When it comes to your free time, you do you.

Why does Gear Acquisition Syndrome cause distress, then? It’s because your priorities and actions are in conflict: you’re focusing on the gear hobby when you feel you should be focusing the main hobby.

I’ve had my bouts of photography GAS, but I nowadays get easily over it. Let me share some of the thoughts that helped me.

First, clarify your priorities. Instead of thinking of what you should do, think about what you want to do. Should is sneaky, beware should! Myself, I care about making greats photographs much more than I care about having a great camera. Some people may find that they in fact prefer cameras to photographs. This is fine, too.

Once you understand your priorities, think about what’s the bottleneck in your process. Have you reached the limits of your gear? For example, I could buy a new digital camera with a bigger sensor. The sensor size is clearly something that people care about, as evidenced by how enthusiastic people are about the new era of medium format digital cameras. Despite this, I’m unable to argue how this would make my photos better. The sensor size is not my bottleneck – the time spent on photography is.

Finally, it’s helpful to study the old masters. It’s easy to forget how amazingly good and easy to use everything on the market right now is. Compared to the modern equipment, the photographers of the early and even the late 20th century used convoluded crap. Nevertheless they managed to make pictures that speak to me decades later. Your current gear likely is good enough for pretty great photography.

Of course, it’s easy for me to say this. My photography is decidedly non-technical. So maybe you do need that new lens after all…

Ice in black and white

Sun, 12 Mar 2017 00:00:00 +0000

I wrote a blog post about Clojure, but I realized it was a cheap rant and deleted it. Nevertheless I must fill this week’s blogging quota. Please look at this photo instead. I love the look of ice in black and white.

The surprises of photography

Sat, 04 Mar 2017 00:00:00 +0000

Roland Barthes’s Camera Lucida is a book about what the essence of photography is. I’ve been studying it with the university photography club’s reading group. What it says about surprise got my attention.

In Chapter 14, Barthes writes about the role of surprise in photography:

I imagine (this is all I can do, since I am not a photographer) that the essential gesture of the Operator is to surprise something or someone (through the little hole of the camera), and that this gesture is therefore perfect when it is performed unbeknownst to the subject being photographed.

Barthes goes on to list five different modes of surprise.

The rare, where the subject is something uncommon like a person with two heads.
The decisive moment, where rapid action is frozen during its course. For example a person jumping out of window.
The prowess, where the photograph displays extreme technical skill. For example a photograph of the explosion of a drop of milk.
Contortions of technique, where the photographer deliberately plays tricks with the medium. For example a photograph created by superimposition.
La trouvaille, which is the lucky finding of the perfectly composed natural scene.

Barthes mainly cares about photographs of people. I’m focused on landscape photography, though, what about that? How are you going to surprise a landscape? It does make sense: an essential part of landscape photography is finding the perfect light. You’re trying to capture the decisive moment of the sun.

The book was published in 1980. The contortions of technique may have lost some of their power since then. The digital tools have made it so easy to create impossible images that they do not have the same surprise value anymore. In general, Barthes argues that the power of photography comes from its ability to depict what has been. I wonder what Barthes would think now that the testimonial value of a photograph has great declined.

Barthes concludes the chapter:

In an initial period, Photography, in order to surprise, photographs the notable; but soon, by a familiar reversal, it decrees notable whatever it photographs. The ”anyhing whatever” then becomes the sophisticated acme of value.

This is something for us amateur photographers to keep in mind. A photograph is not interesting just because it exists.

Reading highlights from 2016

Sat, 25 Feb 2017 00:00:00 +0000

The stack of books mentioned in this post.

I want to highlight some of the best books I read in 2016. None of them were actually published in 2016 and they’re all well-known, popular books. Looks like my taste in books agrees with the masses!

Non-fiction

Dreams from My Father by Barack Obama. This is a memoir by Obama first published in 1995, way before he became a president. He writes about his childhood in Hawaii and Indonesia, about his college studies, about how he worked as a community organizer in Chicago, and about how he visited his relatives in Kenya. The book conveys a sense of rootlessness: Obama isn’t quite Hawaiian, but he’s no Kenyan either. He also touches on the troubles of being black in the US, with themes similar to Ta-Nehisi Coates’s book Between the World and Me (highly recommended!). I recommend this book to everyone. (Amazon, Goodreads)

Crossing the Chasm by Geoffrey A. Moore. This is a classic marketing book about the struggles of high-tech companies when they try to mainstream their innovative products. The book observes that what is easy to sell to the early adopters, who look for game-changing innovation, does not work in the mainstream market, where fully-developed products are needed. The book then suggests that the way forward is to focus on taking over a suitable niche of the mainstream market. The book has been around for a while, so I assume that actual marketing professionals are well aware of its ideas. For me, as a tech professional working with high-tech companies, it was helpful for understanding the business environment where the companies operate. I recommend this book to tech people. (Amazon, Goodreads)

Fiction

Not Before Sundown by Johanna Sinisalo. It’s a story about a photographer called Angel, who finds a wounded troll and starts taking care of it. Soon the troll becomes entangled with Angel’s romantic and professional life. A short, unique book mixing urban life and Finnish mythology! I read it in Finnish, but I hear the English translation is good, too. I recommend this book to everyone. (Amazon with its other English title, Troll: A Love Story, Goodreads)

The Three-Body Problem by Liu Cixin. This is a bestseller, so if you read sci-fi, you’ve probably heard about it. It’s a story about an encounter with alien species, spanning from the Cultural Revolution to the present day. The characters are flat – they seem to all be nihilists – but the sci-fi ideas are a lot of fun. Unfortunately I don’t know what to say about them without spoiling the book. I recommend this book to people who like hard sci-fi. (Amazon,Goodreads)

Technical tooling for making better software

Thu, 16 Feb 2017 00:00:00 +0000

Ripping seams is easy when you use the right tool.

David R. MacIver has a list of some things that might help you make better software. Some of the things, like continuous integration, require technical tooling in addition to adopting the practices.

If you agree with David’s list – and I do - this raises a question: does your technical stack provide the tooling needed for implementing David’s suggestions? In my case, do these tools exist for Clojure and ClojureScript? Let’s find out.

Continuous integration. No problems here: CI servers are mostly language-agnostic. If you want a hosted solution, there’s e.g. Travis and Circle, and if you prefer running your own, Jenkins is always there.

Local automated tests. For Clojure, things are okay. Test runners are not perfect, but there’s a reasonable workflow for running single tests in common editors like Emacs and Cursive, and running the whole test suite obviously works. For ClojureScript, there’s a workable setup for the whole test suite. In theory, the same editor workflow should work for ClojureScript, but in practice I’ve always encountered problems with the ClojureScript REPL.

Code coverage. For Clojure, there’s cloverage, although I’ve had some trouble with making it work with all the projects. As far as I know, there’s nothing for ClojureScript yet. Maybe something could be whipped up with Istanbuland remap-istanbul?

Property-based testing. There’s test.check, which David ranks as very good. clojure.spec’s generator support makes it even nicer. It works with both Clojure and ClojureScript.

Static analysis. Cursive does some in-editor analysis with both Clojure and ClojureScript. For Clojure, there’s Eastwood, which does not work with every project, but it is nice when it works. There’s also kibit, which is less useful, but also works with ClojureScript.

Production error monitoring. I’ve actually never done this with Clojure or ClojureScript, but I know that there’s at least Sentry tooling for various Java and Clojure logging libraries. Sentry also supports JavaScript error tracking, which might be good enough for ClojureScript.

Auto formatting and style checking. At least both Cursive and Emacs have support for reformatting Clojure(Script) code. cljfmt can both check and format your code and lein-bikeshed checks some things as well. They’re not quite as advanced as something like ESLint, though.

Overall, I’d rate the situation as okay but not amazing. If you look at David’s list, having the right processes and the right culture will have much bigger impact than having good tools. On the other hand, changing any processes involving tools is easier when the tools actually work.

Running ClojureScript tests with Karma

Wed, 08 Feb 2017 00:00:00 +0000

In the previous post I mentioned that you can use Karma to run ClojureScript tests. The easiest way to do it is to use doo or boot-cljs-test. Sometimes you need more advanced configuration and it’s best to use Karma directly. I’ve set up a GitHub repository that shows how to do it.

It’s actually pretty straightforward, but that was not obvious to me before I tried to do it.

Clojure test runner of my dreams

Thu, 26 Jan 2017 00:00:00 +0000

How do you run your Clojure unit tests? Does it make you happy?

When working on new code, I rely on CIDER’s clojure.test support. It allows you to run all tests in a namespace and then re-run the failed ones after you’ve made changes. This is good, because it makes the feedback loop tight: write code, send it to the REPL, run the tests, repeat. Cursive supports a similar workflow.

When I want to run the full test suite from the command-line, for example in a CI job, none of the available test runners makes me fully happy. For large test suites - especially the ones with integration tests - I’d like to have the following:

Output catching. If a test prints something to stdout, it should be shown only if the test fails. There’s no point in wading through thousands of lines of logs generated by succesful tests to find that one actual exception.
Test tagging. Many test runners allow you to filter tests by name. I’d like to filter them by custom tags. For example, I’d like to say “run all the tests except those tagged with :large”.

Here are some less essential but still nice to have features:

JUnit output in addition to the normal output. CI tools like Jenkins or Circle know how to create nice reports from JUnit output. The reports are helpful for making sense of large test suites. clojure.test knows how to generate JUnit, but it can only do it instead of the normal output. When debugging test failures, I prefer the usual logs to the nice reports.
Reporting the slowest tests. This is not something I care about all the time, but it’s handy when you wonder why the full test suite takes 45 minutes to run.

I’ve seen or implemented all of the above in custom test runners, but none of the open-source runners – lein test, boot-test, eftest, boot-alt-test, … – offers everything in a coherent package. Basically what I want is pytest for Clojure. Supporting ClojureScript would be great, too, although you can already use Karma with karma-cljs-test.

Since I know what I want, I should just go ahead and implement this, right? Right. I wanted to try something new and first write about this to see if anyone else cares. Does anyone else miss these features?

Update 2017-04-27: Check out the update on progress!

Curry On and ZuriHac

Sun, 31 Jul 2016 00:00:00 +0000

The venue for Curry On after party was impressive.

Last week I attended Curry On in Rome and ZuriHac in Zürich.

Curry On

The idea of Curry On is to bring together academic and industry people to talk about programming languages. There were talks about programming languages, tools for making programming languages, tools for analysing programs etc. I’d like summarize it as language-driven development: you create a custom language in which it’s easy to express and solve your problems.

In general, the talks were of high quality. I was positively surprised by the quality of the Q&A sessions as well - especially the senior academics made insightful questions and remarks. The talk videos are available on YouTube if you want to watch them. Here are my favorites:

The Racket Manifesto by Matthias Felleisen was about how Racket is programming language for creating languages and what this means, what problems there are and how Racket solves them. I was very impressed. This talk makes a nice pair with Larry Wall’s Perl 6 keynote - Perl 6 has a similar goal, but it’s philosophy is quite different.

Why the Free Monad isn’t Free by Kelley Robinson. Robinson first gave an explanation of what free monads are in the context of Scala and then discussed the cognitive costs associated with such advanced abstractions. The learning curve can be steep and it can make a piece of software unmaintainable. It’s a fair point and I haven’t seen much discussion about it.

The conference did reinforce my belief that programming languages matter. In this age of JavaScript I’ve had my doubts.

ZuriHac

ZuriHac is a Haskell hackathon. It’s one of those old-school hackathons where instead of competing, you gather together to hack on open source projects. There aren’t that many Haskell conferences, so ZuriHac has become an important meeting point for the European Haskell community. I liked this: hackathons feel more social than conferences and there are less talks to be ignored. Basically hackathons are like hallway-track-only conferences.

I did not get that much done, but at least I got in a small patch to Agda. Hopefully it enables some further work on the Agda JS backend, which isn’t in very good state right now. Zürich is a beautiful city and I had great time - I hope to be there next year as well.

Some questions (June 2016)

Tue, 28 Jun 2016 00:00:00 +0000

How does one make the perfect pancake?

Once before I wrote about some questions I couldn’t answer back then. Instead of now answering them, I’m going to ask some new questions. I’ve been thinking about these lately.

What are coder super powers? What are some examples of programmer skills that are highly valuable and very rare at the same time? Can I learn some of them?

When is it worthwhile to update your application’s dependencies? If you upgrade early, you get all the bugs and the compatibility problems. If you upgrade late, you’ll miss out on the bug fixes and the new features. I’m eager to update dependencies, but I’m not sure I can argue that it’s time well spent.

Is generative testing worthwhile? I’ve tried to use it several times and I’ve only ever found trivial bugs. Experts would argue that I’m doing it wrong. Even if not, the trivial bugs accumulate. A similar argument applies to the correctness-enforcing abilities of static typing and code review.

Is there a pattern where smart developers create software with bad architecture? If you’re smart enough, you can make software work even if it has catastrophic architecture. If you were less smart, you couldn’t deal with the complexity and would have to come up with a design that actually fits the problem.

On feeling guilty about not being good enough

Tue, 21 Jun 2016 00:00:00 +0000

Summary: Guilt is a poor source of motivation.

On a certain IRC channel, we’ve had a series of discussions about how a dabbling software developer can become an established professional. One of the recurring topics is feeling guilty for not being good enough.¹

There are a lot of highly visible cool programmers with out there. Their knowledge is deep and wide, they’re pushing out popular open-source libraries, they’re giving spectacular talks, they’re working for or starting prestigious companies.

As a junior developer, it’s easy to conclude that this is what you need to do to be a professional programmer. A junior developer seldom does any of these. This can make you feel guilty.

Of course, nobody can realistically expect a junior to do those things. Not every experienced developer does those things, or wants to. It takes time and effort to have impact and to become well-known, even if you’re lucky and privileged. You eat an elephant one bite at a time, as they say.

The guilt may still be there, though, even when you know all this.

I don’t know how to get rid of it, but I want to tell you my story with guilt. It’s about math instead of code, but the logic is the same.

As a kid, I was always considered ”talented” in mathematics. I did very well in math in the high school and eventually ended up studying it in the university. The high school math needed no effort, but the university math was harder. Turns out you have to work for your grades in the university!

Some students seemed to solve the exercises and pass the exams without any effort². I felt like I should be able to do it easily as these other students did but I couldn’t. This made me feel bad about the exercises, which made me avoid them, which prevented me from learning to solve them, which made feel even worse about them. The vicious circle was complete. I pretty much dropped out of the university because of this.

I don’t know what changed it, but eventually I stopped caring about how easy the exercises should be. Instead I started to focus on how much I liked learning math. Studying became something I looked forward to. It has gone to the point that I’m excited about an upcoming exam, because it’s a reason to study math.

It’s a long road to become great at what you do, whether it’s software development, mathematics, or something completely different. I hope you can find a way to enjoy your journey.

There’s probably a catchy name for this. Impostor syndrome is similar but not the same. ↩︎
It was not true. I’ve later learned that they did in fact put in a lot of effort, just like everyone else who succeeds in mathematics. I just failed to notice it back then. ↩︎

You'll want locally scoped CSS

Tue, 14 Jun 2016 00:00:00 +0000

Cascading is like class inheritance: mostly to be avoided.

Summary: Avoid cascading in CSS; use BEM, CSS modules, ar other similar tools.

At work, we recently took over a complex web application that has been developed over many years. I’m working on improving the front-end. I like the architecture, the code is straightforward and there are tests. Based on the code bases I’ve seen, the situation looks good. There is one problem area, though: the style sheets.

The style sheets for the application have organically evolved over years, written by multiple authors. While there has been some effort to scope CSS into components, every component is a bit different and there are no clear principles on what styles should be set where. This results in two problems:

Various parts of the UI that should look the same are subtly different.
When you try to modify something or add something new, you’ll end up fighting the cascading and the specificity of the existing styles.

I imagine this is a familiar situation for many web developers! I’ve certainly been here before. What makes it different is that this time I decided to learn what’s the state of the art for architecting your CSS to avoid these problems. If you are an experienced web developer, you probably already know everything in this article.

The feature of CSS that makes it so easy to mess up is that there’s a single global scope where all your styling rules are interacting. Your styles will accidentally cascade and then you’re in trouble. The solution is to basically create local scopes in CSS, one way or another. I learned that there are two groups of solutions: the manual ones and the more recent automated ones.

The manual way is to structure your CSS in a principled way. There are a bunch of methodologies with similar ideas: BEM, SMACSS, OOCSS etc. I chose BEM, short for block-element-modifier, because it is the simplest one and as such seemed like a good starting point. The basic idea is the following:

Only ever use class selectors.
Name your classes according to the block-element-modifier convention.
No cascading.

This ensures that the styles you want, and only those, get applied to exactly where you want them. Based on my brief experience, it works.

The automated way is to write CSS as usual and then apply it only to specific parts of your DOM tree by using tools like CSS modules or Polymer’s shadow DOM style encapsulation. The good part is that the tools require less manual work and less discipline than BEM. I chose to skip them this time, because they require quite a bit of infrastructure that does not (yet) exist in my project.

I’ve looked at BEM before. Back then, I thought it was suspiciously verbose and did not see the point. I still think it’s verbose, but now I see that it’s also useful.

Thanks to Mikhail and Santeri for pointers.

Why look beyond JavaScript?

Tue, 07 Jun 2016 00:00:00 +0000

A dangerous slope ahead.

There are a number of interesting programming languages that you could use in the browser instead of JavaScript. ClojureScript, TypeScript, PureScript, and Elm are some of the prominent examples among many others. The controversial question is: why would you use any of them instead of JavaScript?

After all, JavaScript has good tools and a huge, fast-moving ecosystem. As a language, JavaScript used to be quite awkward, but it has improved a lot in the recent years. Especially ECMAScript 2015 was a great improvement and made JS so much nicer to write.

My answer is that (some of) the alternatives are easier to think about. You can get features like immutable data structures as the standard, static typing, and algebraic data types. All of these help to make the intent of your code clear. They bring crispness to your code.

You can get immutable data structures via libraries for JavaScript (e.g. mori and Immutable.js) and there are static type checkers as well (e.g. Flow). The problem is that they are not widely used. When you want to use third-party libraries, you need convert back to mutable data and come up with type annotations.

The alternative languages can also give you simpler semantics. JavaScript has not been able to escape the complicated this or the uneasy use of objects as maps. Programmers structure their code to avoid the traps, but it’s hard to avoid them entirely. Elm is an impressive example of how you could do something different here.

This is why I advocate alternatives to JavaScript. Coincidentally this is also why I advocate functional programming techniques. Code that is easy to think about has less bugs and is easier to change.

What is descriptive set theory?

Tue, 24 May 2016 00:00:00 +0000

Today I took the final exam in a course on descriptive set theory. Let me you tell about it.

Sets are a basic structure in mathematics. They’re are very generic, so they come up everywhere - everybody needs a collection of things, after all. Because they’re so generic, you can’t say much about them. To do something useful with a set, you need to know more about it. You have to add some structure.

My mental image of this is that a set is like a blob of jelly. If you try to pick it up and move it somewhere, you’ll just mess up everything. Put it on a plate, though, and it’s easy to move around.

One of the ways to add structure is topology. In topology, you describe the open sets of your space. They behave very nicely, but they have too much structure. Not that many interesting sets are open.

Descriptive set theory finds a middle-ground by defining Borel sets. The collection of Borel sets of a topological space is the smallest collection of sets that includes the open sets and is closed under complement and countable unions and intersections. They’re the sets you can make out of open sets by these basic operations. To make even more sets, you can define the projective hierarchy where you start with Borel sets and iteratively project them.

After taking the course, I don’t know where you would want to use Borel sets or projective sets, but if you do, they will behave quite well.

How I solved the Orbital Challenge

Tue, 17 May 2016 00:00:00 +0000

Reaktor, a local IT consulting company, recently posted a programming puzzle called Reaktor Orbital Challenge. You’re given a starting point and an end point on the globe and the locations for a bunch of communication satellites. The task is to find a route from the starting point to the end point via the satellites. Preferably the route should be the shortest possible one. The satellites and the ground stations can only communicate if they have a line-of-sight.

An Oculus Rift was drawn between the participants. This generated some buzz among the Finnish programming community. I especially enjoyed the low-effort solutions like copy-pasting the coordinates to Google Maps and eyeballing the route.

I have a bit of a love/hate relation with this kind of recruitment puzzles: I wouldn’t want to spend time on them, but they nerd-snipe me so easily. Therefore I decided to aim for a low-effort solution that still finds the shortest path. Let me tell you about it.

I wrote my solution in Clojure and used the graph library Loom, the linear algebra library core.matrix and the utility library potpuri. You can find the full source here.

The idea

Finding the shortest route is an instance of the well-known problem of finding the shortest path between two nodes in a weighted graph. Satellites are the nodes and they’re connected if they can see each other. The weight of an edge is the distance between satellites. The ground stations need to be included as well.

There are so few satellites that you do not need to do anything clever - using a standard algorithm would be fine performance-wise. I looked up graph libraries for Clojure and decided to go with Loom, which has an implementation of Dijkstra’s algorithm.

The locations are given in longitude/latitude/altitude coordinates. I realized that you could do the calculations in this coordinate system, but I didn’t know how to do it and I didn’t know if it would be easy (turns out it is). Converting to the Euclidean coordinates is easy and I knew how to work with them, so that’s what I did.

How do you build the graph, then? Calculating the weight is easy, we can just use the Euclidean distance of the points. All that is left is to check that the points can see each other. As we will see below, it’s not very hard either.

Coordinate transform

Initially I did the coordinate transform the same way as I’ve done it since I was 15 years old: by looking up rotation matrices and multiplying them. Later I realized I should’ve looked up this thing instead. The problem statement says that Earth is a perfect sphere with the radius of 6371 km. Thus in Wikipedia’s formulas, $N(\phi) = 6371\ \textrm{km}$ and $e = 0$. In Clojure:

(def earth-radius 6371)

(defn geo->xyz [[lat lon alt]]
  "Convert latitude/longitude/altitude to Euclidean coordinates."
  (let [h (+ earth-radius alt)
        lat (to-radians lat)
        lon (to-radians lon)]
    [(* h (Math/cos lat) (Math/cos lon))
     (* h (Math/cos lat) (Math/sin lon))
     (* h (Math/sin lat))]))

I chose to represent x-y-z vectors as three-element Clojure vectors. This is both concise and understood by core.matrix.

When doing this conversion, you have to fix the origin and the directions for the Euclidean axes. I did it the “obvious” way:

We use right-handed coordinate system.
The origin is in the center of the Earth.
The positive Z axis goes through the North Pole.
The positive X axis goes through the prime meridian.

When looking the formulas up, I learned that this way has a name. It’s called Earth-Centered, Earth-Fixed coordinate system.

Line of sight

Next, we have two satellites and the Earth. How do we know if the satellites can see each other and that the Earth is not between them?

One way is to write the equations for the sphere of Earth and the line segment between the satellites and to solve for the intersection points. Another way is to calculate the distance between the line segment and the center of Earth. If the distance is less than Earth’s radius, there’s an intersection.

I always think about the distance between a line and a point as projecting the point to the line. It was quick to write the required calculations and I even got them right on the first try! The vector operations are kindly provided by core.matrix.

(defn point-line-segment-dist
  "Compute the shortest distance between the line segment a-b and point c."
  [a b c]
  (let [k (- b a)
        l (- c a)
        ;; scalar projection of l onto k.
        t (/ (dot l k) (norm k))]
    (distance (+ a (* (clamp t 0 1) k)) c)))

(defn line-of-sight? [a b]
  (<= earth-radius (point-line-segment-dist a b [0 0 0])))

Building the graph

The input file looks something like this:

#SEED: 0.24904920277185738
SAT0,78.47003444920836,-80.16227317274806,556.3585069486544
SAT1,18.2619020748309,89.92023247596006,367.93737770788107
# ...lines omitted...
SAT19,43.337390738091216,117.63969735544856,541.2530580475883
ROUTE,-44.35263069870813,162.23137604080517,62.2777848501151,-90.67241530334475

I’m not going to bore you with the parsing code. I parsed the input into a map like this:

{"SAT0" [78.47003444920836 -80.16227317274806 556.3585069486544]
 ;; ...lines omitted...
 :start [-44.35263069870813 162.23137604080517 0]
 :end [62.2777848501151 -90.67241530334475 0]}

Now all that is needed is a sequence of edges and weights for Loom:

(defn build-graph [data]
  (let [xyz-data (map-vals geo->xyz data)]
    (apply weighted-graph
           (for [[key1 pos1] xyz-data
                 [key2 pos2] xyz-data
                 :when (and (not= key1 key2)
                            (not= #{:start :end} (set [key1 key2]))
                            (line-of-sight? pos1 pos2))]
             [key1 key2 (distance pos1 pos2)]))))

Our work here is done.

(defn get-route [graph]
  (butlast (rest (dijkstra-path graph :start :end))))

For the full solution, see my gist. It took me 80 lines of Clojure and about an hour, so the effort was sufficiently low.

Mamiya RB67 Pro-S

Tue, 10 May 2016 00:00:00 +0000

I thought I wouldn’t write about cameras on this blog, but I’ve been shooting with such a cool camera that I want to tell you about it. It’s a Mamiya RB67 Pro-S with a 127 mm lens, lent to me by a friend.

RB67 is a medium format SLR camera from the 1970s. Medium format means it takes 6x7 cm photos - this is in contrast to the smaller 35 mm format and to the larger 4x5 inch format. It’s actually a camera system - in addition to the lens, the film back and the finder are interchangeable. My friend has only got the one lens and the waist-level finder, though.

Taking a picture with the camera is quite involved operation compared to the modern cameras. You have to take out the dark slide that protects the film back from light. You have to turn a lever to cock the shutter and after taking the picture, you get to turn another lever to wind the film. You can rotate the film back, so you can take both horizontal and vertical pictures without rotating the whole camera.

There’s no light meter, either. I’ve got a vintage light meter, but I’m not sure how reliable it is, so I’ve been relying on my other cameras’ light meters. It has worked fine so far. The film backs I’ve got take 120 roll film. Luckily it’s still widely available, although it’s not as common as 35 mm film. The photography shops will develop it as well.

I like the waist-level finder a lot. Waist-level means that you look at it from above. I don’t know why, but everybody looks so great through the finder that I just have to take a picture of them. Maybe it’s because I’m tall, so my waist level is more suitable for photographing people than my eye level.

The camera is so bulky that I assume it was meant to be used with a tripod. Probably it was more at home in a studio than on the street. Lugging around a tripod is not my thing, so I got a neck strap and have been shooting outdoors. (OP/Tech Super Prop Strap with the B type connector fits the camera.)

So far a shop has developed the films for me and I’ve got only proofs made. Eventually I’d like to get a big print of some photo I’ve taken with this camera. Unfortunately this means you get to enjoy scans of the proofs. Oh well.

I’ve had some problems, though. In the last two rolls, some of the pictures had over-exposed bands like this. The band is always in the same part of the frame, but it’s not always fully burnt-out. Maybe it’s a hardware problem, or maybe I’ve made some mistake. If you have any ideas, let me know.

Anyway, medium format is cool.

What's the point of dependent types?

Tue, 03 May 2016 00:00:00 +0000

There are many languages for telling computers what to do. In some languages, you have to tell the computer what kind of things it can use to do something. Often this is good, like when stopping the computer from confusing numbers with words.

In most languages you can say simple things like “this is a number” or “this is a group of words”. The computer will check that what you say is true before it does anything.

In some languages, you can say much more. You can say things like “this is a number and it is bigger than zero” or “if this thing a group of words, then this other thing must be a group of words with an extra word”. Again the computer will check this. This means you can better prove that computer will do what you think it should do.

This is good, because this way you can be sure that computer does what you want. The problem is that so far it is hard work to make the computer understand what you want. Only people who have trained a lot can do it.

This was an attempt to briefly explain the point of dependently typed programming languages using only the 1000 most common English words (according to Cleartext).

We're in early days of software engineering

Mon, 25 Apr 2016 00:00:00 +0000

Whenever there’s a discussion about whether software development is engineering or not, somebody brings up the fact that we software developers mostly do not know what were doing, unlike bridge builders. How do we know that bridge builders know what they’re doing? It’s because bridges mostly do not collapse, unlike software projects.

Civil engineering is not the only kind of engineering, though. For example, there’s nuclear engineering.

Wikipedia has some interesting articles about accidents in early days nuclear engineering. See for example the demon core incidents, where two scientists killed themselves by performing measurements on an almost-critical sphere of plutonium. Or see the Windscale fire, where it took almost 48 hours for the operators to confirm that their nuclear reactor was on fire.

By today’s standards these the incidents seem almost absurd, yet as far as I can tell, the Los Alamos scientists as well the Windscale designers and operators were well-educated experts.

Sometimes I feel that software engineering is like early nuclear engineering. At least we only leak data, not radioactive material.

Finding that lemma: Coq search commands

Tue, 19 Apr 2016 00:00:00 +0000

No hay in this needlestack.

One of the hurdles in using Coq is finding the suitable lemmas from the standard library. There are lots of them and while the naming is consistent, it’s hard to remember all of them. Luckily Coq has search commands to help you out.

Note: The following commands work only on modules you have required. If a lemma exists, but you haven’t required its module, you’re out of luck. Also, before Coq 8.5 Search was called SearchAbout and SearchHead was called Search.

The simplest way to search is to search by name. This is one of the things Search command does:

Coq < Search "len".
length: forall A : Type, list A -> nat

You can also search for theorems (or other objects) whose statement contains a given identifier.

Coq < Search False.
False_rect: forall P : Type, False -> P
False_ind: forall P : Prop, False -> P
(* ... *)

Coq < Search 0.
nat_rect:
  forall P : nat -> Type,
  P 0 -> (forall n : nat, P n -> P (S n)) -> forall n : nat, P n
nat_ind:
  forall P : nat -> Prop,
  P 0 -> (forall n : nat, P n -> P (S n)) -> forall n : nat, P n
(* ... *)

Another thing you can do is to search for patterns with holes _:

Coq < Search (S _ <= _).
le_S_n: forall n m : nat, S n <= S m -> n <= m
le_n_S: forall n m : nat, n <= m -> S n <= S m

When searching for a pattern, Search matches anywhere in the statement. If you only want to search for the pattern in the conclusion, use SearchPattern:

Coq < SearchPattern (S _ <= _).
le_n_S: forall n m : nat, n <= m -> S n <= S m

If you’re looking for a rewrite, there’s SearchRewrite. It finds conclusions of type _ = _ where one of the sides matches the given pattern.

Coq < SearchRewrite (_ + 0).
plus_n_O: forall n : nat, n = n + 0

As always, see the manual for details. Coq’s manual looks intimidating, but it does contain a lot of good information.

How to divide by zero?

Tue, 12 Apr 2016 00:00:00 +0000

What’s the benefit of dependent types, anyway? Pyry pointed this out to me: they allow you to make your functions total by moving the preconditions to the caller side.

You often end up with partial functions because of some preconditions. For example, you might write an integer division function, but division by zero isn’t defined. How do you handle this? In Haskell, you get a runtime exception.

λ> 1 `div` 0
*** Exception: divide by zero
λ> 1 `rem` 0
*** Exception: divide by zero

Elm tries to avoid runtime exceptions and it makes division total by extending the usual definition of division:

> 1 // 0
0 : Int
> 1 `rem` 0
NaN : Int

The unorthodox result for 1 `rem` 0 is likely a bug. This solution quietly breaks the invariant x == (x // y) * y + x `rem` y, but it’s not a big deal. Coq does the same thing. Another solution would be to make the division function return Maybe. In Haskell:

safeDiv :: Integral a => a -> a -> Maybe a
safeDiv a 0 = Nothing
safeDiv a b = Just (a `div` b)

This means that you have to lift all your division-using computations into Maybe. A language like Coq offers you yet another possibility: you can demand that the caller proves that the divisor is non-zero:

Require Import Arith.
Require Import Nat.

Definition safeDiv x (y : { n : nat | 0 < n }) : nat :=
  match y with
    | exist _ O pf => match lt_irrefl _ pf with end
    | exist _ (S y') _ => div x (S y')
  end.

This is a total function. If you want to call it, you have to do it along with a proof that y is non-zero. For example, divide 3 by 2:

Example div_3_2 : nat := safeDiv 3 (exist _ 2 Nat.lt_0_2).
Eval compute in div_3_2. (* = 1 : nat *)

We could try dividing 3 by 0. Let’s do it in type-driven style with the refine tactic. It allows us to leave holes (_) in a term and fill them using Coq’s goal mechanism:

Example div_3_0 : nat.
  refine (safeDiv 3 (exist _ 0 _)).

Here’s the goal we get:

1 subgoal, subgoal 1 (ID 9)
  
  ============================
   0 < 0

Good luck with that.

Take better photos by looking carefully

Tue, 05 Apr 2016 00:00:00 +0000

Internet is full of advice about the technical aspects of photography, but it’s much harder to find good advice on the artistic aspects. Therefore I’ve come up with some advice of my own.

My friends tell me that a fundamental part of learning to paint or draw is to learn to see the scene properly. You may think you know what a scene looks like, but when you try to draw it, you quickly notice how poorly you have observed it. This is one of the attractions of drawing a live model: it’s very easy to notice that your drawing does not match what you’re seeing.

I’ve been mindful about this while photographing and it has helped me. For example, my film photos are better than my digital ones. The main reason is that I’m so slow at operating my film cameras that I end up looking at the scene more carefully while shooting, leading to better pictures.

Spend more time looking at what you’re shooting. And do not just look at the scene - see it. What do you see? How is the light? What makes the scene interesting? What are you trying to capture?

Spend more time looking at what you’ve shot. Does it match what you saw in the scene? What makes the picture interesting?

defaultdicts all the way down

Tue, 29 Mar 2016 00:00:00 +0000

You may know the Haskell function fix:

fix :: (a -> a) -> a
fix f = let x = f x in x

This function applies its argument to itself. It’s called fix because it finds a (least-defined) fixed point of the function: f (fix f) == fix f. Here are some examples:

> take 10 $ fix (1:)  -- fix (1:) == [1,1..]
[1,1,1,1,1,1,1,1,1,1]
> fix (\f n -> if n == 0 then 1 else n * f (pred n)) 5  -- factorial
120

It’s not a function you need very often, but the other day I needed it in Python! I wanted to have a defaultdict that defaults to defaultdict that defaults to defaultdict that defaults… all the way down. This is a fixed point of defaultdict. Here we go:

def fix(f):
    return lambda *args, **kwargs: f(fix(f), *args, **kwargs)

We have to wrap f inside a lambda so that it’s not evaluated when fix is called. Let’s try it out:

>>> from collections import defaultdict
>>> d = fix(defaultdict)()
>>> d["a"]["b"]["c"]
defaultdict(<function <lambda> at 0x105c4bed8>, {})

You can bet I was feeling clever when I wrote this.

Runtime exceptions in Elm

Tue, 22 Mar 2016 00:00:00 +0000

Today was the first Elmsinki meetup, where we gathered to discuss Elm the programming language. Ossi gave us a quick introduction to Elm. One of the points in his Elm elevator pitch was that there are no runtime exceptions. I asked what this means, but ultimately misunderstood the explanation. After thinking it through, here’s my current understanding:

Elm does not have an exception system. There’s no mechanism for throwing and catching exceptions. You might be able to build one, though.

You can have runtime errors in Elm. For example, there’s Debug.crash : String -> a, which is equivalent to Haskell’s error :: String -> a. They both abort the computation - there’s no way to handle the error. You can use this to define partial functions:

unsafeHead x =
  case x of
    (y :: _) -> y
    _ -> Debug.crash "oh no :("

There are also some other ways to get a runtime error, like running out of stack:

> g x = 0 + g x
<function> : a -> number
> g 0
RangeError: Maximum call stack size exceeded

You won’t have pattern matching errors in Elm. You have to always handle all the cases. We might try to define unsafeHead like this:

unsafeHead (x :: _) = x

The Elm compiler does not accept this and prints an error message:

This pattern does not cover all possible inputs.

6│ unsafeHead (x :: _) = x
               ^^^^^^
You need to account for the following values:

    []

When I heard “no runtime exceptions”, I first thought of total languages. Elm clearly isn’t one. I guess “no runtime exceptions” means “runtime exceptions/errors are rare”. Fair enough.

Multitenant Flask-SQLAlchemy

Tue, 15 Mar 2016 00:00:00 +0000

Just one flask. The multiple databases are not pictured.

So you’re writing a web backend with Flask and Flask-SQLAlchemy. Now you want to make the same backend connect to different databases based on the request parameters. What do you do?

Flask-SQLAlchemy supports multiple databases through the bind mechanism. The binds allow you to specify in which database each table lives. What I want to do is to choose the database for all the tables in one go but on per-request basis.

I couldn’t find a definitive solution from the Internet, so I’ll share what I came up with. Here is a small extension of Flask-SQLAlchemy that allows you to (ab)use binds for this:

from flask import g
from flask_sqlalchemy import SQLAlchemy


class MultiTenantSQLAlchemy(SQLAlchemy):
    def choose_tenant(self, bind_key):
        if hasattr(g, 'tenant'):
            raise RuntimeError('Switching tenant in the middle of the request.')
        g.tenant = bind_key

    def get_engine(self, app=None, bind=None):
        if bind is None:
            if not hasattr(g, 'tenant'):
                raise RuntimeError('No tenant chosen.')
            bind = g.tenant
        return super().get_engine(app=app, bind=bind)

We essentially have a per-request default bind for all the tables without a bind key. Now, before you do any database queries, do db.choose_tenant(name). This tells SQLAlchemy which bind to use. For example, you could implement the tenant choosing logic in the @app.before_request hook:

app = Flask(__name__)
app.config['SQLALCHEMY_BINDS'] = {
    'test1': 'sqlite:///test1.db',
    'test2': 'sqlite:///test2.db'
}
db = MultiTenantSQLAlchemy(app)


@app.before_request
def before_request():
    # Just use the query parameter "tenant"
    db.choose_tenant(request.args['tenant'])

Now http://localhost:5000/?tenant=test1 goes to test1.db and http://localhost:5000/?tenant=test2 goes to test2.db.

It was surprisingly simple to make this work. Making Alembic work with this is left as an exercise for the reader.

The full source for the demo is available.

What is forcing, anyway?

Tue, 08 Mar 2016 00:00:00 +0000

Water forces its way through the forest in Nuuksio.

Today was the final exam in a course on set-theoretical forcing. It was one of the hardest courses I’ve attended, but at least the exam was easy. But what the heck is forcing anyway?

It’s a technique for independence proofs. It was originally developed by Paul Cohen for proving the independence of Continuum Hypothesis (CH) from Zermelo-Fraenkel set theory with the Axiom of Choice (ZFC).

A theory is consistent if it does not allow contradictions. For example, ZFC is thought to be consistent (although you can’t prove it in ZFC), so you can’t derive a contradiction from the axioms of ZFC.

An axiom is independent of a theory if you can’t prove or disprove it from the theory. You can prove the independence by showing that the theory is consistent with the axiom and with the negation of the axiom. Assuming the consistency of ZFC, you can prove that ZFC together with CH is consistent. Using forcing, you can also prove that ZFC together with the negation of CH is consistent. Thus CH is independent of ZFC.

How does this work in practice? We assume the existence of countable transitive model of ZFC, $V$. Then we come up with a partially-ordered set (forcing poset) that is used to construct a generic extension of the model, $V[G]$. This model is constructed so that it witnesses whatever we want to prove. Its existence proves the claim.¹

To prove that ZFC is consistent with the negation of continuum hypothesis, i.e. $2^\omega > \omega_1$, we would take a cardinal $\kappa$ that is larger than $\omega_1$ in $V$. We then construct $V[G]$ so that there are at least $\kappa$ subsets of $\omega$. Since $V$ and $V[G]$ have the same cardinals, $2^\omega > \omega$.

The tricky part is finding a suitable forcing poset. One of the ways to make it easier is to use iterated forcing, where you repeat the forcing transfinite number of times. I’d tell you how it works, but unfortunately I don’t understand it.

I definitely do not understand this part, but I trust the authorities. ↩︎

Elementary algebra in Coq: Trivial group

Tue, 01 Mar 2016 00:00:00 +0000

In this series, I’m taking small steps towards understanding Coq.

Recall our definition of groups in Coq from the last time:

Structure group :=
  {
    G :> Set;

    id : G;
    op : G -> G -> G;
    inv : G -> G;

    op_assoc : forall (x y z : G), op x (op y z) = op (op x y) z;
    op_inv_l : forall (x : G), id = op (inv x) x;
    op_id_l : forall (x : G), x = op id x
  }.

This is a record. You can construct a value of type group by providing a value for all the fields. By default, the constructor is called Build_group:

Build_group
     : forall (G : Set) (id : G) (op : G -> G -> G) (inv : G -> G),
(forall x y z : G, op x (op y z) = op (op x y) z) ->
       (forall x : G, op (inv x) x = id) ->
       (forall x : G, op id x = x) -> group

This is also the reason why we didn’t include the right-hand side versions of op_inv_l and op_id_l into the definition of group. If we did, you’d have to provide proofs of the right-hand side laws when constructing a group value.

We can construct the trivial group over unit:

Example trivial_group : group.
  refine (Build_group unit tt (fun _ _ => tt) (fun _ => tt) _ _ _).
  - intros. auto.
  - intros. auto.
  - intros. destruct x. trivial.
Defined.

More interestingly, we can define the additive group of integers Z. Since Coq’s standard library contains a good set of properties for Z, defining the group is straightforward.

Require Import Coq.ZArith.BinInt.
Open Scope Z.

Example Z_add_group : group.
  refine (Build_group Z (0 : Z) Z.add Z.opp Z.add_assoc Z.add_opp_diag_l _).
  - trivial.
Defined.

Please give demanding tech talks

Tue, 23 Feb 2016 00:00:00 +0000

I recently attended a tech conference. It was a well-run event. There were talks about a good selection of topics and all the speakers were good, even great at presenting. The food was good and I enjoyed the after-party. Yet there was a problem: I was bored. None of the talks went over my head.

This problem isn’t unique to this event. When I watch conference videos, the situation is the same. Introductory talks are everywhere, deep talks are few and far between.

I wish this wasn’t the case. When I attend a tech talk, I want to struggle to understand it¹. If you do not need to pay any effort to understand an idea, are you even learning anything?

Tech speakers, please give demanding, ambitious talks! Tech event organizers, please invite such talks to your event!

Every time I talk about this with friends, they shrug and say “I thought it was okay”. Maybe it’s just me.

Preferably because of the depth and breadth of the ideas, not because of poor presentation. ↩︎

Elementary algebra in Coq: Defining a group

Tue, 16 Feb 2016 00:00:00 +0000

When I was first learning about theorem provers, one of the first things I wanted to do was to formalize some of the things I had learned about abstract algebra. Abstract algebra should be easy to formalize, since it’s so axiomatic.

How would you formalize groups, then? One of the ways is to use structures (records):

Structure group :=
  {
    G :> Set;

    id : G;
    op : G -> G -> G;
    inv : G -> G;

    op_assoc : forall (x y z : G), op x (op y z) = op (op x y) z;
    op_inv_l : forall (x : G), id = op (inv x) x;
    op_id_l : forall (x : G), x = op id x
  }.

This record contains the underlying set G, the group operations and also witnesses of the group axioms. Now you can state theorems for all groups with forall (X : group) and you have access to the axioms. :> means a coercion from the group to the underlying set, so if you have group X, you can write g : X instead of g : G X.

You can make some of the arguments implicit and define a notation to make the theorems easier to state.¹

Arguments id {g}.
Arguments op {g} _ _.
Arguments inv {g} _.

Notation "x <.> y" := (op x y) (at level 50, left associativity).

Now we can state and prove a simple theorem, namely that in all groups, $f \circ f = f$ implies $f$ is the identity element.

Theorem square_is_unique (G : group) :
  forall (f : G), f <.> f = f -> f = id.
Proof.
  intros f H1.
  rewrite <- (op_id_l G f), <- (op_inv_l G f), <- op_assoc. 
  f_equal.
  assumption.
Qed.

Note that I defined group with only left-hand side version of op_inv_l and op_id_l. Deriving the right-hand versions is left as an exercise for the reader.

Update 2016-02-27: Simplified the proof for square_is_unique. The original version is here.

You could also set up some hints for auto and autorewrite to make the theorems easier to prove. My CoqIDE just crashed and ate my hints, so they will have to wait for the next time. ↩︎

What's in a proof?

Tue, 09 Feb 2016 00:00:00 +0000

Let’s work through this very simple theorem in Coq:

Theorem plus_n_O :
  forall (n : nat), n = n + 0.
Proof.
  intros n.
  induction n; simpl.
  - reflexivity.
  - rewrite <- IHn. reflexivity.
Qed.

The theorem, called plus_n_O, states that n + 0 equals to n for all natural numbers n (represented by the inductive datatype nat).

The first two lines are the theorem statement and below them is the proof script. If you look at the script, you notice that it’s similar to what you’d do in a pen-and-paper proof: use induction on n, evaluate + in both cases, use induction hypothesis in the inductive step.

Another way of viewing this code snippet is that we define a function plus_n_O that, given natural number n, returns a value of type n + 0 = n. Both of these interpretations are valid - this idea is known as propositions as types.

The proof script does not look much like a function, but it does generate one. With the command Print plus_n_O, we can look at the proof object it generates:

plus_n_O = 
fun n : nat =>
nat_ind
  (fun n0 : nat => n0 = n0 + 0)
  eq_refl
  (fun (n0 : nat) (IHn : n0 = n0 + 0) =>
   eq_ind n0 (fun n1 : nat => S n0 = S n1) eq_refl (n0 + 0) IHn)
  n
     : forall n : nat, n = n + 0

What’s going on here? nat_ind is the induction principle for nat, eq_refl is the constructor for the equality type ? = ? and eq_ind is the induction principle for the equality type.

Coq < Check nat_ind.
nat_ind
     : forall P : nat -> Prop,
       P 0 -> (forall n : nat, P n -> P (S n)) -> forall n : nat, P n

Coq < Check eq_ind.
eq_ind
     : forall (A : Type) (x : A) (P : A -> Prop),
       P x -> forall y : A, x = y -> P y

Let’s go through the arguments of nat_ind in plus_n_O:

(fun n0 : nat => n0 = n0 + 0): this is the proposition we want to prove inductively.
eq_refl: this is the base case. 0 + 0 is convertible to 0, so nothing else is needed.
(fun (n0 : nat) (IHn : n0 = n0 + 0) => eq_ind n0 (fun n1 : nat => S n0 = S n1) eq_refl (n0 + 0) IHn): this is the inductive case. We’ve proven the proposition for n0 and want to prove it for S n0. IHn is the induction hypothesis.
n: the final argument tells that we apply the inductive proof to n that plus_n_O is quantified over.

In the inductive case, eq_ind is used to rewrite S n0 = S n0 into S n0 = S (n0 + 0), which is convertible into S n0 = S n0 + 0, which is what we want.

I’m not sure if this write-up helps anyone else, but it was helpful for me to work through this example to better understand the relationship between a proof script and a proof term. I recommend the exercise.

Getting started with Coq

Tue, 02 Feb 2016 00:00:00 +0000

Here are some resources for programmers who want to get started with Coq:

Coq is best used as an interactive theorem prover. When you’re learning, you’ll want to be able to jump back and forth between the steps in your proof scripts, comment them out etc. To be able to do that, I recommend that you use CoqIDE instead of your usual editor and the REPL (coqtop). CoqIDE is not the best editor out there, but it supports proof navigation and it’s easy to set up. In case your usual editor is Emacs, you can use Proof General as well.
The best tutorial out there is the book Software Foundations by Benjamin C. Pierce et al. It’s the best because it has such a good set of exercises. To start learning Coq, start working through those exercises.
I’ve found Adam Chlipala’s Certified Programming with Dependent Types useful as well. CPDT places more emphasis on using powerful proof automation than SF. My gut feeling is that SF teaches you what’s going on and CPDT teaches you how to do things in practice.
Another interesting text for beginners is Ilya Sergey’s Programs and Proofs. Instead of plain Coq, it uses Ssreflect, an extension of Coq that was developed for implementing large mathematical proofs. I’ve used Ssreflect only a bit but it looks powerful.
Finally, you’ll want to keep the Coq reference manual, esp. the tactics chapter, and the standard library docs at hand.

I’ve also read parts of the Coq’Art book, but I didn’t get much out of it, so I wouldn’t recommend that.

Helsinki Haskell User Group

Tue, 26 Jan 2016 00:00:00 +0000

Oleg explaining Servant at Wunderdog in the January 2016 meetup.

I want to remind all the programmers in Helsinki about the existence of Helsinki Haskell User Group a.k.a. HaskHEL. The group has been meeting since 2014 and lately it has been very active, meeting once per month. Big thanks to the organizers, especially Oleg, and the companies that have hosted us for making this happen.

The current pattern is to have presentations every other month and a pub meetup every other. The presentations topics have included things like interesting libraries, finger trees, and the history of functional programming (presented by yours truly!). You do not have to be a Haskell expert to attend - usually at least one of the talks has been geared towards beginners.

The best place to hear about the next meeting is the meetup page, but there’s also a Twitter account @HaskHEL and an IRC channel #haskhel on Freenode. January meetup was today, so the next meetup will be in February. It will be a pub meetup.

I know that the organizers are always looking for new presentations and companies interested in hosting the meetup, so if you have either of those, please let them know. You can contact them through the meetup page (or contact me and I’ll put you in touch).

Personally I’d like to hear about any kind experiences of using Haskell in real-world projects. Note that the presentations do not have to be about Haskell, as long as they’re interesting for Haskell users - for example, I bet many Haskellers would like to hear about Elm.

So - if you’re in Helsinki and interested in functional programming, see you at HaskHEL? Oh, and the word on the street is that the next Clojure Finland meetup is in the works as well. See you there as well.

On Infinite Jest

Tue, 19 Jan 2016 00:00:00 +0000

In December, I finally finished reading Infite Jest, the magnum opus of David Foster Wallace. It was quite an effort: I started reading it in 2010 after hearing about Infinite Summer.¹

Why did it take so long? It’s a complex, demanding book. There are a dozen central characters. It’s full of long-winded footnotes and invented words. There’s no single plot - it’s more like a collection of intertwined plots. Turns out that you won’t finish that kind of book by reading it every now and then before you go to bed.

Was it worth it? It’s one of those books where the journey was more important than the destination. I enjoyed the rich language and the humor. At times the book has quite serious takes on addiction, its main theme. Still, as a whole, it feels unsatisfying - not much was resolved. As a book, it’s hardly my favorite, but reading it was one of my top book-reading experiences.

Would I recommend it? I was going to title this post “You should read Infinite Jest”, but I’m not sure about that. If you’re thinking about reading the book, try reading the first 50 pages or so. If you hate it, well, it’s not going to get any better. If you like it, it’s probably a worthwhile read. If you decide to go forward, here are some tips:

Use (at least) two bookmarks, one for the main text and one for the footnotes. Even better, read it as an e-book. E-book is easier to carry around as well.
At first, it may seem confusing, but it will start making more sense after 300 pages or so.
You definitely shouldn’t skip the footnotes. They are as important as the main text.
Consider making notes about the characters. There are so many of them that you will lose track of who is who.
Reserve time for reading.

I guess I could tell you what it is about, but really, does anyone ever tell you what Joyce’s Ulysses is about? No. They will tell you that it’s a complex book. ↩︎

The bare minimum to know about RELAX NG

Tue, 12 Jan 2016 00:00:00 +0000

I know what you’re thinking: who uses XML these days! But maybe you do, and maybe you want to validate your XML. One of your options is to use RELAX NG. I spent a moment today to learn about it and here’s what I know.

RELAX NG has two syntaxes, the XML (“regular”) syntax and the compact syntax. If you’re writing RELAX NG by hand, you likely want to use the compact syntax. If needed, you can use Trang to convert the compact syntax to XML. For the record, the compact syntax looks like this:

# HTML goes something like this, I think?
element html {
  attribute lang { text }?,
  element head {
    element title { text }
  },
  element body {
    element h1 { text }* &
    element p { text }*
  }
}

You could check your XML file with Jing:

jing -c your_schema.rnc your_xml_file.xml

RELAX NG Compact Syntax Tutorial is a good source for learning. I also wrote a RELAX NG cheatsheet.

Yearnote 2015

Sun, 27 Dec 2015 00:00:00 +0000

Here’s what I did in 2015. (Previously: 2014)

Working and studying

In spring, I did a bunch of courses at the university. Against my plans, I didn’t write my bachelor’s thesis and therefore I didn’t graduate as a BSc. That is disappointing, but at least I got some MSc-level courses done and finished a minor in cognitive science. The latter is important because I’m afraid University of Helsinki may stop teaching cognitive science soon.

In summer, I returned to full-time work at ZenRobotics. There’s not much to be said about that. In fall, I concluded that I want to move on, which led to me quitting the company in December.

What went well?

In spring, I started jogging three times a week and continued for more or less the whole year. This is probably the best thing I’ve done this year - exercising regularly makes me feel so much better about everything.

I made some process in my thinking about software work. This year I realized it’s so much more about people than it is about technology. I’m still exploring the implications of this.

What didn’t go well?

In my free time, I worked on a bunch of projects, but all of them fell through before I managed to publish anything. Either I ran out of energy to work on them, or I ended up with a problem that I couldn’t solve.

Clearly working harder is not going to cut it. I’m going to prioritize getting stuff out of the door, so my plan is to move the goalposts for my projects:

Set a very achieveable first goal, so I will get at least something out before I run out of steam.
If I encounter an unsolvable problem, I will work around it by changing the goal.

Stuff made by me

Continued practicing photography. See my Flickr stream and Instagram.
Experimented with film photography: I, II.
Gave a talk titled 20th century functional programming at Helsinki Haskell User Group.
Posted 39 times on quanttype. My most popular post was Python is not good enough.

Travel

Again celebrated volbripäev with Raimla in Tartu.
Spent a week in Copenhagen.

Best of

Here are some things that impressed me in 2015:

Best fiction book: Memory of Water by Emmi Itäranta.
Best non-fiction book: Between the World and Me by Ta-Nehisi Coates.
Best movie: The Man Without a Past, directed by Aki Kaurismäki.
Best album: Bullhorn by Verneri Pohjola.
Best coffee shop: Helsingin Kahvikomppania. Consistently the best cappuccino in town!

How did my plans for 2015 go?

I did have a two-week summer holiday, I saw a wild hedgehog, and I finished reading Infinite Jest. As said, I did not graduate as a BSc. Oh well.

What’s next?

I’m joining Metosin as a part-timer and continue my studies. It’s high time that I graduate as a bachelor of science. I hope to give at least one talk at a tech meetup. Otherwise my plans are open.

Math is hard, let's go blogging

Wed, 23 Dec 2015 00:00:00 +0000

I had a blogpost prepared about probabilities, but then I realized that I had messed up the mathematics. Because I have a Beeminder goal about blogging weekly, I have to post something, so here’s a video of 2000 ducks. They flow and that’s awesome.

Code review requires trust

Wed, 16 Dec 2015 00:00:00 +0000

Why do we review code? Here are some of the reasons:

To find bugs and fix problems in code before it’s deployed.
To get and give feedback on the system architecture.
To mentor and train developers.
To be aware of how some part of the system works.

Clearly a big part of code review is giving feedback on code. This often includes pointing out mistakes and areas of improvement. From experience I know that receiving this kind of feedback sometimes hurts. It can hurt even when given by a person with the best intentions.

It’s easy to end up defending yourself to avoid feeling hurt. However, if you refuse to admit your mistakes, you won’t learn anything. This is why there must exist a certain level of trust between the person giving the feedback and the person receiving it. The person receiving the feedback must feel safe to admit their mistakes.

This is especially important when your team includes juniors, who tend to feel more insecure about their skills. Then again, if the senior people in your team do not feel safe to admit mistakes, you’re in for some serious trouble, since their mistakes are likely to have the biggest impact.

See also my commandments for code review.

Color film update

Tue, 08 Dec 2015 00:00:00 +0000

I finally shot and got developed my first roll of color film. The colors are amazing! I don’t really understand what makes them look so much better than the colors in my (and many others’) digital pictures and what to do about it, but trying to emulate the look could be a worthwhile exercise.

In other news, I’ve enjoyed browsing the helsinki on film pool on Flickr. It’s more interesting than most Helsinki photography on Flickr.

Python is not good enough

Wed, 02 Dec 2015 00:00:00 +0000

Python by William Warby. CC BY 2.0.

For the last six months, I’ve been working on some networked Python components for a robotics application. There’s a web app, some data recording, etc. I’m increasingly feeling that Python is not good enough language and ecosystem for my needs. Here’s my beef:

Debugging and profiling live systems is hard. I’d like to profile a Python process with tens of threads. Better yet, it should be done in production, because the performance issues never show up in the testing environment. Or maybe I just want to get an idea of what a weirdly-behaving process is up to. With Clojure, I’d open VisualVM and poke around. With Python, well, there’s a bunch of profilers, but none of them seem to produce much useful information.

Deploying is awkward. I have a bunch of Python projects and I’d like to deploy them on a server. In Clojure world, I could build an uberjar, send that to a server and run it. In Python world, self-contained builds do not exist. I could build a wheel, or clone a git repo, but it won’t include the dependencies. Of course, with Docker you can make a self-contained build out of anything.

Standard library. Python is “batteries included”, but the batteries leak acid. I concur. Even Guido van Rossum admits it. There are sharp corners everywhere and the development speed is glacial. And do you know if the standard library modules you’re using are thread-safe?

Poor support for functional programming. Python’s lack of persistent data structures and proper syntactic support for anonymous functions mean that functional programming is cumbersome. I’ve fixed so many list and dict mutation bugs that simply wouldn’t have happened in Clojure, because the standard data types are immutable. Come on Python, even Java has full-power lambda expressions!

Poor concurrency support. Python’s concurrency tools are quite traditional (native threads, forking, locking) and the performance has not been great. Combined with the lack of persistent data structures, writing robust concurrent programs is hard when compared to e.g. Clojure or Haskell. “You shouldn’t write concurrent programs in Python” is not an acceptable answer. I hear the things might be better with Python 3 and asyncio, though.

Limited language extension possibilities. Higher-order functions are awkward and there are no macros, so it’s hard to have your own control structures. I realize that it’s part of the Zen of Python to not be able to extend the language, but the Zen of me begs to differ. You seldom need this power, but it’s great to have when you do.

Awkward tooling. This is a personal preference, but iPython, pip and virtualenv never seem to work quite as smoothly as Leiningen and Clojure REPLs, or npm and the browser REPLs.

I don’t think that Python is a bad language or a bad ecosystem, per se. It’s just that other languages like Clojure, JavaScript, Go, and Haskell have made great strides forward in the last few years while Python has been falling behind. For some niches like scientific computing and machine learning, Python still is great: there are good libraries and an active community. For my niche of writing network servers, it’s not where the action is anymore.

To end on a positive note, here are some things I like about Python:

Docstrings. More languages should have docstrings, or otherwise a standard convention for writing function and module documentation.

Keyword arguments with default values. You see awkward implementations of keyword arguments everywhere in JavaScript and Clojure code, because they’re great!

There are some great libraries. Lately I’ve enjoyed using Requests and Werkzeug. numpy has served me well whenever I have needed it. I’m impressed by Hypothesis, although I haven’t been able to use it much.

It’s good for small command-line utilities. Fast startup, no compilation, and argparse means it’s easy to whip up a quick CLI tool.

Printers now work

Wed, 25 Nov 2015 00:00:00 +0000

Our home printer, Epson XP-325. It works!

I don’t know if you have noticed, but printers nowadays work. In the last few years, I’ve set up and used a bunch of printers and it has been very much a plug-and-play experience. Just connect the printer to the wireless network and you’re ready to print. I use OS X, but I hear it’s the same for my Linux-using friends. I’ve also successfully printed from my phone. Heck, I even scan wirelessly from my laptop!

Ink is still atrociously expensive and runs out immediatel, and printers are getting less and less important as people move away paper. Still, it’s cool that we’ve finally made the printers work. They used to be such a pain.

For other things that have started working in this millenium, see Dan Luu’s article What’s Worked in Computer Science.

Spinning while sleeping

Wed, 18 Nov 2015 00:00:00 +0000

I have some long-running scripts that sleep while waiting for something to happen. It’d be nice to look at the terminal and know whether the process is working or sleeping. To that end, I replaced some of the sleep(1) invocations with spinner.py, which shows a simple spinner while sleeping.

Now, David R. MacIver pointed out that I of course should’ve enhanced sleep(1) with a LD_PRELOAD hack. LD_PRELOAD is an environmental variable that tells the Linux dynamic loader to look at given shared objects before loading anything else. This allows, among other things, overwriting calls to the C standard library.

I’ve never used LD_PRELOAD before, but it turned out to be simple! Create a shared library with a function with the name and signature of the function you want to overwrite, point LD_PRELOAD to it and you’re done.

A quick search for gnu sleep source tells me that sleep(1) uses nanosleep(2), so that’s the function I rewrote:

Making this work on OS X is left as an exercise for the reader.

Joylent mini-review

Wed, 11 Nov 2015 00:00:00 +0000

A year ago I reviewed Ambronite. Now I’m back with Joylent. Like Ambronite and Soylent, it’s a nutritional powder. A friend gave me a bag, so let me offer you a mini-review.

I got the chocolate-flavored Joylent, but it’s also available in strawberry, vanilla and banana flavor. Friends in the know tell me that chocolate is the best one, though. What it actually tastes like is artifically-sweet bland choco milk.

Like Ambronite, it has the astringent taste of banana peels (i.e. it contains tannins). Ambronite’s astringency might come from nutritional yeast, but about Joylent I don’t know. I asked around, but almost nobody complained about astringency in Joylent. I’m starting to suspect that I have some genetic predisposition for tasting bitterness!

How about the mouthfeel, then? Joylent is powdery and mixes very well with water. Unlike Ambronite, it does not have almost any texture. It’s like drinking thick milk. This gives it a less wholesome feel. Nevertheless I felt satiated after drinking a portion.

To be honest, I liked Ambronite more. I’d go with the exquisite (and bitter) sawdust instead of the bland (and bitter) choco milk.

Dabbling in film photography

Wed, 04 Nov 2015 00:00:00 +0000

A broken armchair in Kallio. One of my first film shots. (Flickr)

I became interested in film photography when a friend showed me how take photos with a beer can and photo paper.¹ I figured out it’d cool to take some photos with a real camera, too. Soon I managed to borrow a film SLR and I’ve now shot two rolls of B&W film. Here are my initial thoughts.

Gear talk

The camera I borrowed a Minolta 7000 AF with a AF 28-135 mm 1:4-4.5 lens. It’s a 35 mm SLR camera that came out in 1985. Apparently it’s the world’s first SLR camera with autofocus.

Ken Rockwell describes it as a straightforward camera that is simple and enjoyable to use. I concur. It also has suprisingly good ergonomics for my big hands. It has buttons for selecting aperture and shutter time. I’d slightly prefer dials, but at least the buttons are easy to reach. The focus ring is unconventionally in the rear of the lens, but it’s also easy to reach there.

Even though the camera has autofocus, I’ve ended up focusing manually. The AF is slow and noisy and not that good. Some of my pictures have been mis-focused, but I doubt AF would’ve done any better job.

Film talk

The colors of autumn are so beautiful I shot them in black-and-white. (Flickr)

The film I first tried was Agfaphoto APX 400. I didn’t know what I want, so I went to the local photography store and asked for a black-and-white ISO400 film. APX 400 is what the cheapest option available.

Since it’s the only film I’ve used so far, it’s hard to make comparisons. I’m happy how my urban photos have turned out, but humans in my photos look weird. Right now I’m trying out color film, but after that I’ll experiment with some other B&W films.

So far I’ve let a local photography shop develop and print my film. I’d like to try developing myself to get the full film experience. I was planning to go to a darkroom course organized by the university photography club, but I messed up some dates and didn’t even manage to sign up befor the courses were over.

So what have I learned?

I’m starting to like black-and-white nature photography… (Flickr)

Prints looks stunning. Really. I had totally forgotten how good printed photos can look. The depth and the contrast are great. These aren’t any special prints, either, just the ordinary 10x15 prints from the local shop from my so-so first roll of film. I believe it’s the magic of the printer and not the magic of the film - I have to get some digital photos printed.

Photography has become so much easier. With digital cameras, you get immediate feedback. You have to develop the film before you even know if you had suitable exposure settings. Film also costs money. Digital cameras give you free infinite retries, and you do not need to know in advance if you’re going to shoot B&W or color, and what ISO you need.

Another thing is that it’s so easy to edit digital photos. My first prints taught me that I can’t shoot straight. I’ve always relied on straightening the photos in post-processing.

The cumbersomeness of film photography does make you think more about the photos you’re shooting. It’s a good exercise, but I’m glad I have my digital camera.

My scanner sucks. What do you do with photos if you don’t post them on social media? They must be scanned, then. Turns out my scanner isn’t really up to the task. Oh well, I guess you can’t expect too much from a 50 € printer/scanner combo.

What’s next?

I’m going to shoot a couple of more rolls of film, both color and B&W. I want to get some pictures of snow once the winter hits Helsinki. Maybe I also find a way to develop the films myself.

See e.g. these instructions. ↩︎

Webpacking a project

Wed, 28 Oct 2015 00:00:00 +0000

This week I used Webpack at work and it was great. Let me tell you about it.

I took over an experimental JavaScript project with the aim of making it a proper part of our software engineering process. This means cleaning up the code, integrating it with our build system, ensuring that it has essential documentation (“how to run it”), etc.

The code didn’t look too bad. It’s a browser-based tool with a single HTML file and a couple of thousands lines of JavaScript split across a dozen files. There was no build step - all the JavaScript files were included in HTML with <script> tags, and the dependencies were checked into the repo. There were some bad signs, too: there were no tests and the code was weirdly formatted. Still, it was clean enough. And at least there was no sign of Angular!

As the first step, I ran all the code through js-beautify. Did you know that Spacemacs has a shortcut for running js-beautify? It’s SPC m =. This took care of all the weird indentation.

The code was split into one object/“class” per file. It was almost modular style, except that of course all the files share the same scope. There was a main module that uses all the other modules, but also defines some global variables that the other modules use.

I figured out that it’s a good idea to set up JSHint. The initial run didn’t reveal any problems beyond some missing semicolons, but I didn’t want to break anything. Since JSHint does not know that the files share the same scope, I had to add all the global variables to the globals array in .jshintrc.

The global variables containing the application state made me feel uneasy. Since the code was almost modular already, I realized it would be easy to use Webpack to get real modules. Webpack (like browserify) takes your code, wraps each file into a module and gives you Node-style interface with require() and module.exports. Using modules would make the global state explicit and Webpack would also make it easy to use ES6, to have live-reloading code, and other good stuff.

What I did was to add an entrypoint that requires the main module and calls the function that starts the application. Initially I only included the application code in webpack and pulled in the dependencies in the old <script> way.

Then I removed the module names from JShint globals one-by-one. JShint gave me a list of all the files that needed attention and I added the relevant imports and exports there.

To deal with the global state variables, I added a new module called GlobalState to contain them. If I have time to refactor this later, finding all the places that use it is just one grep away.

Most of the third-party dependencies could be pulled from NPM. Create a package.json with your dependencies, sadd node_modules to Webpack’s resolve roots, run npm install and you’re good to go.

With the more obscure libraries, I decided to keep the vendored libraries in the repository and use shimming to make them work with Webpack. For example to use roslibjs, I added the following loader:

// roslibjs requires EventEmitter2 and defines ROSLIB
{
  test: /roslib\.js$/,
  loaders: [
    'imports?EventEmitter2=eventemitter2',
    'exports?ROSLIB'
  ]
}

Then I could do var ROSLIB = require('roslib.js') anywhere I wanted.

When integrating Webpack with rest of our build system, I wanted to pass it the resolve roots as a command-line flag. Webpack does not have such a flag out-of-the-box. This is not a problem since webpack.config.js is an ordinary Node module and you can use e.g. yargs to parse the flags:

var argv = require(’yargs’).argv;
module.exports = {
  resolve: {
    root: argv.resolveRoot 
  }
};
// now do webpack —resolve-root=foo —resolve-root=bar

I like the declarative configuration of Webpack. Basically you say “here is my code, here are my deps, please compile” and it does what you want. To get ES6, just add babel-loader to the loaders. To get live-reloading, just run the dev server with --hot --inline. This is so much better than complex gulpfiles. If only the documentation was less confusing!

Exercising is like eating

Wed, 21 Oct 2015 00:00:00 +0000

I used to think that you should have fun while exercising. I had trouble building an exercise habit since, let’s face it, exercise isn’t always fun. Sure, it’s often great, but sometimes it’s just boring and tedious.

Recently there has been a shift in my thinking. Exercising is like eating: you have to do it whether you like it or not. Realizing this has made it so much easier for me to go jogging. If running does not feel so good this time, it’s not a huge disappointment, it’s just life. Aftewards I’m pretty much always happy that I did it.

Moral obligations of ad blocking

Fri, 09 Oct 2015 00:00:00 +0000

With iOS 9, Apple enabled ad blocking in Mobile Safari. This change received a lot of attention. Users were happy, because web ads and trackers significantly degrade the user experience and intrude privacy. However, publishers were quick to announce that their publications are doomed, since they will be unable to make any money when everybody blocks ads.

A central question of the debate is this: are users morally obliged to not block ads?

I look this question from the perspective of software compatibility. When you publish a website, you’re essentially shipping a piece of software of your readers around the world. Some of them are bound to be using setups that are not fully compatible with your site. Maybe they’re running an old browser that doesn’t support the latest HTML5 features, or maybe they’re running an ad blocker. Do they have a moral obligation to maintain 100% compatibility with your website?

A couple of years ago, Flash ads were very popular. A simple way to block a big chunk of the most annoying ads was to uninstall Flash. Do your users have a moral obligation to have Flash installed?

A web site is a kind of distributed system: typically the user’s computer will have to gather the pieces of code and content from a bunch of different servers. Network partitions are a common failure mode of distributed systems.

For example, I have this one computer that does not run ad-blocking software per se, but that is unable to connect certain well-known hosts that serve tracking code. It is essentially partitioned off of certain parts of the Internet. Do your users have a moral obligation to ensure that there are no network partitions in your distributed systems?

How to make money with Internet publishing is a hard question. I do not have an answer, but I doubt that limiting the users’ ability to be picky about the software they run on their computers is the way.

Commandments for code review

Mon, 07 Sep 2015 00:00:00 +0000

Code review is the practice of soliciting and giving feedback on code. At work, reviewing is an integral part of our software engineering process. Code won’t be merged to the master before at least one person has reviewed it — the only exception is experimental research code. This means that over the past two years, I’ve reviewed a lot of code.

How do you review code, then? I’ve distilled my experience into a short list of ”commandments”:

Review the code, not the people.
Review what computers can’t review.
Give constructive feedback.
Praise good work.
Everybody can and should review everybody’s code.
Stay humble.

Some questions I can't answer

Sun, 06 Sep 2015 00:00:00 +0000

This is a picture of a concrete building in a post about building concrete knowledge.

I feel like I know quite a bit about programming, but a lot of that knowledge is tacit. I know intuitively how various bits fit together, but it’s hard to put that knowledge into words. One of the reasons why I write this blog is to practice making my knowledge explicit.

Today I was thinking of something else: how to make my lack of knowledge explicit? If you know what you don’t know, that’s fine, you can work on it. If you don’t know it, you might be in trouble. Who knows? Not you, that’s for sure.

To get started, I decided to map some of my known unknowns. I wrote down some questions about software engineering that I can’t answer. I recommend this exercise: it is surprisingly difficult to condense the unknowns into concrete questions. Here is my list:

How do you manage dependency updates for JavaScript projects? JavaScript projects tend to have a lot of fast-moving dependencies, and the common practice seems to be to use bleeding-edge versions of them. How do you ensure that the updates do not break your software?

Are projects a good way to organize software engineering work? I mean project as in subcontracting project, not as in open source project. The question probably gives away the fact that I think the answer might be ”no”. Projects by definition have end date, but to me it seems that software that is used requires continuous work. What are alternatives to projects, though?

What does a well-working software engineering process look like? I’m thinking of something like agile-as-in-the-manifesto, but what are the essential parts?

What makes a senior programmer? I mean senior-as-recognized-by-peers, not senior-as-in-title. What are the essential skills and skill areas? Seniority is a continuum, but are there natural checkpoints on that continuum?

What were the most significant new discoveries in software engineering and computer science in the 00s? What will the years 2000-2009 be remembered for? My knowledge on this is so weak that I don’t even know what to suggest. Maybe the advances to deep learning could be on the list?

These are the questions on my mind tonight. They are my unknowns: obviously there are many people out there who could give well-reasoned answers to these questions. It also shows what I’ve been thinking about lately. The list will look different next month, or next year.

So, what does your list look like?

Everyday carry (August 2015)

Sun, 30 Aug 2015 00:00:00 +0000

Everyday carry (EDC) means the collection of items you lug around with you every day. It’s also the one of the most popular genres of manly lifestyle blogging. A typical EDC post contains a picture of knolled contents of the author’s EDC, along with a list of the items with their brand names and optionally a description why each item was chosen. One of the biggest EDC blogs is everydaycarry.com.

In theory, EDC is about being to prepared to the situations you might end up in. That’s why you’ll see lots of multitools, knives and other tools in the EDC postings. If you browse everdaycarry.com, it quickly becomes obvious that the posts are also about style, fashion and building your identity.

I find it fascinating to browse EDC posts. What even is a tactical pen? Do all those software architects really use all those knives for something? What do people write or draw into all those Field Notes notebooks? Where are all the charging cables?

Let’s have my contribution to the genre, then.

I almost always have my backpack with me and typically carry around a lot of stuff. In this picture I’ve included only the items whose absence nags me.

Smartphone. It’s a communication device, but it’s also a book, a camera, a map, a notebook and a bus timetable.

Smartphone charger. Unfortunately, the battery of an actively used smartphone runs out quickly. Carrying a smartphone without battery is almost worse than just leaving the smartphone home. This means I have to be always ready to charge it when the need arises.

There’s actually a recent study by Ferreira, McGregor and Lampinen about the impact of smartphone batteries on everyday life! The article is called Caring for Batteries. Here’s a conclusion they draw:

[W]e see how battery work impacts our lives in various ways. Also, we see how it is not just about particular moments when batteries go flat, but rather a matter of constant strategizing and anticipating of when and where one will be able to charge, navigating between a complex and varying infrastructure that we learn, build, and maintain. We take these tasks upon ourselves despite the stress they bring into our lives (as seen in the richness of the emotional-laden vocabulary used by participants); we rarely reflect back on these tasks. Perhaps there is a feeling that little can be done and we are unable to consider battery care as something optional.

Notebook, pen and pencil. I like to take lecture notes on paper. Sometimes I think on paper: if I can’t figure something out, I write what I know about in on paper or draw a picture. It usually helps.

Writing in the notebook feels too permanent, so I actually write on throw-away pieces of found paper, yet I carry around the notebook. I should work on this.

Watch. For years, I didn’t use watch, instead relying on the smartphone as a clock. I didn’t like the fact that my clock ran out of battery so often, so this spring I bought a watch. It hasn’t yet run out of battery. Success!

Key-sized multitool. A.k.a. the only multitool I ever actually use. Compared to bigger multitools, this one sucks. However, it totally beats using your keys as improvised cutting tools! I use this to open packages and to cut strings almost daily. It’s made by Semptec, but it seems to be identical to Swiss+Tech Utili-Key.

For further reading, check out Venkatesh Rao’s recent post The Things You Carry.

Copenhagen highlights

Tue, 18 Aug 2015 00:00:00 +0000

A cargo bike. Copenhagen is full of them. (Flickr)

I’m on vacation and I spent the last week in Copenhagen, Denmark. It was a tourist trip - I even saw the Little Mermaid! Here are my highlights.

Everything works. Copenhagen was very easy to visit. Everybody spoke perfect English, all information was easily available in English and on the Internet. Everything was clean and everything worked. My only complaint is that not every bathroom has a single-handle mixer tap. Come on, Danes, you even have an iconic design tap, there’s no excuse.

A typical Copenhagen street, full of bikes. How can there be some many bikes?

Bike infrastructure. Cyclists in Helsinki sometimes lament how much worse the cycling infrastructure is compared to Copenhagen. I’ve thought that it can’t be that much better, but yes it can. Bike paths are everywhere, they’re full of cyclists and devoid of parked cars. Wow.

Museums. I liked Glyptoteket. They had an interesting exhibition about Man Ray and his Shakespearean Equations, their collection of French masters was great and the winter garden is beautiful. We also visited Louisiana, the museum of modern art. The exhibitions were high quality, but I wasn’t so interested in them. Still, I recommend visting Lousiana just to see the modernist buildings and the beautiful site.

A bowl of oat porridge with fresh apple and caramel sauce. Yum.

Food. We visited some nice places, but the best food experience was at GRØD. This is a bit surprising since it’s a fast-food joint serving porridge. They served some of the best oat porridge I’ve ever had and I’ve had a lot of porridge in my life. Also, what’s not to like about new Nordic fast food?

Coffee. The Coffee Collective is an obvious place to visit for coffee, but another great coffee shop I visited was Forloren Espresso (false espresso?). I got an excellent cup of Kaiguri AA that was meticulously hand-brewed.

You can find both the The Coffee Collective and GRØD in Torvhallerne (market halls?) near Nørreport station. However, I recommend visiting their other locations. Torvhallerne are very busy and you’ll get better service in the calmer places. Instead, go to Jægersborggade in Nørrebro. The street is full of small design shops and restaurants. I imagine Vaasankatu in Helsinki will look like Jægersborggade in ten years.

Street corner with graffiti in Nørrebro.

All my friends in Helsinki seem to be going to Copenhagen right now. I can see the appeal: even though Copenhagen isn’t that much bigger than Helsinki, it feels much bigger, yet it has the Nordic flavor. Unfortunately it has the Nordic price level, too…

Personally, I didn’t fall in love with Copenhagen. It’s a vibrant city, but it’s too busy and noisy for me. I’m glad that I went there, but I don’t think I will be back too soon. Next time I travel, I’m going to go see some nature.

Shell pro-tip: create weekly working directories

Mon, 10 Aug 2015 00:00:00 +0000

Is your home directory filled with temporary files, directories and other kinds of experiments? Mine was until I learned this tip from Leah Neukirchen’s blog almost ten years ago: every week I create a new directory to use.

The path is ~/mess/YEAR-WEEK, so this weeks path is ~/mess/2015-33. I have a small shell function called mess that cds to the current weekly directory, creating it if it does not exist. It also points the symlink ~/mess/current to the current weekly directory. I’ve also added mess to zsh’s directory hash table, so ~mess points to ~/mess/current.

Here’s the code:

hash -d mess=~/mess/current

function mess() {
    local MESSDIR=~/mess/`date +%G-%V`

    if [ ! -e $MESSDIR ]; then
        mkdir -p $MESSDIR
        ln -snf $MESSDIR ~/mess/current
    fi

    cd $MESSDIR
}

Wash your smelly travel towel with vinegar

Sun, 02 Aug 2015 00:00:00 +0000

Do you have a smelly travel towel? Soak it in vinegar to clean it up.

I have one of those microfiber travel towels. They’re great: they take very little space, dry up quickly, and after breaking in, they’re comfy, too. There’s just one problem with them: if you forget to let them dry, they will stink (like all towels) and it’s hard to get rid of the smell (unlike cotton towels).

This happened to me. I forgot my towel in the bag overnight and next morning it smelled like vomit. The smell didn’t go away even after washing the towel several times. In fact, the smell seemed to become even worse. I was ready to throw the towel away, but then a friend suggested using vinegar. I tried it and it worked great! The towel is non-stinky and usable again.

Here’s what I did:

Submerge the towel into a solution of equal parts of vinegar and water. Leave it there for 24 hours.
Rinse the towel.
Wash it the usual way (machine wash warm).

I believe that using vinegar is a well-known method among the more experienced laundry-washers, yet my googling didn’t bring it up. Hopefully writing this down will help the next person with the same problem.

FRP and self-adjusting computation

Sat, 25 Jul 2015 00:00:00 +0000

Modern FRP is all about event streams.

If you follow me on Twitter attentively, you know that a while ago I realized that functional reactive programming (FRP) and self-adjusting computation (SAC) are related, but I didn’t know what the difference is. I have now figured it out.

Functional reactive programming is programming with values that change over time. For example, if you want to move a sprite on the screen, the position of the sprite changes as a function of time.

Umut Acar, whose PhD thesis was the original work on self-adjusting computation, describes SAC thus:

Self-adjusting computation refers to a model of computing where computations can automatically respond to changes in their data. […] Perhaps the most important idea in self-adjusting computation is the treatment of computation as a “first-class” mathematical and computational object that can be remembered, re-used, and adapted to changes in its environment.

So, both FRP and SAC are about computations with changing inputs. What’s the difference?

The best explanation I’ve found comes from the blog post Breaking down FRP by Yaron Minsky: FRP is history-sensitive and SAC is not. In FRP, the values that change over time, called behaviors, can be defined in terms of the history of their inputs. In SAC, only the current state can be used.

It can be helpful to thinking about the context where FRP and SAC were developed. FRP’s roots are in animations in virtual reality. SAC was developed for efficiently implementing algorithms where part of the input changes and you do not want to recalculate everything. Also, there’s an answer on Theoretical Computer Science StackExchange that explains when to use FRP, SAC and partial evaluation.

Running is great for lazy people

Mon, 20 Jul 2015 00:00:00 +0000

If you’re looking for an easy way to get some exercise, try running. It definitely worked for me. Over the years, I’ve tried to build swimming, cycling and gym habits, but they never lasted. In March I started running three times a week and I’ve have ran almost every week ever since.

You can run almost anywhere and at any time. I like swimming, but the pool closes so early and besides first you have to go there. With running, you can just step outside and you’re ready to go.
You do not need much gear. I like cycling, but sometimes my bicycle breaks and fixing it is no fun. Sneakers do not suddenly stop working.
You can run alone. No need to arrange a group of people to be at the same place at the same time.

In other words, running is the perfect hobby for lazy opportunistic people! If you get an urge to get some exercise, there’s no excuse for not following the urge.

Admittedly, running itself is not that exciting. I’ve solved this by listening to podcasts - Beats, Rye & Types and Tyyppimuunnos (in Finnish) are some of my favorites. Sometimes the weather is bad, but so far I’ve always been glad that I went out. We will see what happens in the winter.

They say that regular exercise is good for your health. The only change I’ve noticed so far is that I now have a daily urge to get some exercise. Beware the addiction.

The Moat of Scrumbut

Mon, 13 Jul 2015 00:00:00 +0000

Whenever there’s talk about agile software development, someone will tell you about how their team was burned by agile methods, and someone will counter that the team just didn’t do agile right.

Agile proponents promise sky-rocketing productivity, but if you do just a small mistake, you end up doing worse than by just doing whatever you were doing before. Here’s how it looks to me:

Scrumbut is where you almost use Scrum: “We use Scrum, but…”

What’s up with the moat of scrumbut? Is agile really this fragile?

You don’t want to use a fragile development method. There will be situations where you can’t fully follow the process. Even things like vacation season can cause problems with following the rules to the letter. This should be a small speed bump, not a catastrophe.

So, what’s up? I don’t know - I do not have enough experience to say. I suspect that some of this is caused by what Martin Fowler calls Flaccid Scrum. In Flaccid Scrum, you pick up the agile management practices, but none of the agile technical practices. Scrum does not mandate any specific technical practices, after all. Neglecting the technical process will get you in trouble.

If everybody does the same big mistake, that does not yet make the method fragile. It does suggest that there are problems in how the method is taught, though.

Speech as art

Mon, 06 Jul 2015 00:00:00 +0000

Kimmo Modig is this artist whose work tends to center around speaking and performance. I’ve been following his work for a while and I find it weirdly brilliant. You can find many of his works online, so let’s have a look.

First, Express Yourself (2012). It’s one of the earlier works, but shows what all of this is about (speaking and pop culture).

Second, here’s DEEP4U (2014). It’s a collaboration between Kimmo Modig, Georges Jacotey, Shawné Michaelain Holloway and Jennifer Chan. Like a lot of Modig’s art, it’s very meta.

Finally, Court of Helburg (2014), a 90-minute play. It’s hard to summarize, but basically Kimmo Modig has gotten your attention for 90 minutes so he might as well entertain you. It felt to me like start-up comedy for adults (in the sense of mature taste, not sexual content).¹

I don’t think any of you will actually watch this through, but hey. I did watch it even though I was in the premiere. ↩︎

A Mind for Numbers

Mon, 29 Jun 2015 00:00:00 +0000

I’ve never been very good at studying. This point was driven home by my latest exam. I studied for it for many hours, spanned over a month and it was about a topic I like (abstract algebra). Yet I failed the exam.

It was not the first exam I’ve failed, but what makes it poignant is that I really spent a lot of energy in studying for no result. So what went wrong?

To answer that, I read a book on math study technique. The one I picked was A Mind for Numbers by Barbara Oakley. It deals with topics like self-testing, chunking and procrastination. The book is practical, but it has solid grounding in cognitive science. It confirmed to me many things I’ve thought about studying (including some of my study hacks) - even if I haven’t been able to put them in practice.

My main takeaway is that constant self-testing is the key for deeply learning mathematics. When you’ve read something, put away the book and try to summarize it. Try to reproduce key definitions and proofs. If you can’t explain your learnings to yourself, have you really learned them? Dr. Oakley writes:

”In the same amount of time, by simply practicing and recalling the material, students learned far more and at a much deeper level than they did using any other approach.”

Another key point is the importance of building solid foundations. By building a library of basic knowledge, it’s easy to learn the more advanced topics. Without the foundations, you can end up with brittle knowledge: you can solve the formulaic exercises in the course or in the book, but you can’t apply the knowledge anywhere else.

So what about that exam?

It’s the latter point that I failed to account for. I was studying after the work and I had very tight schedule, so I just pushed forward without taking time understanding the basics. I ended up doing almost all the exercises in the course, but without understanding anything.

Actually, I realized that things are going wrong already before the exam and started reading the book. I went back, took all the cognitive tricks and rehearsed the basics. It worked, but I only had time to cover half of the topics. I aced the first half of the test and had no idea what to with the other half. Alas.

Failing an exam sucks, but it’s high time for me to learn to study. This book gave me a good idea on how to better structure my studies. I wish I had read it as a freshman, even though I doubt I would’ve heeded the advice.

Spend some time away from computers

Mon, 22 Jun 2015 00:00:00 +0000

The view from the veranda. (details on Flickr)

I spent the last couple of days at a summer cottage. I had my phone and laptop with me, but I didn’t use them at all even though it was raining all the time. Instead I read books and played board games. It was very relaxing.

I feel stupid for writing this down, but spending some time away from the Internet is a great idea. I know it and I think you know it, too. I also know that I sometimes forget it. It is the summer vacation season, so I thought it’d be a good time for a gentle reminder.

ROS: Good, bad and ugly

Mon, 15 Jun 2015 00:00:00 +0000

I don’t think this cool little bot uses ROS. Photo CC-BY by langfordw.

At work, we use The Robot Operating System is some parts of our software. ROS is a software framework for building robot applications. There’s a message passing system with C++ and Python implementations and a bunch of ready-made robotics modules and hardware wrappers that use the message passing system.

We’ve now used ROS for a while. It has been a success in some regards, but also a source of frustration. Below I’ll look at some of the issues.

This post is about the core of ROS. A major selling point for ROS is the ecosystem of existing modules. We use almost none of them, so I can’t comment on their quality.

The good

The message passing system itself is nice. It’s based on publish-subscribe channels and works over TCP. There’s a protobuf-style serialization format, a server for channel discovery (rosmaster). There are also command-line tools for inspecting, recording and replaying the messages.

Adopting a message passing system was a great idea. It’s now easy to write new nodes, because there’s an easy and flexible way to communicate with the existing system. Inspecting a running system by looking at the message streams is handy, and you can also use the same tools to look the recorded messages.

You could build a message passing system of your own, but it’s nice to find an integrated package.

The bad

ROS is very framework-y. It comes with its own build system and a tool for running the applications developed with it (roslaunch). In theory, you do not have to use them, but ROS is pretty much built around the assumption that you do. Retrofitting ROS communication into an existing applications can be painful. Prepare for yak shaving.

We’ve also had problem in building robust long-running systems with ROS. There have been resource leaks and thread safety issues and we’ve had problems with handling failures like rosmaster dying. I have a feeling that this might be because there aren’t that many ROS users building systems that actually need to be robust.

The ugly

There’s an unofficial, unmaintained Java port. The resource leaks were so bad that we simply stopped using it.

Should I use ROS?

If you’re considering ROS for a small project, like a school project, go for it - especially if you can leverage the third-party modules for ROS. You will get a lot of handy stuff pre-made and likely the bad bits won’t bite you.

If you’re considering ROS for a bigger project, do your due diligence. Are you able to deal with robustness issues if they come up? Is supporting C++ and Python enough? The ideas behind ROS are good but not unique. For a long-term project, handling problems should be a bigger issue than getting stuff “for free”.

The Law of Partial Test Coverage

Mon, 08 Jun 2015 00:00:00 +0000

I’ve been thinking about unit tests lately. Here’s an observation I made - I call it the law of partial test coverage:

When your test coverage is less than 100%, the most annoying bugs will be in the untested part.

Why is this? An obvious reason is that untested code is more likely to have regressions. A more subtle reason is that it’s not random what parts of the code are tested and what parts are not.

Usually there are two kinds of untested code: trivial and hard to test.

Trivial code is not a problem. For example, in Python, maybe you haven’t bothered to write tests for all your __repr__ methods, or maybe nothing exercises the if __name__ == "__main__": main() line. If there’s a bug, you’ll likely spot it immediately.

The other kind of untested code is the tricky part: the code that is hard to test. There might be some complex I/O operations involved, for example. If the code is hard to test automatically, it’s often hard to test manually. It might even be hard to understand.

This is means that you just won’t have any idea of whether your code works after you make some changes. You will only find it out by running the code in production. There will be bugs.

Good math exercises build trust

Mon, 01 Jun 2015 00:00:00 +0000

There’s this feeling you get when working on an interesting bug. You just can’t stop until you’ve figured out what’s wrong. That’s also what solving a good math exercise feels.

It’s not always the case. If you’re working on a bug - or a math exercise - that is way beyond your skills, you’re in for a great deal of frustration. You don’t know what to do and you don’t even know how to find out what to do.

I’m working through a set of exercises in abstract algebra and it has been frustrating. This got me thinking. In the spring, I enjoyed working on topology and I used to like algebra. This course shouldn’t be harder than the ones I’ve seen before. What’s wrong?

Some of the exercises required big leaps of understanding. They were too hard for me. I tried to do them for a while, then went to look at the solutions for hints. There were some steps that I wouldn’t had thought of in any reasonable time!

This broke my trust in my own skills in doing these exercises. Now every time I encounter a hard exercise, I’m wondering whether I can actually solve it or if it again requires some clever trick I wouldn’t think of. Then it’s way too easy to give up and I never get the satisfaction of a solved problem.

To avoid this, exercises should grow in difficulty as you progress, but not too fast. This way they build trust in your skills. Here’s an induction axiom that an ideal exercise set should fulfill:

If you solved exercises 1 to n, you can solve exercise n+1.

Note: The said exercises and the material were meant to be used on a lecture course, but I’m self-studying. Attending the lectures would likely make the exercises work much better. The reason I picked this material is that I’m going to take an exam based on it. Oh well.

Quickly jumping between git branches

Sat, 23 May 2015 00:00:00 +0000

When working on software with Git, I create a new branch for every change request I’m going to make. This means that I might be working on a bunch of branches in any repository. How to switch between them easily?

I use a small script to do it, called git-b (the name is inspired by autojump’s j). It gives you list of branches ordered by the time of the latest commit. It uses selecta so you get to choose the branch by typing a part of its name. It’s simple, but so much better than git checkout <tab>, which is what I did before. (I imagine you could actually configure zsh’s tab completion for similar experience. Alas.)

Here’s the code:

Emacs: Get the path for current buffer from command-line

Sun, 17 May 2015 00:00:00 +0000

At work, I use Emacs (nowadays with the Spacemacs configuration). My favorite Git history browser is tig. When I want to see the git history or blame for the file I’m currently editing, what do I do?

There are many possible solutions, but here’s mine: a small script called cur that returns the path to the buffer currently active in Emacs.

#!/bin/sh
emacsclient -e '(buffer-file-name (window-buffer))' | \
    sed -e 's/^"//' -e 's/"$//'

It’s easy to use with tig – or any other command-line tool you might need:

tig blame $(cur)

Birds and context-free grammars

Mon, 27 Apr 2015 00:00:00 +0000

Swans are not known for their beautiful song. (details on Flickr)

A lecturer once claimed that birdsongs conform to a formal grammar. I didn’t ask about it then, but I’ve thought about it many times since.

Birdsongs and human languages have certain similarity. For example, the brain structures used for producing them are similar, and both humans and birds go through a “babbling” stage in their infancy when learning to speak and sing. Could it be that birdsongs have grammatically as complex structure as human languages?

While studying something else, I stumbled upon an article that gestures towards the answer “no”.

G. J. L. Beckers, J. J. Bolhuis, K. Okanoya, R. C. Berwick: Birdsong neurolinguistics: songbird context-free grammar claim is premature. NeuroReport 23:139–145. 2012.

There’s an earlier paper by Abe and Watanabe that claims that Bengalese finches can learn to discriminate songs based on a context-free grammar (CFG). Beckers et al. claim that the experiment design is flawed and the matching on acousting similarity is more likely explanation. My understanding is that this is the general state of the research: there have been attempts to show that birds can learn CFGs, but the so far the evidence is not convincing.

There’s even a book called Birdsong, Speech, and Language with several chapters on this issue. I haven’t read it, but maybe it would have some answers for me.

My hunch is that many birdsongs have regular grammar. This gave me an idea that maybe you could generate novel birdsongs with a Markov chain. Split a large collection of song recordings into syllables, cluster them based on the acoustic similarity, create a Markov chain from them and produce new songs.

When I tried to do this, I wasn’t able to figure out the clustering. I think it could be done if I just knew a bit more about pattern recognition. I made this webtoy, though.

You should take lecture notes with pen and paper

Tue, 21 Apr 2015 00:00:00 +0000

My highly advanced note-taking setup. (details on Flickr)

Is it a good idea to use a laptop to take lecture notes instead of writing them on paper?

My personal experience says no. There’s the allure of the Internet, and looking around in a classroom, I’m not the only one who can’t always resist it. Even when I do avoid the distractions, my concentration feels more superficial than when using paper and pen. Taking mathematical notes has the extra problem of writing down the notation and capturing the explanatory drawings - both are straightforward on paper.

Anecdotes are fun, but is there any research to back this claim? Here’s a study¹ that agrees with me:

P. A. Mueller, D. M. Oppenheimer: The Pen Is Mightier Than the Keyboard: Advantages of Longhand Over Laptop Note Taking. Psychological Science. 2014-04-23.

They acknowledge the risk of distractions, but even when the distractions are avoided, they claim that laptop notes are worse:

The present research suggests that even when laptops are used solely to take notes, they may still be impairing learning because their use results in shallower processing.

This is because laptop users tend to transcribe lectures verbatim, whereas longhand users have to put it in their own words, because they write slower with a pen than with a keyboard. The cognitive processing required for reframing the information is the key for better recall. (This is also mentioned in my scientific learning hacks!)

To me, taking useful notes seems like a skill that takes practice. The article does not discuss this. My gut feeling is that if you’re mindful about results like this, you can become very efficient at taking notes with laptop. In the second experiment described in the article, laptop users were asked to avoid verbatim notes. This didn’t help much, but maybe you can do better, because you understand why it matters.

Myself, I’m going to continue with pen and paper. In theory, computerized notes are great - they’re searchable and everything! - but in practice I haven’t found many benefits. They’re a bit easier to write, but that easiness is what makes them more shallow.

You should never trust just one study… ↩︎

Tornien taisto is coming soon

Tue, 14 Apr 2015 00:00:00 +0000

Here’s a bird I saw last Sunday. (details on Flickr)

The yearly Finnish birdwatching competition Tornien taisto (Battle of the Towers) is coming in early May. This will be the fifth time I’m taking part.

Here’s the concept: around Finland, teams of at least three people go to birdwatching towers for eight hours and try to identify as many bird species as possible. The best teams spot over hundred species - our team’s score has usually been around 40. The competition takes place when the migratory birds are migrating to Finland for the summer, so there should be lots of them to be seen and heard.

Personally I’m not very good at identifying birds - I’ve teamed up with some friends who are quite a bit better. Still, I’ve been practicing the songs of some birds. Last year, my proudest moment was to identify a redwing by ear. Finnish Wikipedia describes redwing as “one of the most common birds in Finland”. I admit it’s a small win, but hey, you have to start somewhere.

Tornien taisto isn’t very serious competition. For me, it’s more about spending a day in the nature, having good time and learning about birds. I recommend taking part. If you aren’t up for the competition, you could still visit a tower - some of the towers even organize guided tours.

New theorem prover: Lean

Tue, 07 Apr 2015 00:00:00 +0000

Microsoft Research and CMU are developing a new theorem-prover, Lean. Here’s how they describe their goals:

The Lean Theorem Prover aims to bridge the gap between interactive and automated theorem proving, by situating automated tools and methods in a framework that supports user interaction and the construction of fully specified axiomatic proofs. The goal is to support both mathematical reasoning and reasoning about complex systems, and to verify claims in both domains.

It looks similar to Coq and Agda, but I’m not enough of an expert to provide a comparison.

What I find interesting is that in addition to a traditional, Emacs-based proof development environment, it also works in the browser. This should lower the barrier of entry for new users - I’ve seen a lot of complaints about how hard e.g. Coq is to set up (see my instructions if you’re using OS X).

I haven’t yet played around with it, but I hope to take a look at the tutorial soon.

The value of MOOCs is not in the videos

Tue, 31 Mar 2015 00:00:00 +0000

In the article Why My MOOC is Not Built on Video, Prof. Lorena A. Barba writes:

The participants of #NumericalMOOC will have noticed that we made only one video for the course. I thought that maybe I would do a handful more. But in the end I didn’t and I don’t think it matters too much.

I haven’t been very interested in MOOCs, because I’ve seen them as a glorified collections of lecture videos. I seldom find lectures useful, and in video lectures you can’t even ask questions.

This article made me change my mind. Instead of collections of videos, I now see MOOCs as potential collections of exercises. As Prof. Barba writes:

Videos are nice, they can get you exposed to a new concept for the first time in an agreeable way, but they do not produce learning, on their own. Students need to engage with the concepts in various ways, interact with ideas and problems, work through a process of “digestion” of the learning material.

When learning a new topic by myself, one of the hard things is to find a useful progression of challenges to tackle. The challenges should be hard enough to be interesting, but easy enough to not be frustrating. For me, in the offline math courses the exercise sheets are the most valuable part of the course. MOOCs can and do provide the same value.

I do not know why I didn’t realize this earlier.

Speaking of MOOCs, if you want to learn functional programming, check out University of Helsinki’s course Functional programming with Clojure. There are no lecture videos, but there are exercises. I took the course when it was an offline course and the exercises were great.

Are quantum computers faster than classical computers?

Sat, 21 Mar 2015 00:00:00 +0000

We do not know - not from the point of view of computational complexity, anyway.

The complexity class $\textrm{BQP}$ (bounded-error quantum polynomial-time) roughly corresponds to the problem that can be solved efficiently with a quantum computer, the same way as problems in $\textrm{P}$ can be solved efficiently with a classical computer. BQP is essentially the quantum version of $\textrm{BPP}$ (bounded-error probabilistic polynomial-time). For more precise description, see Scott Aaronson’s lecture notes or Wikipedia.

The question in the title corresponds to the question of whether $\textrm{P}$ is a strict subset of $\textrm{BQP}$. The following hierarchy is known:

$$\textrm{P} \subseteq \textrm{BQP} \subseteq \textrm{PSPACE}$$

The strictness of these relations is an open question. Proving that quantum computers are faster than classical computers would imply that $\textrm{P} \subsetneq \textrm{PSPACE}$, which would be a breakthrough result in classical complexity theory!

Of course, the existence of Shor’s algorithm and some other algorithms that are faster than their classical equivalents do support the idea that quantum computers are more powerful than classical computers. A proof of the contary result would be a surprise.

What about $\textrm{NP}$? We do not know much about the relationship between $\textrm{BQP}$ and $\textrm{NP}$. It is clear that, contrary to the popular culture description, quantum computers won’t allow us to magically solve NP-complete problems efficiently. Scott Aaronson writes in the lecture notes:

[A] quantum computer is not a device that could “try every possible solution in parallel” and then instantly pick the correct one. If we insist on seeing things in terms of parallel universes, then those universes all have to “collaborate” – more than that, have to meld into each other – to create an interference pattern that will lead to the correct answer being observed with high probability.

This Week in Finnish Politics

Sat, 14 Mar 2015 00:00:00 +0000

Today was the last workday of the Parliament of Finland before the election break. This final week was so eventful that I had to write this post to make sense of it.

Healthcare for paperless immigrants

On Monday, the cabinet proposed a law that would guarantee a minimum level of healthcare for paperless immigrants. The law was prepared by Minister of Health and Social Services Susanna Huovinen, of the Social Democratic Party. The law would have been likely approved, except that MP Kari Rajamäki, also of SDP, demanded postponing the vote. Because this is the last week of the Parliament, there was no time for another vote and so the law was abandoned.

Mr. Rajamäki, after 32 years in the Parliament, is not going to run for the Parliament again. He must be proud of his last act as a member of Parliament.

Limiting student benefits

The cabinet proposed taking away the student benefits for studying for a second degree of the same level that one already has. The proposal was widely critisized by the student movement, but it was nevertheless initally approved on Tuesday by votes 95-91.

On Thursday, a change to the law about the Sami Parliament of Finland was withdrawn by the cabinet. MPs of SDP (which is a member of the cabinet coalition) announced that they would also postpone the ratification of ILO Convention 169.

This pissed off the Swedish People’s Party (also member of the cabinet coalition) and they announced on Friday that they’d vote against the student benefit law in the final vote. Almost immediately, National Coalition Party (yet another member of the cabinet) announced that they will follow suit. In turn, SDP announced that they’d vote against the second-level education budget cuts proposed by the cabinet.

Today, in the final vote, the parliament voted against the student benefit law 185-1. Only Kimmo Sasi (NCP) voted for it. He commented that he is fed up with all the budget cuts being flushed down the toilet.

Prime Minister Alexander Stubb (NCP) commented that the coalition government is not in chaos or collapsed. Yeah right.

Types of JavaScript

Sat, 31 Jan 2015 00:00:00 +0000

Ever since I heard about isomorphic JavaScript, I’ve been wondering what other morphisms describe JavaScript. Here’s a brief summary of my findings:

Isomorphic JavaScript: The same code can be run on the server and in the browser.
Endomorphic JavaScript: The server/client code can be run with another server/client runtime.
Epimorphic JavaScript: The same code can be run in all the browsers. Also known as epic JavaScript.
Homeomorphic JavaScript: Isomorphic JavaScript that is continuously deployed.
Catamorphic JavaScript: Isomorphic JavaScript, where the server code is minified before it is run in the browser.
Homotopic JavaScript: Isomorphic JavaScript, where the server code is transpiled before it is run the browser.

Setting up Coq, Ssreflect and Proof General on OS X

Sat, 17 Jan 2015 00:00:00 +0000

Previously I’ve used CoqIDE to interact with Coq, because I figured out it’d be tricky to set up Proof General. Proof General the Emacs interface to Coq and other interactive theorem provers. This time I decided to take the PG route and turns out it’s really easy if you’re using Homebrew! While at it, I also set up Ssreflect, which is an enhanced version of Coq with alternative standard library.

brew install emacs coq ssreflect

# Use --with-emacs to point to the Emacs you're using, so that the elisp
# files get compiled with the correct Emacs version.
brew install proof-general --with-emacs=/usr/local/bin/emacs

Next, do as Homebrew tells you and put these lines into your ~/.emacs:

(load-file "/usr/local/share/emacs/site-lisp/ProofGeneral/generic/proof-site.el")
(load-file "/usr/local/Cellar/ssreflect/1.5/share/ssreflect/pg-ssr.el")

Finally, customize Emacs variable coq-prog-name to the path of coqtop, i.e. /usr/local/bin/coqtop.

Spacemacs

This section was added 2016-02-27.

If you’re using Spacemacs, there’s now a very simple Coq layer available. It sets up Homebrew-installed Proof General, although it does not support Ssreflect.

First, clone the layer:

git clone https://github.com/olivierverdier/spacemacs-coq ~/.emacs.d/private/coq

Then, add coq to dotspacemacs-configuration-layers. You can edit your .spacemacs with SPC f e d and reload it with SPC f e R.

Yearnote 2014

Mon, 05 Jan 2015 00:00:00 +0000

Here’s what I did in 2014.

Working and studying

In spring, I concentrated on my work as a software developer. Then I decided to finally graduate from the university and asked my employer for study leave from September till April. In summer, I worked even more to save up some money for studying. In autumn, I studied more than ever. That brought me over 55 course credits (the usual rate is about 30 per semester). I’m happy I finally found success with studies - my earlier semesters have been much slower.

Travel and events

Visited Stockholm thrice. Highlights: having coffee at Drop Coffee and discovering Adisgladis, the lifestyle store right next to Drop Coffee.
Visited Tartu to celebrate volbripäev with Raimla. For the record: Tartu Student Villa is a pretty cool hostel.
Visited the Isle of Anglesey in Wales. Highlights: walking about Holyhead Mountain, seeing a grey seal on a Puffin Island cruise
Attended Electromagnetic Field 2014, the hacker camp in Bletchley, UK.
Attended ClojuTRE 2014 in Tampere, Finland.

Stuff made by me

Made Flappy Sine the game.
Practiced photography.
Fixed the golden pig.
Posted 27 times on quanttype.

Best of

Here are some things I was impressed by in 2014:

Best album: Merelle by Pikku Kukka
Best back-up software: Arq
Best coffee shop: Freese Coffee Co.
Best low-alcohol beer: BrewDog’s Nanny State
Best month of abstinence: hevoseton helmikuu
Best performance art: Court of Helburg by Kimmo Modig
Best restaurant lunch: mac’n’cheese in Juuri
Best wine: Mouth Bomb by Winepunk

Not awarded: best movie, best book, best hamburger. I didn’t read enough good books, watch enough good films or eat good hamburgers.

Plans for 2015

This year I hope to work a bit less than last year, and to have more time to make interesting stuff. Some plans I have for this year:

Graduate as a Bachelor of Science.
Have a summer vacation that is at least two weeks long in one go.
See a wild hedgehog. (I live in one of the most urban spots of Finland…)
Read Infinite Jest.

Ambronite mini-review

Wed, 17 Dec 2014 00:00:00 +0000

Update: See also my Joylent mini-review.

A while ago I backed Ambronite on Indiegogo, mostly out of curiosity. It’s a nutritional drink powder, similar to Soylent. Today I received my batch of Ambronite and I’ve just consumed the first portion. Let me tell you about it.

It’s green and thick and looks nice. It smells good, too, like sawdust or bran – I like that. Unfortunately it also tastes like sawdust, and not in a good way. It has the mouthfeel of porridge made of sawdust. The aftertaste is okay, though, and it did leave me feeling full. I didn’t expect it to taste nice, but I’m surprised how off-putting the taste and the texture are. Seriously.

Maybe adding something to the mix will make the taste more pleasant. There have been reports of using juice instead of water, or adding banana. Maybe I can eat it with yoghurt instead of brans. The crowdfunding campaign promised a recipe booklet, but it isn’t done yet.

Oh well.

Going to write a bachelor's thesis

Tue, 16 Dec 2014 00:00:00 +0000

My aim is to finally graduate as a Bachelor of Science in mathematics in the spring. To do that, I need to write a bachelor’s thesis. I’ve now agreed with my advisor about the topic: formalizing mathematics with Coq. It’s a topic I don’t yet know much about, and neither does my advisor. Still, I’ve been interested in it for a long time and this seemed like a good opportunity to capitalize on that interest.

It won’t be anything fancy, but I hope to make at least one useful contribution to the science: establishing some terminology in Finnish. The thesis must be in Finnish and not much has been written about the topic in Finnish before. Nobody reads bachelor’s theses, so I’m thinking of publishing a Finnish glossary on the web and improving the coverage in Finnish Wikipedia.

Guts of the Golden Pig

Sat, 29 Nov 2014 00:00:00 +0000

Last year Salla made an interactive sculpture as a school project. It’s a golden pig. You drop a coin in the coin slot, it prints you a receipt with wisdom about money out of its mouth. Here’s a video:

Video, jonka Miikka (@arcatan) julkaisi Marras 11, 2014 at 3:53 PST

I cobbled together the electronics. The pig was re-exhibited at the Art of Research conference and right now it is at TOKYO’s Christmas Sales. Before the exhibition we gave the internals a small update and I thought it’d be nice to write down how it works.

Here are the parts:

Coin detector is a light gate. There’s a red LED that points to a LDR. They’re taped to a chute made out of plywood. The coin slot funnels the coins to the chute and they roll through the light gate, triggering the printing. The first version of the chute was made of cardboard, but it was hard to align the LED and the LDR in a stable way. Plywood is a huge improvement.
Arduino is used as an analog-to-digital converter. It reads the LDR in a loop using the analog input pins and outputs every reading through the serial interface.
Python program on Raspberry Pi reads the measurements from Arduino over USB. When the numbers go below certain threshold, it chooses a random PDF file from a directory and calls lpr filename.pdf. In the first version, we had a netbook, but I wanted to port this to Raspberry Pi since they’re so trendy.
The printer is Star TSP143, which works great on Linux, although you have to install the drivers yourself. At first I was worried, because there are no drivers for Linux on ARM, but no worries: the drivers come with the source, which was trivial to compile. We chose this printer because it was the only receipt printer we could borrow.

For the new version, I made a program that draws a live graph of the signal, for calibrating the threshold. This was another great improvement - previously I just eyeballed the numbers in the console.

The original design was guided by the fact that I don’t know anything about electronics. I decided to do the simplest thing that could work. Another constraint was to use parts and materials we already had, since we didn’t want to spend any money. It’s nothing fancy, but it works surprisingly well.

There is one known bug: it sometimes double or even triple-prints. I’ll fix it for version 1.2.

Event notes: ClojuTRE 2014

Wed, 26 Nov 2014 00:00:00 +0000

Yesterday was ClojuTRE 2014. It’s the biggest (and the only) Clojure conference in Finland. Here are my main impressions:

Wow, there are lots of people in Finland interested in Clojure, and they’re actually using it in production. Clojure is one of the better programming languages available right now and I’m glad that people have found it. Compare this to the Haskell situation: there’s a lot of interest, but almost no-one uses it for “serious” work.
ClojureScript is at least as important as Clojure. Judging by the talks and the chatter, people are very much into CLJS. Make sense: JavaScript is everywhere and therefore ClojureScript can be everywhere.
Haskell evangelists still have some work to do. Reactions to Bodil’s Haskell talk ranged from “Haskell is the best” to “why on Earth would I use Haskell?” I thought everybody already loved Haskell. Alas!

The talks very pretty basic and there were some technical problems, but otherwise everything went well. Since the talks were short (20-30 min), there was a lot of variety, which was great. In general everybody seemed to have a good time. Thanks to everyone involved!

Talks

I’ve summarized the key points of each talk below. Please note: This was written by me, not the presenters. If it sounds like they said something stupid, it’s probably me putting words in their mouth.

Wrapping Java in Awesomeness (by Tero Kadenius & Juha Heimonen). While Java is not the nicest language, there are good Java libraries out there. They can be cumbersome to use, so you should wrap them nicely with Clojure. You’ll do this by restricting mutation and feeding the Java objects with immutable Clojure data.
Let’s Steal Some Java Goodness (by Antti Virtanen). Clojure books tell you how to write the fib function, but not how to do real-world stuff like logging or performance monitoring. Inspired by Spring Boot, Antti has created a Clojure web app skeleton) which covers these and more, so you do not have to figure them out yourself.
How to Introduce Clojure to Your Organization (by Erik Assum). Clojure is elegant and expressive and it has REPL, which will blow your mind if you haven’t seen it before. But how do you impress a programmer with deadlines? Build a simple, valuable and non-critical project it with, like a useful chatbot.
Load Testing with core.async (by Markus Hjort). To sneak in Clojure, Markus has built a Gatling-style tool for load testing, called clj-gatling. During the talk, Markus showed us how to build the core logic of the tool using core.async. It was straightforward.
Hacking FirefoxOS with ClojureScript (by Timo Sulg). FirefoxOS applications are developed with HTML5, which makes ClojureScript a first-class citizen on the platform. There are some hurdles, like no REPL yet (because eval is disabled), but there are likely workarounds. Since the Firefox OS phones are so cheap (even if the specs are a bit dated), they make a great platform for hacking projects.
A Clojurian’s Quest for Qt (by Pauli Jaakkola). Using Qt with Clojure is non-trivial: Qt Jambi is not maintained anymore and Qt Widgets for Node.js does not really work. The way to do it is to use Qt Quick, QML and ClojureScript. QML is no fun, but Pauli has created a Hiccup-style library for generating QML.
ClojureScript Live Coding With Ease Using Lively (by Immo Heikkinen). By live coding Immo means editing the live application by reloading code in the running process (as opposed to playing music by writing code, which also involves live coding in Immo’s sense). He has created the library Lively for this purpose. Compared to the alternatives like figwheel, it’s simpler and more straightforward to use. Immo demoed Lively by building a snake game. This was my favorite talk - fun and well presented!
Build Tooling With Boot (by Juho Teperi). If you try to build a full-stack web project (with Clojure, ClojureScript, LESS, …) with Leiningen, it soon gets messy. Boot is designed to solve this problem. Boot tasks are Clojure functions which run in the same JVM. This makes them faster and more composable than Leiningen tasks, and not every plugin has to implement file-watching. There are still things missing like IDE support and some common tasks.
Live Hacking with Unity 3D (by Tims Gardner). There were some technical problems with this presentation, but basically it was Tims demoing Arcadia, a Clojure interface for Unity.
Hay - an embedded language (by Meikel Brandmeyer). Sometimes you want to make your application extensible by users. Exposing the host language - like XMonad does - requires the users to have the full toolchain and expertise in the host language. That’s why a simpler scripting language might be warranted. Meikel is working on Hay, a stack-based concatenative language that is closely integrated with Clojure. Meikel walked use through the basics of Hay and also showed us a useful use of Clojure metadata: storing the stack signatures of Hay functions.
Haskell for Clojurists (by Bodil Stokke). The Clojure community has been open to the influences of other programming languages - as shown by e.g. core.logic and core.typed. Bodil thinks that a good language to learn from is Haskell (I concur) and she walked us through implementing linked lists up to their Foldable instance.

A photo essay: Haronmäki

Mon, 17 Nov 2014 00:00:00 +0000

I made you a photo essay titled Haronmäki. By photo essay, I mean a short series of photos that are to be viewed as a whole.

I’m not sure if a HTML page of huge JPEGs is the best way to deliver. Suggestions welcome.

Customizing Nix packages

Fri, 07 Nov 2014 00:00:00 +0000

I’m using Nix to install some packages on a host where I don’t have root and do not want to pester the admins all the time. The good side of this is that I can install packages with one command. The bad side is that installing anything takes forever. Because I use a custom store location, I can’t use the binary packages. Nix builds all the dependencies (including e.g. gcc and glibc), which takes a while.

Some Nix packages can customized. For example, I wanted to install ikiwiki and use it with git. If you look at ikiwiki’s Nix expression, you’ll see that git support is disabled by default. How does one enable it?

The documentation on this can be hard to find, but it does exist¹. The gist of it is to create the file ~/.nixpkgs/config.nix and override the settings there. Here’s how I enabled ikiwiki’s git support:

{
    packageOverrides = pkgs: with pkgs; {
        ikiwiki = ikiwiki.override {
            gitSupport = true;
        };
    };
}

Now you can install ikiwiki with nix-env -i ikiwiki and git will be pulled in.

Mostly you’ll find mentions about a file called configuration.nix, but as far as I can tell, that only applies if you’re using NixOS. ↩︎

Did taking photos teach me anything?

Thu, 23 Oct 2014 00:00:00 +0000

My 28 days of practising photography are now over. You can see the photos on Flickr. Did I learn anything?

My goals were to think about the photos I take, learn the basics of taking an urban landscape shot and take at least one shot I could be proud of. The last goal first, here’s the one shot I really like:

I didn’t really plan this shot. I was trying to take another shot of the university buildings on Siltavuorenpenger. I had even brought a tripod, but there just wasn’t enough light at 23 o’clock. Before going home, I decided to take a quick snap at Pitkäsilta.

The project did improve my skills, but I can’t offer a coherent theory of the basics of urban landscape photography. I wish I could, but it has become apparent that I have even less idea of what I’m doing than I thought.

Nevertheless, here are some ideas for my novice peers:

Everybody loves one point perspective. (example)
Separating the ground from the foreground would be great, but I can’t do it with my camera. (example of me not doing it)
Everything looks more dramatic at night. (example: night vs. day)
Re-trying a shot on another day is a good idea. This one was the third time I took a photo at this place.
That contrast slider in your photo editing program is really tempting. Try not to overuse it.
Take a lot of photos. You’ll accidentally take some good ones.

I uploaded some of my photos on 500px. 500px’s algorithm seems to be very good at putting my (a new user?) photos in front of people, since I received a lot of likes. If you like social media likes, consider 500px.

Taking a single photo is great for photography-as-visual-art, but not so great for photography-as-storytelling. It’s not like you can’t tell a story with a single picture, it’s just that using more pictures often leads to better results. Hemingway’s six-word novel is cool, too, but there are benefits in longer-form writing. My 28 photos do not form a whole and they do not tell a story. It wasn’t the purpose, either, but it might be what I want to explore next.

As a bonus, here’s my best color shot:

Four scientific ways to hack your learning

Tue, 21 Oct 2014 00:00:00 +0000

During my university studies, I haven’t been to any book examinations until now. I’m studying a minor in cognitive science and there’s a bunch of book examinations I need to take.

The book I’ve now studied is Fundamentals of Cognition (Michael Eysenck, 2012). I’ve been worried, because I don’t have any practice in studying for a book examination. Luckily this book covers topics such as memory and learning! What does cognitive psychology say about studying efficiently?

Process the information deeply

According to the level-of-processing theory, the deeper your analyze the information you’re learning, the longer-lasting the memory traces are.

Do not just read the material - do something with it. For example, put the key points in your own words. This forces you to process the text semantically instead of just glossing over. The best thing you can do is to relate the information to yourself somehow - this gives an extra boost to recollection.

Test your learning

Research says that the long-term retention is better when you test yourself while you study. Instead of reading a chapter twice, read it once, close the book and try to remember as much as you can. (This is also known as the Feynman Technique.)

When rehearsing the material, spaced repetition software (SRS) such as Anki can be a powerful tool. The forgetting curve hypothesis says that forgetting happens exponentially. Anki will test you just ahead of this curve. Also, creating a deck of flashcards is a way to process the material more deeply.

Learn in environment similar to the exam hall

You recall information better when you’re in similar context as you were in when learning the information. This includes the physical environment but also your internal physiological state and mood.

This would suggest that you should study at the same time of the day as the exam is. If you’re going to eat or drink coffee before exam, do it before studying, too. Study in an environment that is similar to the examination hall.

Bonus: Get very drunk immediately after studying

Consolidating new memory traces is a process that takes hours. Alcohol inhibits consolidation of memory, so if you drink heavily, nothing will interfere with consolidation of the material you’ve learned while studying. This means you will remember the material more clearly.

I haven’t tried out this method myself, but hey, there’s research so it must be true.

NP and non-deterministic Turing machines

Thu, 09 Oct 2014 00:00:00 +0000

The complexity class P is pretty straightforward: it’s the class of problems that can be solved in polynomial time. What is NP, though?

I’w now attending a course on computational complexity. While I’ve known about complexity classes for a long time, my understanding has been hazy. I’ve known that the problems that can be solved in polynomial time belong to P and harder problems belong to NP and thene there’s something beyond that. This course has finally clarified to me what is NP.

The key to NP are non-deterministic Turing machines (NDTMs). They’re a generalization of the ordinary, deterministic Turing machines: there can be more than one rule for each state-symbol pair. This is the same as with non-deterministic and deterministic finite automatons.

NDTM can be thought to run all the possible computations in parallel. A possible intuition is that it “makes a copy of itself”, multiverse-style, when it reaches a configuration where it can apply two or more rules and the different copies apply different rules.¹

NDTM accepts a language when there exists at least one computation (a series of configurations of the machine) that finishes in an accepting state. It halts if all the possible computations halt.

Formally, P is the complexity class of decision problems that can be solved in polynomial time by a deterministic Turing machine. NP is the complexity class of decision problems that can be solved in a polynomial time by a non-deterministic Turing machine.

It’s is sometimes said that NP is the class of problems whose solutions can be checked efficiently (i.e. in polynomial time). Here’s how it works: construct a NDTM that generates non-deterministically a solution candidate and then checks it with that efficient procedure. This NDTM accepts the input if there’s at least one computation that halts and hence it solves the decision problem efficiently.

Consider the partition problem: you have a set of items with known weights and two knapsacks and you want to split the items into the sacks so that they weigh equally much. The decision problem asks, given the weights, whether this can be done. A possible solution can be efficiently verified: if you come up with a way to split up the items, you can easily sum the weights and see if they match. This problem is indeed NP-complete: it can be solved efficiently with a NDTM, but not with a deterministic Turing machine unless P=NP.

This multiverse intuition is sometimes offered as an explanation of what quantum computers do. I do not know much about quantum computing, but I’m pretty sure this is wrong and quantum computers are less powerful than NDTMs. Edit: See a later post about this. ↩︎

Why does everything fall apart so quickly?

Mon, 06 Oct 2014 00:00:00 +0000

I’ve lately thought about how much waste we produce. It’s mind-boggling. Every time I put something in the garbage bin, I think again, and I doubt that our household is amongst the worst producers of waste. Where did all that waste come from? Especially all that plastic? What’s going to happen to it? Why all my clothes wear out so quickly? Why all the electronic devices stop working so quickly? Why can’t they be repaired?

In our apartment, we have a couple of chairs found from my girlfried’s grandparents’ attic. The chairs can be seen in a photo from the 1920s, which means they’re at least ninety years old, yet they’re in great condition. They haven’t been in use all the time, but I’m still impressed: this kind of durability is in stark contrast to the planned obsolescence of modern devices.

For people who like shopping and new stuff, and for the companies producing that stuff, the short lifetime of their stuff is a blessing. If your clothes quickly wear out, then you soon get to buy new ones. For me, it feels wasteful and tedious.

Finding durable items

All this has made me pay more attention to the lifecycle of things around me. Here are some things to think about:

Will it age well? All items wear in usage. Some of them do it with dignity (this is the aesthetic of wabi-sabi), while others take it less well.
Can it be repaired? Can you do it yourself? Can you get spare parts? Wonders can be done to clothes and furniture, but many electronic devices are not designed to be user-serviceable. Still, there is a cool Ello post by Clay Shirky about hardware hackers in China: in China, repairing any electronics is business as usual.
Can it be resold or given away? Can it be recycled? When the item comes to the end of its road, can its materials be re-used? Can they be safely disposed?

Practicing photography

Sat, 27 Sep 2014 00:00:00 +0000

I’ve been complaining that I take worse photos than, say, five years ago. A couple of days ago I decided do something about it and start a small photography project.

I’m very much a beginner when it comes to photography, so I’ll probably use silly words and say things that are obvious or obviously wrong. Please bear with me while I’m learning.

Click here to see my progress on Flickr.

I take most of my photographs when I travel. I take them to better remember the places I’ve been to and to better describe my adventures to the friends and family. I’d like to proudly show my pictures, but instead I’m going ugh, this is so bad.

To get better, for the next 28 days, I’m going to publish one 4:3 black-white urban landscape shot taken with my point-and-shoot on that day. The photos will be on Flickr. This may sound somewhat arbitrary. I could come up with some justifications for these choices, but to be honest, the idea just popped into my head and why not?

There’s only so much you can learn with such a small project. My goal is to make myself think about the photos I take, and to get a basic idea on how to take a urban landscape shot. Also I hope to take at least one shot I can be proud of.

Initial insights

I’ve now been doing this for four days. Here’s what I’ve figured out so far.

One thing I quickly realized is that it’s much easier to take an agreeable photo of a detail (e.g. an interesting ornament) than it is to take a photo of a whole (e.g. a stretch of street). This also helped me understand why my older photos were better. I’ve assumed it’s because I practiced more, which is true, but it is also because I took mostly photos of details.

During the last few years I’ve shifted to trying to capture wholes. While detail pictures are fun and important, they feel very detached if you don’t have pictures of the whole to tie them together.

So, capturing the whole is what I’m now interested in. How to take a photo that gives you a feeling of the city? I might be even doing the wrong thing: instead of concentrating on single shots, should I concentrate on series of images?

Another hunch is that a very important feature of the urban landscape is its shape in three dimensions. The city is not just a painting on a plane and you will fail if you try to capture it like that. One way to make an interesting image is to capture the shape of the space.

Update (2014-10-23): Read here what happened!

Houkutteleva tutkijanura

Fri, 26 Sep 2014 00:00:00 +0000

Briefly in English: This is a post about I think doing research would be cool, but the career options in (Finnish) universities seem atrocious. Because the articles I refer to are in Finnish, I wrote this in Finnish, too.

Välillä olen miettinyt, että pitäisikö suunnata urani käytännön ohjelmistokehityksen sijaan tutkimuksen saralle. Esimerkiksi ohjelmointikielten tutkimus on ala, jota seuraan mieleenkiinnolla ja johon syventyisin mielelläni.

Onneksi yliopistoväki on muistuttanut tutkija-uran ongelmista. Esimerkiksi Jenny Kangasvuon mukaan jatko-opinnot polttavat ihmisen loppuun ja yliopistolta ei tukea heru. Janne Saarikiven ja Tiedemiehen mukaan tutkijat ja varsinkaan professorit eivät ehdi tehdä tutkimusta, sillä kaikki aika kuluu hallinnollisiin tehtäviin ja rahoituksen hankkimiseen. Eilen puolestaan sai lukea siitä, kuinka Helsingin yliopisto hoitaa irtisanomiset. Ei tainnut mennä kovin hyvin.

Kun tällaisia kirjoituksia lukee, valinta ei ole vaikea. Akateeminen vapaus on siisti juttu, mutta myös se on siistiä, että IT-alalla koodarit laitetaan koodaamaan, työsopimukset tehdään toistaiseksi ja palkkaakin maksetaan. Ainakin tällä hetkellä.

In praise of Hiccup

Thu, 25 Sep 2014 00:00:00 +0000

I’m writing a simple CRUD web application for a school course and I’m doing it in Clojure. I chose to do the templating with Hiccup. It’s a Clojure library for generating HTML. You specify the structure as inline data, like this:

(require '[hiccup.page :as h])

(defn example-page [title]
  (h/html5
   [:head
    [:title title]]
   [:body
    [:div.container
     [:h1 title]
     [:p "Hello world!"]]]))

For a long time, there has been a debate how much or little logic you should have in your templates. On the one end of spectrum is Mustache, which advertises itself as logic-less. On the other end is PHP, where all you have are templates and programs are fully embedded in them.

I’m unsure which one is the right way, but I do say this: if you choose to embed logic in your templates, you should use a proper programming language. This is what Hiccup does: Hiccup templates are just Clojure code that generates plain old Clojure data. Contrast this to e.g. Django’s templating language, which allows you to do a lot of things, but never quite as much as you’d want.

Is being normal anti-establishment?

Fri, 12 Sep 2014 00:00:00 +0000

The more alike to others you are, the more you need to be surveilled to characterise you properly. Does that make aiming to be normal an act against the surveillance establishment?

Electromagnetic Field 2014

Sun, 07 Sep 2014 00:00:00 +0000

Last weekend I went to Electromagnetic Field 2014 in Bletchley, UK. EMF is a hackercamp - there were over 1000 makers and hackers camping on a field. You could get electricity to your tent, and there was fast Internet connection.

It was a pretty cool event. I met some nice people, attended some cool talks, and generally had a good time. The Arduino Duo -compatible event badge, TiLDA MKe, was cool too, even though the radio network never really worked.

There were several really good talks. Here are my favorites:

Walt Disney World: This was supposed to be the future by Dan W, was a talk about the history of Epcot, which is part of Disney World in Florida. Turns out it was Walt Disney’s attempt at utopian city planning. A fascinating story.
Where Games Break by game designer Hannah Nicklin was a personal talk about how and where various games (understood very broadly) break. It was gripping and poetic talk, more like a performance than a lecture. Unfortunately I can’t really summarize it in words. Text of the talk here.
bach.js, an unhistory of how the great Baroque composer pioneered Javascript, 250 years before Netscape even existed by James Aylett was a wonderfully silly talk about the exploits of Johann Sebastian Bach as an early JavaScript programmer. I’m not sure if it made much sense, and half of the jokes went over my head because I don’t know much about Bach, but hey.
Surreal Numbers And Mathematical Games by Tom Hall was a talk about how all of the certain kind of games are actually games of Nim. Even though it was fast-paced, Tom ran out of time. It was so fast-paced that I’m not sure if anyone in the audience was able to really follow the talk, unless they had previous knowledge of combinatorial game theory. Slides here.

I usually don’t get much out of talks in this kind of events, because I’m superficially familiar with a lot of stuff and the speakers tend to not go very deep. Probably one of the reasons that these talks stood out was that I was new to all of the topics and couldn’t properly follow the talks.

All in all, it was a great weekend and I will certainly consider coming back for EMF 2016.

Math is programming

Tue, 05 Aug 2014 00:00:00 +0000

Lately the people of the Internet have been concerned about whether programming is math. Sarah Mei has written a good post about this: Programming Is Not Math. Here I want to record my internal dialogue on the matter.

Is programming math?

Yes.

Why?

Because programming and math are so deeply interlinked that it’s impossible to distinguish them in their core. For example, have a look at type theory. Does it describe mathematics? Yes. Does it describe programming? Yes.

Is that the only reason?

No. Doing programming and doing mathematics feel the same to me. Thinking about proofs feels like debugging programs. Exploring mathematics feels like exploratory programming.

Do I need to be good at math to be a good programmer?

Yes. Programming is math, so you can’t be good at programming if you aren’t good at math.

Do I need to learn math to learn programming?

Yes. Programming is math, so you need to learn math.

What kind of math should I learn to learn program?

Since programming is math, your best bet is to start by learning programming.

What if I’m bad at math? Can I still learn to program?

When you say you’re bad at math, you probably mean you didn’t do too well in the school math class. Programming is math, but luckily it’s very different math from what they teach at schools. You should give it a try, you might be surprised by how good you’re at math.

Do you agree with Sarah Mei?

Yes, even though programming is math.

Does it matter whether programming is math?

Not at all. Math is programming, though, and that is interesting.

When is static typing worthwhile?

Tue, 24 Jun 2014 00:00:00 +0000

Static typing is a crucial tool for constructing correct programs. That does not mean you should always use a statically typed programming language. Often program correctness does not matter and “mostly works” is good enough. you might be better off with a dynamic programming language that has good libraries for solving your problem.

When is static typing worthwhile, then? I have a couple of hunches:

The value of types increases over time. They make it easier to understand large codebases and they ensure that your changes are correct.
The value of types increases together with problem complexity. If the problem is complex, the extra structure given by static typing allows you to concentrate on the problem itself instead of worrying whether you’ve constructed a correct program.
Types boost you in the core of the program, but hinder on the edges. Often you need to interface with the untyped world, e.g. via JSON-speaking APIs. Marshalling the data back and forth takes extra effort in the statically typed world.

Hard things in software engineering

Sun, 04 May 2014 00:00:00 +0000

According to common wisdom, there are only two hard things in computer science: cache invalidation, naming things and off-by-one errors.

In similar vein, here’s my list of hard things in software engineering (not in order):

security
concurrency
dealing with people

What makes them hard is that when solving a security/concurrency/people problem, the difference between a good solution and a catastrophically bad one is often subtle. All of these are commonly seen as much more simple than they are.

Linux problems, April 2014

Sun, 06 Apr 2014 00:00:00 +0000

At work I use Linux, or to be more precise, Ubuntu 12.04. I’ve had all kinds of problems with it and I thought that it’d be nice to track them by making a list. I’ve included the things that I’d really expect to just work by this point.

Recently solved problems

Firefox menus stopped working. When you started the browser, everything would work just fine, but after a while the menus and drop-downs would stop opening. I think it is a known bug in Firefox. It disappeared a while ago, maybe after some software update. Update 2014-04-18: It’s happening again. :(
Chrome tabs would black out. Some pages, typically ones with Flash or some other multimedia content, would suddenly urn totally black. Sometimes when you opened the page again in a new tab, it would work, sometimes not. I think it had something to with nVidia drivers. The problem disappeared after upgrading the drivers and rebooting the computer.
nvidia-settings crashes when you try to apply settings. It still does, but it turns out that the Save to X Configuration File button actually works and you just need to restart the X server after making changes. No more hand-editing Xorg.conf.

Current problems

Whenever I plug in my USB DAC (i.e. soundcard), I need to restart Spotify to hear audio again. It used to work without restarts, but maybe some software update broke it. (The said DAC, NuForce uDAC-2 occasionally overheats, so I unplug it when I leave the office.)
Sometimes, like every time I plug in a USB device, the X keyboard layout gets reset. I use xmodmap to set up a custom layout and this undoes all those customizations. My boss points out that instead of using xmodmap, I should create a proper XKB layout. I guess it would help, but I’ve never encountered this problem with xmodmap elsewhere.

Baudrillard on Jogging

Sat, 05 Apr 2014 00:00:00 +0000

One of the books I’m reading right now is Jean Baudrillard’s The Transparency of Evil. It’s a collection of essays. In one of them, Operational Whitewash, he writes:

Jogging is another activity in the thrall of the performance principle. […] The pleasure (or pain) of jogging has nothing to do either with sport or with the body in its fleshly reality: it is the pleasure not of pure physical exertion but of of a dematerialization, of an endless functioning. The body of the jogger is like one of Tinguely’s machine: ascesis and ecstasis of the performance principle. Making the body run soon gives way, moreover, to letting the body run: the body is hypnotized by its own performance and goes on running on its own, in the absence of a subject, like somnambulist and celibate machine.

I’m not sure if understand what he is saying, but the comparison to Tinguely’s machines is apt. Consider Hannibal II:

I can’t make my mind about the book. Sometimes I think that Baudrillard is just putting words after words without saying anything. Sometimes I feel like he’s doing an excellent analysis of people project a perfected image of themselves in social media – even though the book was published in 1990. People never change, I guess.

Renaming files with zmv

Sun, 09 Mar 2014 00:00:00 +0000

Sometimes you want to rename a bunch of files. For example, maybe you want to rename all the .htm files in the directory to .html. Previously I’d give a shell command like this:

for f in *.htm; do
    mv $f $(basename $f .htm).html
done

It works, but it’s cumbersome. There are better tools available. For example, Perl comes with rename, which lets you do this:

rename 's/\.htm$/.html/' *.htm

Another tool, and the one I’ve been using lately, is zmv, which is a part of Zsh. The .htm renaming goes like this:

zmv '(*).htm' '$1.html'

As a more advanced example, recently I wanted to rename all the subdirectories of a directory to uppercase. This was inside a git repo, so I needed to use git mv instead of plain mv. Here’s how I did it:

zmv -Q -i -p 'git' -o 'mv' '(*)(/)' '${(U)1}'

-Q turns on bare glob qualifiers. This is required for (/), which makes * match only directories.
-i enables interactive mode, as I wanted to manually confirm each renaming.
-p specfifies the program and -o the arguments for it.

To use zmv, you need to load it with autoload -U zmv. See man zshcontrib for documentation. I also highly recommend skimming the Zsh manual on expansion (man zshexpn) – it’s a treasure trove that keeps giving.

Weak views, strongly held

Sat, 01 Mar 2014 00:00:00 +0000

A couple of days ago I read a fascinating essay. It’s Venkatesh Rao’s post The Cactus and The Weasel. He starts from the archetypes of fox and hedgehog, originally introduced by Isaiah Berlin, and goes on to analyze their thinking in detail. I attempted to summarize it, but failed. Just read it yourself.

I’m certainly more of a fox than a hedgehog.

Flappy Sine

Sat, 15 Feb 2014 00:00:00 +0000

My tribute to Flappy Bird. It’s not quite as hard as Flappy Bird, though.

nix-docker and docker volumes

Mon, 03 Feb 2014 00:00:00 +0000

nix-docker is a tool by Zef Hemel that allows you to provision your Docker images using Nix. If that sounds like a good idea, head over to Zef’s blog to read his post Declaratively Provision Docker Images Using Nix. He makes a compelling case.

Over the weekend I experimented a bit with nix-docker. One thing that took me a while to figure out is that if you want to mount a data volume, the mount point must exist on the image.

Zef’s Wordpress example uses a volume mounted at /data. It works, because the Apache service specification takes care of creating the directory. If you want to create the directory yourself, you can add a script to docker.buildScripts.

{ config, pkgs, ... }:
{
  docker.volumes = [ "/data" ];
  docker.buildScripts.createVolumeDirs = ''
    mkdir -p /data
  '';
}

This could be and should be abstracted away, but I’m very much a beginner with Nix. Or maybe nix-docker should take care of creating the directories.

Baana, Helsinki

Tue, 28 Jan 2014 00:00:00 +0000

Compiling assembler files with avr-gcc without C runtime

Mon, 27 Jan 2014 00:00:00 +0000

Written down so that I’ll remember this the next time I’m trying to do this: If you want to compile assembler files with avr-gcc to be used with your C files, just compile them like your .c files.

avr-gcc -c -mmcu=atmega328p -o foo.o foo.S

If you want to run your assembler code standalone, without the C runtime support (crt*.o), you’ll need to tell the linker that the entry point is inside your program instead of the C initialization routines. The command-line option -e for avr-ld is what you’re looking for. Consider the following assembler program:

    #include <avr/io.h>

    .section text

    .org 0
    .global init
init:
    rjmp main

    .org 0x020
    .global main
main:
    ; on Arduino, this will light up the on-board LED
    ldi 16, 0xFF
    out _SFR_IO_ADDR(DDRB), 16
    out _SFR_IO_ADDR(PORTB), 16

loop:
    rjmp loop

To start the execution from init, compile and link it like this:

avr-gcc -c -mmcu=atmega328p -o foo.o foo.S
avr-ld -e init -o foo.elf foo.o
avr-objcopy -O ihex foo.hex foo.elf

Also, check out avr-libc’s manual on assembler programs.

People Never Change

Fri, 03 Jan 2014 00:00:00 +0000

For ages thinkers have been lamenting the common people for their lack of passion in their lives. For example, here’s a quote from Søren Kierkegaard’s Concluding Unscientific Postscript (originally published in 1846, translation by Alistair Hannay):

Every human being is fitted by nature to become a thinker (all honour and praise to the God who created man in his image!). God cannot help it if habit and routine and want of passion, and affectation, and gossiping with neighbours next door and opposite little by little ruin most people so that they become thoughtless – and base their eternal happiness on one thing and then another and then something else – not noticing the secret that their talk about their eternal happiness is an affectation precisely because it is devoid of passion, which is why it can also be so excellently supported by matchstick arguments.

I encountered this quote when reading Torsti Lehtinen’s book Eksistentialismi¹. It’s an overview of existentialism and the main thinkers behind it. All the existentialists found despicable the complacent, petit bourgeois mode of existence. While I don’t have any quotes ready, I’m sure that these thoughts aren’t new. Surely you can find ancient Greeks saying similar things.

This makes me wonder: have there been any philosophers who thought the opposite? Maybe humans aren’t created to be thinkers - maybe the human nature is to be petit bourgeois?

If you’re interested in the subject and you’re able to read Finnish, I highly recommend the book. It was a great introductory read. ↩︎

Peruna, a horse and a medicine

Mon, 25 Nov 2013 00:00:00 +0000

The mascot of the athletic team of Souther Methodist University is a black shetland pony called Peruna. Peruna has been their mascot since 1932 - the current Peruna is the ninth in the succession. Here’s Peruna III:

Peruna got its name from a patent medicine that was sold in the turn of the century. The temperance movement in the US was strong and alcohol dealers faced all kinds of hindrances. Still, many medicines with alcohol were available even in states under prohibition, and they were very popular. Their medical effects were dubious but at least they made you drunk.

What was in Peruna the medicine? According to these reprints, Collier’s Weekly gave the following recipe for home-made Peruna:

Half a pint of 190-proof (95 vol-%) cologne spirits
A pint and a half of water
A little cubebs for flavor
A little burned sugar for color

There’s also a thorough article by Jack Sullivan on the history of Peruna in the May-June 2007 edition of Bottles and Extras (the official publication of the Federation of Historical Bottle Collectors)

Where did the medicine get its name? I don’t know. Peruna means potato in Finnish and I was hoping to learn that the medicine was made of potato alcohol. Maybe Dr. Samuel Brubaker Hartman, the inventor of Peruna, was a Finnish American or something, I thought. Unfortunately that does not seem to be the case.

Photo: [Peruna III bucking][peruna-photo] / Southern Methodist University, Central University Libraries, DeGolyer Library.

Mental Attitude of the Grid

Mon, 18 Nov 2013 00:00:00 +0000

The typographic grid is a graphic designer’s tool for structuring the page. You split the page into a grid and use that as a guide for laying out the design elements. Wikipedia writes:

The grid serves as an armature on which a designer can organize graphic elements (images, glyphs, paragraphs) in a rational, easy to absorb manner.

The word rational is telling. The grid was born during the 20th century inspired by the modernist ideals. It’s not just a practical tool, but a principled one. The definitive book on the grid is Josef Müller-Brockmann’s Grid Systems in Graphic Design. He lists the reasons to use a grid:

economic reasons: a problem can be solved in less time and at lower cost.

rational reasons: both simple and complex problems can be solved in a uniform and characteristic style.

mental attitude: the systematic presentation of facts, of sequences of events, and of solutions to problems should, for social and educational reasons, be a constructive contribution to the cultural state of society and an expression of our sense of responsibility.

A graphic designer is someone who solves problems. The problem they’re solving is how to best communicate visually something. I’ve never seen it put this way on the Internet, but it seems like a very valid way of viewing the profession.

How do they solve the problems? Efficiently, of course, both visually and economically. But they also solve them responsibly and for the betterment of the society.

I was surprised to encounter a call for responsible design in a book on grid systems, but it makes sense. There’s a great talk by Mike Monteiro along similar lines, titled How Designers Destroyed the World. To summarize it:

You are directly responsible for what you put into the world.

The talk is not just for designers, but for everyone who works on products. You could do worse than watch it.

Silver Streetcar for the Orchestra

Sat, 16 Nov 2013 00:00:00 +0000

A colleague pointed me to Silver Streetcar for the Orchestra by Alvin Lucier. Here is one of the performances available on the Internet.

The solo triangle player dampens the triangle with their hand while playing a fast rhytm. They’re exploring how the dampening and the location, speed and loudness of the tapping affect the sound of the triangle. To quote Alvin Lucier:

During the course of the performance, the acoustic characteristics of the folded metal bar are revealed.

I find it wonderful how so rich musical work can come out of so constrained setup. This piece keeps playing in my head.

The name of the piece comes from Luis Buñuel’s surrealist piece of writing Orchestration, which describes the roles of different instruments in an orchestra. A silver streetcar for the orchestra is of course the triangle. It’s a worthy read.

ferret.gif

Fri, 23 Aug 2013 00:00:00 +0000

Why do you write commit messages?

Tue, 20 Aug 2013 00:00:00 +0000

If you write commit messages, take a couple of minutes to consider the following questions:

Who reads your commit messages?
When do they read them?
Why do they read them?

Your answers likely depend on the kind of projects you’re working on. For my personal projects, the answers to the first two are “only me” and “never”. As a result, my commit messages are mostly one-liners along the lines of fix everything. That’s fine, because my projects are one-man short-term operations.

At work, the situation is different. We have around twenty persons working on a codebase with over 150k lines of code and years of history. I can’t tell when (or if ever), my co-workers read my commit messages, but I do know when I read theirs.

The first time I read the messages is when the code is in review. I like to get quick overview of the changes by looking at the commit messages of the change request. However, it’s not a big deal if the messages don’t tell much - the main concern is the code itself.

The second time is more important: it’s when I try to understand the purpose of some old code. This often takes me digging through all the commits that have touched the code. Ideally the code would be self-documenting and there would be an explanatory comment next to it. In practice, the code and the comments get obsolete and out-of-sync. The code might be written or modified by a person who does not fully understand it. A wide-ranging change can miss something. Something obvious to the original author might not be obvious two years later. Sometimes you just need to make an awful hack.

In these cases the original commits introducing the code often helps me understand what is happening. It’s even better when there’s a commit message documenting the thinking behind the change: What problem does this commit solve? Why was it solved this way? Why wasn’t some other way used?

I’ve worked at ZenRobotics for only eight months, so my code hasn’t yet had to take the test of time. Nevertheless I’ve started to make it more future-proof by writing more extensive commit messages. Hopefully in two years I’ll be thanking myself for doing it.

I hope these questions help you figure out what kind of commit messages are useful for you and your contributors.

Setting up nginx for static content with Pallet

Sun, 18 Aug 2013 00:00:00 +0000

In this post I’ll show you how I used Pallet to configure the server hosting this blog.

What and why?

I’m setting up a web server to host a bunch of static HTML, CSS and image files. That’s a pretty simple task, so what is Pallet and why am I using it?

I want to have reproducible server configuration. If I ever need to move quanttype.net to a new server, I do not want to figure out which packages to install and which hand-edited configuration files I need to copy over. Various configuration management tools such as this problem by programmatically applying my configuration to a server.

I’ve earlier dabbled with some of more popular configuration management tools such as Puppet and Chef. This time I chose to use Pallet because it’s simpler and hence suits my simple needs better.

Pallet: cloud automation with Clojure

Pallet is a Clojure library that can be used to describe a server configuration and then apply it to a server. It’s simple to use: all you need is a Clojure REPL on your local computer and SSH connection to your servers. Pallet also suppert automatically spinning up nodes with cloud providers such as Amazon or Rackspace.

Pallet’s abstraction level is somewhere between those of Puppet and Fabric. You can easily use pre-made modules (“crates” in Pallet parlance) for installing and configuring common software like nginx, but you do not need to set up any central servers or repositories.

Acquiring a server

Quanttype is hosted on DigitalOcean, mainly because their prices are cheap. Pallet does not support their API, so I manually created a Ubuntu VM, or a droplet as DigitalOcean calls them. Before using Pallet I did some setup on the server:

Create a non-root user with passwordless sudo access. This is not mandatory: you could use Pallet as root as well, but this the way I like to organize the things.
```
# as root
adduser arcatan
visudo
```
Install your SSH key. Pallet uses SSH to connect to the server and you don’t want to store your password in your Pallet configuration.
```
# on your own computer
ssh-copy-id quanttype.net
```

A new Pallet project

Pallet’s getting started guide suggests to use Leiningen to create a new project with pallet template. I did so, but I didn’t know what to do with all the stuff Leiningen put there, so I revereted back to a plain Clojure project.

lein init quanttype-ops

Frankly, I had a hard time following Pallet’s documentation on this. It covers okay how you do various things, but it’s lacking on how to structure your Pallet project.

The stable release of Pallet is 0.7, but 0.8 is going to be released soon. The first release candidate is out there already, so it’s probably a good idea to base new projects on that.

Pallet is distributed as a Clojure library, so I added [com.palletops/pallet "0.8.0-RC.1"] to project.clj. I’m also going to use nginx-crate, so I’ll add that, too.

(defproject quanttype-ops "0.1.0-SNAPSHOT"
  :description "Pallet for quanttype.net"
  :dependencies [[org.clojure/clojure "1.5.1"]
                 [com.palletops/pallet "0.8.0-RC.1"]
                 [org.clojars.strad/nginx-crate "0.8.3"]])

Main configuration

Now we can start configuring the server. To properly understand this walkthrough, keep Pallet documentation handy.

I put my configuration in the namespace net.quanttype.ops.core. Here’s the ns form:

(ns net.quanttype.ops.core
  (:use
   pallet.actions
   pallet.api
   pallet.compute
   pallet.crate.nginx)
  (:require
   [pallet.crate :refer [defplan]]))

Pallet does not support DigitalOcean’s API, but once you’ve acquired a server, you can just give Pallet a list of IPs of your server and it will happily connect there. To do so, you need to define a node-list compute service:

(def digital-ocean
  (instantiate-provider
    "node-list"
    :node-list
    ;; A list of nodes: [name group IP operating-system]
    [["quanttype.net" "web" "37.139.12.210" :ubuntu]]))

Pallet also needs to know about the user account it should use. As I already set up an account that uses my default SSH key and has passwordless sudo, all I need is to tell Pallet the username.

(def my-user
  (make-user "arcatan"))

First, I’m going to want some packages for easier usage of the server. I created a server specification, which install my packages. In Pallet parlance phase is a sequnce of actions to be executed. Different phases are applied at different times - :bootstrap is applied when a new node is started. I also set my shell to be zsh.

(def with-my-packages
  (server-spec
   :phases {:bootstrap
            (plan-fn
             (packages :aptitude ["git" "zsh" "vim"])
             (user (:username my-user) :shell :zsh)}))

Next, I will set up nginx. This is easiest to with nginx-crate. I used Ryan Stradling’s fork which has been updated to Pallet 0.8. The configuration keys match those of the real nginx configuration files, so it’s best to see nginx documentation to figure out what you need.

(def http-server-config
  {:install-strategy :packages
   :user "www-data"
   :group "www-data"
   :sites [{:action :enable
            ;; Name of the configuration file must be something.site,
            ;; because nginx's main configuration files includes *.site.
            :name "quanttype.site"
            :servers [{:server-name "quanttype.net www.quanttype.net \"\""
                       :listen "80"
                       :index "index.html"
                       :root "/var/www/quanttype.net"}]}]})

With this configuration, nginx serves quanttype.net from /var/www/quanttype.net. Here’s a plan function that makes sure that the directory exists and has the proper rights for me to rsync files there.

(defplan quanttype-directories
  []
  (group "www-data")
  (user "www-data"
        :system true
        :home "/var/www"
        :create-home false
        :shell false
        :group "www-data")
  (exec-script ("usermod" "-a" "-G" "www-data" (:username my-user))
  (directory "/var/www/quanttype.net"
             :owner (:username my-user)
             :group "www-data"
             :mode "0755"))

We need another server specification to encapsulate the nginx configuration. The key here is that it extends (nginx http-server-config), where nginx is a function provided by nginx-crate.

(def quanttype-server
  (server-spec
   :extends [(nginx http-server-config)]
   :phases {:configure (plan-fn (quanttype-directories))}))

Finally, to pull everything together, I define the "web" group. In the node-list we defined quanttype.net’s group as web and this group specification maps it to the server specifications we saw above.

(def web-group
  (group-spec
   "web"
   :extends [with-my-config quanttype-server]))

Applying the configuration

The configuration is applied to the servers with lift and converge. I have only one server, so lift is what I need. Here’s a helper function for executing the config.

(defn execute
  [& args]
  (apply lift
         web-group
         :user my-user
         :compute digital-ocean
         args))

Now, open a REPL and use (execute). Because I didn’t use Pallet to provision the nodes, I need to run :bootstrap myself.

user> (use 'net.quanttype.ops.core)
nil
user> (execute :bootstrap)
user> (execute :install)  # custom phase by nginx-crate to install nginx
user> (execute)           # :configure is executed by default
user> (execute :restart)  # custom phase by nginx-crate to restart nginx

Ta-da, everything is ready and the only thing left is to rsync the content to the server.

Conclusion

Now you’ve seen how to automatically install and configure nginx with Pallet. Hopefully sharing this helps people to get started with Pallet. I found that it’s pretty easy to use after a while, but figuring out what to do at first is hard.

Here’s a to-do list for the future:

Testing the configuration locally in VirtualBox
Automating the pre-Pallet steps
Support for DigitalOcean in Pallet

Another thing I’ve been looking at is Docker. It packages applications in isolated containers that can be easily deployed anywhere (as long as you’re running modern-enough Linux). If it catches on, it might a very good idea for future-proofing the setup. I’m not yet quite sure how it fits in this picture, but I’m keeping an eye on it.

Tomaatti-Sota

Wed, 14 Aug 2013 00:00:00 +0000

You know those animated GIFs from the 1990s with a rotating construction sign, or maybe with a guy digging? This page needs one of those. But what should you do while I'm searching for a suitable GIF?

Well. Do you remember the classic Finnish freeware game Tomaatti Sota? It was made by WizeQuiz and it had an epic story.

"Olipa kerran (ja miksei toisenkin) suuri Tomaatti-Jumala, joka oli huomannut, että hänen kaksi kansaansa, Tuore- ja PakasteTomaatit, elivät sulassa sovussa. Niinpä tähän oli tultava muutos. Ja perkele, sitähän tulikin! Joten nyt he taistelevat loputonta taisteluaan, kunnes piru heidät pieree..."

I guess you could download the game and give it a go, for the old times' sake. It works in DOSBox.

quanttype

Software for myself

Small web apps

Small games

Implementing research

Speeding up compression algorithms

In conclusion

Goblin Mode, or spinning up VMs for feral agents

How does it work?

Problems

Successes

Related work

Let's automate our jobs

The software delivery loop

Programming is over

Programming is over

Code review is over

What about theory building?

Yearnote 2025

Professional life

Software engineering community

Other life events

Outdoors life

Best of 2025

Traditional commentary on Finnish politics

PyCon Finland 2025

PyCon UK 2025

My way to Manchester

The other talks

The hallway track

Manchester

uv and maturin

Why uv?

Setting up the project

Option 1: Let uv handle the rebuild

Option 2: Use maturin develop

Option 3: Use maturin import hook

Managing Python dependencies

Compressing floating point data with Gorilla

FP compression family tree

Storing timeseries data

Leader election with S3 and If-Match

What’s If-Match

What’s leader election?

The locking protocol

Fencing tokens

Python implementation

Does this make sense?

Meta

New winds

Yearnote 2024

Software engineering

Rust in production

Software engineering community

Blogging

Microblogging

Outdoors life

Paddling

Climbing

Best of 2024

What about 2025?

Traditional commentary on Finnish politics

Weeknote 19: ADRs record decisions

Blog meta

Weeknote 18: Code comments

Explain why the code looks wrong

GHC Notes

Weeknote 17: Caching Docker builds on GitHub Actions

Building Docker Compose targets

The registry backend

Gotcha: Multi-arch builds

Weeknote 16: Late code review

Dealing with it

Bottom line

Weeknote 15: Technology radar

Adopt

Trial

Assess

Hold

In conclusion