lachrist

Let’s Reify Effects

2023-05-14T00:00:00+00:00

This is my attempt at demystifying effect systems of pure functional languages and how to implement them in impure imperative languages.

Function Purity

To tell whether a function written in an impure language is pure, just ask yourself whether it can be replaced by a (possibly infinite) mapping from its arguments to its results – nice talk here. This corresponds to the mathematical definition of a function. Effects are what make functions impure:

Input/Output actions.
Never returning.
Crashing the program.
Throwing an exception.
In-memory mutation.

Pros of pure functions:

Easier to test – e.g., no mock db, no file setup.
Type signature offers better documentation.
No race condition means easier concurrency.
Can be memoized.
Easier to reason about (somehow subjective).

Pros of impure functions:

Can actually do something useful by performing I/O actions.

So we want to write programs with pure functions, but we also want to produce programs that have effects on the real world. Because effects are contagious, a function becomes impure when it calls an impure function. That means that if we want to maximize the amount of logic implemented by pure functions, we need to reject effects to the edge of our programs. This architecture has many names: functional core and imperative shell, ports and adapters, or hexagonal – nice talk here.

Supporting Output Actions

One approach to move logic inside the functional core is to reify effects. That is, impure functions are turned into pure functions by making them compute descriptions of their effects rather than letting them directly carry them. These reified effects still need to be interpreted to do anything useful.

// Runtime [impure] //
import { writeFile } from "node:fs/promises";
const run = async (effect) => {
  if (effect.type === "writeFile") {
    await writeFile(effect.path, effect.content, effect.encoding);
  } else {
    throw new Error("unknown effect");
  }
};

// Prelude [pure] //
const writeFileEffect = (path, content, encoding) => ({
  type: "writeFile", path, content, encoding,
});

// Main [pure] //
const main = writeFileEffect("file.txt", "foo", "utf8");

// Runtime [impure] //
await run(main);

At the extreme, every single effect (besides running forever or crashing) carried by the program is reified that way. We then speak about effect system. Effect systems can be made generic and reusable. This is what pure functional languages usually do to support IO actions:

In Haskell, it is called IO.
In Elm, it is called Task.
In Purescript, it is called Effect.

Combining Output Actions

We can add both concurrent and sequential composition to our effect system:

// Runtime [impure] //
import { writeFile } from "node:fs/promises";
const run = async (effect) => {
  if (effect.type === "writeFile") {
    await writeFile(effect.path, effect.content, effect.encoding);
  } else if (effect.type === "concurrent") {
    await Promise.all([run(effect.left), run(effect.right)]);
  } else if (effect.type === "sequence") {
    await run(effect.first);
    await run(effect.second);
  } else {
    throw new Error("unknown effect");
  }
};

// Prelude [pure] //
const writeFileEffect = (path, content, encoding) => ({
  type: "writeFile", path, content, encoding,
});
const concurrentEffect = (left, right) => ({
  type: "concurrent", left, right,
});
const sequenceEffect = (first, second) => ({
  type: "sequence", first, second,
});

// Main [pure] //
const main = sequenceEffect(
  writeFileEffect("file1.txt", "foo", "utf8"),
  concurrentEffect(
    writeFileEffect("file2.txt", "bar", "utf8"),
    writeFileEffect("file3.txt", "qux", "utf8"),
  ),
);

// Runtime [impure] //
await run(main);

Supporting Input Actions with Abstract Machines

Up until now, our effect system could only express entirely static programs that do the same thing every time they are executed. To make our program more dynamic we need to retrieve the values returned by our effects and somehow plug them back to the functional core.

One idea is to turn our main into a function that accepts an input and returns an effect. Then we repeatedly call this function with whatever input is available. To help our main function to make sense of the input we should also bookkeep a state.

// Runtime [impure] //
import { readFile, writeFile } from "node:fs/promises";
const run = async (effect) => {
  if (effect.type === "writeFile") {
    return await writeFile(effect.path, effect.content, effect.encoding);
  } else if (effect.type === "readFile") {
    return await readFile(effect.path, effect.content, effect.encoding);
  } else if (effect.type === "concurrent") {
    return await Promise.all([run(effect.left), run(effect.right)]);
  } else {
    throw new Error("unknown effect");
  }
};

// Prelude [pure] //
const writeFileEffect = (path, content, encoding) => ({
  type: "writeFile", path, content, encoding,
});
const readFileEffect = (path, encoding) => ({
  type: "readFile", path, encoding,
});
const concurrentEffect = (left, right) => ({
  type: "concurrent", left, right,
});

// Main [pure] //
const main = {
  initial: {
    state: "reading",
    effect: readFileEffect("file.txt", "utf8"),
  },
  step: ({state, input}) => {
    if (state === "reading") {
      return {
        state: "writing",
        effect: concurrentEffect(
          writeFileEffect("copy1.txt", input, "utf8"),
          writeFileEffect("copy2.txt", input, "utf8"),
        ),
      };
    } else if (state === "writing") {
      return {
        state: "final",
        effect: null,
      };
    } else {
      throw new Error("invalid state");
    }
  },
};

// Runtime [impure] //
{
  const { step } = main;
  let { initial: { state, effect } } = main;
  while (effect !== null) {
    ({ state, effect } = step({ state, input: await run(effect) }));
  }
}

Our main effectively encodes an abstract machine: an initial state, a final state, and a state transition function. By also reifying the state, abstract machines facilitate formally reasoning about programs and enable powerful introspection techniques — e.g., time travel debugging. However, it doesn’t scale well with complexity, and expressing real-world software requirements as an abstract machine seems like a nightmare. I might be wrong. Maybe they will be the way to write programs in the future. With the advent of AI, who can tell?

reading
  |
  | -> readFile
  | <- "content"
  |
writing
  |
  | -> (writeFile, writeFile)
  | <- [undefined, undefined]
  |
final

It’s worth noting that there are actually two nested abstract machines at play here. What we discussed above corresponds to the external machine that describes the transition of the program between effects. But these outer transitions are themselves composed of many inner transitions dependent on the programming language at hand. A good formalism to express these inner transitions is the CESK machine.

reading
  |
  | -> readFile
  | <- "content"
  |
  writeFileEffect()
  |
  writeFileEffect()
  |
  concurrentEffect()
  |
writing

Supporting Input Actions with Callbacks

Another, more practical way to support input actions is to allow pure functions inside effects. Let’s tweak our effect system to handle these callbacks and use some of the big-brain Haskell names.

// Runtime [impure] //
import { readFile, writeFile } from "node:fs/promises";
const run = async (effect) => {
  if (effect.type === "writeFile") {
    return await writeFile(effect.path, effect.content, effect.encoding);
  } else if (effect.type === "readFile") {
    return await readFile(effect.path, effect.content, effect.encoding);
  } else if (effect.type === "fmap") {
    return effect.mapping(effect.child);
  } else if (effect.type === "liftA2") {
    return effect.combine(... await Promise.all([run(effect.left), run(effect.right)]));
  } else if (effect.type === "bind") {
    return await run(effect.makeSecond(await run(effect.first)));
  } else if (effect.type === "return") {
    return await effect.makeSecond(await run(effect.first));
  } else {
    throw new Error("unknown effect");
  }
};

// Prelude [pure] //
const readFileEffect = (path, encoding) => ({
  type: "readFile", path, encoding,
});
const writeFileEffect = (path, content, encoding) => ({
  type: "writeFile", path, content, encoding,
});
// Effects are functors:
const fmapEffect = (mapping, child) => ({
  type: "fmap", mapping, child,
});
// Effects are applicatives:
const liftA2Effect = (combine, left, right) => ({
  type: "liftA2", left, right, combine,
});
// Effects are monads:
const bindEffect = (first, makeSecond) => ({
  type: "bind", first, makeSecond,
});
const returnEffect = (result) => ({
  type: "return", result,
});

// Main [pure] //
const main = bindEffect(
  readFileEffect("file.txt", "utf8"),
  (content) => liftA2Effect(
    (_write1, _write2) => null,
    writeFileEffect("copy1.txt", content, "utf8"),
    writeFileEffect("copy2.txt", content, "utf8"),
  ),
);

// Runtime [impure] //
await run(main);

The code is more concise and readable, but states are no longer explicit. Indeed, states have been encoded inside the effects. As a result, the two nested abstract machines are no longer cleanly separated.

Also, by polluting effects with functions, they are no longer pure data and cannot be expressed inside a DSL.

Supporting In-Memory Mutations

From a functional point of view, memory mutations are no different from IO actions. They are just faster. Let’s use the same system to support them.

// Runtime [impure] //
const run = (effect) => {
  if (effect.type === "get") {
    return effect.map.get(effect.key);
  } else if (effect.type === "set") {
    return effect.map.set(effect.key, effect.val);
  } else if (effect.type === "log") {
    return console.log(effect.message);
  } else if (effect.type === "bind") {
    return run(effect.makeSecond(run(effect.first)));
  } else {
    throw new Error("unknown effect");
  }
};

// Prelude [pure] //
const getEffect = (map, key) => ({
  type: "get", map, key,
});
const setEffect = (map, key, val) => ({
  type: "set", map, key, val,
});
const logEffect = (message) => ({
  type: "log", message,
});
const bindEffect = (first, makeSecond) => ({
  type: "bind", first, makeSecond,
});

// Main [pure] //
let map = new Map();
const main = bindEffect(
  setEffect(map, "foo", "bar"),
  (_) => bindEffect(
    getEffect(map, "foo"),
    logEffect,
  ),
);

// Runtime [impure] //
run(main); // logs bar

Supporting Timeouts

Up until now, effects had a clear before and after, which only made it necessary to add callbacks in the bind effect combinator. This is no longer the case with timers, which are indeed effects as they depend on the global state of the JavaScript VM. For instance, setTimeout is not referentially transparent because setTimeout(() => {}, 1000) !== setTimeout(() => {}, 1000).

// Runtime [impure] //
const run = (effect) => {
  if (effect.type === "setTimeout") {
    return setTimeout(() => run(effect.callback()), effect.time);
  } else if (effect.type === "clearTimeout") {
    return clearTimeout(effect.timer);
  } else if (effect.type === "log") {
    return console.log(effect.message);
  } else if (effect.type === "bind") {
    return run(effect.makeSecond(run(effect.first)));
  } else {
    throw new Error("unknown effect");
  }
};

// Prelude [pure] //
const setTimeoutEffect = (callback, time) => ({
  type: "setTimeout", callback, time,
});
const clearTimeoutEffect = (timer) => ({
  type: "clearTimeout", timer,
});
const logEffect = (message) => ({
  type: "log", message,
});
const bindEffect = (first, makeSecond) => ({
  type: "bind", first, makeSecond,
});

// Main [pure] //
const main = bindEffect(
  setTimeoutEffect(
    () => log("this should not be printed"),
    2000,
  ),
  (timer) => setTimeoutEffect(
    () => clearTimeoutEffect(timer),
    1000,
  ),
);

// Runtime [impure] //
run(main);

Many other effects also require a callback directly inside the effect rather than inside the bind combinator: listening to HTTP requests, listening to user clicks on a button, etc. Note that callbacks inside the bind combinator can be nicely chained whereas callbacks directly inside the effects require manual nesting to be composed. That is the reason why you should prefer promises wherever they make sense despite what Mikeal Rogers says here.

Takeaway

If you have the opportunity to work with a pure functional language, great. But let’s be real, most of us are stuck with imperative languages. Nonetheless, I found out that reasoning in terms of functional core and imperative shell was helpful. And, rejecting the effects to the border of the program is generally a good idea. I sometimes use effect reification as a design pattern to achieve this. Even if this is not a full-blown generic effect system, it is already beneficial.

I had to instrument JavaScript files for my work. This required fetching the associated source map file and source files. I ended up rejecting readFile effects by reifying them into URL requests:

import { readFile } from "node:fs/promises";

// https://github.com/getappmap/appmap-agent-js/blob/055d138abb9dba260db8fc95ad412dcf339be3e4/components/instrumentation/default/codebase.mjs#L21
export const extractMissingUrlArrayPure = (url, files) => {
  // may return:
  // - the file url to be instrumented
  // - the url of the source map file
  // - the url of the source files
};

// https://github.com/getappmap/appmap-agent-js/blob/055d138abb9dba260db8fc95ad412dcf339be3e4/components/instrumentation/default/index.mjs#L21
export const instrumentPure = (url, files) => {
  // return: the instrumented content of the file.
};

// https://github.com/getappmap/appmap-agent-js/blob/055d138abb9dba260db8fc95ad412dcf339be3e4/components/agent/default/index.mjs#L77
export const instrumentImpure = async (url, content) => {
  const files = new Map([url, content]);
  while (true) {
    const urls = extractMissingUrlArrayPure(url, file);
    if (urls.length === 0) {
      return instrumentPure(url, file);
    }
    for (const url of urls) {
      files.set(url, await readFile(url, "utf8"));
    }
  }
};

Mutations make me paranoid

2023-04-27T00:00:00+00:00

The sad, sad reality of JavaScript and most programming languages out there is that there is little to no guarantee on what a function can do.

Little Space Guarantee

The amount of value that a function can reach and mutate is enormous. That is because the object graph is super connected. Think of the famous “gorilla and banana” problem.

I think the lack of reusability comes in object-oriented languages, not functional languages. Because the problem with object-oriented languages is they’ve got all this implicit environment that they carry around with them. You wanted a banana but what you got was a gorilla holding the banana and the entire jungle. – Joe Armstrong

I would distinguish between two flavors of mutation. First, the bad: mutation of arguments. But at least it does not break referential transparency. And the caller can still check out the argument after calling the function to see what’s up. I’m guilty of using those and even getting comfortable doing so. It is neat for implementing state transition functions without having to reconstruct an entire new state. I mean by state transition functions, functions of type: (state, input) -> (state, output). That is pretty much any method you would find on objects in OOP.

// state transition function:
export const createGen = (seed) => ({ seed });
export const random = (gen, min, max) => {
  gen.seed = computeNext(gen.seed);
  return randomFromSeed(gen.seed, min, max);
};

// With OOP syntactic sugar:
export class Gen {
  constructor (seed) {
    this.seed = seed;
  }
  random (min, max) {
    this.seed = computeNext(this.seed);
    return randomFromSeed(this.seed, min, max); 
  } 
};

Second, the ugly: mutation of values in free variables. This is worse because it introduces implicit state and breaks referential transparency. The caller has no idea what is going on. The worst you can do is mutating values reached by global variables. People will tell you to never do that. But I don’t care; sometimes you have to do it. I do dynamic program analysis for a living, and sometimes I have to do this abomination even if it always bites me in the ass. This kind of code is hard to maintain and often requires diving deep into it to understand the link between seemingly unrelated parts.

// Original version:
const square = (x) => x * x;
// Instrumented version:
const squareInstrumented = (x) => {
  LOG.push("begin-square");
  try {
    return x * x;
  } finally {
    LOG.push("end-square");
  }
}; 

No Time Guarantee

Bloody hell, mutations can even happen after the function returns. Now the caller is getting real paranoid. It is not sufficient to check whatever mess the function did right after it returned. But it could mess things later! I struggled with this recently. I needed to record some events and flush them. I ended up doing something like this:

// hook.mjs
import process from "node:process";
import { createHook } from 'node:async_hooks';
export const hook = (buffer) => {
  process.on("uncaughtErrorMonitor", (error) => {
    buffer.push({
      type: "error",
      error,
    });
  });
  createHook({
    before: (id) => {
      buffer.push({
        type: "before",
        id,
      });
    },
    after: (id) => {
      buffer.push({
        type: "after",
        id,
      });
    },
  }).enable();
}

// flush.mjs
import { Socket } from "node:net";
export const flush = (buffer) => {
  const socket = connect("localhost:8080");
  setInterval(() => {
    socket.write(JSON.stringify(buffer));
    buffer.length = 0;
  }, 1000);
};

// main.mjs
import { hook } from "./hook.mjs";
import { flush } from "./flush.mjs";
const buffer = [];
hook(buffer);
flush(buffer);

I think this is bad code because the interaction between hook.mjs and flush.mjs is not immediately apparent from main.mjs. Adapting hook.mjs and flush.mjs to use callbacks can make this interaction explicit.

// main.mjs
import { hook } from "./hook.mjs";
import { flush } from "./flush.mjs";
const buffer = [];
hook((event) => {
  buffer.push(event);
});
flush(() => buffer.splice(0, buffer.length));

Maybe we can write saner JavaScript code by following these two rules:

Arguments can only be mutated synchronously.
Asynchronous mutations can only happen with explicit callbacks.