Nick Fisher's Blog

Managing 32-bit floating point precision with relative camera offsets

Tue, 09 Dec 2025 00:00:00

Filament vertex shaders expose the getWorldFromModelMatrix() function, which is used to compute the position of the vertex in 'world space':

void materialVertex(inout MaterialVertexInputs material) { vec3 position = getPosition().xyz; material.worldPosition.xyz = (getWorldFromModelMatrix() * vec4(position, 1.0f)).xyz; }

If your material uses vertexDomain: object, you don't need to explicitly compute this value; this has already been computed internally before materialVertex() is invoked (see initMaterialVertex in surface_material_inputs.vs and computeWorldPosition in surface_getters.vs).

With an identity transform, the tuft of grass is rendered at (0,0,0) in world space, as we'd expect

Since this matrix transforms the position of each vertex from model space to 'world space', this is conceptually similar to the model matrix. However, there's one important qualification in the Filament documentation:

“world space” in Filament's shading system does not necessarily match the API-level world space. To obtain the position of the API-level camera, custom materials can use getUserWorldFromWorldMatrix() to transform getWorldCameraPosition().

This suggests that the 'world space' position computed from getWorldFromModelMatrix is not the 'real' world space position of the vertex. We can verify this by omitting the world transform when setting 'material.worldPosition':

void materialVertex(inout MaterialVertexInputs material) { vec3 position = getPosition().xyz; material.worldPosition.xyz = position; }

If material.worldPosition was the 'real' world space position for the vertex, we would expect to see the tuft of grass rendered at the same position as above. That's not the case:

We see something quite different; in fact, the tuft of grass actually moves with the camera:

This suggests that material.worldPosition and getWorldFromModelMatrix() are actually relative to the camera position rather than the true world position.

Let's check the Filament source to see what's going on. getWorldFromModelMatrix returns the value of the uniform object_uniforms_worldFromModelMatrix, which has been set to the value of sceneData.elementAt<WORLD_TRANSFORM>() in FScene::prepareVisibleRenderables. This latter value itself is shaderWorldTransform in FScene::prepare:

const mat4f shaderWorldTransform{ worldTransform * tcm.getWorldTransformAccurate(ti) };

getWorldTransformAccurate() returns the 'real' world transform for the renderable instance. worldTransform was passed from FRenderer::renderJob to FView::prepare and FScene::prepare, and is actually computed in FView::computeCameraInfo:

CameraInfo FView::computeCameraInfo(FEngine& engine) const noexcept { double3 translation; FCamera const* const camera = mViewingCamera ? mViewingCamera : mCullingCamera; if (engine.debug.view.camera_at_origin) { // this moves the camera to the origin, effectively doing all shader computations in // view-space, which improves floating point precision in the shader by staying around // zero, where fp precision is highest. This also ensures that when the camera is placed // very far from the origin, objects are still rendered and lit properly. translation = -camera->getPosition(); } return { *camera, mat4{ rotation } * mat4::translation(translation) }; }

So when camera_at_origin is true (the default), worldTransform is actually the negative of the camera position. Multiplying the vertex position by getWorldFromModelMatrix() in the vertex shader actually returns the position of the vertex relative to the camera. Or, to think of it another way, the world origin is moved to the camera origin before any vertex or fragment calculations take place.

Per the documentation, we can recover the "true" world space position of the camera by multiplying getWorldCameraPosition() by getUserWorldFromWorldMatrix(). From src/ds/ColorPassDescriptorSet.cpp, we see this latter value is simply the inverse of the camera world transform calculated in computeCameraInfo above:

s.userWorldFromWorldMatrix = mat4f(inverse(camera.worldTransform));

So the matrix returned by getWorldFromModelMatrix() translates the vertex to the position of the camera, and getUserWorldFromWorldMatrix moves it back. Filament's "world space" is therefore what we refer to above as "camera offset space", and Filament's "API-level world space" is what I refer to as "true worldspace".

The reason for this is floating point precision.

32-bit floating point numbers

A 32-bit number ranges from 0000 0000 0000 0000 0000 0000 0000 0000 and 00 00 00 00 to 1111 1111 1111 1111 1111 1111 1111 1111 and FF FF FF FF in binary and hexadecimal respectively.

If this represents an IEEE 754 floating point number, the smallest (negative) and largest (positive) finite values will be:

1111 1111 0111 1111 1111 1111 1111 1111, FF 7F FF FF and -3.40282346638528859811704E+38
0111 1111 0111 1111 1111 1111 1111 1111, 7F 7F FF FF and 3.402823466385288598117042E+38

in binary, hexadecimal and scientific decimal notation respectively (or in C, -FLT_MAX and FLT_MAX). All-zeros/all-ones aren't used for FLT_MIN/FLT_MAX because some bits are reserved for infinity, signed zeros, etc.

Consider a camera positioned at (FLT_MAX, 0.0, 0.0). By decrementing the last bit of FLT_MAX (i.e. 7F 7F FF FE), we will find the closest 32-bit floating point number to FLT_MAX in the direction of the origin. This equates to 3.402823263561192561600338E+38 in decimal, roughly 2e31 world units away from FLT_MAX! Clearly, there is a huge loss of precision when working with large floats.

Let's look at the following triangle:

 (0,0.5,0) /\ / \ (-0.5,0.0,0) /____\ (0.5,0.0,0)

Imagine we want to translate this triangle by (10_000_000, 0, 0) (or 4B 18 96 80 in hex float 32). We would expect the vertices to be positioned like so:

 (10_000_000,0.5,0) /\ / \ (9_999_999.5,0.0,0) /____\ (10_000_000.5,0.0,0)

However, neither 9_999_999.5 nor 10_000_000.5 can be exactly represented as a 32-bit floating point number. In hexadecimal, the closest 32-bit floating point numbers will be at either (9_999_999, 0, 0)/4B 18 96 7F in the negative direction or (10_000_001, 0, 0)/4B 18 96 81 (in the positive direction). Applying this transform will most likely round both values to 10_000_000, collapsing the triangle to a single point. We can verify this with numpy:

>>> import numpy >>> numpy.float32(10_000_000) + numpy.float32(0.5) 10000000.0 >>> numpy.float32(10_000_000) + numpy.float32(-0.5) 10000000.0

Python's float is almost always a 64-bit floating point number, so I'm using numpy to make sure I'm working with 32-bit floats.

Since floating point numbers have greater precision the closer you get to zero, if we're forced to work with only 32-bits, we can squeeze more precision by reducing the absolute size of the value it contains.

For example, we could adjust the model matrix so that all transforms are specified relative to the camera position, rather than the world origin.

Imagine a camera positioned at (9_999_990, 0, 0), looking at our triangle above. In "camera offset space", the transformed triangle origin is at (10,0,0) (rather than (10_000_000, 0, 0) in "real world space"). This leaves us with sufficient precision to render the triangle above:

>>> numpy.float32(10) + numpy.float32(0.5) 10.5 >>> numpy.float32(10) + numpy.float32(-0.5) 9.5

Returning to Filament, it should be clear that moving the world origin to the camera origin is Filament's way to maximize precision for floating point calculations.

July 2025 - Open Source Update

Wed, 30 Jul 2025 16:00:00

Quick update regarding some of my recent open source activity.

Thermion v0.3.0 is now available on pub.dev. This was the culmination of months of internal refactoring, mostly to expose more of the public Filament API to the internal Dart API. This made it considerably easier to implement things like texture projection (which required multiple views/render targets), as well as various miscellaneous rendering options like fog, custom camera projections, and so on. This also built heavily on the js_interop work I did, which means the web version has now reached feature parity with the desktop & mobile version. Here's a quick demo of a basic game built with Thermion, running in Chrome:

I also spent around a week to extract the timeline/keyframe system I had developed for mixreel into a standalone package:

I hadn't planned to do this, but when someone asked if I knew any open source packages that did the same thing, I figured it would be a good excuse to clean up my work and abstract everything away from the mixreel internals.

I'm glad I did so, because it forced me to clean up the interfaces and fix some of things that had really been bugging me. The published version is available on pub.dev here. The source repository is available on GitHub here.

I also quickly vibe-coded yet another static site generator. This one uses markdown/Liquify templates to output static HTML; the earlier Jaspr version was proving overkill for rendering mostly static HTML.

Dart + WebAssembly with Javascript Interop

Fri, 13 Jun 2025 00:00:00

The lack of proper web compatibility for Thermion has been bugging me for some time. This hasn't been for lack of trying - the main obstacle was that Dart doesn't have an (easy) path for direct interop with WebAsssembly libraries.

Native targets (Android, iOS, macOS, Linux and Windows) can all use dart:ffi to invoke external functions directly and pass around native data structures and pointers to native memory. This was previously partially available when compiling to WebAssembly, but after the public API was removed in 2024, the only officially supported method for interop with WebAssembly libraries is (indirect) invocation via Javascript interop (which I'll cover in more detail below).

Dart FFI interop for native targets

First, let's recap the FFI approach. Say we have a header file my_native_func.h:

int my_native_function(int number);

and matching implementation my_native_func.c:

int my_native_function(int number) { return number+1; }

and we've already compiled this library to my_native_func.dylib.

We now want to write a Dart program to load this library and invokes this method. In our main.dart, we could write the following binding:

@Native<Int32 Function(Pointer<Uint8>)>(isLeaf: true) external int my_native_function(Pointer<Uint8> addr);

and then invoke it directly in our main function:

 // I didn't explain how to compile/link the native library // because it's not relevant to this topic, // so just assume this loads the library loadDynamicLibrary(); final value = my_native_function(0); print("Value: $value"); }

Running dart run main.dart will print Value: 1.

(base) nickfisher@Nicks-Mac-mini % dart run main.dart Value: 1

Dart JS interop for WebAssembly

When compiling to WebAssembly, however, this won't work. In fact, it will throw a compile-time error complaining that dart:ffi is not available for the target platform longer.

(base) nickfisher@Nicks-Mac-mini % dart compile wasm main.dart main.dart:1:1: Error: 'dart:ffi' can't be imported when compiling to Wasm. import 'dart:ffi'; ^ main.dart:14:37: Error: The '.address' expression can only be used as argument to a leaf native external call. final value = my_native_function(data.address);

Luckily, Dart has very robust options for Javascript interop (package:web and dart:js_interop). WebAssembly compilers generally don't just produce .wasm modules, they can also generate Javascript wrappers to allow those WebAssembly modules to be loaded and executed in Javascript runtimes (like v8/Node/etc).

Assuming we have my_native_func.wasm (the WebAssembly module) and my_native_func.wasm.js (the corresponding Javascript wrapper), we can invoke our "native" function like so:

@JS('my_native_function') external JSNumber my_native_function(JSNumber addr); void main(List<String> args) { // Similarly, just assume this loads the Javascript module loadDynamicLibrary(); final value = my_native_function(0.toJS); print("Value: ${value.toDart}"); }

While this looks superficially similar to the dart:ffi approach, there are a few key differences:

the @JS annotation indicates where to find the function in the global JS namespace
rather than defining the argument/return types as a primitive Dart int, this converts the argument and return types to/from the JS equivalent type respectively. This isn't actually necessary (if you define these as int, the Dart compiler will take care of the conversion for you), but I have kept these in to explicitly illustrate that you are passing Javascript types around - not native WebAssembly types.

The problems with JS interop

If the native library only comprises a single native function, then it's no problem to do this by hand. If/when something changes in the native library, it's trivial to edit the sole JS binding manually.

A package like Thermion, though, exposes more than 500 native function definitions. While it's possible to write every single binding by hand, it's very time-consuming and tedious.

Using native FFI, this isn't a problem; the ffigen package auto-generates bindings practically instantly. There's no equivalent for js_interop.

What's more, FFI interop types are slightly different from JS interop types. Take the following native function definition as an example.

void my_native_function(uint8_t* ptr);

With dart:ffi, the binding is straightforward:

@ffi.Native<ffi.Void Function(ffi.Pointer<Uint8>)>(isLeaf: true) external void my_native_function( ffi.Pointer<Uint8> ptr, );

Without dart:ffi, there's no Pointer or Uint8 class - so how do you know which Javascript data type will be accepted/returned by your native function? You need to roll your own, which requires at least passing familiarity with the WebAssembly ABI and calling conventions. You might then also need to maintain two separate calling libraries - one for FFI, one for JS interop WebAssembly.

jsgen to the rescue

The obvious solution to these problems is a code-generator that does all of this for you. That's exactly what I did with jsgen - forked the ffigen package to generate JS interop bindings from C headers (and additionally, to reimplement most dart:ffi and package:ffi Dart types).

Where possible, I reused the interfaces and syntax from dart:ffi; this makes it much easier to use as a drop-in replacement for bindings generated with ffigen. For example, we only need to slightly refactor main.dart in the above example to use conditional imports:

import 'bindings.dart'; void main(List<String> args) { // I didn't explain how to compile/link the native library // because it's not relevant to this topic, // so just assume this loads the library loadDynamicLibrary(); final value = my_native_function(0); print("Value: $value"); }

In bindings.dart:

export 'bindings.ffi.g.dart' if (dart.library.io) 'bindings.ffi.g.dart' if (dart.library.js_interop) 'bindings.js.g.dart';

In bindings.ffi.g.dart:

@Native<Int32 Function(Pointer<Uint8>)>(isLeaf: true) external int my_native_function(Pointer<Uint8> addr);

In bindings.js.g.dart:

@Native<Int32 Function(Pointer<Uint8>)>(isLeaf: true) external int my_native_function(Pointer<Uint8> addr);

With this approach, you only need to refactor the import paths, not every invocation of my_native_function. This was a huge relief when migrating Thermion; I would have needed to a full day to refactor otherwise.

Shared memory

Unfortunately, Javascript interop isn't an exact substitute for FFI interop. Some FFI data structures don't have equivalents in Javascript. For example, Thermion often uses the following structure to pass byte data from Dart to native code:

@Native<Void Function(Pointer<Uint8>)>(isLeaf: true) external void native_load(Pointer<Uint8> addr); Future loadGltf(String assetPath) async { var byteData = await rootBundle.load(assetPath); var buffer = await byteData.buffer.asUint8List(byteData.offsetInBytes); native_load(buffer.address); }

The .address accessor and the isLeaf: true annotation here are key. With the latter, the Dart garbage collector is guaranteed not to run before native_load has finished executing. This ensures the memory location of the Dart buffer object will not move while the function is executing; the Pointer derived from the .address accessor will therefore remain valid for the lifetime of the invocation. Similarly, the .address accessor cannot be used if the native function being invoked is not explicitly marked as a leaf call.

js_interop has no such equivalent. .address isn't a runtime lookup, it's actually a compiler transformation which simply isn't available when compiling Dart to Javascript or WebAssembly.

That's not all though - the whole concept doesn't make sense in the context of Javascript interop, where Dart code runs in one memory space and WebAssembly code in another. Just because we can use Javascript to communicate between the two doesn't mean that memory addresses are mutually accessible.

So conceptually, we first need to figure out how to implement some form of shared memory between Dart and WebAssembly.

Luckily, with the emscripten compiler, we can allocate memory on the stack and/or heap and pass this around as a typed Javscript array:

emscripten::val emscripten_make_uint8_buffer(int length) { uint8_t *buffer = (uint8_t*)malloc(length); auto v = emscripten::val(emscripten::typed_memory_view(length, buffer)); return v; }

If you're not familiar with emscripten, here's a high level overview:

malloc allocates length bytes of memory on the emscripten heap
emscripten::typed_memory_view creates a view on the underlying buffer (so we can pass around a typed array without copying the underlying data)
emscripten:val creates a Javascript object to that can be passed directly back to Dart as a JSObject

With Dart static extensions, I can create a compatible implementation of the .address extension on dart:typed_data classes like so:

@JS('emscripten_make_uint8_buffer); external JSUint8Array emscripten_make_uint8_buffer(int length); @JS('free); external void emscripten_free(void* ptr); final _allocations = <Uint8List, int>{}; extension Uint8ListExtension on Uint8List { Pointer<Uint8> get address { if (this.lengthInBytes == 0) { return nullptr; } final jsTypedArray = emscripten_make_uint8_buffer(this.lengthInBytes); jsTypedArray.toDart.setRange(0, length, this); final ptr = Pointer<Uint8>(jsTypedArray.offsetInBytes); _allocations[this] = ptr; return ptr; } void free() { if(_allocations.contains(this)) { emscripten_free(_allocations[ptr]!); _allocations.remove(this); } } }

Accessing .address on an instance of Uint8List will create a JSUint8Array via emscripten, copy the contents of the Dart UInt8List and return a pointer to the emscripten heap to the caller. This pointer can be passed directly back to a native function exactly as we do with FFI interop:

Future loadGltf(String assetPath) async { var byteData = await rootBundle.load(assetPath); var buffer = await byteData.buffer.asUint8List(byteData.offsetInBytes); native_load(buffer.address); buffer.free(); }

The memory copy is unfortunate, but unavoidable unless you have complete control over how the Uint8List was constructed in the first place. You also now need to manually track the lifetime of the additional native allocation, which is painful.

Until Dart fully supports direct WebAssembly interop (rather than indirect interop via Javascript), that's what we're stuck with.

About Me

Thu, 12 Jun 2025 16:00:00

Currently doing a lot of backend/AI agent work, speech-to-text, text-to-speech, 3D animation. Mostly working with Dart, C++, WebAssembly, Python/PyTorch, Typescript and Blender/bpy.

My projects:

Get in touch via GitHub, BlueSky, Twitter, Mastodon or e-mail.

Building a (Mini) 3D Flutter Game - Part 1

Fri, 03 May 2024 00:00:00

I've been working on flutter_filament for some time now, a package that enables cross-platform 3D rendering in Flutter apps with the Filament Physically Based Rendering library.

I still haven't managed to write a blog post on the package itself, but if you're interested, in the meantime you can check out a presentation I gave at the Singapore Flutter meetup here and this GitHub issue for a high-level overview.

From Renderer To Game

I'm not a game developer or designer (aside from a few toy projects in my university days), but I'd been itching to extend flutter_filament with a basic game functionality for some time now.

When Google launched a Flutter game competition earlier in the year, it gave me a convenient excuse to set aside some time to do so.

That being said, no-one was sponsoring me to write a game, let alone a game engine. With bills to pay, I couldn't take too much time away from paying work, so I needed to be very judicious and implement only the absolute bare minimum to make a game (hence the "mini" game).

The initial game concept was intentionally basic : paddle a canoe down a river, fish rubbish out of the water with a net. Different objects would be worth different points, and the objective would be to collect as many points as possible before reaching the end of the river.

You've probably realized that this is not the game I ended up submitting for the competition. After I went through the engine work below, I started afresh with a different concept, which I'll cover in part 2.

The 2D UI overlay would be handled by Flutter, and I already had a renderer in flutter_filament, so I was able to render/transform 3D objects, add lights/skybox, and play animations.

What I needed to add, though, was:

the ability to attach an over-the-shoulder camera
collision detection to stop the canoe itself clipping through the river banks, and to detect when an object was caught by the net
keyboard/mouse controls for moving the character and triggering the animations

(1) was very straightforward, simply because I could cheat and avoid the issue by exporting the canoe/character model from Blender with a camera node as a child. This was time away from paid work, I had no shame in taking shortcuts every which way I went!

Collision detection

Collision detection wasn't going to be as trivial as that, but I was hoping I could get away with something simple like the following pseudo-code:

void collides(Entity entity1, Entity entity2) { // implement this } void calculateCollisions() { for(auto entity1 : scene) { for(auto entity2 : scene) { if(entity1 != entity2 && collides(entity1, entity2)) { collisionCallback(entity1, entity2); } } } } void renderLoop() { while(true) { calculateCollisions(); render(); } }

This O(N^2) complexity in the hot path would obviously be terrible for a real game, but this concept only needed a few dozen renderable entities so I didn't expect it to be a problem.

Implementing the actual collides(...) method didn't appear too difficult at first glance either. The Filament library (and the glTF format more generally) expose axis-aligned bounding boxes for assets, so I thought I could get away with something as simple as:

auto aabb1 = worldTransform(entity1,entity1.aabb); auto aabb2 = worldTransform(entity2,entity2.aabb); for(auto vertex : aabb1.vertices) { if(aabb2.contains(vertex)) { return true; } }

However, there were a few problems with this simple approach.

One is that Filament uses the rest pose of the model to calculate the bounding box when it is imported. This isn't a problem for static (i.e. non-animated) models - like determining whether the canoe hit the river bed. But for a character model with a swipe animation, the bounding box remains fixed and doesn't account for the fact that the animated limb is now "outside" this box.

Another is that this is an axis-aligned bounding box, meaning that the extent along each axis changes depending on the rotation of the model. This means that the AABBs can intersect, even though visually, there's no collision.

The other problem is that the bounding box of the top-level entity is (obviously) larger than the bounding box of the actual object we want to test (the end of the net). We only want to award points when the net hits the floating object in the water, not (for example) when the canoe reverses into one.

In keeping with the spirit of "do as little work as possible", I avoided the issue again by attaching a number of small hidden cubes in Blender to the collidable parts (the riverbanks, the canoe and the front of the canoe where the scoop animation would intersect with the water).

Keyboard/mouse control

At this stage I was mostly working on the desktop (MacOS) version, so I wanted to be able to use conventional FPS controls to move the character (WASD keys for forward/back/strafe, mouse movement for look and the mouse button for the "swing net" action).

In a normal game engine, you'd expect to be able to collect/process user input inside the main loop:

void main() { while(true) { processInput(); calculateCollisions(); waitForVsync(); render(); } }

This doesn't quite fit the way that flutter_filament is structured though, where the Flutter UI loop is running on the main thread and a separate render loop running on a background thread.

There's no inherent reason why we couldn't process keyboard and mouse events in both loops. But with the Flutter framework providing tools to handle user input across all supported platforms, why reinvent the wheel?

I had already implemented basic manipulation via Flutter GestureDetector widgets for the main 'scene' camera, so it was relatively straightforward to extend this to manipulating the camera attached to the model.

To maintain consistent movement speed (and to stop the transform updates when a collision is detected), though, I needed to queue up the user input so it was only processed once per (render) frame.

As a side note, I don't think there's any inherent reason why I couldn't restructure flutter_filament to run more like a conventional game loop:

void main() { while(true) { processInput(); calculateCollisions(); waitForVsync(); flutterEngine.tick(); render(); } }

In this structure, the Flutter engine would render into an offscreen render target, and the game is then responsible for compositing this at the top of the scene view. This strikes me as functionally similar to the Flutter "add-to-app" scenario, so this is probably feasible - I just haven't had any compelling reason to do so yet.

In the next post, I'll go into some more detail on the work needed for the second iteration (GPU instancing, menu callbacks on entity mouseover).

Static Blog Generator with Dart & Jaspr

Wed, 01 May 2024 16:00:00

I was happy to recently discover the jaspr project, a Dart framework for generating dynamic and static HTML/JS pages with a pseudo-Flutter syntax.

It's a far better alternative to Flutter web, at least for general purpose web pages or even certain basic web applications.

For one, SEO is a non-starter when it comes to Flutter web apps. Text elements are rendered straight to a canvas, so search engine crawlers will have a hard time indexing content.

The bundle size for a Flutter web app is also relatively large: last I checked, release builds using the HTML renderer were ~1.4mb (and a CanvasKit build clocked in at around ~6.5mb!). That's way too much for a static site, where load time is a priority.

I was looking for a Dart-based static site generator to consolidate a few of my sites (including this one). I had been using a mixture of Hugo, PHP, and vanilla HTML/JS, and it was getting to be a pain. I wanted a simple tool that I could point to a directory of .md files, and generate static HTML with the appropriate routing and styling for upload directly to Cloudflare.

jaspr's static site generation offered the majority of what I needed, and I wrote a small extension to cover the remainder (markdown parsing, basic templating, route configuration). I've open-sourced the repository, though (as usual) it's not very well documented at this stage.

I've started eating my own dog food, having moved over both this site and the Polyvox blog. I've noticed a few painful omissions (code formatting is broken, and there's no way to specify custom tags or otherwise directly embed Javascript or an iframe), which I'll gradually address when I have the time.

3D PBR with Flutter - Talk at Singapore Flutter Meetup

Tue, 20 Feb 2024 00:00:00

Here are the slides from my talk the Singapore Flutter Meetup on my flutter_filament rendering package.

Backpropagation with asymmetric weights

Sun, 15 Sep 2019 00:00:00

A number of recent papers have explored learning in deep neural networks without backpropagation, often motivated by the apparent biological implausibility of backpropagation.

Supposedly, the brain's neurons can only transmit electrical impulses in a single direction; neurons have no way of "communicating" error backwards.

So while backpropagation is an effective method for training neural networks, it is not a reasonable analogue for biological learning.

Now I haven't the faintest clue about neuroscience, so I can't comment on whether machine learning does or should replicate biological learning.

But I do find "learning without backpropagation" very intriguing for two practical reasons.

First, since the gradients at $layer_{t-1}$ depend on the gradients at $layer_{t}$, backpropagation isn't an inherently parallelizable algorithm. This is a practical bottleneck for efficiently training larger networks.

Second, backpropagation is still computationally expensive. Even where the backwards pass requires only as many operations as the forward pass, this still doubles the effective training cost for a given network.

So is it possible to train a network with conventional gradient-based methods while minimizing the downsides to backpropagation?

According to the theory of feedback alignment, the answer is yes!

At a high level, feedback alignment means using random weights to communicate an error signal back through a network, rather than reusing the same weights during the forward and backward passes.

I found this pretty counterintuitive when I first came across the idea. How can you train a network if the error signal is essentially random? Not that I didn't believe the paper, but I really needed to write it out from scratch to really understand what's going on.

Backpropagation refresher

Let's revisit the theory behind backpropagation. I'm going to assume some basic familiarity - if you need a refresher, there are dozens of existing tutorials. I recommend Chapter 2 of Michael Nielsen's book.

Let's take a basic neural network with $t$ layers and a single real-valued output.

The output layer and loss function can be depicted as follows:

At the end of the forward pass, the network outputs a scalar prediction $\hat{y}$. A loss function $L$ is applied to $\hat{y}$ and $y$, returning a scalar representing the error of the network.

During backpropagation, we first calculate the error of the weights at layer $t-1$ (i.e. $W_{t-1}$) with respect to this loss.

We'll see shortly that this is used to calculate the error at $layer_{t-2}$.

In other words, we propagate the total network loss back through every intermediate layer in the network - hence the term backpropagation.

But what does this mean mathematically?

The network's output is calculated by multiplying $W_{t-1}$ by the activation outputs from the previous layer (zt-2).

$$ \hat{y} = W_{t-1}z_{t-2} $$

For stochastic gradient descent, the update rule for $W_{t-1}$ will be as follows:

$$ W_{t-1} := W_{t-1} - \frac{\partial L}{\partial W_{t-1}} W_{t-1} $$

Per the chain rule:

$$ \frac{\partial L}{\partial W_{t-1}} = \frac{\partial L}{\partial \hat{y}} \frac{\partial \hat{y}}{\partial W_{t-1}} = \frac{\partial L}{\partial \hat{y}} z_{t-2} $$

Since the derivative of $(1)$ with respect to $W_{t-1}$ is $z_{t-2}$.

Now let's look at $z_{t-2}$, the activation outputs from the preceding layer.

$$ z_{t-2} = W_{t-2}z_{t-3} $$

The update rule for $W_{t-2}$ is as follows:

$$ W_{t-2} := W_{t-2} - \frac{\partial L}{\partial W_{t-2}}W_{t-2} $$

And again, by the chain rule:

$$ \frac{\partial L}{\partial W_{t-2}}= \frac{\partial L}{\partial \hat{y}} \frac{\partial \hat{y}}{\partial z_{t-2}}\frac{\partial z_{t-2}}{\partial W_{t-2}}$$

Since

$$ z_{t-2} = \sigma(W_{t-2}z_{t-3}) $$

then

$$ \frac{\partial z_{t-2}}{\partial W_{t-2}} = \sigma\prime(W_{t-2}z_{t-3})z_{t-3} $$

Where $\sigma$ is some non-linear activation function, and $\sigma\prime$ is its corresponding derivative.

This gives us an expression for the last component of $(6)$. But what about the second component - i.e. $\frac{\partial \hat{y}}{\partial z_{t-2}}$?

During the back pass for the last layer, we calculated $\frac{\partial \hat{y}}{\partial W_{t-1}}$ (i.e. the change in network output with respect to the last layer's weights) but not $\frac{\partial \hat{y}}{\partial z_{t-2}}$ (the change with respect to the penultimate activation output).

$$ \frac{\partial \hat{y}}{\partial z_{t-2}} = \frac{\partial L}{\partial \hat{y}} \frac{\partial \hat{y}}{\partial z_{t-2}} $$

$$ = \frac{\partial L}{\partial \hat{y}} W_{t-1}^\intercal $$

Substituting $(8)$ and $(9)$ into $(6)$, we now have:

$$ \frac{\partial L}{\partial W_{t-2}} = \frac{\partial L}{\partial \hat{y}}W_{t-1}^\intercal\sigma'(W_{t-2}z_{t-3})z_{t-3} $$

This line is the key to understanding feedback alignment.

Conventional backpropagation pushes the error "back" through the network via multiplication with that layer's weight matrix.

In other words, the weights used are symmetric between forward and backward passes.

I've only depicted backpropagation from the last (linear) to the penultimate (non-linear) layer, but the same applies at every layer in a deep network - the transpose of the weight matrix is used at each step to propagate the error backwards.

"Weight symmetry" is therefore just a way of saying "the chain rule requires multiplication by the weight matrix".

Now what's interesting is considering whether a network can learn without this weight symmetry. What happens if we replace the transposed weight matrix with purely random values?

Intuitively, you might expect that the network would simply fail to learn. That was certainly my initial reaction.

But you only need to read the title of the feedback alignment paper - "Random feedback weights support learning in deep neural network". Neural networks can still learn, even when completely random weights are used during the backwards pass!

Implementing asymmetric weight transfer

It's difficult to believe, hence why I had a stab at coding an asymmetric network from scratch. This shows that random weights can indeed learn XOR.

Here's a link to the full code, but I'll go through it here line-by-line.

I've used my preferred language (F#), so I'll try and explain some of the F#-specific syntax where I've used.

To make sure I really understood what was going on, I tried to keep the code as close as possible to the maths. This meant avoiding automagic DL frameworks like Keras/PyTorch, but I did use the MathNet library for the underlying matrix operations.

As this was only an exercise, it's all hand-coded, so there's no support for custom loss functions, optimizers, operations or even layers.

The objective

Ultimately, we're experimenting to see if a basic neural network can learn an arbitary function with random weights used during the backpass.

XOR seemed a convenient test function to choose - it's not linearly separable, therefore can only be learned via a non-linear model.

Let's start with the main invocation code.

{{< highlight fsharp >}} let nn = NN(10) let rnd = System.Random() let inputs = [ 0.0,1.0,1.0; 1.0,0.0,1.0; 1.0,1.0,0.0; 0.0,0.0,0.0 ]; let next () = List.item (rnd.Next inputs.Length) inputs {{< / highlight >}}

This is just basic setup code. We first initialize a neural network object, the XOR inputs/output tuples and a function that returns a random XOR input sample.

Next, the training/evaluation loop:

{{< highlight fsharp >}} let iterations = 5000 for i in seq { 0..iterations } do let (x1,x2,y1) = next() let x = array2D [ [x1; ]; [x2] ] |> DenseMatrix.ofArray2
{{< / highlight >}}

We run 5000 training iterations, with a single input/output tuple at each iteration.

The last line is just to ensure the input can be matrix-multiplied with the weights in the first layer. Repeating this on every iteration is very wasteful, but for this experiment/problem, I prefer readability over efficiency.

{{< highlight fsharp >}} nn.Forward x y1 |> ignore nn.Backward x false nn.Update 0.01 x {{< / highlight >}}

Next is the crux of the training loop - one forward pass, one backwards pass, and a parameter update. We'll dive into these shortly. The forward pass will return the predicted value, which we will need to explicitly pipe this to the ignore function (F# complains if a function returns a value that isn't used anywhere).

{{< highlight fsharp >}} if i % 100 = 0 then let preds = seq { for j in seq { 0..20 } do let (x1,x2,y1) = next() let pred = nn.Forward (array2D [ [x1;];[x2 ] ] |> DenseMatrix.ofArray2) y1 |> (fun x -> match x > 0.5 with | true -> 1 | _ -> 0) if pred = int(y1) then yield true else yield false } let accuracy = (float(preds |> Seq.where id |> Seq.length) / float(preds |> Seq.length)) printfn "Accuracy : %f" accuracy {{< / highlight >}}

To evaluate the network's performance, every 100 iterations we will feed a random sample into the network and compare the generated prediction with the actual label.

If the network learns the XOR problem perfectly, the accuracy will converge to 1.0 (i.e. 100% of predictions were correct).

Now let's move to the internals of the neural network implementation.

First, define a type that takes a constructor argument for the dimension of each layer.

{{< highlight fsharp >}} type NN (dim:int) = let mutable w1 = Matrix.Build.Random(dim, 2); // weights let mutable z1 = Matrix.Build.Random(dim, 1); // first layer activation input let mutable w1' = Matrix.Build.Random(dim, 2); // first layer grads let mutable w2 = Matrix.Build.Random(dim, dim); // second layer weights let mutable w2' = Matrix.Build.Random(dim, dim); // second layer grads let mutable z2 = Matrix.Build.Random(dim, 1); // second layer activation input let mutable w3 = Matrix.Build.Random(1, dim); // linear output weights let mutable err = 0.0 {{< / highlight >}}

Like I said above, I'm rolling everything by hand here with no concern for extensibility.

This means hardcoding three layers - two non-linear layers plus one linear output layer.

To do this, I initialize matrices for each layer's weights, activation outputs and gradients, and the scalar error.

Note that the MathNet Random() function draws from a Gaussian distribution; this is known to be a suboptimal initialization strategy, but that's not important for this particular exercise.

In F#, all variables are immutable by default. These matrices will be updated during every forward/backward pass, so we explicitly denote these matrix variables as mutable.

{{< highlight fsharp >}} let relu x = max x 0.0 let relu' x = match x > 0.0 with | true -> 1.0 | false -> 0.0 {{< / highlight >}}

The basic network is only going to use hardcoded ReLU activations, so I define two functions - ReLU and its derivative.

{{< highlight fsharp >}} member x.Forward (input:Matrix) (output:float) = z1 <- w1 * input |> Matrix.map relu // (dx2) * (2x1) -> (dx1) z2 <- w2 * z1 |> Matrix.map relu let y_hat = (w3 * z2).Row(0).Item(0) if y_hat.Equals(nan) then failwith "NAN" let loss = (abs(y_hat - output)) err <- match y_hat > output with | true -> 1.0 | false -> -1.0 y_hat {{< / highlight >}}

This is the forward pass for our 3-layer network to learn XOR, accepting a vector of size (1x2) as input and outputting a scalar.

We multiply the first layer's weights by the input vector, followed by the relu activation. Note for simplicity, I haven't included any bias parameter.

This gives us $z_{1}$ (the first layer's activation outputs). Next, a matmul between $z_{1}$ and the second layer's weights, followed by the nonlinearity, giving us $z_{2}$.

Finally, a linear multiplication between $z_{2}$ and the 3rd layer's weights, giving us the network's output.

For the loss function, I am using the mean-squared-error, the derivative of which evaluates to $\hat{y} - y$.

{{< highlight fsharp >}} member x.Backward (input:Matrix) (symmetric:bool) = if symmetric then w2' <- (err * w3.Transpose()).PointwiseMultiply(Matrix.map relu' z2) w1' <- (w2.Transpose() * w2').PointwiseMultiply((Matrix.map relu' z1)) else
w2' <- (err * r2.Transpose()).PointwiseMultiply(Matrix.map relu' z2) w1' <- (r1 * w2').PointwiseMultiply((Matrix.map relu' z1)) {{< / highlight >}}

For a conventional (i.e. symmetric) backward pass, the error gradient at the output layer is multiplied by $W_{3}^\intercal$, the transpose of the last layer's weight matrix.

This is then pointwise-multiplied by the derivative of $z_{2}$ (the penultimate layer's activation function), giving the error gradient at the penultimate $layer_{2}$.

Likewise, to propagate the error from $layer_{2}$ to $layer_{1}$, we multiply by $W_{2}^\intercal$.

For now, let's skip the case where asymmetric weights are used.

Once we've finished our backwards pass, we perform the weight update:

{{< highlight fsharp >}} member x.Update (lr:double) (inputs:Matrix)= w1 <- w1 - (lr * w1' * inputs.Transpose()) w2 <- w2 - (lr * w2' * z1.Transpose()) w3 <- w3 - (lr * err * z2.Transpose()) {{< / highlight >}}

That completes a single training iteration.

If we run this with symmetric weights, the network will quickly converge to 100% accuracy.

{{< highlight fsharp >}} Accuracy : 0.476190 Accuracy : 0.619048 Accuracy : 0.761905 Accuracy : 1.000000 {{< / highlight >}}

Exactly how many iterations this takes will depend on the weight initialization, which is a topic best reserved for another day. But at the least we know our network's implementation is correct!

Let's return to the backwards pass and consider the case of asymmetric weights.

{{< highlight fsharp >}} let r2 = Matrix.Build.Random(1, 10) let r1 = Matrix.Build.Random(dim, dim) member x.Backward (input:Matrix) (symmetric:bool) = if symmetric then w2' <- (err * w3.Transpose()).PointwiseMultiply(Matrix.map relu' z2) w1' <- (w2.Transpose() * w2').PointwiseMultiply((Matrix.map relu' z1)) else
w2' <- (err * r2.Transpose()).PointwiseMultiply(Matrix.map relu' z2) w1' <- (r1 * w2').PointwiseMultiply((Matrix.map relu' z1)) {{< / highlight >}}

Rather than multiplying by the transpose of each layer's weight matrix, what if multiply by completely random weights. How does that look?

{{< highlight fsharp >}} Accuracy : 0.857143 Accuracy : 0.666667 Accuracy : 0.761905 Accuracy : 0.857143 Accuracy : 1.000000 Accuracy : 1.000000 Accuracy : 1.000000 {{< / highlight >}}

The network still reaches 100% accuracy, showing that weight symmetry isn't necessary for the network to learn a non-linear function.

It's important to note that the random backpass weights are fixed - they are not adjusted or learned during the parameter update stage.

As a side-note, another curious result is that re-randomizing the weights during each backwards pass also doesn't prevent the network from learning.

Now why is this effective?

My own hand-wavy explanation is that training with random backpass weights equates to training a pseudo-objective $W_{r}$ rather than $W_{z}$. If this pseudo-objective is optimized with respect to the "true" loss, the direction of the gradients will be pushed in the direction of the "true" gradient. This means the pseudo-objective will converge towards the true objective.

I also speculate that there's a close relationship between feedback alignment and random projections.

Either way, the paper sets out a much more formal/rigorous explanation and proof.

I found it very instructive to go through this line-by-line, so I hope that others find this useful too. If you have any comments, criticisms or errata, please reach out in the comments or via Twitter!

Calling F#/.NET code from Flutter with Mono

Mon, 02 Sep 2019 04:58:40

I'm obliged to issue a severe warning to anyone who found their way here.

DO NOT DO anything I explain below.

Seriously.

I'd sooner recommend juggling a pair of flaming chainsaws. Everything I'm about to say is a terrible idea and you should ignore it completely.

Why?

Because although it's possible to glue the Mono and Flutter runtimes together, it will leave your application looking like the software equivalent of the Somme circa 1916.

Only try this yourself if you really, really want an application that:

not testable end-to-end
double the binary size it was originally
dependent on C, CMake, F# and MSBuild, most of which are foreign (or at least new) to your average mobile developer
only buildable in parts, since no single build system can address the assembly dependency/versioning issues you'll experience on the .NET side with the conventional Android/Objective-C/Swift build system.

If you really need .NET in a mobile application, use Xamarin.

If you really want to use Flutter, rewrite your library in Dart.

I beg you - don't mix and match.

I did it, and ended up with such a mess that I went back and took the second option. In less time than it took me to do the below, I might add.

I honestly can't think of a single situation where the trade-off will be net positive.

With that out of the way, let's look at the motivation.

Bridging Flutter and .NET with Mono

Say you have an existing cross-platform mobile application written in Flutter/Dart.

One day you happen upon a marvellous .NET library written in (say) F# that takes your application from humdrum to unicorn.

You're itching to integrate the two. How do you do it?

First, a quick recap of what a Flutter application looks like.

A single Dart code base with a nice layout/rendering framework that runs on both Android and iOS via the Dart VM.

On Android, your application code and the framework itself are compiled to bytecode, which is then JIT-compiled by the Dart VM and executed by the operating system on the CPU.

Apple does not allow JIT-compiled code on iOS, so the code is AOT-compiled and skips the bytecode-JIT compile step.

Bear in mind I'm using a fairly loose definition of "operating system" here, encompassing emulators, simulators, system libraries and native APIs.

Now on the .NET side, things look reasonably similar. F# code, compiled to CIL (bytecode), JIT-compiled via the runtime (.NET or Mono) and executed by the OS on the CPU.

For a Flutter application to interop with native code, the two runtimes (or static libraries, in an AOT context) need to communicate.

Although this is under development, Dart does not currently support native interop. Flutter can only communicate with native code via platform channels in Java (for Android) or Objective C/Swift. These are straightforward and well-documented, so I won't go into those here.

To interop with a F#/.NET assembly, we'll need to embed a native CLR capable of running on arm64, armv7 or x86 (for emulators), as well as the iOS and Android operating systems.

Mono is the only runtime that fits the bill here. Neither .NET Framework nor .NET Core are available for mobile architectures - I assume this is because Microsoft acquired Xamarin (which runs on Mono under the hood), and supporting two cross-platform CLR implementations wouldn't be prudent.

Embedding Mono

So, how do we "embed" Mono? What does that even mean?

Similar to .NET Framework, a Mono "installation" is a set of libraries (either statically or dynamically compiled) containing basic CLR types (String, Object, etc), system libraries (file, memory allocation, etc), a JIT compiler and instructions ("trampolines") needed to invoke JIT-compiled code. Methods in your own .NET assembly are invoked via the Mono runtime.

"Embedding" Mono basically means:

Building/deploying the correct Mono libraries for the given OS/architecture combination
Initializing the Mono runtime to ensure all base libraries are correctly loaded
Glue code to change data going in/coming out of Mono from "native" data types to "Mono" types (e.g. converting JNI JStrings or Objective-C NSString to Mono Strings)
Glue code to locate your own assembly (and the methods/classes you want to use) so you can pass your data in and handle any exceptions.

Note this all occurs in native (C) code - so on Android, for example, your application will end up looking like this:

By now, you might be starting to understand why the complexity just isn't worth it.

Compiling Mono

To begin with, you'll need to compile the Mono runtime and a cross-compiler (since all assemblies need to be AOT-compiled for iOS).

Clone the repository at https://github.com/mono/mono/ and follow the instructions in the sdks directory (note this requires an existing Mono installation to bootstrap, as well as automake and ninjna).

For Android, it's theoretically possible to do this on Windows, but I had major issues via both cygwin and WSL. In the end, I built all the Android-specific libraries on Linux. For the iOS libraries, everything needs to be built on MacOS.

Under sdks/out, you'll end up with Mono builds for each architecture/OS combination:

./android-arm64-v8a-release
./android-armeabi-v7a-release
./ios-bcl
./ios-cross64-release
./ios-target64-release

This should build the base class libraries (BCLs - standard CLI libraries handling core tasks like assembly loading, security, etc). However, if it does not, you may need to run autogen and make in the root directory first:

{{< highlight shell >}} ./autogen.sh --with-monodroid # for Android ./autogen.sh --with-monotouch # for iOS make {{< / highlight >}}

Compile your own assembly

Let's say the F# module you want to invoke looks like this:

{{< highlight fsharp >}} module Hello

[] let say x = sprintf "You said %s" x {{< / highlight >}}

Compile this with fsharpc (or dotnet build), targeting your main project as a .NET 4.6.2+ console application.

Mono requires an assembly entry point to setup the correct AppDomain paths, and there's some incompatibility with .NET Core apps. Note the EntryPoint annotation.

[Android] Copy libraries

On Android, zip all project DLLs, BCLs and config together.

{{< highlight shell >}} mkdir myproj mkdir myproj/lib mkdir myproj/lib/mono mkdir myproj/lib/mono/4.5 mkdir myproj/etc mkdir myproj/etc/mono cp $SRCDIR/hello.dll myproj cp -R $MONO_REPO_DIR/mcs/class/lib/monotouch/.dll myproj/lib # replace monotouch with monodroid for Android cp -R $MONO_REPO_DIR/mcs/class/lib/monotouch/Facades/.dll myproj/lib mv myproj/lib/mscorlib.dll myproj/lib/mono/4.5 cp $MONO_REPO_DIR/sdks/builds/ios-target64-release/data/config myproj/etc/mono # replace ios-target-64 with android-arm64-v8a-release for Android {{< / highlight >}}

I've used the environment variables SRCDIR and MONO_REPO_DIR arbitrarily - you will need to set these to the correct directories.

The config file is needed as Mono maps certain DLLs to specific native libraries, depending on the chosen platform.

On my build, these config files are the same between architectures (e.g. armv8 vs armv7), so it doesn't matter whether you choose ios-target-64/ios-target-32/etc. However, this may change in future, so take care.

There's nothing particularly important about this folder structure. As we'll see shortly, Mono allows us to manually set the config and base library directories manually.

However, two points to note:

Mono expects to find mscorlib.dll under the mono/4.5 subfolder
Mono looks for native remapped libraries in the same directory as its DLLs, meaning we will need to copy our architecture-specific libraries to the myproj/lib directory inside our application.

In your Flutter pubspec.yaml, add this zip file as an asset. You'll need to write your own script at startup to unzip these libraries to the application's ${dataDir}/app_flutter subdirectory. I've created a gist here that will help.

Ideally, I'd integrate this with Gradle build, but I couldn't figure out a way to embed files other than dynamic libraries (.so).

I assume Android requires all non-library files to be packaged as application assets.

[iOS] AOT compile libraries and link via Xcode

On iOS, all assemblies need to be AOT-compiled and copied via Xcode (rather than zipped and extracted at runtime).

I encountered a lot of difficulty in getting this all to work.

A successful compile doesn't mean the libraries will actually run.

There are a lot of strong, hidden dependencies under the hood that mean you need to use the correct version of mscorlib.dll, FSharp.Core.dll, compile switches, and so on.

Roughly speaking, I needed to:

use the FSharp.Core.dll from the Xamarin.iOS binaries repository. Don't copy from NuGet or your project build folder - I believe an older build is required under Mono.
use exactly the same BCLs from your Mono build, not the assemblies used to build your project
compile with --static and -direct-icalls for mscorlib.

This shouldn't be necessary if you link against the icall library, but Mono crashed every time without it, complaining that the icall lookup table couldn't be found.

create your own ld/assembler scripts for each architecture.

Again, this shouldn't be needed, but my Mono installation wasn't calling the native assembler/linker correctly, so I had to set this up manually.

When you compile, use the correct tool-prefix switch (i.e. tool-prefix=aarch64-, or tool-prefix=armv7s-)

{{< highlight shell >}} $ cat /usr/local/bin/aarch64-as as -arch arm64 $@

$ cat /usr/local/bin/aarch64-ld clang -Xlinker -v -Xlinker -F/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS.sdk/usr/lib/ -arch arm64 $@

$ cat /usr/local/bin/armv7s-as as -arch armv7 $@

$ cat /usr/local/bin/armv7s-ld clang -Xlinker -v -Xlinker -F/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS.sdk/usr/lib/ -arch armv7s $@ {{< / highlight >}}

{{< highlight shell >}} mono --aot=full,static,tool-prefix=aarch64,direct-icalls mscorlib.dll {{< / highlight >}}

Rinse and repeat for all BCLs and project assemblies (excluding the static and direct-icalls switches for DLLs other than mscorlib.dll).

[Android] Link libraries via CMake

I won't cover the ins-and-outs of including CMake in your build pipeline via the Gradle/Flutter build process. Long story short, you'll need to copy libmonosgen-2.0.so for each architecture to the corresponding folders in your Flutter project's android/src/main/jniLibs directory:

{{< highlight shell >}} cp $MONO_REPO_DIR/sdks/out/android-arm64-v8a-release/lib/libmonosgen-2.0.so $FLUTTER_PROJECT_DIR/android/src/main/jniLibs/arm64-v8a cp $MONO_REPO_DIR/sdks/out/android-armeabi-v7a-release/lib/libmonosgen-2.0.so $FLUTTER_PROJECT_DIR/android/src/main/jniLibs/armeabi-v7a

repeat for all architectures

Your CMakeLists.txt file will need to include the following:

{{< highlight shell >}} link_directories("${Project_SOURCE_DIR}/../jniLibs/${ANDROID_ABI}") add_library(hello SHARED hello.c) target_include_directories (parser PUBLIC "$MONO_REPO_DIR/sdks/out/android-x86-release/include/mono-2.0" "$GLIB_SRC_DIR" "$GLIB_SRC_DIR/glib" "$GLIB_BUILD_DIR/glib" ) target_link_libraries(hello monosgen-2.0 android log gcc m) {{< / highlight >}}

[iOS] Link libraries via Xcode

In Xcode, you'll need to link your entire application with the Mono runtime and the libraries you just compiled (static or otherwise). You'll also need to copy the original assemblies (even though the actual compiled code is statically linked, the runtime still needs the reference metadata from the assemblies to load this code).

Under your Runner - Build Phases - "Link Binary with Libraries", link in all AOT-compiled libraries. For dynamic libraries, these also need to be included under "Embed Frameworks".

For dynamic libraries, you will also need to use install-name-tool to change the ID of each dynamic library and its rpath.

You'll also need to include both glib and Mono header files.

Do not try and set these directly via the Pods project in XCode - these settings will be overwritten. Set these via the ios/yourapp.podspec file as follows:

{{< highlight shell >}} s.pod_target_xcconfig = { 'USER_HEADER_SEARCH_PATHS' => '/usr/local/lib/glib-2.0/include /usr/local/include/glib-2.0 $MONO_REPO_DIR/include/mono-2.0',
'ALWAYS_SEARCH_USER_PATHS' => 'YES' } {{< / highlight >}}

The standard Mono SDK build doesn't include x86-64 binaries, so if you want to run on the simulator, you'll need to manually lipo all the libmonosgen-2.0-compat.dylib files into one "fat" library and reference that instead:

{{< highlight shell >}} lipo $MONO_REPO_DIR/sdks/out/target-ios-target-64/libmonosgen-2.0.compat.dylib $MONO_REPO_DIR/sdks/out/target-ios-target-32/libmonosgen-2.0.dylib repeat for all architectures -create -output libmonosgen-2.0_fat.dylib {{< / highlight >}}

C code to load the Mono runtime and invoke assembly method

This step is quite involved, and becomes very app-specific once the runtime is loaded.

Refer to the Mono documentation for further information on converting native types to pass to the Mono runtimes, finding methods/classes, and so on.

I'll just cover the runtime initialization here.

Note your interop code will need to build against glib: {{< highlight shell >}} yum install glib # Android brew install glib # ios {{< / highlight >}}

One concern I have is that glibconfig.h is a generated/architecture-specific file containing typedefs for certain sizes. Although I didn't experience any issues from this, I can't guarantee that this is OK. Someone with more experience would need to comment.

Roughly speaking, the native code will:

Find the absolute path of the app install directory
Find the subdirectory where XCode copied the dynamic libraries and DLLs
Call mono_set_dirs on this path
[iOS] Call mono_aot_register_module on any statically compiled DLLs
[iOS] Call mono_jit_set_aot_mode(MONO_AOT_MODE_FULL);
Call mono_jit_init("myapp");
Call mono_domain_assembly_open (myDomain, path_to_your_dll);

Here's a direct copy and paste for the iOS portion of my code (note I actually bundled all Mono libraries/DLLs as a separate framework, so yours may look slightly different):

{{< highlight C >}} char* subdir = "/Frameworks/ParserAOT.framework/lib/"; assembly_path = malloc(strlen(path) + strlen(subdir)); strcpy(assembly_path, path); strcat(assembly_path, subdir); parser_dll_path = malloc(strlen(assembly_path) + strlen(ASSEMBLY_FILE_NAME)); strcpy(parser_dll_path, assembly_path); strcat(parser_dll_path, ASSEMBLY_FILE_NAME);

config_path = malloc(strlen(assembly_path) + strlen("config")); strcat(config_path, assembly_path); strcat(config_path, "config");

mono_set_dirs(assembly_path, assembly_path); mono_aot_register_module(mono_aot_module_mscorlib_info); mono_config_parse (config_path); mono_jit_set_aot_mode(MONO_AOT_MODE_FULL); myDomain = mono_jit_init("myapp"); MonoAssembly *assembly = mono_domain_assembly_open (myDomain, parser_dll_path); {{< / highlight >}}

Bridge Dart/Java/Objective C code to native

Your Dart code will need to send a message to your Java/Objective C code via platform channel. This is pretty straightforward so I won't cover it here.

On the Android/Java side, you'll then need to bounce this method to native code via JNI method. This involves adding a method signature in your Java class like:

{{< highlight Java >}} public native String invokeJNI(String method, String data); {{< / highlight >}}

...and a matching C method like:

{{< highlight Java >}} JNIEXPORT jobjectArray JNICALL Java_com_avinium_parser_ParserPlugin_invokeJNI(JNIEnv *env, jobject obj, jstring method, jstring json) { {{< / highlight >}}

Again, read up on JNI for a proper implementation.

For iOS apps, invoke your C method directly from your Objective C code.

Postscript

There you go. A very high-level view of what's needed to get a Flutter application to communicate with a F# (or any .NET) assembly.

Worth it? Definitely not.

This was an insane amount of work for zero eventual benefit.

If I'm totally honest, I should have given up halfway through.

I only persisted because the OCD took over and I was compelled to finish what I started, no matter how daft.

As a side note - you may think that Mono's mkbundle will package everything together as a single shared library - the Mono runtime, BCLs, your application DLL and its dependencies.

This will actually work - but for x86_64 only. mkbundle is intended for desktop binaries and will not handle mobile.

Nelder Mead Optimization with F# + Fable

Tue, 02 Jul 2019 04:58:40

Gradient descent is a spectacularly effective optimization technique, but it's not the only method for optimizing non-convex functions. There are a number of alternative numerical methods that can be used to solve functions without using or calculating the gradient of that function (or indeed, where the gradient of such function isn't known).

The Nelder-Mead algorithm is one such numerical method that was first proposed in 1965. As we'll see, the method is actually quite straightforward, evaluating the function at various points within a neighbourhood, then iteratively moving those points in the optimal direction until convergence.

Assume we have some task T, like constraint solving, image classification. For current purposes, it doesn't really matter what this task is. We just know that our task takes some input $x$, and we want to solve it by producing some numerical output.

To do this, we formulate the task as some function $f$ of $x$ and parameters $\theta$.

With this formulation, the minimum of the function (i.e. $\min f(x;\theta)$) is the optimal solution for T.

Let's leave aside how we decided this formulation (or the parameterization of that function). That's obviously a critical step in the optimization process, just not the focus of this post.

So we want to minimize $f(x;\theta)$ to find the best parameters for our task T.

The logic behind NM is quite straightforward - evaluate the objective function at a few random points, try and move the worst point towards the middle of the first two, and repeat until the points converge to the minimum.

Let's look at a function where we know the minimum, like $f(x) = x^2$. We'll work in two dimensions to make things easier to visualize.

NM starts a set of n+1 random points (where n is the dimensionality of the domain of $f$). This is known as a simplex, and in two dimensions, forms a triangle. Our input here is only one-dimensional, so the simplex is simply a line-segment - but I chose three points anyway for the purpose of illustration.

These points are then sorted according to their performance under their objective function - leaving us with points B, G and W (think "best", "good", and "worst").

NM then finds the midpoint M between B and G, and performs a sequence of different transformations to M, evaluating the objective function at M and comparing it with the function evaluated at B or G.

For example, M is first reflected towards W - if this reflection R performs better than B (i.e. the objective function evaluated at R is closer to zero than evaluated at B), an extension is performed in the same direction.

Depending on which is better, point W is replaced with point R. Alternatively, if the reflection performs worse, the midpoint is contracted or shrunk in the same direction as W.

This process is repeated until the distance between the original and transformed points falls beneath some threshold.

Here is an F# implementation for Nelder Mead:

{{< highlight fsharp >}} type Simplex(points:Point[], objective:Point -> float) =

 let scored = points |> Array.map( fun x -> x, objective x) |> Array.sortBy (fun (x,y) -> y) let best = fst scored.[0] let f_best = snd scored.[0] let good = fst scored.[1] let f_good = snd scored.[1] let worst = fst scored.[2] let f_worst = snd scored.[2] let midpoint = (best + good) / 2.0 member this.compute () = let reflect = (midpoint * 2.0) - worst let f_reflect = objective reflect match f_reflect < f_good with | true -> match f_best < f_reflect with | true -> Simplex([|best; good; reflect|], objective) | false -> let extension = (reflect * 2) - midpoint let f_extension = objective extension match f_extension < f_reflect with | true -> Simplex([|best; good; extension|], objective) | false -> Simplex([|best; good; reflect|], objective) | false -> match f_reflect < f_worst with | true -> Simplex([|best; good; reflect|], objective) | false -> let c1 = (midpoint + worst) / 2 let c2 = (midpoint + reflect) / 2 let contraction = match objective c1 > objective c2 with | true -> c2 | false -> c1 let f_contraction = objective contraction match f_contraction < f_worst with | true -> Simplex([|best; good; contraction|], objective) | false -> let shrinkage = (best + worst) / 2 let f_shrinkage = objective shrinkage Simplex([|best; midpoint; shrinkage|], objective)

This is a fairly bare-bones implementation (the canonical version should contain scale parameters for each transformation and checks at each iteration to ensure the simplex has actually shrunk).

This was also a good opportunity to take Fable for a spin. Fable is a transpiler, letting you write F# code that will run/render in the browser. Here's Nelder-Mead in action, running courtesy of Fable.

Click "Reset" to reset the simplex to three randomly chosen points, then click "Step" to go through one iteration of Nelder-Mead. You'll see the simplex converging to 0 - the optimal solution for our basic function.

One of the cool things about this is that the above uses Plotly.js for charting library, all invoked from F# code. ts2fable is a Typescript to F# transpiler bundled with , letting you generate F# bindings for any Typescript library. This allowed me to take the Plotly.js definitions from the DefinitelyTyped project, generate F# bindings and stitch the Nelder-Mead algorithm/buttons to the chart, all via F#. Nifty, right?

Admittedly, the project is still relatively young so it's not all smooth sailing (and ts2fable in particular generates some odd code that needed manual fixing).

But seeing the F# language venture beyond the .NET runtime is very promising. I initially tried a Javascript implementation of Nelder-Mead. Not only was it literally twice as long as the F# version, and only half as comprehensible. I find ML-style pattern matching an excellent idiom for representing algorithmic work concisely and with minimal clutter.

Code is available on GitHub if you want to check it out.

Header image courtesy of Flickr

WPF ItemsSource not updating when items added to ObservableCollection?

Sat, 01 Sep 2018 04:58:40

Let's say our WPF application has an ItemsControl whose ItemsSource is bound to an ObservableCollection. {{< highlight python >}} <UserControl.Resources> </UserControl.Resources> Looks like this document doesn't contain any questionnaire items. {{< / highlight >}}

The Visibility property of the ItemsControl is bound to a custom IValueConverter that returns Visibility.Visible if the ItemsSource is non-empty, or Visibility.Collapsed if empty. Additionally, we have a TextBox with a reverse IValueConverter

In other words, if our ObservableCollection is empty, we display the TextBox to notify the user there's nothing there, otherwise we show the ItemsControl.

If everything goes to plan, on first load, on first load, we should see our "empty" message.

So far so good.

Now if we add an item to the ObservableCollection, the ItemsControl should update and we should see our newly added object.

No bueno - the UI isn't updating; seemingly nothing has been added.

If we step through the debugger, though, we can verify that everything is wired up correctly. The item is definitely added to the collection.

So what's going on?

The key here is the Visibility binding:

{{< highlight python >}} Looks like this document doesn't contain any questionnaire items. {{< / highlight >}}

If we remove this, suddenly everything clicks!

The logic is pretty straightforward - feel free to kick yourself like I did when I figured it out. ObservableCollection raises a CollectionChanged event whenever an item is added or removed - the ItemsSource binding will listen for this event and refresh appropriately.

The Visibility binding, however, will only listen to PropertyChanged event, which is not raised when the collection changes.

This means that, although the ItemsControl itself is being updated, its Visibility (and the Visibility of the overlayed TextBlock) are not.

A simple solution is to add an event handler to the CollectionChanged event on the ObservableCollection:

{{< highlight csharp >}} Items.CollectionChanged += Items_CollectionChanged;

private void Items_CollectionChanged(object sender, System.Collections.Specialized.NotifyCollectionChangedEventArgs e) { NotifyPropertyChanged("Items"); } {{< / highlight >}}

This will then refresh the Visibility bindings whenever the ObservableCollection changes, ensuring the updated ItemsControl is actually visible.

Depending on the size of the collection, this may be quite resource-intensive, so YMMV.

Conditional Random Fields for Company Names

Thu, 10 May 2018 04:58:40

Let's assume we have a sequence of words, and we want to predict, as accurately as possible, whether each word is a name, verb, or some other part of speech.

This is equivalent to predicting a sequence of labels (where each label represents a part-of-speech tag) from a sequence of observations (where each observation is a word from a natural language vocabulary).

In mathematical terms, this equates to finding the sequence of labels that is most probable for the given sequence of words.

Brute force

We could do this by determining p(Y|X) - the conditional probability of the label sequence (Y) given the observation sequence (X).

Assuming we had some model that could give us p(X), p(Y) and p(X|Y), we could explicitly calculate this by using Bayes Rule:

$p(Y|X) = \frac{p(X|Y)p(Y)}{p(X)}$

Our label prediction would then simply be the sequence with the maximum probability.

This would, however, require us to explicitly model both $p(Y)$ and $p(X)$ - the probability of the observed word sequence (over all possible word sequences), and the probability of every possible label sequence (over all possible label sequences).

This is equivalent to modelling the joint probability of both label and word sequences.

Reducing the event space

This is intractable for all but the most simple cases - from the binomial theorem, the number of possible word sequences grows factorially as the sequence grows larger.

This brute force approach is of complexity $O(n!)$, so we want to find a better approach.

What if we disregarded the probability of the observation itself?

We know intuitively that many combinations of words, even if combinatorially possible, never actually appear in the English language.

We don't want to waste computational time modelling nonsensical phrases like "trumpet cat the bag overlay".

Ideally, we would ignore the probability of the observation and consider only $p(Y|X)$, the probability of each label sequence conditioned on that sequence of words.

This seems intuitively reasonable for our label prediction task.

We only really care about labelling word sequences that we are given, so avoiding modelling $p(X)$ delivers a significant reduction in computational complexity.

Assumption of independence

Could we similarly reduce the complexity of our model for p(Y)?

Without any simplifying assumptions, we would again need to consider every possible sequence of labels - a set that is too large to work with in general.

What if we assumed that each label in the sequence is independent of the others?

This would allow us to focus on the probability of each label in its respective position, separately and independently of anything that comes before or after it.

This would reduce the number of calculations for $p(Y)$ from "all possible K-length sequences from N possible labels" ($\binom{N}{K}$) to a more manageable ("K independent draws of N possible labels" (NK) - a clear reduction.

The question then becomes whether or not this is a reasonable assumption for our task at hand¹.

In English, at least, we know that adjectives tend to appear before nouns, adverbs don't follow prepositions, and so on. The labels are clearly not independent.

Relaxed assumption of interdepedence

What if we adopted a more relaxed independence assumption - that each label is conditionally independent, given its immediate predecessors?

For example, say we want to predict the blank in the following sequence:

NUM ORG VB _

We want to model the probability of the label at position i, given every preceding label in the sequence:

$p(y_{i} | y_{i-1}, y_{i-2}, y_{i-3}) $

which is equivalent to

$p(y_{i} | y_{i-1}).p(y_{i-1}|y_{i-2}, y_{i-3}).p(y_{i-2}|y_{i-3})$

If we assume that each label is independent of everything except its immediate predecessor, then:

$p(y_{i}) = p(y_{i} | y_{i-1})$

This is a more realistic assumption than naive/strong independence.

Now, we can capture some first-order interactions and interdependence between labels, which fits much better with our prior knowledge of linguistic structure².

If we were explicitly modelling the probability of each observation (i.e. p(X)), reintroducing p(Y|X) would give us a Hidden Markov Model, where each label corresponds to the hidden state, and each word in the sequence corresponds to the observation.

Without explicit consideration of consideration of P(X), this then gives us a Conditional Random Field.

This model is equivalent to a Markov Random Field, conditioned on a sequence of observations (hence the name - Conditional Random Field).

CRFs are a restricted form of Hidden Markov Model that allow more tractable probability models, by avoiding the need to explicitly consider the probability of the observations.

This is important for our label sequence prediction task, where the probability input space - that is, the space of all possible word sequence combinations - is simply too large to explicitly consider.

For discriminative tasks such as this, CRFs deliver competitive performance by making weaker independence assumptions about a smaller search space.³

So far, I've only talked about what probability we're trying to model - not how we might actually model it.

In the next post, I'll give a breakdown of what it actually means to model the probability under a linear chain CRF.

References & Further Reading

H. Wallach, Conditional Random Fields: An Introduction, University of Pennsylvania CIS Technical Report MS-CIS-04-21
C. Sutton and A. McCallum, An Introduction to Conditional Random Fields, Foundations and Trends in Machine Learning, Vol. 4, No. 4 (2011) 267-373
R. Klinger and K. Tomanek, Classical Probabilistic Models and Conditional Random Fields, Algorithm Engineering Report TR07-2-013

Note that this is the strong independence assumption adopted by Naive Bayes. ↩
This has also been explored empirically - see R. Caruana and A. Niculescu-Mizil, “An empirical comparison of supervised learning algorithms using different performance metrics,” Technical Report TR2005-1973, Cornell University, 2005. ↩
The downside is that, by not modelling $p(X)$, we can no longer calculate $p(X|Y)$. This means we can only assign labels to words (a classification/discriminative objective), and we are unable to create a generative model that can generate the most likely word given a label. ↩