Dealing with Government of Canada logos on the web

Despite the Government of Canada having a web presence for 30 years now, one of the weirdly persistent problems when building government web applications is how to deal with logos.

The problem is with what is known as “signature blocks”; the combination of flag plus department name (or just “Government of Canada”) that you’ll find at the top of every website.

At first glance, it’s a solved problem, but if you look closely at one of these largely textual logos you’ll notice that it’s a picture: The logos all include an image of text rather than text itself, along with a flag.

This is visible everywhere from the new Design System, to Canada.ca itself, all recycling the same problematic logos.

Given that this is a well known accessibility problem, and doesn’t respect my choice of language (on the web I can specify my language, but the logo images always include both), and doesn’t work for mobile devices (including both languages isn’t good on a small screen)… it’s an odd pattern to see across every government site.

Why would anyone make a picture of text?

Because in 1970, the Federal Identity Program selected Helvetica as the official font of the Government of Canada. An eminently reasonable decision for the centralized team and world of print that existed at the time, this has caused major problems for generations of departmental web developers who then realize that as a commercial font Helvetica can’t legally be used without a license. Given the governments broken procurement process, and the insanity of trying to navigate it for each web application people have opted to simply take a picture of the text instead.

This decision, combined with the fact that the only examples of logos all have both languages because of the FIPs origins in signage and stationary (where users can’t express which language they want to see) have created the bizarre mess we have currently.

Helvetica compatible fonts to the rescue

There have been a few “Helvetica clones” published over the years that allow us to do something reasonable for the web without messing with existing signage.

For our goal of visual compatibility with existing signage, one of the selection criteria is whether they preserve Helvetica’s distinctive “spur” on the capital G, something the words “Government of Canada” will make immediately noticeable. Overused Grotesk does this, and also stands out for having a web-friendly variable font version.

With font in hand, we can solve these accessibility and screen real-estate problems by using a proper font with Government branding.

To show how this can work, here is copy-paste friendly example of setting up Overused Grotesk in a new project so that it’s properly preloaded so that the page loads without the infamous Flash of Unstyled Text (FOUT).

Let’s use Rsbuild to scaffold a basic React application for us.

$ npm create rsbuild@latest

> npx
> create-rsbuild

◆  Create Rsbuild Project
│
◇ Project name or path
│  fontpreload
│
◇ Select framework
│  React 19
│
◇ Select language
│  TypeScript
│
◇ Select additional tools (Use <space> to select, <enter> to continue)
│  Add Biome for code linting and formatting
│
◇ Next steps ─────────────╮
│                          │
│  1. cd fontpreload       │
│  2. git init (optional)  │
│  3. npm install          │
│  4. npm run dev          │
│                          │
├──────────────────────────╯
│
└  All set, happy coding!

In this application we’ll need a copy of that Overused Grotesk font. We can use curl to both download the font put it in static/font which works nicely with Rsbuilds defaults.

curl 'https://raw.githubusercontent.com/RandomMaerks/Overused-Grotesk/refs/heads/main/fonts/variable/OverusedGrotesk-VF.woff2' --create-dirs --output-dir static/font --output OverusedGrotesk-VF.woff2

Rsbuild’s default template doesn’t include a lang attribute on the <html> element, or a <meta> description, so we’ll add a minimalist template in the static folder too.

cat << EOF > static/index.html
<!doctype html>
<html lang="en">
  <head>
    <meta name="description" content="Rsbuild application" />
  </head>
  <body>
    <div id="<%= mountId %>"></div>
  </body>
</html>
EOF

Now we’ll add some basic config for Rsbuild, telling it to use our template, skip bundling licence files, and most importantly to preload our font and other assets which is the key to avoid the Flash of Unstyled Text (FOUT).

patch rsbuild.config.ts <<'EOF'
diff --git a/rsbuild.config.ts b/rsbuild.config.ts
index c55b3e1..799d5ae 100644
--- a/rsbuild.config.ts
+++ b/rsbuild.config.ts
@@ -4,4 +4,29 @@ import { pluginReact } from '@rsbuild/plugin-react';
 // Docs: https://rsbuild.rs/config/
 export default defineConfig({
   plugins: [pluginReact()],
+  html: {
+    // use a custom template to address A11y and SEO issues.
+    template: "./static/index.html",
+    tags: [
+      {
+        tag: 'link',
+        attrs: {
+          rel: 'preload',
+          type: 'font/woff2',
+          as: 'font',
+          href: '/static/font/OverusedGrotesk-VF.woff2',
+          crossorigin: 'anonymous',
+        },
+      },
+    ],
+  },
+  output: {
+    // This will prevent .LICENSE.txt files from being generated
+    legalComments: "none",
+    filename: {
+      // Don't use a hash in the font filename, so our tags above can
+      // reference the font files directly.
+      font: "[name][ext]",
+    },
+  },
 });
EOF

Now we include an @font-face rule specifying where we want to load our font from. Rsbuild will run rspack on this file which will normally rewrite the src to point to the bundled file (something like src: url(/static/font/OverusedGrotesk-VF.1656e9bd.woff2)format("woff2");). Including filename.font: "[name][ext]", ensures that it doesn’t add that hash, so the name lines up with the name in our href in the tags section.

patch src/App.css << 'EOF'
 diff --git a/src/App.css b/src/App.css
index 164c0a6..ae9754b 100644
--- a/src/App.css
+++ b/src/App.css
@@ -1,7 +1,15 @@
+@font-face {
+  font-family: "Overused Grotesk";
+  src: url("../static/font/OverusedGrotesk-VF.woff2") format("woff2");
+  font-weight: 300 900;
+  font-style: normal;
+  font-display: fallback;
+}
+
 body {
   margin: 0;
   color: #fff;
-  font-family: Inter, Avenir, Helvetica, Arial, sans-serif;
+  font-family: "Overused Grotesk", Inter, Avenir, Helvetica, Arial, sans-serif;
   background-image: linear-gradient(to bottom, #020917, #101725);
 }
EOF

Now we can add a little text with a capital G, so that it’s easy to see that our font is loaded and used.

patch src/App.tsx << 'EOF'
diff --git a/src/App.tsx b/src/App.tsx
index dff1751..cebdc5b 100644
--- a/src/App.tsx
+++ b/src/App.tsx
@@ -3,7 +3,7 @@ import './App.css';
 const App = () => {
   return (
     <div className="content">
-      <h1>Rsbuild with React</h1>
+      <h1>Rsbuild with Overused Grotesk</h1>
       <p>Start building amazing things with Rsbuild.</p>
     </div>
   );
EOF

And now it’s time to build our production bundle.

$ npm run build

> [email protected] build
> rsbuild build

  Rsbuild v1.4.8

info    build started...
ready   built in 0.13 s

File (web)                                           Size       Gzip
dist/static/css/index.062da953.css                   0.53 kB    0.34 kB
dist/index.html                                      0.63 kB    0.35 kB
dist/static/js/index.58333bd8.js                     1.3 kB     0.78 kB
dist/static/font/OverusedGrotesk-VF.1656e9bd.woff2   85.1 kB
dist/static/js/lib-react.a4a8b05c.js                 182.8 kB   57.9 kB

                                            Total:   270.4 kB   144.4 kB

To get a sense of what this will look like in production, use npm run preview to get Rsbuild to serve us the production build files it just created in the dist folder. The Lighthouse scores are telling us we did this right.

This is pretty good so far. We have a font, and we’re loading it in a way that is good for web performance, but we haven’t really solved the larger problem with logos until we take one final step and make a signature block that uses it.

First, we’ll need to create a React component representing the Canadian flag. You might notice this component is created as a headless component, so that it’s easy to add ARIA attributes (important for accessibility) to and plays nicely with whatever approach you’re using for CSS (something like PandaCSS or Tailwind)

cat << EOF > src/Signature.tsx
import React from "react";

export function SVG({ children, ...props }: React.ComponentProps<"svg">) {
  return (
    <svg
      xmlns="http://www.w3.org/2000/svg"
      preserveAspectRatio="xMinYMin"
      role="img"
      width="65.669"
      height="31.116"
      {...props}
    >
      {children}
    </svg>
  );
}

export function Flag(props: React.ComponentProps<"path">) {
  return (
    <path
      {...props}
      d="m30.675 6.467 2.223-4.443 2.206 4.281c.275.452.499.415.938.2l1.898-.921-1.234 5.977c-.258 1.174.422 1.518 1.162.722l2.705-2.837.718 1.605c.241.485.605.415 1.086.328l2.794-.577-.938 3.464v.074c-.11.453-.33.83.186 1.05l.993.485-5.782 4.783c-.587.593-.384.776-.164 1.444l.532 1.605-5.372-.954c-.663-.162-1.124-.162-1.14.361l.219 6.048h-1.614l.22-6.031c0-.594-.461-.577-1.547-.357l-4.983.937.642-1.605c.22-.614.279-1.029-.22-1.444l-5.87-4.716 1.086-.651c.313-.237.33-.486.165-1.013l-1.104-3.505 2.831.593c.79.183 1.01 0 1.213-.414l.79-1.59 2.799 3.07c.494.577 1.196.2.976-.63l-1.344-6.48 2.08 1.174c.329.2.68.253.883-.124M50.099 0h15.57v31.116h-15.57ZM0 0h15.57v31.116H0Z"
    />
  );
}
EOF

And finally we’ll import that into our App component to create a our new signature block. Notice how we can add styles and ARIA attributes as needed, without worrying about a customizability wall.

patch src/App.tsx << 'EOF'
diff --git a/src/App.tsx b/src/App.tsx
index cebdc5b..f9f58f9 100644
--- a/src/App.tsx
+++ b/src/App.tsx
@@ -1,9 +1,26 @@
 import './App.css';
+import { Flag, SVG } from "./Signature.tsx";

 const App = () => {
   return (
-    <div className="content">
-      <h1>Rsbuild with Overused Grotesk</h1>
+    <div className="content" style={{ justifySelf: "center" }}>
+      <section
+        style={{ display: "flex" }}
+      >
+        <SVG>
+          <title>Canadian Flag</title>
+          <Flag style={{ fill: "#ea2d37" }} />
+        </SVG>
+        <span
+          style={{
+            paddingLeft: "0.8em",
+            lineHeight: "1em",
+            textAlignLast: "left",
+          }}
+        >
+          Government of<br /> Canada
+        </span>
+      </section>
       <p>Start building amazing things with Rsbuild.</p>
     </div>
   );
EOF

The result is super satisfying, a razor-sharp and accessible signature block looks great at any zoom level and on any screen size. It’s easily styled, and works really well with internationalization libraries like the one I covered in a previous post.

Using an Open Source font to maintain visual compatibility with FIP feels like win-win: Nobody needs to change physical signage/letterhead, lingering accessibility problems can be solved and the user experience for every government web site improves a little… all for a cost of $0.

Build vs buy in government

“Build vs buy” is presented as a fundamental binary choice that organizations must make as part of their “due diligence” around technology projects.

Here in 2025, this idea is suffering from “concept drift” that makes its simplistic dichotomy a dangerous one, especially for governments.

The Build option

The first problem with “build vs buy” is that “build” doesn’t mean what it used to.

In the past decade or two Open Source Software has “won” and in doing so, fundamentally changed what “building” means. 40 years ago “build” meant tonnes of first-of-it’s kind software painstakingly written in low-level languages, but these days programmers are building with bricks not sand.

Back in 2018 Laurie Voss, the co-founder of the npm package registry captured this shift when he did some analysis showing that 97% of code in modern applications comes from npm. Buy pulling and assembling these packages developers are able to write just the last 3% of the code needed to connect everything together while generating 100% of the value.

The big idea here is that applications are now composed or assembled like a prefab house rather than built from scratch. While prefab components are recognized as a way to mitigate risk in construction, senior executives “limited exposure and experience with information technology projects” (OAG report pg 10) has meant that no such acknowledgement has happened above the working level of the Canadian government.

Without that acknowledgement, while web applications only take a couple of months to build (even for teams in the government!) the “risk management” process around them continues to look like the Manhattan project: budgets in the millions and multi-year, multi-year PowerPoint-driven gated processes full of earnest “go/no-go” decisions, tonnes of scope-creep to “align” with tech decisions past and present, HIPPO effects and a mandatory 8+ month compliance audit.

With the “risk management” process itself adding more cost, time and risk to software projects than the actual software, the “build” option continues to be an option of last resort in spite of an $8.8 trillion ecosystem of open source components readily available.

While the executive class in the Canadian government largely has no idea about this shift from custom code to composition, private industry is well aware of it. This adds a frustrating twist to the “build vs buy” conversation: Executives often avoid “build” because of perceived risk, forbidding employees to assemble the Open Source components themselves in favour of contractors who then assemble the same components for 60% more cost.

This ends up meaning that the “build vs buy” is no longer about “open source vs closed source”, it’s about which organization the skills to competently navigate the Open Source Ecosystem will exist in.

The buy option

As Thoughtworks points out, “a commodity capability will be provided under the assumption that you will adapt your processes to that particular vendor’s definition of industry ‘best practice’”, but the core function of government – governing – is inherently unique to each nation.

With 5,162 municipalities in Canada, this is a big enough customer base that competitive markets can emerge, driving down costs, preventing capture and spreading the “best practices” that software embeds. This is the world the “buy” option is intended to be exercised in.

Where “buy” breaks down

At the Federal level its more complicated. Federal governments are unique in the literal sense; there is only one per country. Each is shaped by a set of laws and policies unique to that countries historical and political context. Software that implements federal “business logic” will have a market of one (a competitive market can’t be created) which suggests that in general, you’d expect to find more custom software at the federal level.

This doesn’t mean “buy” is off the table; Technically savvy governments are able square their unique needs with the imperative to support the free market by breaking down large systems full of custom logic into smaller components, finding generic ones like like sending an email, authenticating users, or managing a wait queue that can easily adapt to a vendors definition of ‘best practice’ and buying those. Commercial cloud providers provide hundreds of commodity services at exactly this level of granularity to allow for this. Again the result is a “composed” system; some amount of custom logic and whatever code is needed to integrate some number of third party services.

It’s unfortunately common in the Canadian federal government that executives don’t think about systems a granular enough level, and often insist on the “buy” option for entire systems, even in the face of unique requirements and processes it either can’t or won’t adapt.

The canonical example is Phoenix, where, despite the complexity (22 different employers in the core public service and agencies, 80 collective agreements and 80,000 business rules) was somehow considered an obvious candidate for “commercial off-the-shelf” software (Phoenix was “Probably destined for failure the minute they made the decision to go with off-the-shelf software” explains André Boudreau, reflecting on lessons from the failed 1995 attempt to replace the pay system).

The government refused to adapt any of it’s practices to fit the “commodity capability” they were buying, but charged ahead anyway, ordering IBM to make 1500 changes which involved rewriting significant parts of the software. This spiraled into a $3.5 billion fiasco that has yet to be fixed (2025 update: Now $5.1 billion!).

Despite the lessons learned from the failed 1995 attempt to replace the pay system saying that the pay rules and processes would need to be simplified before attempting a new system, the choice of a replacement for Phoenix seems to be based on promises that the government won’t need to adapt their practices which unsurprisingly don’t seem to be panning out. With a favorite vendor chosen from a brutal process nobody will be willing to repeat, it appears we are on track to learn this lesson again.

From the looks of it, we have a complex, unique need that requires custom software , but the government has neither the skills to build it, nor the skills to contract it out.

Somewhat unsurprisingly, the skills needed to build and the skills needed to contract it out are related, and over-use of the “buy” option may well be eroding both.

That outsourcing leads to deskilling is seemingly well established, with Dell’s self-inflicted wounds as a cautionary tale from the private sector.

Some years ago, Dell began outsourcing manufacturing to a Taiwanese electronics manufacturer, ASUSTek. The outsourcing started with simple circuit boards, then the motherboard, then the assembly of the computer, then the management of the supply chain and finally the design of the entire computer.

The end result? ASUSTeK became Dell’s formidable competitor, while Dell itself, apart from its brand, was hardly more than a shell, without any real expertise to run or grow its business.

The Canadian Government has been aggressively following Dell’s example for years now, and here in 2024 Government executives spent a record amount on outsourcing, with the American government on very similar path.

NASA recently published “NASA at a Crossroads” talking about some of it’s struggles with the deskilling effects of outsourcing. They write of one contract that “In this case, NASA is more of a contract monitor than a technical organization capable of taking humanity into the solar system”, and explain that generally over-use of contracting will “erode the agency’s in-house capabilities”.

Connecting the dots, NASA points out that doing so also effects their contracting ability, and ultimately their mission: “the concern is not only an erosion of “smart-buyer” capability but also of the capacity to invent and innovate”. Anyone who has worked in the Canadian Government will likely recognize that state.

Meanwhile at Canada’s Treasury Board, the 2023 cloud strategy principle #5 is still pushing outsourcing by Prioritizing “buy before build”, thus deskilling their staff (turning them into “contract monitors”), while principle #6 says they want to re-skill staff seemingly without ever wondering why they need to do that.

With the need to form coherent regulations on technical topics like AI, cyberwarfare, crypto-currencies, ransomware, along with a burning need to address the aging IT infrastructure that was identified 24 years ago… there is a deep need in government for technical expertise.

Despite the bad outcomes, somehow the Canadian Government remains bent on following Dell by outsourcing not just technology but policy until it is also just a shell without any of the expertise needed for it’s business of governing. If the handling of Bill C-18 isn’t enough to suggest a problem, remember that the government hired a consultant to figure out how to save money on consultants.

The underlying problem here isn’t buying things, it’s that losing touch with the technical skills needed to build things is setting the stage for both internal facing disasters like Phoenix and outward facing ones, in the form of poor service delivery and ignorant and damaging regulation that makes things worse for business.

Why rehash this now?

The fundamentals of governing depend on basic capabilities like being able pay the public service, or safeguard citizens data. Doing these things increasingly requires things government can’t buy, thus requiring a builders skill set.

Zero Trust is the poster-child for this problem: The US has recognized that the Federal Government can no longer depend on conventional perimeter-based defenses to protect critical systems and data, and ordered the entire US Federal Government to adopt the Zero Trust model. After an uncomfortably long pause Canada has decided to follow (see the Enterprise Cyber Security Strategy and Cloud Adoption Strategy).

Unfortunately, “Zero Trust is a strategy. It’s not a product. You can’t buy it”.

Without a product to buy, it is going to take a builders skill set to consistently interpret and implement Zero Trust principles across all architectural layers in a complex environment full of legacy systems.

After years of outsourcing and deskilling it’s unclear that the Canadian Government is capable of implementing this security model (especially when each pillar is assigned matrix-style to a different group with differing priorities and wildly different levels of technical skills).

Without it, the government’s ability to protect it’s citizens data is suspect, and the willingness for allies to share data with Canada will steadily decrease as Canada fails to implement what they now consider “basic cyber hygiene“, all of which threatens the basic functioning of government.

From build last to something new

The old “build vs buy” debate needs an update. Beyond the basic observation that folks responsible for IT projects should have actual skills in this area, people in government need to wrap their heads around how the two sides of this little dichotomy have shifted if they want better outcomes.

On the “build” side, the time and cost involved are a fraction of what they were, the risk management process doesn’t reflect it. The inability to automate key services causes political crises, and the inability to implement key enabling architectures like Data Mesh and Zero Trust (neither of which can be purchased) setting us up for future crises. Weighing “build” against “buy” without considering how “build” is entwined with these broader issues is damaging.

On the “buy” side, granular services now exist that can allow smart use of commodity components inside larger custom software efforts, but techniques that would help like Wardley mapping are basically unheard-of in the Canadian Government. Using those techniques to safely outsource commodity capabilities also requires a builders skill set.

With service levels directly connected to trust in the government and the CIA noting that “in coming years, this mismatch between governments’ abilities and publics’ expectations is likely to expand and lead to more political volatility“, Treasury Board needs to recognize that the inability to build is a threat to the most basic functions of government.

While there are legitimate moments for the “build vs buy” question, without knowing how things are actually built, having the capability to actually pull it off and the consequences of too much buying, this discussion is just faux diligence that does more harm than good.

TBS’s current “buy before build” mandate makes building an act of last resort, suppressing that critical skill set when it’s needed most.

Instead of “buy before build” we probably need to remember what NASA’s Wernher von Braun had figured out in 1964:

“A good engineer gets stale very fast if he doesn’t keep his hands dirty . . . it is for this reason that we are spending about 10 percent of our money in-house; it enables us to really talk competently about what we are doing. This is the only way known to us to retain professional respect on the part of our contractors.”

We managed to get from “cloud first” to “cloud smart“. Maybe something like “build smart, buy smart” is possible too.

i18n for Rsbuild with Lingui

With the sunsetting of Create React App, I’ve wanted to find something similar but with a more minimalistic vibe.

Two projects that caught my eye are the Rust-based rewrite of Webpack called Rspack, and it’s Create React App equivalent companion project Rsbuild. Of course here in Canada I’m going to want that along with proper internationalization (i18n) using my favourite library Lingui.

With Rsbuild creating a minimalistic Single Page Application pre-configured with the Rspack bundler, and both Lingui and Rspack working with the rust-based transpiler SWC (a faster equivalent of Babel) these actually work together nicely and are my new favorite way to start a project.

What follows is a cut-and-paste friendly walk-though to get translations working.

We’ll use the command npm create rsbuild@latest to create our project skeleton.

$ npm create rsbuild@latest

> npx
> create-rsbuild


◆  Create Rsbuild Project
│
◇  Project name or path
│  lingui-demo
│
◇  Select framework
│  React 19
│
◇  Select language
│  JavaScript
│
◇  Select additional tools (Use <space> to select, <enter> to continue)
│  Add Biome for code linting and formatting
│
◇  Next steps ─────────────╮
│                          │
│  1. cd lingui-demo       │
│  2. git init (optional)  │
│  3. npm install          │
│  4. npm run dev          │
│                          │
├──────────────────────────╯
│
└  All set, happy coding!

Using fd the modern (and git aware) equivalent of find we can see that the structure of this project is nice and simple: 2 JS files and a bit of CSS along with 3 config files and a readme.

$ git init && echo node_modules >> .gitignore
Initialized empty Git repository in /home/mike/projects/lingui-demo/.git/
$ fd
README.md
biome.json
package.json
public/
rsbuild.config.mjs
src/
src/App.css
src/App.jsx
src/index.jsx

The first step is to install the dependencies for Lingui.

npm install @lingui/core @lingui/react
npm install --save-dev @lingui/swc-plugin @lingui/cli

Next we’ll add a config file for Lingui specifying the locales we want and where they should be stored.

cat << EOF > lingui.config.js
import { defineConfig } from '@lingui/cli';

export default defineConfig({
  sourceLocale: 'en',
  locales: ['fr', 'en'],
  catalogs: [
    {
      path: '<rootDir>/src/locales/{locale}/messages',
      include: ['src'],
    },
  ],
});
EOF

And now we’ll connect the dots by adding some SWC config to our rsbuild.config.mjs file. The idea is that rsbuild will pass this through to rspack, so that its built-in swc-loader is properly configured with Lingui’s @lingui/swc-plugin

patch rsbuild.config.mjs <<'EOF'
diff --git a/rsbuild.config.mjs b/rsbuild.config.mjs
index c9962d3..70154c3 100644
--- a/rsbuild.config.mjs
+++ b/rsbuild.config.mjs
@@ -3,4 +3,13 @@ import { pluginReact } from '@rsbuild/plugin-react';

 export default defineConfig({
   plugins: [pluginReact()],
+  tools: {
+    swc: {
+      jsc: {
+        experimental: {
+          plugins: [['@lingui/swc-plugin', {}]],
+        },
+      },
+    },
+  },
 });
EOF

I’ll use jq to add scripts for lingui’s extract and compile commands (documentation) to the scripts section of our package.json file.

jq '.scripts += {"extract": "lingui extract", "compile": "lingui compile"}' package.json | sponge package.json

Now we need two things that don’t exist yet: a function to dynamically load the locales (borrowing heavily from the one in their documentation) and some buttons that would let us switch languages.

First up, that dynamic loading function.

cat <<'EOF' > src/i18n.js
import { i18n } from '@lingui/core';

export const defaultLocale = 'en';

export async function dynamicActivate(locale) {
  const { messages } = await import(`./locales/${locale}/messages`);
  i18n.load(locale, messages);
  i18n.activate(locale);
  return i18n;
}
EOF

And then our fancy buttons.

cat <<'EOF' > src/LocaleSwitcher.jsx
import React from 'react';
import { locales, dynamicActivate } from './i18n.js';

function LocaleSwitcher() {
  return (
    <div>
      <button type="button" onClick={async () => dynamicActivate('en')}>
        English
      </button>
      <button type="button" onClick={async () => dynamicActivate('fr')}>
        Français
      </button>
    </div>
  );
}

export default LocaleSwitcher;
EOF

Now we’ll use that dynamicActivate function to help set up Lingui’s I18nProvider

patch src/index.jsx <<'EOF'
diff --git a/src/index.jsx b/src/index.jsx
index 65a8dbf..c919c24 100644
--- a/src/index.jsx
+++ b/src/index.jsx
@@ -1,10 +1,16 @@
 import React from 'react';
 import ReactDOM from 'react-dom/client';
 import App from './App';
+import { I18nProvider } from '@lingui/react';
+import { dynamicActivate, defaultLocale } from './i18n';
+
+const i18n = await dynamicActivate(defaultLocale);

 const root = ReactDOM.createRoot(document.getElementById('root'));
 root.render(
   <React.StrictMode>
-    <App />
+    <I18nProvider i18n={i18n}>
+      <App />
+    </I18nProvider>
   </React.StrictMode>,
 );
EOF

We’ll mark the default text in the src/App.jsx for translation and pull in that <LocaleSwitcher/> component so we can see this thing working.

patch src/App.jsx <<'EOF'
diff --git a/src/App.jsx b/src/App.jsx
index dff1751..629fab9 100644
--- a/src/App.jsx
+++ b/src/App.jsx
@@ -1,10 +1,18 @@
+import React from 'react';
+import { Trans } from '@lingui/react/macro';
+import LocaleSwitcher from './LocaleSwitcher.jsx';
 import './App.css';

 const App = () => {
   return (
     <div className="content">
-      <h1>Rsbuild with React</h1>
-      <p>Start building amazing things with Rsbuild.</p>
+      <LocaleSwitcher />
+      <h1>
+        <Trans>Rsbuild with React</Trans>
+      </h1>
+      <p>
+        <Trans>Start building amazing things with Rsbuild.</Trans>
+      </p>
     </div>
   );
 };
EOF

With the setup out of the way, we’ll use the lingui extract command to comb through our code for translatable strings and write them into the translation files. Notice it even tells us that we’re missing some translations!

$ npm run extract

> [email protected] extract
> lingui extract

✔
Catalog statistics for src/locales/{locale}/messages:
┌─────────────┬─────────────┬─────────┐
│ Language    │ Total count │ Missing │
├─────────────┼─────────────┼─────────┤
│ fr          │      2      │    2    │
│ en (source) │      2      │    -    │
└─────────────┴─────────────┴─────────┘

That extract command will extract all the strings marked as translatable with the <Trans> component into the .po files in the locales directory. Then we’ll update the non-default translation file in src/locales/fr/messages.po with the translated text.

patch src/locales/fr/messages.po <<'EOF'
diff --git a/src/locales/fr/messages.po b/src/locales/fr/messages.po
index 0c52442..f7dabe8 100644
--- a/src/locales/fr/messages.po
+++ b/src/locales/fr/messages.po
@@ -15,8 +15,8 @@ msgstr ""

 #: src/App.jsx:11
 msgid "Rsbuild with React"
-msgstr ""
+msgstr "Rsbuild avec React"

 #: src/App.jsx:14
 msgid "Start building amazing things with Rsbuild."
-msgstr ""
+msgstr "Commencez à créer des choses incroyables avec Rsbuild."
EOF

Don’t forget to run npm run compile to get those strings ready for use in your app. After that, run npm run dev to admire your freshly internationalized application!

Only enterprise architects can save us from Enterprise Architecture

Enterprise architecture (EA) is a troublesome discipline. I think it’s fair to argue that the famous Bezos API mandate and the birth of AWS are both essentially enterprise architecture efforts as is Simian Army from Netflix. These efforts have clearly delivered a huge positive business impact, but it’s much harder to make that case for the version of EA that exists in government.

For this government version, if we look beyond the tendencies towards self-referential documentation, and the use of frameworks that lack empirical grounding, there is an increasingly visible conflict with a growing body of knowledge about risk and resilience that is worth considering.

In Canada, Treasury Boards Policy on Service and Digital requires the GC CIO to define an enterprise architecture while the Directive on Service and Digital requires the departmental CIOs to “align” with it.

EA is used to design a target architecture, but more people are familiar with it as a project gating mechanism where the pressure to “align” is applied to projects. Mostly this takes the form of EA arguing for centralization and deduplication largely justified in terms of imagined cost savings (despite the fact that the combination of virtualization and usage-based billing has largely eliminated the benefits of shared infrastructure).

This focus stands in sharp contrast with the literature on resilience, which largely views this sort of cost-optimization activity as stripping a system of it’s adaptive capacity.

What’s common to all of these approaches- robustness, redundancy, and resilience, especially through diversity and decentralization- is that they are not efficient. Making systems resilient is fundamentally at odds with optimization, because optimizing a system means taking out any slack.

Deb Chachra, How Infrastructure Works

Since this stuff can feel pretty abstract, we can try to make this concrete with a look at Treasury Board’s Sign-in-Canada service which is a key part of their “target enterprise architecture”.

The Government of Canada has ~208 Departments and Agencies, 86 of which have their own accounts. This is often held up as an example of of inefficiency and duplication, and the kind of thing that EA exists to fix. As TBS describes: “Sign‑in Canada is a proposal for a unified authentication mechanism for all government digital engagement with citizens.”

If you skip past the meetings required to get all 86 systems to use Sign-in-Canada, the end result would be a “star graph” style architecture; Sign-in-Canada in the center, with digital services connecting to it.

A “star graph”. Imagine the central point as a central sign-in service, or some other shared resource (maybe a shared drive, or a firewall) with other users/systems connecting to it.

Prized for efficiency and especially for central control this star-graph style architecture shows up everywhere in governments. To get to this architecture, EA practitioners apply steady pressure in those meetings (those gating functions of Enterprise Architecture Review Boards) to avoid new sign-in systems and ensure new and existing systems connect to/leverage Sign-in-Canada.

In graph theory there is a term for networks that are formed under such conditions; “preferential attachment“, where new “nodes” in the network attach to existing popular nodes.

Networks formed under a preferential attachment model (called “scale-free” in the literature) have some really interesting (and well studied) properties that I think are exactly what EA is trying to encourage; networks formed like this are surprisingly robust to random failures.

If you imagine the power/cooling/rack space constraints of a traditional physical data center, and the challenge of staying within those limits while limiting the effects of random failures, the centralization/deduplication focus of EA is a huge benefit.

A demonstration from the online Network Science textbook of how scale free networks are surprisingly difficult to destroy by randomly removing nodes.


But “scale-free” networks also have another property: They are very fragile to targeted attack. Only a handful of highly connected nodes need to be removed before the network is completely destroyed. If targeted attacks are suddenly the concern, the preferential attachment playbook, starts to look like a problem rather than a solution.

targeted-attack
A demonstration from the Network Science textbook showing how specifically targeting central nodes quickly destroys a scale-free network.

It’s these ideas that show why an EA practice narrowly focused on reuse/centralization/deduplication ends up conflicting with resilience engineering and modern security architecture.

Through that resilience lens, the success of Sign-in-Canada means a successful hack (the Okta breach gives us a preview) could paralyze 86 government organizations, something that isn’t currently possible given the redundancy of our current “inefficient” system.

In academic terms what we’ve done is increase our systems “fragility”, it’s a well known byproduct of the kinds of optimizations that EA is tasked with making.

We need to understand that this mechanistic goal of optimization as creating this terrible fragility and that we need to try and think about how we can mitigate against this.

Paul Larcey: Illusion of Control 2023

These system/network properties are well known enough that the US military has developed an algorithm that will induce fragility in human organizations. It uses this to make networks (terror networks in their case) more vulnerable to targeted attack.

The algorithm is called “greedy fragile” and it works by selecting nodes for “removal” via “shaping operations” (you can imagine what removing someone from a social network means in a military context), so that the resulting network is more centralized (“star-like”) and fragile; centralizing as a way maximize the impact of a future attack.

Explaining the goal of military “shaping operations”, to make a network more “star-like” and fragile.

While it might sound uncharitable to lay the responsibility for systemic fragility at the feet of enterprise architecture it is literally the mandate of these groups to identify and make many of these optimizations happen. It’s worth saying the executives fixated on centralization and security’s penchant for highly centralized security “solutions” are big contributors too.

I would argue the 2022 hack of Global Affairs which brought down the entire department for over a month is an example of of this fragility. When an entire department can fail as a single unit, this is an architectural failure as much as it is a security failure; one that says a lot about the level of centralization involved.

It’s worth saying that architecting for resilience definitely still counts as “enterprise architecture”, and in that way I think EA is actually more important than ever. However as pointed out in How infrastructure Works, it would be a big shift from current practice.

“Designing infrastructural systems for resilience rather than optimizing them for and efficiency is an epistemological shift”

Deb Chachra, How Infrastructure Works

We very much need EA teams (and security architecture teams) to make that shift to assuming targeted attacks and focusing on resilience. The EA folks I’ve met are brilliant analysts and more than capable of updating their playbooks with ideas from complex systems, cell-based architecture, resilience patterns like the bulkhead pattern, chaos engineering, or Team-Topologies and using them to build more resilient architectures at every level: both system and organizational.

With Global Affairs, FINTRAC and RCMP all hit within a few weeks of each other here in early 2024, making resilience a priority across the government is crucial and there is nobody better placed to do that than enterprise architects.

Modernise security to modernise government

With the resignation of the CIO of the Government of Canada, the person placed at the top of the Canadian public service to fix the existing approach to IT, there is lots of discussion about what’s broken and how to fix it.

Across these discussions, one thing stands out to me: IT security always seems to get a pass in discussions of fixing/modernising IT.

This post is an attempt to fix that.

As the article about the CIO points out, “All policies and programs today depend on technology”. IT Security’s Security Assessment and Authorization (SA&A) process applies to all IT systems therefore landing on the critical path of “all policies and programs”. This one process adds a 6-24 month delay to every initiative and somehow escapes any notice or criticism at all.

If you imagine some policy research, maybe a public consultation and then implementation work, plus 6-24 months caught in the SA&A process, it should be clear that a single term in office may not be enough be able to craft and launch certain initiatives let alone see benefits from them while in office. Hopefully all political parties can agree fixing this is in their best interests.

As a pure audit process, the SA&A is divorced from the technical work of securing systems (strangely done by operations groups or developers, rather than by security groups) leaving lots of room to reshape (or eliminate) this process without threatening actual security work. Improvements in this process are probably the single most impactful change that can be made in government.

It’s also key to accelerating all other modernisation initiatives.

Everyone in Ottawa is well aware that within each department lies one or more likely political-career-ending ticking legacy IT timebombs. Whether this is the the failure of the system itself, or of the initiative launched to fix it, or even just the political fallout from fixed capacity systems failing to handle a surge in demand, every department has these and the only question is who will be in office when it happens.

Though you’d never guess, inside the government it is actually known how to build systems that can be modernised incrementally, changed quickly, rarely have user visible downtime and can expand to handle the waves of traffic without falling over.

The architecture that allows this (known as microservices) was made mandatory by TBS in the 2017 Directive on Service and Digital. The Directives successor (the Enterprise Architecture Framework) doesn’t use the term directly but requires that developers “design systems as highly modular and loosely coupled services” and several other hallmarks of microservices architecture that allow for building resilient digital services.

I think TBS was correct in it’s assessment that this architecture is key to many modernisation initiatives and avoiding legacy system replacements just as inflexible as their predecessors. Treasury Board themselves describes the difference between current practice as their target architecture as a “major shift” but the number of departments willing/able to make that shift hovers close to zero.

AWS uses the same microservices architecture to deliver their digital services and promotes it, along with the infrastructure and team structures needed to support it, under the banner “Modern Applications“. Substantially similar advice is given by Google and others and these best practices have been worked into TBS’s policy since 2017.

While much of TBS IT policy has been refreshed, all core IT security guidance and patterns are built around pre-cloud ideas (ITSG 22 from 2007, ITSG 38 from 2009) and process (ITSG 33 from 2012).

While TBS might want departments to adopt microservices (created circa 2010 around the “death” of SOA), it’s the 1990s-era 3-tier architecture (what ITSG-38 still calls the “preferred architecture”) that the network and security infrastructure is set up to support, rather than the fancy compute clusters and cloud functions needed for the microservices architecture; an application architecture that exists to fix the very visible problems governments (and others) have with availability and scalability.

Unfortunately, these legacy design patterns mean that security teams spend months or years on non-nonsensical routing and subnets and placing fixed-capacity Virtual Machine based security appliances between the internet and services running in the cloud before a cloud account can be used. Beyond the staggering delay, the result is an anti-pattern where the network layer cancels out the scalability and availability benefits of the application architecture above and the cloud architecture below.

Fixed capacity firewalls can’t handle waves of traffic like AWS Lambda or Google Cloud Run, and high-availability microservices architecture doesn’t matter when security brings down the firewall in front to patch it.

Similar conflicts exist with TBS’s requirements for the Dev(Sec)Ops and “multidisciplinary teams” practices needed for this architecture. These practices are likely not feasible given almost every action currently requires manual security approvals, or the current interpretations and enforcement of ITSG-33’s AC-5 “separation of duties” and the naive insistence that distant monitoring/SoC teams operating with zero context can somehow parse meaning out of the complex communication patterns of microservices applications.

While TBS has updated it’s policy to require agile development practices, ITSG-33, the foundation of all government security process is explicitly waterfall, and while adapting it to agile is theoretically possible, it’s developers that get exposure to agile methods, rather than the well intentioned auditors and former network admins that populate most security groups.

Beyond the shift to agile, the division of labour (the split between dev/sec/ops) systemically undermines ITSG-33’s approach: the technical work of securing systems builds security engineering expertise in operations groups, while the spreadsheet-based audit work of security teams chases away technical talent and steadily erodes any remaining technical capacity.

This dynamic reliably creates a skills imbalance that undermines the audit/watchdog role ITSG-33 imagines for security teams; The 2019 State of DevOps report noted a similar effect with CoEs: “This disconnect between theory and hands-on practice will eventually threaten their expertise”. Luck and smart hiring can help, but security teams are forever swimming against this current. When you mix security and compliance, what survives is compliance.

The failure case is the familiar security-theater of the audit process: non-technical auditors “verifying” the work of developers and operations via screenshots without being able to read or run code and executives placing their faith in this paperwork rather than the judgement of the dev/ops teams doing the engineering work of securing systems.

Typically contractor driven, each one costs over $70,000 and at least 6 months for a paper audit of a simplistic one-piece monolithic application. Nobody is sure how to apply this process to the microservices architecture TBS is pushing where each application is made of dozens, or even thousands of separate applications.

This process is almost single-handedly responsible for the lack of progress on modernisation: Since all departments have hundreds of applications, assuming an optimistic audit timeline of 6 months each, even the most milquetoast modernization effort implies a SA&A/audit process bottleneck measured in decades.

Surrounded by hollowed-out waterfall security processes and pre-cloud security architecture that no-one seems equipped to change, Treasury Board’s vision of the government delivering modern, reliable digital services flounders.

The idea here is that in many cases modernising security is a precondition to successfully modernising anything else. For those that overlook security, the assumptions embedded in their tools, processes and architectures will subtly but steadily undermine their efforts. Treasury Board is filled with smart policy analysts learning this the hard way.

Security is the base of the modernisation pyramid… start there to fix things.

ArangoDB on Kubernetes

Running a database on Kubernetes is still considered a bit of a novelty, but ArangoDB has been doing exactly that as part of their managed service, and has released the Kube-ArangoDB operator they built to do it, so we can do it too.

I found there was a bit of a learning curve with Kube-ArangoDB, so the idea here is to try to flatten it a bit for others. To do that I’ve created a repo showing a sane single instance setup using Kustomize and Minikube.

Kustomizing Kube-ArangoDB

The key file in that repo is the kustomization.yaml file. Kube-ArangoDB installs into the default namespace, so we’re using kustomize’s namespace option to ensure that everything ends up in the db namespace.

The main event is the resources: we’re pulling Kube-ArangoDB directly from GitHub and adding two files of our own.

The in the replicas section we’re just saying to run only a single instance of the various operators (since we’re running a single instance of the database).

And finally, we’re saying a secret called arangodb from a .env file.

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: db
resources:
- db-namespace.yaml
- arangodeployment.yaml
- https://raw.githubusercontent.com/arangodb/kube-arangodb/1.1.3/manifests/arango-crd.yaml
- https://raw.githubusercontent.com/arangodb/kube-arangodb/1.1.3/manifests/arango-deployment.yaml
- https://raw.githubusercontent.com/arangodb/kube-arangodb/1.1.3/manifests/arango-storage.yaml
- https://raw.githubusercontent.com/arangodb/kube-arangodb/1.1.3/manifests/arango-deployment-replication.yaml
replicas:
- name: arango-deployment-replication-operator
  count: 1
- name: arango-deployment-operator
  count: 1
- name: arango-storage-operator
  count: 1
secretGenerator:
- envs:
  - arangodb.env
  name: arangodb
  namespace: db
generatorOptions:
  disableNameSuffixHash: true

In the secretGenerator section, we told Kustomize to expect a .env file in the directory, so we should create that next. The values in that file will be used as the credentials for the root user.

cat < arangodb.env
username=root
password=test
EOF

Running it

The the setup complete, we’ll start minikube, and build and apply the config.

minikube start
kustomize build . | kubectl apply -f -

You can watch the creation of the pods and pvc, and when it looks like this, you’ll know it’s ready.

$ kubectl get po,pvc -n db
NAME                                                          READY   STATUS    RESTARTS   AGE
pod/arango-deployment-operator-7c54bb947-67qdn                1/1     Running   0          3m49s
pod/arango-deployment-replication-operator-558b49f785-k99hf   1/1     Running   0          3m49s
pod/arango-storage-operator-68fb5f6949-zzf4c                  1/1     Running   0          3m49s
pod/arangodb-sngl-miotcqdv-435cf0                             2/2     Running   0          3m9s

NAME                                             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistentvolumeclaim/arangodb-single-miotcqdv   Bound    pvc-71fee7f0-2387-4985-a551-a79ab8671a34   10Gi       RWO            standard       3m9s

With that running, you can connect to the admin interface by forwarding the ports to your local machine. The name of the pod will be different for you.

kubectl port-forward -n db svc/arangodb 8529:8529
Forwarding from 127.0.0.1:8529 -> 8529
Forwarding from [::1]:8529 -> 8529

With that you should be able to connect to localhost:8529, with the credentials you gave above.

That was easy

I didn’t find Kube-ArangoDB super approachable at first. After piecing a few things together and a few reps, I’m really impressed with the how easy it is to get my favourite database up and running in Kubernetes.

A look at Overlay FS

Lots has been written about how Docker combines linux kernel features like namespaces and cgroups to isolate processes. One overlooked kernel feature that I find really interesting is Overlay FS.

Overlay FS was built into the kernel back in 2014, and provides a way to “present a filesystem which is the result over overlaying one filesystem on top of the other.”

To explore what this means, lets create some files and folders to experiment with.

$ for i in a b c; do mkdir "$i" && touch "$i/$i.txt"; done
$ mkdir merged
$ tree
.
├── a
│   └── a.txt
├── b
│   └── b.txt
├── c
│   └── c.txt
└── merged

4 directories, 3 files

At this point we can use Overlay FS to overlay the contents of a, b and c and mount the result in the merged folder.

$ sudo mount -t overlay -o lowerdir=a:b:c none merged
$ tree
.
├── a
│   └── a.txt
├── b
│   └── b.txt
├── c
│   └── c.txt
└── merged
    ├── a.txt
    ├── b.txt
    └── c.txt

4 directories, 6 files
$ sudo umount merged

With merged containing the union of a,b and c suddenly the name “union mount” makes a lot of sense.

If you try to write to the files in our union mount, you will discover they are not writable.

$ echo a > merged/a.txt
bash: merged/a.txt: Read-only file system

To make them writable, we will need to provide an “upper” directory, and an empty scratch directory called a “working” directory. We’ll use c as our writable upper directory.

$ mkdir working
$ sudo mount -t overlay -o lowerdir=a:b,upperdir=c,workdir=working none merged

When we write to a file in one of the lower directories, it is copied into a new file in the upper directory. Writing to merged/a.txt creates a new file with a different inode than a/a.txt in the upper directory.

$ tree
.
├── a
│   └── a.txt
├── b
│   └── b.txt
├── c
│   └── c.txt
├── merged
│   ├── a.txt
│   ├── b.txt
│   └── c.txt
└── working
    └── work [error opening dir]

6 directories, 6 files
$ echo a > merged/a.txt
$ tree --inodes
.
├── [34214129]  a
│   └── [34214130]  a.txt
├── [34217380]  b
│   └── [34217392]  b.txt
├── [34217393]  c
│   ├── [34737071]  a.txt
│   └── [34211503]  c.txt
├── [34217393]  merged
│   ├── [34214130]  a.txt
│   ├── [34217392]  b.txt
│   └── [34211503]  c.txt
└── [34737069]  working
    └── [34737070]  work [error opening dir]

6 directories, 7 files

Writing to merged/c.txt modifies the file directly, since c is our writable upper directory.

$ echo c > merged/c.txt
$ tree --inodes
.
├── [34214129]  a
│   └── [34214130]  a.txt
├── [34217380]  b
│   └── [34217392]  b.txt
├── [34217393]  c
│   ├── [34737071]  a.txt
│   └── [34211503]  c.txt
├── [34217393]  merged
│   ├── [34214130]  a.txt
│   ├── [34217392]  b.txt
│   └── [34211503]  c.txt
└── [34737069]  working
    └── [34737070]  work [error opening dir]

6 directories, 7 files

After a little fooling around with Overlay FS, the GraphDriver output from docker inspect starts looking pretty familiar.

$ docker inspect node:alpine | jq .[].GraphDriver.Data
{
  "LowerDir": "/var/lib/docker/overlay2/b999fe6781e01fa651a9cb42bcc014dbbe0a9b4d61e242b97361912411de4b38/diff:/var/lib/docker/overlay2/1c15909e91591947d22f243c1326512b5e86d6541f83b4bf9751de99c27b89e8/diff:/var/lib/docker/overlay2/12754a060228233b3d47bfb9d6aad0312430560fece5feef8848de61754ef3ee/diff",
  "MergedDir": "/var/lib/docker/overlay2/25aba5e7a6fcab08d4280bce17398a7be3c1736ee12f8695e7e1e475f3acc3ec/merged",
  "UpperDir": "/var/lib/docker/overlay2/25aba5e7a6fcab08d4280bce17398a7be3c1736ee12f8695e7e1e475f3acc3ec/diff",
  "WorkDir": "/var/lib/docker/overlay2/25aba5e7a6fcab08d4280bce17398a7be3c1736ee12f8695e7e1e475f3acc3ec/work"
}

We can use these like Docker does to mount the file system for the node:alpine image into our merged directory, and then take a peek to see the nodejs binary that image includes.

$ lower=$(docker inspect node:alpine | jq .[].GraphDriver.Data.LowerDir | tr -d \")
$ upper=$(docker inspect node:alpine | jq .[].GraphDriver.Data.UpperDir | tr -d \")
$ sudo mount -t overlay -o lowerdir=$lower,upperdir=$upper,workdir=working none merged
$ ls merged/usr/local/bin/
docker-entrypoint.sh  node  nodejs  npm  npx  yarn  yarnpkg

From there we could do a partial version of what Docker does for us, using the unshare command to give a process it’s own mount namespace and chroot it to the merged folder. With our merged directory as it’s root, running ls /usr/local/bin command should give us those node binaries again.

$ sudo unshare --mount --root=./merged ls /usr/local/bin
docker-entrypoint.sh  nodejs                npx                   yarnpkg
node                  npm                   yarn

Seeing Overlay FS and Docker’s usage of it has really helped flesh out my mental model of containers. Watching docker pull download layer after layer has taken on a whole new significance.

Kubernetes config with Kustomize

If you are working with Kubernetes, it’s pretty important to be able to generate variations of your configuration. Your production cluster probably has TLS settings that won’t make sense in while testing locally in Minikube, or you’ll have service types that should be NodePort here and LoadBalancer there.

While tools like Helm tackle this problem with PHP style templates, kustomize offers a different approach based on constructing config via composition.

With each piece of Kubernetes config uniquely identified, composite key style, through a combination of kind, apiVersion and metadata.name, kustomize can generate new config by patching one yaml with others.

This lets us generate config without wondering what command line arguments a given piece of config might have been created from, or having turing complete programming languages embedded in it.

This sounds a little stranger than it is, so let’s make this more concrete with an example.

Lets say you clone a project and see kustomization.yaml files lurking in some of the subdirectories.

mike@sleepycat:~/projects/hello_world$ tree
.
├── base
│   ├── helloworld-deployment.yaml
│   ├── helloworld-service.yaml
│   └── kustomization.yaml
└── overlays
    ├── gke
    │   ├── helloworld-service.yaml
    │   └── kustomization.yaml
    └── minikube
        ├── helloworld-service.yaml
        └── kustomization.yaml

4 directories, 7 files

This project is using kustomize. The folder structure suggests that there is some base configuration, and variations for GKE and Minikube. We can generate the Minikube version with kubectl kustomize overlays/minikube.

This actually shows one of the nice things about kustomize, you probably already have it, since it was built into the kubectl command in version 1.14 after a brief kerfuffle.

mike@sleepycat:~/projects/hello_world$ kubectl kustomize overlays/minikube/
apiVersion: v1
kind: Service
metadata:
  labels:
    app: helloworld
  name: helloworld
spec:
  ports:
  - name: "3000"
    port: 3000
    protocol: TCP
    targetPort: 3000
  selector:
    app: helloworld
  type: NodePort
status:
  loadBalancer: {}
...more yaml...

If you want to get this config into your GKE cluster, it would be as simple as kubectl apply -k overlays/gke.

This tiny example obscures one of the other benefits of kustomize: it sorts the configuration it outputs in the following order to avoid dependency problems:

  • Namespace
  • StorageClass
  • CustomResourceDefinition
  • MutatingWebhookConfiguration
  • ServiceAccount
  • PodSecurityPolicy
  • Role
  • ClusterRole
  • RoleBinding
  • ClusterRoleBinding
  • ConfigMap
  • Secret
  • Service
  • LimitRange
  • Deployment
  • StatefulSet
  • CronJob
  • PodDisruptionBudget

Because it sorts it’s output this way, kustomize makes it far less error prone to get your application up and running.

Setting up your project to use kustomize

To get your project set up with kustomize, you will want a little more than the functionality built into kubectl. There are a few ways to install kustomize, put I think the easiest (assuming you have Go on your system) is go get:

go get sigs.k8s.io/kustomize

With that installed, we can create some folders and use the fd command to give us the lay of the land.

$ mkdir -p {base,overlays/{gke,minikube}}
$ fd
base
overlays
overlays/gke
overlays/minikube

In the base folder we’ll need to create a kustomization file some config. Then we tell kustomize to add the config as resources to be patched.

base$ touch kustomization.yaml
base$ kubectl create deployment helloworld --image=mikewilliamson/helloworld --dry-run -o yaml > helloworld-deployment.yaml
base$ kubectl create service clusterip helloworld --tcp=3000 --dry-run -o yaml > helloworld-service.yaml
base$ kustomize edit add resource helloworld-*

The kustomize edit series of commands (add, fix, remove, set) all exist to modify the kustomization.yaml file.

You can see that kustomize edit add resource helloworld-* added a resources: key with an array of explicit references rather than an implicit file glob.

$ cat base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- helloworld-deployment.yaml
- helloworld-service.yaml

Moving over to the overlays/minikube folder we can do something similar.

minikube$ touch kustomization.yaml
minikube$ kubectl create service nodeport helloworld --tcp=3000 --dry-run -o yaml > helloworld-service.yaml
minikube$ kustomize edit add patch helloworld-service.yaml
minikube$ kustomize edit add base ../../base

Worth noting is the base folder where kustomize will look for the bases to apply the patches to. The resulting kustomization.yaml file looks like the following:

$ cat overlays/minikube/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
patchesStrategicMerge:
- helloworld-service.yaml
bases:
- ../../base

One final jump into the overlays/gke folder gives us everything we will need to see the difference between two configs.

gke$ kubectl create service loadbalancer helloworld --tcp=3000 --dry-run -o yaml > helloworld-service.yaml
gke$ touch kustomization.yaml
gke$ kustomize edit add base ../../base
gke$ kustomize edit add patch helloworld-service.yaml

Finally we can generate the two different configs and diff them to see the changes.

$ diff -u --color <(kustomize build overlays/gke) <(kustomize build overlays/minikube/)
--- /dev/fd/63	2019-05-31 21:38:56.040572159 -0400
+++ /dev/fd/62	2019-05-31 21:38:56.041572186 -0400
@@ -12,7 +12,7 @@
     targetPort: 3000
   selector:
     app: helloworld
-  type: LoadBalancer
+  type: NodePort
 status:
   loadBalancer: {}
 ---

It won't surprise you that what you see here is just scratching the surface. There are many more fields that are possible in a kustomization.yaml file, and nuance in what should go in which file given that kustomize only allows addition not removal.

The approach kustomize is pursuing feels really novel in a field that has been dominated by DSLs (which hide the underlying construct) and templating (with the dangers of embedded languages and concatenating strings).

Working this way really helps deliver on the promise of portability made by Kubernetes; Thanks to kustomize, you’re only a few files and a `kustomize build` away from replatforming if you need to.

Exploring GraphQL(.js)

When Facebook released GraphQL in 2015 they released two separate things; a specification and a working implementation of the specification in JavaScript called GraphQL.js.

GraphQL.js acts as a “reference implementation” for people implementing GraphQL in other languages but it’s also a polished production-worthy JavaScript library at the heart of the JavaScript GraphQL ecosystem.

GraphQL.js gives us, among other things, the graphql function, which is what does the work of turning a query into a response.

graphql(schema, `{ hello }`)
{
  "data": {
    "hello": "world"
  }
}

The graphql function above is taking two arguments, one is the { hello } query, the other, the schema could use a little explanation.

The Schema

In GraphQL you define types for all the things you want to make available.
GraphQL.org has a simple example schema written in Schema Definition Language.

type Book {
  title: String
  author: Author
  price: Float
}

type Author {
  name: String
  books: [Book]
}

type Query {
  books: [Book]
  authors: [Author]
}

There are a few types we defined (Book, Author, Query) and some that GraphQL already knew about (String, Float). All of those types are collectively referred to as our schema.

You can define your schema with Schema Definition Language (SDL) as above or, as we will do, use plain JavaScript. It’s up to you. For our little adventure today I’ll use JavaScript and define a single field called “hello” on the mandatory root Query type.

var { GraphQLObjectType, GraphQLString, GraphQLSchema } = require('graphql')
var query = new GraphQLObjectType({
  name: 'Query',
  fields: {
    hello: {type: GraphQLString, resolve: () => 'world'}
  }
})
var schema = new GraphQLSchema({ query })

The queries we receive are written in the GraphQL language, which will be checked against the types and fields we defined in our schema. In the schema above we’ve defined a single field on the Query type, and mapped a function that returns the string ‘world’ to that field.

GraphQL is a language like JavaScript or Python but the inner workings of other languages aren’t usually as visible or approachable as GraphQL.js make them. Looking at how GraphQL works can tell us a lot about how to use it well.

The life of a GraphQL query

Going from a query like { hello } to a JSON response happens in four phases:

  • Lexing
  • Parsing
  • Validation
  • Execution

Let’s take that little { hello } query and see what running it through that function looks like.

Lexing: turning strings into tokens

The query { hello } is a string of characters that presumably make up a valid query in the GraphQL language. The first step in the process is splitting that string into tokens. This work is done with a lexer.

var {createLexer, Source} = require('graphql/language')
var lexer = createLexer(new Source(`{ hello }`))

The lexer can tell us the current token, and we can advance the lexer to the next token by calling lexer.advance()

lexer.token
Tok {
  kind: '',
  start: 0,
  end: 0,
  line: 0,
  column: 0,
  value: undefined,
  prev: null,
  next: null }

lexer.advance()
Tok {
  kind: '{',
  start: 0,
  end: 1,
  line: 1,
  column: 1,
  value: undefined,
  next: null }

lexer.advance()
Tok {
  kind: 'Name',
  start: 1,
  end: 6,
  line: 1,
  column: 2,
  value: 'hello',
  next: null }

lexer.advance()
Tok {
  kind: '}',
  start: 6,
  end: 7,
  line: 1,
  column: 7,
  value: undefined,
  next: null }

lexer.advance()
Tok {
  kind: '',
  start: 7,
  end: 7,
  line: 1,
  column: 8,
  value: undefined,
  next: null }

It’s important to note that we are advancing by token not by character. Characters like commas, spaces, and new lines are all allowed in GraphQL since they make code nice to read, but the lexer will skip right past them in search of the next meaningful token.
These two queries will produce the same tokens you see above.

createLexer(new Source(`{ hello }`))
createLexer(new Source(`    ,,,\r\n{,\n,,hello,\n,},,\t,\r`))

The lexer also represents the first pass of input validation that GraphQL provides. Invalid characters are rejected by the lexer.

createLexer(new Source("*&^%$")).advance()
Syntax Error: Cannot parse the unexpected character "*"

Parsing: turning tokens into nodes, and nodes into trees

Parsing is about using tokens to build higher level objects called nodes.
node
If you look you can see the tokens in there but nodes have more going on.

If you use a tool like grep or ripgrep to search through the source of GraphQL.js you will see where these nodes are coming from. There specialised parsing functions for each type of node, the majority of which are used internally by the parse function. These functions follow the pattern of accepting a lexer, and returning a node.

$ rg "function parse" src/language/parser.js
124:export function parse(
146:export function parseValue(
168:export function parseType(
183:function parseName(lexer: Lexer): NameNode {
197:function parseDocument(lexer: Lexer): DocumentNode {
212:function parseDefinition(lexer: Lexer): DefinitionNode {
246:function parseExecutableDefinition(lexer: Lexer): ExecutableDefinitionNode {
271:function parseOperationDefinition(lexer: Lexer): OperationDefinitionNode {
303:function parseOperationType(lexer: Lexer): OperationTypeNode

Using the parse function is a simple as passing it a GraphQL string. If we print the output of parse with some spacing we can see the what’s actually happening: it’s constructing a tree. Specifically, it’s an Abstract Syntax Tree (AST).

> var { parse } = require('graphql/language')
> console.log(JSON.stringify(parse("{hello}"), null, 2))
{
  "kind": "Document",
  "definitions": [
    {
      "kind": "OperationDefinition",
      "operation": "query",
      "variableDefinitions": [],
      "directives": [],
      "selectionSet": {
        "kind": "SelectionSet",
        "selections": [
          {
            "kind": "Field",
            "name": {
              "kind": "Name",
              "value": "hello",
              "loc": {
                "start": 1,
                "end": 6
              }
            },
            "arguments": [],
            "directives": [],
            "loc": {
              "start": 1,
              "end": 6
            }
          }
        ],
        "loc": {
          "start": 0,
          "end": 7
        }
      },
      "loc": {
        "start": 0,
        "end": 7
      }
    }
  ],
  "loc": {
    "start": 0,
    "end": 7
  }
}

If you play with this, or a more deeply nested query you can see a patten emerge. You’ll see SelectionSets containing selections containing SelectionSets. With a structure like this, a function that calls itself would be able to walk it’s way down the this entire object. We’re all set up for some recursive evaluation.

Validation: Walking the tree with visitors

The reason for an AST is to enable us to do some processing, which is exactly what happens in the validation step. Here we are looking to make some decisions about the tree and how well it lines up with our schema.

For any of that to happen, we need a way to walk the tree and examine the nodes. For that there is a pattern called the Vistor pattern, which GraphQL.js provides an implementation of.

To use it we require the visit function and make a visitor.

var { visit } = require('graphql')

var depth = 0
var vistor = {
  enter: node => {
    depth++
    console.log(' '.repeat(depth).concat(node.kind))
    return node
  },
  leave: node => {
    depth--
    return node
  },
}

Our vistor above has enter and leave functions attached to it. These names are significant since the visit function looks for them when it comes across a new node in the tree or moves on to the next node.
The visit function accepts an AST and a visitor and you can see our visitor at work printing out the kind of the nodes being encountered.

> visit(parse(`{ hello }`, visitor)
 Document
  OperationDefinition
   SelectionSet
    Field
     Name

With the visit function providing a generic ability to traverse the tree, the next step is to use this ability to determine if this query is acceptable to us.
This happens with the validate function. By default, it seems to know that kittens are not a part of our schema.

var { validate } = require('graphql')
validate(schema, parse(`{ kittens }`))
// GraphQLError: Cannot query field "kittens" on type "Query"

The reason it knows that is that there is a third argument to the validate function. Left undefined, it defaults to an array of rules exported from ‘graphql/validation’. These “specifiedRules” are responsible for all the validations that ensure our query is safe to run.

> var { validate } = require('graphql')
> var { specifiedRules } = require('graphql/validation')
> specifiedRules
[ [Function: ExecutableDefinitions],
  [Function: UniqueOperationNames],
  [Function: LoneAnonymousOperation],
  [Function: SingleFieldSubscriptions],
  [Function: KnownTypeNames],
  [Function: FragmentsOnCompositeTypes],
  [Function: VariablesAreInputTypes],
  [Function: ScalarLeafs],
  [Function: FieldsOnCorrectType],
  [Function: UniqueFragmentNames],
  [Function: KnownFragmentNames],
  [Function: NoUnusedFragments],
  [Function: PossibleFragmentSpreads],
  [Function: NoFragmentCycles],
  [Function: UniqueVariableNames],
  [Function: NoUndefinedVariables],
  [Function: NoUnusedVariables],
  [Function: KnownDirectives],
  [Function: UniqueDirectivesPerLocation],
  [Function: KnownArgumentNames],
  [Function: UniqueArgumentNames],
  [Function: ValuesOfCorrectType],
  [Function: ProvidedRequiredArguments],
  [Function: VariablesInAllowedPosition],
  [Function: OverlappingFieldsCanBeMerged],
  [Function: UniqueInputFieldNames] ]

validate(schema, parse(`{ kittens }`), specifiedRules)
// GraphQLError: Cannot query field "kittens" on type "Query"

In there you can see checks to ensure that the query only includes known types (KnownTypeNames) and things like variables having unique names (UniqueVariableNames).
This is the next level of input validation that GraphQL provides.

Rules are just visitors

If you dig into those rules (all in src/validation/rules/) you will realize that these are all just visitors.
In our first experiment with visitors, we just printed out the node kind. If we look at this again, we can see that even our tiny little query ends up with 5 levels of depth.

visit(parse(`{ hello }`, visitor)
 Document  // 1
  OperationDefinition // 2
   SelectionSet // 3
    Field // 4
     Name // 5

Let’s say for the sake of experimentation that 4 is all we will accept. To do that we’ll write ourselves a visitor, and then pass it into the third argument to validate.

var fourDeep = context => {
  var depth = 0, maxDepth = 4 // 😈
  return {
    enter: node => {
      depth++
      if (depth > maxDepth)
        context.reportError(new GraphQLError('💥', [node]))
      }
      return node
    },
    leave: node => { depth--; return node },
  }
}
validate(schema, parse(`{ hello }`), [fourDeep])
// GraphQLError: 💥

If you are building a GraphQL API server, you can take a rule like this and pass it as one of the options to express-graphql, so your rule will be applied to all queries the server handles.

Execution: run resolvers. catch errors.

This us to the execution step. There isn’t much exported from ‘graphql/execution’. What’s worthy of note is here is the root object, and the defaultFieldResolver. This work in concert to ensure that wherever there isn’t a resolver function, by default you get the value for that fieldname on the root object.

var { execute, defaultFieldResolver } = require('graphql/execution')
var args = {
  schema,
  document: parse(`{ hello }`),
  // value 0 in the "value of the previous resolver" chain
  rootValue: {},
  variableValues: {},
  operationName: '',
  fieldResolver: defaultFieldResolver,
}
execute(args)
{
  "data": {
    "hello": "world"
  }
}

Why all that matters

For me the take-away in all this is a deeper appreciation of what GraphQL being a language implies.

First, giving your users a language is empowering them to ask for what they need. This is actually written directly into the spec:

GraphQL is unapologetically driven by the requirements of views and the front‐end engineers that write them. GraphQL starts with their way of thinking and requirements and builds the language and runtime necessary to enable that.

Empowering your users is always a good idea but server resources are finite, so you’ll need to think about putting limits somewhere. The fact that language evaluation is recursive means the amount of recursion and work your server is doing is determined by the person who writes the query. Knowing the mechanism to set limits on that (validation rules!) is an important security consideration.

That caution comes alongside a big security win. Formal languages and type systems are the most powerful tools we have for input validation. Rigorous input validation is one of the most powerful things we can do to increase the security of our systems. Making good use of the type system means that your code should never be run on bad inputs.

It’s because GraphQL is a language that it let’s us both empower users and increase security, and that is a rare combination indeed.

Tagged template literals and the hack that will never go away

Tagged template literals were added to the JavaScript as part of the 2015 update to the ECMAScript standard.

While a fair bit has been written about them, I’m going to argue their significance is underappreciated and I’m hoping this post will help change that. In part, it’s significant because it solves a problem people had resigned themselves to living with: SQL injection.

Before ES 2015, combining query strings with variables was done via concatenation using the plus operator.

It’s common to retrieve things from the database with information (like an product id) supplied by users, so code like this was common and resulted in many security vulnerabilities.

let query = "select * from widgets where id = " + 1 + ";"

As of ES 2015, you can now write this differently, by creating multi-line strings using backticks, and use ${} for variable interpolation within them.

let query = `select * from widgets where id = ${1};`

This is a lot nicer (less “noise”) but still problematic security-wise. It’s pairing this new syntax with another language feature known as tagged templates that give us the tools to solve sql injections once and for all:

let id = 1
// define a function to use as a "tag"
sql = (strings, ...vars) => ({strings, vars})
[Function: sql]
// call our "tag" function with a template literal
sql`select * from widgets where id = ${id};`
{ strings: [ 'select * from widgets where id = ', ';' ], vars: [ 1 ] }

What you see above is still just a function call, but it no longer works the same. Instead of doing the variable interpolation first and then calling the sql function with the resulting string (select * from widgets where id = 1;), the sql function is passed an array of string segments and the variables that are supposed to be interpolated.

You can see how different this is from the standard evaluation process by adding brackets to make this a standard function invocation; it switches back to the standard function invocation… the string is once again interpolated before being passed to the sql function, entirely losing the distinction between the value (which we probably don’t trust) and the string (that we probably do). The result is an injected string and an empty array of variables.

sql(`select * from widgets where id = ${id};`)
{ strings: 'select * from widgets where id = 1;', vars: [] }

This loss of context (the distinction between the variables/values and the query itself) is the heart of matter when it comes to SQL injection (or injection attacks generally). The moment the strings and variables are combined you have a problem on your hands.

So why not just use parameterized queries or something similar? It’s generally held that good code expresses the programmers intent. I would argue that select * from widgets where id = ${id}; perfectly expresses the programmers intent; the programmer wants the id variable to be included in the query string.

When the clearest expression of a programmers intent is also a security problem what you have is a systemic issue which requires a systemic fix.

This is why despite years of condescending security training, developer shaming and “push left” pep-talks SQL injection stubbornly remains “the hack that will never go away”. Pointing out the problem is easy, but providing implementable solutions is hard (especially since most security people don’t write code). As others have pointed out:

First, we need to deal with the standing advice of “Don’t trust your input.” This advice doesn’t give the programmers any actionable solution: what to trust, and how to build trust?

Given how ineffective the security industry has been so far, it’s fascinating to see Mike Samuel from Google’s security team as the champion of the “Template Strings” proposal. Even more telling is the mantra from his GitHub profile: “make the easiest way to express an idea in code a secure way to express that idea”.

You can see the fruits of his labour by noticing library authors leveraging this to deliver a great developer experience while doing the right thing for security. Allan Plum, the driving force behind the Arangodb Javascript driver leveraging tagged template literals to let users query ArangoDB safely.

The aql (Arango Query Language) function lets you write what would in any other language be an intent revealing SQL injection, safely returns an object with a query and some accompanying bindvars.

aql`FOR thing IN collection FILTER thing.foo == ${foo} RETURN thing`
{ query: 'FOR thing IN collection FILTER thing.foo == @value0 RETURN thing',
  bindVars: { value0: 'bar' } }

Mike Samuel himself has a number of node libraries that leverage Tagged Template Literals, among them one to safely handle shell commands.

sh`echo -- ${a} "${b}" 'c: ${c}'`

It’s important to point out that Tagged Template Literals don’t entirely solve SQL injections, since there are no guarantees that any particular tag function will do “the right thing” security-wise, but the arguments the tag function receives set library authors up for success.

Authors using them get to offer an intuitive developer experience rather than the clunkiness of prepared statements, even though the tag function may well be using them under the hood. The best experience is from safest thing; It’s a great example of creating a “pit of success” for people to fall into.

// Good security hinges on devs learning to write clunky
// stuff like this instead of the simple stuff above.
const ps = new sql.PreparedStatement(/* [pool] */)
ps.input('param', sql.Int)
ps.prepare('select * from widgets where id = @id;', err => {
    // ... error checks
    ps.execute({id: 1}, (err, result) => {
        // ... error checks
        ps.unprepare(err => {
            // ... error checks
        })
    })
})

It’s an interesting thought that JavaScripts deficiencies seem to have become it’s strength. First Ryan Dahl filled out the missing IO pieces to create Node JS and now missing features like multi-line string support provide an opportunity for some of the worlds most brilliant minds to insert cutting edge security features along-side these much needed fixes.

I’m really happy to finally see language level fixes for things that are clearly language level problems takes JavaScript next. It’s the only way I can see to move the needle in the security space and make perennial problems like “the hack that will never go away” finally go away.