Skip to content

GAAIM-standard/canonicalize

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GAAIM Canonicalization Reference Implementations

Spec: GAAIM Core v0.1-draft §4.2 (Canonical Serialization) License: Apache-2.0 Status: Reference implementations — validated against shared test vectors

This package contains reference implementations of the GAAIM canonical JSON serialization in TypeScript and C#. Both produce bit-identical output for the same input, enabling cross-platform signature interoperability.

What this code does

The canonicalizer takes a GAAIM event (a JSON object) and produces the canonical UTF-8 byte sequence that is signed per §5. This is the single source of truth for what gets hashed and signed — implementations that canonicalize differently will not verify each other's signatures.

The canonicalization procedure, per §4.2:

  1. Remove top-level signature and signaturekey attributes.
  2. Apply JCS (RFC 8785) to the remaining object:
    • Sort object keys lexicographically by UTF-16 code unit, recursively.
    • No insignificant whitespace.
    • Numbers serialized per ECMAScript ToString(Number).
    • Strings emitted as UTF-8 with minimal I-JSON escaping.
  3. Emit UTF-8 bytes (no BOM).

Per §4.2.1, absent attributes MUST NOT be synthesized as null. Both implementations enforce this: only keys present in the source object are emitted in the canonical form.

Directory layout

canonicalize/
├── README.md                              ← this file
├── test-vectors/
│   └── canonical-vectors-v0.1.json        ← 6 shared test vectors with SHA-256 hashes
├── typescript/
│   ├── package.json
│   ├── tsconfig.json
│   └── src/
│       ├── canonicalize.ts                ← the implementation
│       └── test.ts                        ← test runner
└── csharp/
    ├── src/
    │   ├── Gaaim.Canonicalization.csproj
    │   └── Canonicalizer.cs               ← the implementation
    └── tests/
        ├── Gaaim.Canonicalization.Tests.csproj
        └── Program.cs                     ← test runner

Test vectors

The test-vectors/canonical-vectors-v0.1.json file contains six normative vectors. Every conformant canonicalizer MUST produce matching output for all six. The vectors cover:

Vector What it exercises Byte length SHA-256 prefix
full-event-no-audit Complete A.1 event, all optional attributes present 903 8d028f41...
minimal-event REQUIRED fields only, many absent attributes 378 d6249473...
unicode-content Non-ASCII in keys, values, tags 502 f22007e2...
signature-stripping Source has signature/signaturekey that must be removed 378 81a503c8...
prev-null Explicit prev: null preserved vs absent keys dropped 411 8d59333b...
number-edge-cases Integer vs float (1.0 → 1), zero, negative, fractional 421 73bce3c5...

Full hashes and expected canonical forms are in the vectors file.

TypeScript — build and run

cd typescript
npm install
npm run test

Expected output:

Running 6 test vectors from .../canonical-vectors-v0.1.json

✓ full-event-no-audit
✓ minimal-event
✓ unicode-content
✓ signature-stripping
✓ prev-null
✓ number-edge-cases

Verifying canonicalSha256Hex() async helper:
✓ full-event-no-audit (async)
... [all 12 checks pass]

==================================================
6 passed, 0 failed

Status in this package: validated end-to-end. All six vectors pass, plus six additional async-helper validations.

Usage

import { canonicalize, canonicalizeToString, canonicalSha256Hex } from '@gaaim/canonicalize';

const event = {
  specversion: "1.0",
  id: "01HQ5P3KJ6X8W2YQGMZB9N4T7R",
  source: "ide-plugin://example.org/code-adapter/instance-7",
  type: "gaaim.core.artifact.created",
  time: "2026-04-04T21:30:15.123Z",
  gaaimversion: "1.0",
  profile: "core",
  data: { /* ... */ }
};

// Get canonical UTF-8 bytes (this is what you sign)
const canonicalBytes: Uint8Array = canonicalize(event);

// Or get the canonical string
const canonicalStr: string = canonicalizeToString(event);

// Or compute SHA-256 directly
const hash: string = await canonicalSha256Hex(event);

C# — build and run

cd csharp/tests
dotnet run -- ../../test-vectors/canonical-vectors-v0.1.json

Requires .NET 8 SDK or later. Uses System.Text.Json (no external dependencies).

Status in this package: written and verified by inspection against the TypeScript reference. Number-formatting edge cases (very large/small doubles using scientific notation) are the most likely divergence point; run the test vectors to confirm on your target .NET version.

Usage

using Gaaim.Canonicalization;
using System.Text.Json;

string eventJson = "{ \"specversion\": \"1.0\", /* ... */ }";

// Get canonical UTF-8 bytes (this is what you sign)
byte[] canonicalBytes = Canonicalizer.Canonicalize(eventJson);

// Or from a JsonElement
using var doc = JsonDocument.Parse(eventJson);
byte[] bytes2 = Canonicalizer.Canonicalize(doc.RootElement);

// Get canonical string
string canonicalStr = Canonicalizer.CanonicalizeToString(doc.RootElement);

// Compute SHA-256 directly
string hash = Canonicalizer.CanonicalSha256Hex(eventJson);

Implementation notes

Number serialization

The trickiest part of JCS. Both implementations handle:

  • Integer fast path: 2847"2847", 1247"1247"
  • Whole-valued doubles: 1.0"1" (matches String(1.0) === "1" in ECMAScript)
  • Negative zero: -0.0"0"
  • Fractional values: 0.92"0.92" via shortest-round-trip formatting
  • Rejects NaN and Infinity (not valid JSON)

Known limitation: the exponent threshold where scientific notation kicks in can differ between the ECMAScript algorithm and .NET's default double.ToString("R"). ECMAScript switches to exponent form at ≥ 1e21 or < 1e-6. .NET's threshold is similar but not guaranteed identical for all edge cases. Audit-event payloads rarely contain such extreme values (token counts, lines changed, durations in ms), so this matters only in unusual cases.

The C# NormalizeExponent helper converts "1E+21""1e+21" and ensures the + sign is present on positive exponents, matching ECMAScript's output format.

String escaping

Per RFC 8785 §3.2.2.2, JCS uses I-JSON (RFC 7493) string encoding: escape only the minimal set.

  • Always escaped: ", \, and control characters U+0000 through U+001F
  • Short escapes preferred: \b \t \n \f \r
  • Other control chars: \u00XX lowercase hex
  • Non-ASCII characters emitted as UTF-8 bytes, NOT as \uXXXX escapes

The unicode test vector exercises this — a canonicalizer that emits \u65E5\u672C\u8A9E instead of the UTF-8 bytes for 日本語 will fail.

Key sorting

JCS requires lexicographic sort by UTF-16 code unit. JavaScript's default Array.prototype.sort() does this. In .NET, StringComparer.Ordinal is the equivalent — it compares strings char-by-char, and char in .NET is a UTF-16 code unit.

For characters outside the Basic Multilingual Plane (code points > U+FFFF), both languages represent them as surrogate pairs in their string types, and both sort them the same way when using code-unit comparison.

Integration checklist

When you integrate these into your GAAIM producer or verifier:

  • Run the test vectors. All six MUST pass.
  • Do not modify the canonicalizer output with pretty-printing, logging instrumentation, or encoding transforms before signing.
  • Remember: sign the bytes from canonicalize(), not the string. Re-encoding the string could introduce BOMs or normalization differences.
  • Verifier: re-canonicalize on receipt before verifying signature. Don't trust that the sender canonicalized correctly — the bytes you verify must be the canonical bytes you produce from the received event.
  • After Fix 4 integration (the auditeligible attribute), regenerate your production test vectors to include auditeligible: true on L1+ events.

Verifying an event end-to-end (pseudocode)

function verifyEvent(event, registry):
    # 1. Extract signature and signaturekey
    signature = event.signature  # e.g., "ed25519:<base64url>"
    signaturekey = event.signaturekey  # e.g., "https://keys.example.com/v1/keys/adapter-2026q2"
    
    # 2. Resolve public key from registry
    keyRecord = registry.get(signaturekey)
    if keyRecord.keyUri != signaturekey: reject("registry-mismatch")
    if keyRecord.revokedAt != null: reject("key-withdrawn")
    
    # 3. Canonicalize event (signature fields stripped automatically)
    canonicalBytes = canonicalize(event)
    
    # 4. Verify signature over canonical bytes
    sigBytes = base64urlDecode(signature.after(":"))
    publicKey = parseSpkiBase64(keyRecord.publicKey)
    if not Ed25519.verify(publicKey, canonicalBytes, sigBytes):
        reject("signature-invalid")
    
    # 5. Verify chain continuity (§5.6.2) if event.auditeligible
    if event.auditeligible and event.prev != lastSeenEventId:
        flag("chain-gap")
    
    accept(event)

License

Apache-2.0. See the individual project files for details. This is reference code intended to be copied, modified, and embedded in GAAIM producers and verifiers.

About

Reference implementations of GAAIM canonical JSON serialization (RFC 8785 + GAAIM rules). TypeScript and C#.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors