Available Games

This repo contains a collection of board games, written in C (C99 and built using GNU Make), for the purpose of exploring/experimenting with reinforcement learning agents.

Available Games

Game	Tic-Tac-Toe	Pig
Num Players	2	2
State Size	3	5
Max Actions	9	6
Observation: Number of Dimensions	3	1
Observation: Dimensions	[2, 3, 3]	[3]
Observation: Size	18	3
Features: Number of Dimensions	3	1
Features: Dimensions	[2, 3, 3]	[3]
Features: Size	18	3

Note: Each game also defines a STRING_BUF_SIZE macro.

This is not shown in the table above as it is a derived constant for internal formatting and is not relevant to agent integration.

Building

The main build you will want to do is to build the static library, which will be output to bin/board_games_lib.a:

make lib

From there, you would include the headers in include/* and bin/board_games_lib.a in your project.

I've also included the ability to compile a "demo" executable, this is compiled by the demo target:

make demo

This is intended as a way to demo or smoke-test the game engine you are implementing.

Engine Structure

Each game must completely fulfil the Game interface located at include/board_game.h. This allows for the following generic game loop:

Deterministic Games

flowchart TD
    init["init()"]
    gcp["get_current_player()"]
    gva["get_valid_actions()"]
    obs["get_observation()<br>get_features()"]
    agent["Agent selects action"]
    aa["apply_action()"]
    terminal{"is_terminal?"}
    outcome["get_outcome()"]

    init --> gcp
    gcp --> gva
    gva -.-> obs
    obs -. "feeds agent's<br>decision" .-> agent
    gva --> agent
    agent --> aa
    aa --> terminal
    terminal -- "no" --> gcp
    terminal -- "yes" --> outcome

    style init fill:#f1efe8,stroke:#5f5e5a,color:#2c2c2a
    style gcp fill:#eeedfe,stroke:#534ab7,color:#26215c
    style gva fill:#eeedfe,stroke:#534ab7,color:#26215c
    style obs fill:#f1efe8,stroke:#5f5e5a,stroke-dasharray:5 5,color:#2c2c2a
    style agent fill:#f1efe8,stroke:#5f5e5a,stroke-dasharray:5 5,color:#2c2c2a
    style aa fill:#eeedfe,stroke:#534ab7,color:#26215c
    style terminal fill:#e1f5ee,stroke:#0f6e56,color:#04342c
    style outcome fill:#faece7,stroke:#993c1d,color:#4a1b0c

Note that the game loop differs slightly for games that feature an element of stochasticity, see Modelling Stochasticity for more details.

See also src/main.c for a working example.

The `Game` Interface

The Game struct (defined in include/board_game.h) is a vtable of function pointers and metadata that every game implementation must provide. All state is stored externally in a caller-allocated uint64_t array - the Game itself holds no mutable state.

Note that state is intended as a semi-opaque container for the game state. Agents are expected to observe the state of the game via the get_observation() or get_features() functions - not by inspecting the state directly! This is because some board games such as Liar's Dice or Poker are incomplete/hidden information games, so their state arrays will contain the entire information of the game state, which is not available to each player. get_observation() and get_features() expose only the information that a specific player should see.

Required Macros

Every game must define the following compile-time macros, each prefixed with a namespace (e.g. TTT_ for Tic-Tac-Toe). These are used by callers to allocate correctly-sized buffers before calling into any Game function.

Macro	Description
`<NAMESPACE>_NUM_PLAYERS`	Number of players in the game.
`<NAMESPACE>_STATE_SIZE`	Number of `uint64_t` elements required to represent the full game state.
`<NAMESPACE>_MAX_NUM_ACTIONS`	Maximum number of actions that `get_valid_actions()` can return in any state.
`<NAMESPACE>_STRING_BUF_SIZE`	Minimum `char` buffer size (including null terminator) that `to_string()` requires.
`<NAMESPACE>_OBS_NDIMS`	Number of dimensions in the observation tensor (e.g. `1` for a flat vector, `3` for channels × rows × cols).
`<NAMESPACE>_OBS_SIZE`	Total number of elements in the observation. Must equal the product of the first `OBS_NDIMS` entries in `obs_dims`.
`<NAMESPACE>_FEATURES_NDIMS`	Number of dimensions in the features tensor.
`<NAMESPACE>_FEATURES_SIZE`	Total number of elements in the features tensor.

Function Pointers

Function	Signature	Description
`init`	`void (const void* config, uint64_t state[])`	Initialise `state` to the start of the game. `config` is game-specific (may be `NULL`).
`get_current_player`	`uint64_t (const uint64_t state[])`	Return the zero-based index of the player whose turn it is.
`is_chance_node`	`bool (const uint64_t state[])`	Return `true` if the next action is a random outcome (die roll, card draw, etc.) rather than a player decision. Deterministic games always return `false`. See the "Modelling Stochasticity" section for details.
`get_valid_actions`	`uint64_t (const uint64_t state[], uint64_t actions_out[])`	Write valid action IDs into `actions_out` and return the count.
`apply_action`	`void (uint64_t state[], uint64_t action)`	Mutate `state` in-place by applying `action`.
`is_terminal`	`bool (const uint64_t state[])`	Return `true` if the game is over (win, loss, or draw).
`get_outcome`	`void (const uint64_t state[], int64_t scores_out[])`	Write per-player scores into `scores_out` (e.g. +1 win, -1 loss, 0 draw). Only meaningful when terminal.
`get_observation`	`void (const uint64_t state[], uint64_t player, uint8_t* obs_out)`	Encode the `state` as a discrete observation tensor from player's perspective.
`get_features`	`void (const uint64_t state[], uint64_t player, float* features_out)`	Encode the `state` as a floating-point feature tensor from `player`'s perspective, suitable for neural network input.
`to_string`	`uint64_t (const uint64_t state[], uint64_t buf_size, char* buf)`	Write a human-readable representation of the state into `buf`. Returns the number of bytes written.
`help_prompt`	`const char* (void)`	Return a static string describing the game's rules and move format.

Static Fields

Field	Type	Description
`obs_dims[4]`	`uint64_t[4]`	Shape of the observation tensor. Only the first `OBS_NDIMS` entries are meaningful (e.g. `{2, 3, 3}` for TTT).
`features_dims[4]`	`uint64_t[4]`	Shape of the features tensor. Only the first `FEATURES_NDIMS` entries are meaningful.

Observation vs Features

Both get_observation() and get_features() encode the game state from a given player's perspective, but they serve different purposes:

Observation (uint8_t*) - A semantically faithful, discrete/integer encoding of the game state. This is the "ground truth" representation i.e. each element maps precisely to a meaningful game concept. Useful for rule-based agents, debugging, and inspection.
Features (float*) - A float encoding optimised for neural network consumption. This may normalise values, broadcast scalars across spatial dimensions, or apply other transformations beyond a simple cast of the observation. Note: A simple game (like Tic-Tac-Toe) may produce identical representation, but more complex games will diverge.

Memory Conventions

All memory is caller-allocated. The Game functions never allocate or free memory themselves. Before calling any function, the caller must allocate buffers according to the macros listed above. For example:

uint64_t state[STATE_SIZE];
uint64_t actions[MAX_NUM_ACTIONS];
int64_t  scores[NUM_PLAYERS];
uint8_t  observation[OBS_SIZE];
float    features[FEATURES_SIZE];
char     string_buffer[STRING_BUF_SIZE];

Players are zero-indexed throughout (player 0 moves first).

Modelling Stochasticity

Some games contain randomness that is not directly controlled by any of the players e.g. dice rolls, card draws, shuffles, or any other outcome determined "by the game itself". The Game interface assumes that apply_action(state, action) is deterministic: given the same state and the same action, the resulting state should always be identical. This is a hard requirement for techniques like Monte Carlo Tree Search, which rely on being able to copy a state and replay actions to explore hypothetical futures.

To reconcile these two tensions, randomness is modelled as a dedicated chance "player". Rather than baking a PRNG into state and having apply_action() draw from it, the engine exposes random outcomes as actions to be resolved at "chance nodes" (in the context of a game tree). Whether the current state is a chance node is reported by the is_chance_node() function on the Game interface - deterministic games always return false - as there is no random event to sample, stochastic games return true when awaiting a random outcome.

When is_chance_node() returns true:

get_valid_actions() returns every possible random outcome as a list of action IDs (e.g. {0, 1, 2, 3, 4, 5} for a six-sided die, {0, 1} for a coin flip).
The caller - not the engine - samples one of those actions according to the game's chance distribution and passes it to apply_action().
apply_action() resolves the sampled outcome deterministically, then hands control back to whichever real player should act next.

Agents never control the chance player directly. In a standard game loop, chance nodes are handled by a small branch that samples a random action and applies it without consulting the agent. See src/main.c for a reference implementation of this pattern.

A game has at most one chance player, visited possibly many times at different nodes. Multiple random events in a game are sequenced (or combined into a single joint action), not parallelised - the interface has exactly one get_current_player() and one get_valid_actions() per state, and this reflects a deeper assumption that at any given state, exactly one actor is about to act.

Decomposing Chance Events

How a game author breaks a random event into chance nodes is a modelling decision that affects whether a caller can sample correctly with a uniform RNG.

If a random event decomposes into independent uniform sub-events, it is more ergonomic to model it as sequential chance nodes rather than a single "fat" chance node. For the scenario of rolling 2×d6 dice, for example, would naïvely be represented as one chance node with 21 or 36 actions - but then the correct distribution over those actions is non-uniform (e.g. (3,4) is twice as likely as (1,1)), and a caller sampling uniformly would silently compute wrong expected values. To fix this, the caller would have to carefully model the correct probability distribution to sample from! The more ergonomic modelling choice is two sequential chance nodes of six actions each, with apply_action() transitioning from "awaiting die 1" to "awaiting die 2" to the next player's turn. This way, each sub-event is now uniform, therefore using uniform sampling now is correct, and the joint distribution falls out for free because probability composes multiplicatively down the game tree.

The same principle/strategy applies to revealing multiple cards, rolling multiple dice, or any compound random event whose components are independent and uniform.

At this point in time, I cannot envision a source of randomness in a board game that is genuinely non-uniform and cannot be decomposed, so this is the main strategy used to model randomness. The few cases this principle does not handle are rare: inherently weighted physical randomisers (e.g. a loaded die), continuous distributions (which cannot be enumerated at all.

Summary: Why This Design

Modelling chance as a dedicated player (rather than embedding a PRNG in state) has several benefits:

state remains a pure snapshot. Copying a state and replaying actions produces an identical trajectory only if the same chance actions are replayed; different samples give different trajectories, exactly as a real random process would. This is essential for MCTS and any algorithm that explores hypothetical futures from a saved state.
No hidden RNG plumbing inside the engine. The engine contains no global RNG, no per-state seed, and no hidden mutation across calls that misleadingly appear pure. Callers bring their own PRNG, with whatever seeding and parallelism strategy suits their use case.
Chance nodes are first-class citizens of the game tree. MCTS can expand them by enumerating every possible outcome and weighting each child by its probability. Expectimax search works without special-casing. Any tree-based search algorithm handles chance naturally rather than working around it.
Generalises cleanly. The same mechanism handles dice, cards, tile draws, and any other stochastic event a future game might introduce — each game decides internally what "chance node" means via its is_chance_node() implementation.

The game loop differs slightly to the deterministic game loop in order to incorporate the chance player:

flowchart TD
    init["init()"]
    chance{"is_chance_node?"}
    gcp["get_current_player()"]
    gva["get_valid_actions()"]
    obs["get_observation()<br>get_features()"]
    agent["Agent selects action"]
    sample["Sample chance action<br>(caller supplies RNG)"]
    aa["apply_action()"]
    terminal{"is_terminal?"}
    outcome["get_outcome()"]

    init --> chance
    chance -- "no" --> gcp
    gcp --> gva
    chance -- "yes" --> gva
    gva -.-> obs
    obs -. "feeds agent's<br>decision" .-> agent
    gva -- "decision node" --> agent
    gva -- "chance node" --> sample
    agent --> aa
    sample --> aa
    aa --> terminal
    terminal -- "no" --> chance
    terminal -- "yes" --> outcome

    style init fill:#f1efe8,stroke:#5f5e5a,color:#2c2c2a
    style chance fill:#e1f5ee,stroke:#0f6e56,color:#04342c
    style gcp fill:#eeedfe,stroke:#534ab7,color:#26215c
    style gva fill:#eeedfe,stroke:#534ab7,color:#26215c
    style obs fill:#f1efe8,stroke:#5f5e5a,stroke-dasharray:5 5,color:#2c2c2a
    style agent fill:#f1efe8,stroke:#5f5e5a,stroke-dasharray:5 5,color:#2c2c2a
    style sample fill:#f1efe8,stroke:#5f5e5a,color:#2c2c2a
    style aa fill:#eeedfe,stroke:#534ab7,color:#26215c
    style terminal fill:#e1f5ee,stroke:#0f6e56,color:#04342c
    style outcome fill:#faece7,stroke:#993c1d,color:#4a1b0c

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
codegen		codegen
include		include
src		src
.clang-format		.clang-format
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Available Games

Building

Engine Structure

Deterministic Games

The `Game` Interface

Required Macros

Function Pointers

Static Fields

Observation vs Features

Memory Conventions

Modelling Stochasticity

Decomposing Chance Events

Summary: Why This Design

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Available Games

Building

Engine Structure

Deterministic Games

The Game Interface

Required Macros

Function Pointers

Static Fields

Observation vs Features

Memory Conventions

Modelling Stochasticity

Decomposing Chance Events

Summary: Why This Design

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

The `Game` Interface

Packages