Jekyll2026-02-12T06:13:52+00:00https://domslee.com/feed.xmlDom’s Tech blogA blog for my programming projects and findingsDom Slee[email protected]Solving Wordle - Novel Strategies for the NP hard problem2022-10-14T14:27:31+00:002022-10-14T14:27:31+00:00https://domslee.com/2022/10/14/Solving-wordleWordle is the popular game where you try to guess a hidden word in 66 guesses.

Each guess you get a pattern consisting of black, yellow and green colours for each letter of the guess.

A guess of “PARSE” would return a hint with a pattern, like this:

P
A
R
S
E

Terminology

Word Meaning
Hidden word A word that could be the answer, but it is hidden from the player
Test word A word that can be used as a guess
Pattern The hint given by the game after making a guess
“Easy” mode The original game where the set of test words is not restricted
“Hard” mode A variation where the set of test words gets smaller depending on the previous guesses and patterns. In this variation, you can only use guesses that match the patterns provided

Results

The regular wordle list from 15-Feb-22 has 23152315 hidden words HH, and 1297212972 test words TT.

The regular wordle problem has been looked at in great depth by Alex Selby on The best strategies for Wordle, part 2.

BIGHIDDEN mode, which is a harder version where there are 1297212972 hidden words and 1297212972 test words, is what I focused on. In particular, the wordle “easy” mode, which is ironically harder to compute than wordle hard mode, where the test words you can use are restricted by your previous guesses.

With BIGHIDDEN mode, it can be proven (by verifying the models) that there are at least 64606460 unique starting test words that have a solution. Unless there are mistakes in the code, there should be exactly 64606460 unique starting test words for this problem, because it was exhaustively calculated.

Finding the solution is quite difficult, but verifying that it is a correct solution is simple. The complete 6460 word list is available here, and the all the models are here.

Using solve from domsleee/wordle.

Goal Command
Find all solutions ./bin/solve --guesses ext/wordle-combined.txt --answers ext/wordle-combined.txt
Verify a model ./bin/solve --guesses ext/wordle-combined.txt --answers ext/wordle-combined.txt --verify model.json

New strategies

1. EndGame Db

Refer to EndGameAnalysis/ folder.

This is a cutoff that caches all powersets of sets of words EE. Currently it uses 2276 sets of words that match on 4 letters + positions, e.g. pattern .arks.

Wordle-EndGameDatabase

2. Lower bounds using a cache entry that is a subset

The main idea is that if you are querying for a lowerbound for a set of hidden words HH when you have known lower bounds for sets of hidden words L={H1,H2,...,Hn}L = \{H_1,H_2,...,H_n\}:

l=maxHL{lb(H)HQ}l = \textrm{max}_{H \in L}\{lb(H) \mid H \subseteq Q\}

This only makes sense for easy mode, since for hard mode you would also need to consider the test words.

In terms of implementation details, I tried an inverted index approach SubsetCache.h, and also a trie approach.

3. Most solved in 2 or 3 guesses

See this PR.

We know the following

  • 2 guesses ⇒ any 3 words can be solved (by exhaustion)
  • 3 guesses ⇒ any 5 words can be solved (reasoning, {1,1,1} can add any 2)

normal: bin/solve --guesses ext/wordle-guesses.txt --answers ext/wordle-answers.txt

bighidden: bin/solve --guesses ext/wordle-combined.txt --answers ext/wordle-combined.txt

--other-program normal bighidden
any3in2
any4in2 ❌ (13 groups, eg batch,catch,hatch,patch) ❌ (11423 groups)
any6in3 ❌ (eg fills,jills,sills,vills,bills,zills)

4. Remove guesses that have a better guess

Refer to RemoveGuessesBetterGuess/

Goal: reduce the size of test words TT by removing test words that have a better test word.

Let P(H,t)\mathcal{P}(H, t) be the partitioning of the hidden word set HH using test word tt, i.e.

P(H,t)={P(H,t,x)xX}\mathcal{P}(H,t)=\{P(H,t,x)\mid x\in X\}

Where P(H,t,x)P(H,t,x) is the words in HH which would give pattern xx when guessing the word tt.

Define t1t2t_1 \geq t_2 to mean t1t_1 is at least as good as t2t_2 if every partition of t1t_1 is a subset of a partition of t2t_2:

p1P(H,t1)p2P(H,t2)(p1p2)\forall_{p_1\in\mathcal{P}(H,t_1)}\exists_{p_2\in \mathcal{P}(H,t_2)}(p_1 \subseteq p_2)

t1t2t_1 \geq t_2 implies that t2t_2 can be deleted from TT.

proof

Consider this definition of solvability, which returns 11 iff there is a solution in dd guesses:

F(H,d)={Hdd1maxtTpP(H,t)F(p,d1)otherwise\mathcal{F}(H,d) = \begin{cases} |H|\leq d &d\leq1\\ \text{max}_{t\in T}\prod_{p\in\mathcal{P}(H,t)}\mathcal{F}(p,d-1) &\text{otherwise} \end{cases}

We want to prove that F(H,d)\mathcal{F}(H,d) will return an equivalent value if we remove values from TT.

For every t1t_1 that is used as a better guess than t2t_2:

pP(H,t1)F(p,d1)pP(H,t2)F(p,d1)\prod_{p\in\mathcal{P}(H,t_1)}\mathcal{F}(p,d-1) \geq \prod_{p\in\mathcal{P}(H,t_2)}\mathcal{F}(p,d-1)

We can prove this by cases:

If any value of the RHS is 00, the claim is clearly true because the domain is {0,1}\{0,1\}.

If all values of the RHS is 11, then all values of the LHS is 11 because every partition of the LHS is a subset of a partition on the RHS.

qed

For least possible guesses

The definition for least possible guesses, taken from The best strategies from Wordle:

f(H)={HH1H+mintTsGGGGGf(P(H,t,s))otherwisef(H) = \begin{cases} |H| &|H|\leq1\\ |H|+\text{min}_{t\in T}\sum_{s \ne GGGGG}f(P(H,t,s)) &\text{otherwise} \end{cases}

The same definition for “better guess” doesn’t work because the sum can exclude a partition (sGGGGGs \ne GGGGG) . So for least guesses, we must add an additional check that P(H,t1,GGGGG)=P(H,t2,GGGGG)P(H,t_1,GGGGG)=P(H,t_2,GGGGG).

Approximation

This is nice however I could not find a fast enough way of doing this.

Using the partitioning approach, I could find an O(T2H)O(T^2H) algorithm, but with a reasonable approximation, there is an O(TlogT)O(T\log{T}) approach. Typical values are T=4000T = 4000 and H=40H = 40.

An approximation of this partitioning check is to reason about which partitions p2p_2 would be split, by reasoning about how two words of HH are differentiated:

1 Non-letter rule

If a letter does not occur in any word of HH, then any occurrence of that letter in a test word will not differentiate any two words of HH.

2 Green letter rule

If every position of one letter across all HH has the same position in every word of HH (for all positions of the letter in the word), then any occurrence of that letter in a test word will not differentiate any two words of HH.

3 exactly-n letter rule (yellow letter rule)

If there are exactly nn occurrences of cc in every word in HH, then any occurrence of cc in TT that is in a position other than all the positions of cc in HH can be ignored.

  • If there are at most two cc in every word in HH, and one of them is green, the same rule does not apply
    • reason: for guess tt, 1Y gives no information when tt has 1c1c but gives information when tt has 2c2c. Example: 1Y1Y from 2c2c implies hh has 1c1c

4 most-n letter rule

If there are at most nn occurrences of cc in every word in HH, a word tt with n+m≥n+m of cc can mark up to mm positions that don’t occur in any position of HH - they are guaranteed to be useless BB

5 least-n letter rule

If there are at least nn occurrences of cc in every word in HH, a word with tt with n\le n of cc can mark all letters that don’t occur in any position of HH - they are guaranteed to be useless YY

]]>
Dom Slee[email protected]
Pruning Duplicate Nodes in N-Puzzle2022-01-29T14:27:31+00:002022-01-29T14:27:31+00:00https://domslee.com/2022/01/29/npuzzle-cycle-pruningIntroduction

See papers:

The above paper and technical document talks about a technique that can be used to reduce the number of nodes traversed by the IDA* search when finding an optimal solution for the n-puzzle.

IDA* search does not use any memory to store which nodes have already been visited, and so it will revisit the same nodes many times in the DFS search. The technique described in the paper is to create a state machine to govern successor nodes for a given path.

The technique is to find a list of forbidden operator strings, and then use the Aho-Corasick algorithm to build a finite state machine that can be used to skip operator strings that are redundant in the search. For example, exploring up to depth 2 produces operator strings lr, rl, ud, du, which builds an FSM with 9 states.

Results

The branching factor was calculated by finding the number of paths of lengths 23, 24, 25. The geometric mean was taken of the branching factors of odd and even parity, e.g N25N24N24N23\sqrt{\frac{N_{25}}{N_{24}} \cdot \frac{N_{24}}{N_{23}}}.

The result for duplicate operators matches what has already been shown in this paper:

Branching factor, calculated by:

./bin/puzzle -d databases/6666-reg --fsmDepthLimit 2 -e

The set of 16366272 duplicate operator strings for depth 22 can be downloaded here:

Type Number Words, Number FSM States Branching factor
idfs2 4 strings, 5 states 2.36762 = sqrt(2.39446*2.34108)
idfs14 18414 strings, 65558 states 2.24915 = sqrt(2.27542*2.22317)
idfs22 16366272 strings, 61701626 states 2.16475 = sqrt(2.18836*2.1414)

Experimental results

Using the first two problems from Solving the 24-Puzzle.

The single-threaded solver uses plain IDA*, not a hybrid search. See log file: duplicate-nodes-log.txt.

Number of nodes

problem# idfs2 idfs14 idfs22
1 58,644,915,709 23,045,422,923 (39.29%) 14,531,111,159 (24.78%)
2 21,770,762,064 7,842,119,256 (36.02%) 5,089,706,902 (23.38%)

Runtime

problem# idfs2 idfs14 idfs22
1 7,029.19s 2,977.23s (42.3%) 2,527.94s (36.0%)
2 2,447.82s 956.079s (39.0%) 870.295s (35.6%)

Nodes per second

fsm type nodes per second
idfs2 8,485,342.7 nodes/s (80,415,677,773/9,477.01)
idfs14 7,852,813.5 nodes/s (30,887,542,179/3,933.309)
idfs22 5,773,826.1 nodes/s (19,620,818,061/3,398.235)

These results show that as the state machine grows in number of states, the nodes per second decreases, probably due to cache locality. However, the decrease in the branching factor drastically decreases the number of the nodes that need to be expanded in the search.

In these two problems, idfs22 outperformed idfs14, however there will be easier problems where idfs14 will perform better.

Algorithm to find the duplicate operator strings

The technique suggested in the paper uses a BFS to explore a larger version of the puzzle. For example, the 24-puzzle (5x5) - you would do a BFS on the 9x9 puzzle so that all paths would be found. In the search, paths where the bounding box of the blank tile is larger than 5 horizontally or vertically would be excluded. For example lrrrrr can not occur in the 24-puzzle, but can occur on the 9x9 grid.

By keeping track of the board states that have been visited before, we can exclude operator strings that revisit a known board state. We must ensure that the two board states can occur for every starting position of the blank tile. Taylor describes how to check this “Operator precondition”

To deal with this situation, a routing was written to test the “bounding box” of the actions of a pair of strings, A and B. B can be a duplicate if it is a match of A, and A has a bounding box contained within or identical to that of B.

For example:

When the blank tile begins at position 1 as in the video, then one of these operator strings can be considered a duplicate. However, if the blank tile begins in position 0, only the right operator string is valid, so there is no duplicate operator string in this case.

So to consider either of these as a duplicate operator is wrong, and could result in missing the optimal solution.

How to handle running out of memory

When using BFS and keeping track of seen board states, memory is quickly exhausted.

A key observation - if we have a list of operator strings of length n-1, and we know all shortest paths of length n for a given board state, we can make a function that will correctly choose the same subset of these shortest paths to be considered duplicate operators.

For this set of operator strings, we want to split it into two groups - a set of permitted operators, and a set of forbidden operators. We then must not violate the following constraint:

For each forbidden operator, there must be at least one permitted operator which has a length less than or equal to it, which bounding box is a subrange of the forbidden operator.

To compute this, we look at all 2-partitions of the set of operators for a given board state, and we have a sorting function that always determines the same “best” 2-partition to use.

Since the forbidden operator strings we are computing is a set, it does not matter if we process the same board state more than once! Because of this, an iterative DFS can be used to find all board states and all paths to that board state of a given depth. When searching at depth nn, an FSM with operator strings of length n1n-1 is used to find new operator strings of length nn. When memory is running low, a clean up process can begin, which is a DFS of depth nn which does not track any more board states, it will only find paths to known board states with paths of length nn. This clean-up process will find new duplicate operators, and remove the board states from memory since the duplicate operators of these board states has already been computed.

In this way, we can compute very long duplicate operators, at the cost of significantly more computation. When the clean up process is used, the branching factor is squared. So the branching factor moves from O(2.36n)O(2.36^n) to O(5.57n)O(5.57^n). The memory requirement is reduced and is now bound by the number of paths for a board state of the same length, which is miniscule up to depth 22.

Improving the scoring function

A naive scoring function is to count the number of possible starting positions of the blank tile for a forbidden operator. For example, a 2x2 bounding box in a 5x5 grid will have 16 possible starting positions. When scoring a 2-partition, we can take the sum of the number of possible starting positions of each forbidden operator in the forbidden operator partition.

I could not find any improvement on this scoring function. An idea that was explored was looking at the sum of the probabilities of the starting positions of the blank tile. However this did not produce a different scoring order.

The ordering used was the sum of the tiles, with a fallback to ordering lexicographically if they had the same score. Note that changing the fallback ordering will result in a different set of forbidden operator strings, which suggests that there may be a scoring function that produces better run times.

FSM Structural Improvement

The first four duplicate operator strings are lr, rl, ud, du. The basic Aho-corasick algorithm is represented like:

This is a trie with nodes that have four children and a boolean if it is a word or not. Our implementation does not need a failure function because we populate every edge of the trie - it is not space efficient to use a double-array trie because our character set is small.

So the size of the trie is four ints + a boolean, which is 17 bytes per word:

struct TrieNode
{
    int children[4];
    bool isWord;
};
TrieNode trie[NUM_WORDS];

We can improve on this by noticing that we never visit words. When the IDA* detects a word through the FSM, it does not search on the branch. So we can replace every word in the trie with a dummy integer value.

In this example, there are 5 states from the original 9 states. This is also faster because checking if a child is a word has one less indirection:

// old structure
bool isChildAWord = trie[node.children[0]].isWord;

// new structure
bool isChildAWord = node.children[0] != WORD_STATE

The amount of space used in this representation is now 16 bytes per word, and there are now fewer words. For a dictionary of 178302 strings, the state reduction is from 832012 to 653710 states, which is about 21.5% smaller.

]]>
Dom Slee[email protected]
Solving the 24-Puzzle2021-11-25T14:27:31+00:002021-11-25T14:27:31+00:00https://domslee.com/2021/11/25/solving-the-24-puzzleSee papers:

In the 1996 paper, 10 instances of the 24-puzzle problem are looked at. Nine of ten of those instances are solved in the paper, with a lower bound for the tenth being established.

I have implemented a solver that can solve all 10 instances. This can be reproduced by running ./run from this commit 4f2a1be.

Instance Nodes Generated1 Time taken (s) Optimal Solution
17 1 20 9 16 2 22 19 14 5 15 21 0 3 24 23 18 13 12 7 10 8 6 4 11 6,508,216,548 168 100
14 5 9 2 18 8 23 19 12 17 15 0 10 20 4 6 11 21 1 7 24 3 16 22 13 3,949,323,558 176 95
7 13 11 22 12 20 1 18 21 5 0 8 14 24 19 9 4 17 16 10 23 15 3 2 6 74,382,671,211 3172 108
18 14 0 9 8 3 7 19 2 15 5 12 1 13 24 23 4 21 10 20 16 22 11 6 17 75,528,769,943 1928 98
2 0 10 19 1 4 16 3 15 20 22 9 6 18 5 13 12 21 8 17 23 11 24 7 14 127,858,033,287 5536 101
16 5 1 12 6 24 17 9 2 22 4 10 13 18 19 20 0 23 7 21 15 11 8 3 14 373,302,608,938 8572 96
21 22 15 9 24 12 16 23 2 8 5 18 17 7 10 14 13 4 0 6 20 11 3 1 19 330,453,737,334 17656 104
6 0 24 14 8 5 21 19 9 17 16 20 10 13 2 15 11 22 1 3 7 23 4 18 12 68,097,975,369 3146 97
3 2 17 0 14 18 22 19 15 20 9 7 10 21 16 6 24 23 8 5 1 4 11 12 13 492,114,567,189 24793 113
23 14 0 24 17 9 20 21 2 18 10 13 22 1 3 11 4 16 6 5 7 12 8 15 19 10,172,974,643,392 422230 114
  1. The nodes generated does not include the last iteration, so these values are in reality a bit higher.

Run on Ubuntu 20.04 with a Ryzen r9-3900x CPU with 32GB ram.

The solution to the tenth instance, using a demo from Michael Kim:

The demo can be reproduced using these strings:

To generate the board from a reverse board (empty on top left):
L U R U L L D R D R U U L L L D L D R U R U U R R D L D L U L D R D L U L U R U U R D R U R D D D L D L U L L D R U U U U L D D D R U U U R D D D L U U U R D D R D L L U R R U U R D D D D L U U U L L D R D D L U R R U R D D L L

To solve the board (a reverse sequence of above with inverted operators):
R R U U L D L L D R U U L U R R D D D R U U U U L D D L L D R R U L U U L D D D R U U U L D D D L U U U R D D D D L U R R D R U R U U U L D L U L D D L D R D R U L U R D R U R U L L D D L D L U R U R R R D D L U L U R R D L D R

A zip containing the log file of all the results is here.

Differences to the implementation in Korf & Taylor paper:

  • Better heuristics were used. This implementation uses walking distance and disjoint pattern databases. Michael Kim talks about it in his blog post. These heuristics made some of the problems easier to solve.
  • Uses multiprocessing to utilise 24 threads. A small BFS is used to generate a pool of starting nodes, which are then used in the IDA* search. Threads are allocated groups of starting nodes to solve for each iteration of the IDA* search. This could be generalised in a distributed system, since it is possible to generate a large pool of nodes to start with. For implementation details, see IdaStarMulti.cpp.
  • The FSM used to prune nodes was smaller. The forbidden words dictionary for this implementation was generated from 16 BFS searches of depth 14, which had 10018 words and 33451 states. Korf & Taylor generated a structure with 619,000 states, so this structure would have pruned many more nodes.
]]>
Dom Slee[email protected]
High score tetris AI using branching algorithms2020-04-12T14:27:31+00:002020-04-12T14:27:31+00:00https://domslee.com/2020/04/12/high-score-tetrisai-improvedThis post will explore how branching algorithms can improve the performance of agents playing tetris without gravity.

The techniques used are similar to Nintendo Tetris AI Revisited by Meat Fighter. It may be worth reading his post first, as this post builds upon his work

PSO configuration

We define an evaluate function as a heuristic that gives a score to a tetris board and block type (e.g. I-PIECE). evaluate is calculated by getting the dot product of a feature vector and a weights vector of the same size, i.e. score=WvFv\text{score} = W_v\cdot F_v. PSO was used to find optimal weights for the weights vector of evaluate. The feature vector and the PSO setup used is similar to that of Meat Fighter’s post. Note that the branching algorithm is not incorporated in the PSO training phase, evaluate is used directly.

Notable differences include:

  • Using global PSO from pyswarms, probably with different parameters
    • {'c1': 2.5, 'c2': 2.5, 'w': 0.92}
    • num particles: 96
    • num iterations: 3600
  • Removing lock_height from the evaluation function

Our configuration for each particle is as follows:

  • each particle plays 150 games using the same piece sets
  • the score for a particle is the average of the top 50% of games

Branching algorithm description

The branching algorithm is inspired by alpha-beta pruning algorithms, except with only one agent. The inputs are a board and the next two block types from the piece set (call these p1p_1 and p2p_2). The idea is to search promising branches to find the best placement for p1p_1.

Bfs algorithm

Define the following methods:

  • findAllMoves(board, blockType): finds all moves (a.k.a PieceInfo) given a board a blockType
  • evaluate(board, PieceInfo): uses predefined weights to give a score for the game state

Firstly, if there is a move that results in a tetris with p1p_1, select that move.

If not, observe that we can apply these functions two times each to find an ideal position for p1p_1 just by using p1p_1 and p2p_2. This would result in a tree of height 22, where we take the branch with the leaf node that has the best evaluation. This is the simplest way of utilising p1p_1 and p2p_2, and describes what is used in Bfs in the results.

Bfs+branching algorithm

For more predictive power, we aim to maximise the expected evaluation of a partial search tree. To do this, we sum the product of the evaluation if we had block type tt, multiplied by the probability P(t)P(t) based on the piece RNG algorithm. Adding a depth parameter dd to make the results computable, the recursive relation looks like:

evalBoard(b,d)=tTP(t)evalGivenBlockType(b,t,d)where T is the set of block typesevalGivenBlockType(b,t,d)={maxpM(b,t)evaluate(b,p)if d=0maxpM(b,t)evalBoard(applyPiece(b,p),d1)otherwisewhere M=findAllMoves\texttt{evalBoard}(b, d) = \sum_{t \in T}{P(t)\cdot\texttt{evalGivenBlockType}(b, t, d)}\\ \text{where }T\text{ is the set of block types}\\ \texttt{evalGivenBlockType}(b, t, d) = \begin{cases} \max\limits_{p\in M(b, t)}{\texttt{evaluate}(b, p)} & \text{if } d = 0 \\ \max\limits_{p\in M(b, t)}\texttt{evalBoard}(\texttt{applyPiece}(b, p), d-1) & \text{otherwise} \end{cases} \\ \text{where }M = \texttt{findAllMoves}

From this, we see that the branching factor for evalGivenBlockType calls is the number of block types (77) multiplied by the number of moves from findAllMoves (typically 102010-20). To mitigate this, we define a pruning factor n1n_1, and only continue the search in the top n1n_1 scores, bringing the branching factor down to 7n17n_1 (however, evaluate still needs to be called on all of these states to establish which should be pruned).

An implementation detail is that evaluate adds an offset of -1e9 when a tetris is made. Slight modifications are made to ensure that this offset is carried through to the leaf node of the tree, to promote aggressive decisions

With pruning on p1p_1 and p2p_2 included (prune factors of first layer 3, subsequent layers 2), only one extra layer was able to be computed in a reasonable time (132x slower than no branching). Profiling reveals that the ratio of evaluate to findAllMoves is about the same between bfs and bfs+branching (73:1973:19 and 78:1678:16 respectively), and the number of calls to each function evaluate and findAllMoves is 154x and 146x respectively.

Bfs+branching visualisation

Below is a graph that visualises the branching factor. The numbers on the edges indicate how many children the parent node has. s1s_1 and s2s_2 are special because the block types are known. Note that, for each rectangle node, all the possible moves result in a call of evaluate, for pruning or finding the best value in the leaf node. This explains the number of calls to evaluate, despite the selective search (and therefore explains why it is much slower to compute)

Branching graph

Results

After training is finished, a different set of 150 games were used to see how the bot performs. This was run with two configurations - with and without branching - using an identical state evaluation function evaluate:

name median mean maxout%
Bfs 1066910 1053015 81.33
Bfs+branching 1230460 1193441 94.67

Note that the highest score that can be achieved is 1542000. Raw data: bfs, bfs_branch.csv

We can also compare the probability of a bot achieving a certain score:

From this data we draw the following conclusions:

  • 96% of the time, bfs+branching is superior
  • On average, the bfs+branching achieves an additional 140426 points (or a 13% improvement)
  • Notably, bfs+branching is 132x less efficient than bfs (913.5s vs 6.9s to compute)

Possible improvements

There are a few things that can be done to improve the performance of the agent:

  • Adjusting the parameters
  • Adding or finding more features
  • Using a neural network instead of a linear combination of parameters is an idea. Using a neural network can allow for cases where one feature can have more or less weight depending on another feature. For example, the agent should play more safely when the pile height is high or the board is too jagged

There are a few improvements that will be possible with better hardware:

  • Training with a larger dataset
  • Training with the branching algorithm instead of evaluate

Ultimately I’m satisfied with the results - a 13% improvement is significant, especially when considering it is approaching the theoretical maximum score. It would be nice if it was less computationally expensive. There is potential for offloading the evaluation function (which is 75% of the processing) to the GPU, which would be a substantial improvement

]]>
Dom Slee[email protected]