engcode - binary2english

Motivation

Words, even unfamiliar ones, are easier to remember than numbers or cryptic strings.

Algorithm

Each word represents its 0-based index in the alphabetically ordered word list. Assuming that the word list has 2^k entries, encoding binary data can be done as follows:

Break the input into groups of k bits.
Interpret each group as an unsigned k-bit integer where the most significant bit is the first one. Lets denote this value n.
Output nth word from the word list.

Decoding is similar:

For each word in the input, find its position in the word list
Encode it as an unsigned k-bit integer (MSB first again)
Concatenate values from all groups

Implementations of the described procedures for k=18 are in engcode.py and engdecode.py. In the case of k=18, 9 bytes can be encoded using 4 words. There is a much more efficient encoder implemented in engcode.c that uses this fact. Decoding could also be implemented faster, for example using a precalculated prefix trie of words with their numbers in leaves.

The bits left over

If the number of bits to be encoded is not divisble by k, the missing at the end are assumed to be 0. This means that encodings of 0b1 and 0b10 are indistinguishable. To avoid this, one could either transmit the length separately or use a padding scheme as for block ciphers.

The word list

Based on Yet Another Word List by Mendel Cooper and Alan Beale from 12-Oct-2008, a list of 2^18 words was generated by selecting 2^18 shortest words and breaking the ties in favor of including words earlier in the alphabetical ordering. The resulting list starts with words aa and aah, ends with zyzzyvas and zzz and when delimited with newlines has an MD5 checksum of 4e0766283a0b1892b2ecbc28691e64b7. The script used is prepare.sh.

Legal...

Yet Another Word List was released to the public domain and so is the derived list and this specification. The code accompanying it is GPL3.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
engcode.c		engcode.c
engcode.py		engcode.py
engdecode.py		engdecode.py
prepare.sh		prepare.sh
preptable.c		preptable.c
readme.md		readme.md
test.sh		test.sh
testvectors.txt		testvectors.txt
words.txt		words.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

engcode - binary2english

Motivation

Algorithm

The bits left over

The word list

Legal...

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

engcode - binary2english

Motivation

Algorithm

The bits left over

The word list

Legal...

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages