Re: Factor

Standard Deviation

Wed, 04 Feb 2026 08:00:00 -0700

Standard deviation is “a measure of the amount of variation of the values of a variable about its mean.”. It’s a useful measure from statistics, and std in the math.statistics vocabulary included with Factor.

I bumped into this – um, are they still called tweets? – today:

Word with the lowest standard deviation of letter position in the alphabet, for each length pic.twitter.com/caHYDDpyBx
— Adam Aaronson (@aaaronson) February 4, 2026

Of course, I wondered how that looks for the /usr/share/dict/words on my computer:

IN: scratchpad "/usr/share/dict/words" utf8 file-lines
               [ length ] collect-by
               [ [ std ] minimum-by ] assoc-map
               sort-keys values
               [ dup std "%s: %s\n" printf ] each
A: 0.0
aa: 0.0
aba: 0.5773502691896257
baba: 0.5773502691896257
abaca: 0.8944271909999159
bacaba: 0.816496580927726
deedeed: 0.5345224838248488
poroporo: 1.3093073414159542
susurrous: 1.9364916731037085
beefheaded: 1.9578900207451218
cabbagehead: 2.46429041972071
fiddledeedee: 2.424621182533032
promonopolist: 2.911075226502911
monogonoporous: 3.1830595551871363
prophototropism: 3.5023801430836525
philophilosophos: 3.855731664245668
sulphophosphorous: 4.227013964824759
chemicoengineering: 4.556472090169843
plutonometamorphism: 5.220293285659733
encephalomeningocele: 4.871776937249466
philosophicoreligious: 5.014740177478695
philosophicohistorical: 5.485320828241917
philosophicotheological: 5.2090678006509625
scientificophilosophical: 5.538894097749744
antidisestablishmentarianism: 6.7081053397123425

Interesting, both similar and different. Well, it’s pretty close and also pretty obvious we are using slightly different dictionaries. I’m not sure what deedeed or poroporo mean and they aren’t in the SCRABBLE Players Dictionary.

Anyway, fun!

PBRT

Tue, 03 Feb 2026 08:00:00 -0700

PBRT is an impressive photorealistic rendering system:

From movies to video games, computer-rendered images are pervasive today. Physically Based Rendering introduces the concepts and theory of photorealistic rendering hand in hand with the source code for a sophisticated renderer.

The fourth edition of their book is now available on Amazon as well as freely available online.

I thought it would be fun to explore the PBRT v4 file format using Factor.

Here’s a short example pbrt file from their website:

LookAt 3 4 1.5  # eye
       .5 .5 0  # look at point
       0 0 1    # up vector
Camera "perspective" "float fov" 45

Sampler "halton" "integer pixelsamples" 128
Integrator "volpath"
Film "rgb" "string filename" "simple.png"
     "integer xresolution" [400] "integer yresolution" [400]

WorldBegin

# uniform blue-ish illumination from all directions
LightSource "infinite" "rgb L" [ .4 .45 .5 ]

# approximate the sun
LightSource "distant"  "point3 from" [ -30 40  100 ]
   "blackbody L" 3000 "float scale" 1.5

AttributeBegin
  Material "dielectric"
  Shape "sphere" "float radius" 1
AttributeEnd

AttributeBegin
  Texture "checks" "spectrum" "checkerboard"
          "float uscale" [16] "float vscale" [16]
          "rgb tex1" [.1 .1 .1] "rgb tex2" [.8 .8 .8]
  Material "diffuse" "texture reflectance" "checks"
  Translate 0 0 -1
  Shape "bilinearmesh"
      "point3 P" [ -20 -20 0   20 -20 0   -20 20 0   20 20 0 ]
      "point2 uv" [ 0 0   1 0    1 1   0 1 ]
AttributeEnd

And this is what it might look like:

Using our new pbrt vocabulary, we can convert that text into a set of tuples that we could do computations on, or potentially look into rendering or processing. And, of course, it also supports round-tripping back and forth from text to tuples.

{
    T{ pbrt-look-at
        { eye-x 3 }
        { eye-y 4 }
        { eye-z 1.5 }
        { look-x 0.5 }
        { look-y 0.5 }
        { look-z 0 }
        { up-x 0 }
        { up-y 0 }
        { up-z 1 }
    }
    T{ pbrt-camera
        { type "perspective" }
        { params
            {
                T{ pbrt-param
                    { type "float" }
                    { name "fov" }
                    { values { 45 } }
                }
            }
        }
    }
    T{ pbrt-sampler
        { type "halton" }
        { params
            {
                T{ pbrt-param
                    { type "integer" }
                    { name "pixelsamples" }
                    { values { 128 } }
                }
            }
        }
    }
    T{ pbrt-integrator { type "volpath" } { params { } } }
    T{ pbrt-film
        { type "rgb" }
        { params
            {
                T{ pbrt-param
                    { type "string" }
                    { name "filename" }
                    { values { "simple.png" } }
                }
                T{ pbrt-param
                    { type "integer" }
                    { name "xresolution" }
                    { values { 400 } }
                }
                T{ pbrt-param
                    { type "integer" }
                    { name "yresolution" }
                    { values { 400 } }
                }
            }
        }
    }
    T{ pbrt-world-begin }
    T{ pbrt-light-source
        { type "infinite" }
        { params
            {
                T{ pbrt-param
                    { type "rgb" }
                    { name "L" }
                    { values { 0.4 0.45 0.5 } }
                }
            }
        }
    }
    T{ pbrt-light-source
        { type "distant" }
        { params
            {
                T{ pbrt-param
                    { type "point3" }
                    { name "from" }
                    { values { -30 40 100 } }
                }
                T{ pbrt-param
                    { type "blackbody" }
                    { name "L" }
                    { values { 3000 } }
                }
                T{ pbrt-param
                    { type "float" }
                    { name "scale" }
                    { values { 1.5 } }
                }
            }
        }
    }
    T{ pbrt-attribute-begin }
    T{ pbrt-material { type "dielectric" } { params { } } }
    T{ pbrt-shape
        { type "sphere" }
        { params
            {
                T{ pbrt-param
                    { type "float" }
                    { name "radius" }
                    { values { 1 } }
                }
            }
        }
    }
    T{ pbrt-attribute-end }
    T{ pbrt-attribute-begin }
    T{ pbrt-texture
        { name "checks" }
        { value-type "spectrum" }
        { class "checkerboard" }
        { params
            {
                T{ pbrt-param
                    { type "float" }
                    { name "uscale" }
                    { values { 16 } }
                }
                T{ pbrt-param
                    { type "float" }
                    { name "vscale" }
                    { values { 16 } }
                }
                T{ pbrt-param
                    { type "rgb" }
                    { name "tex1" }
                    { values { 0.1 0.1 0.1 } }
                }
                T{ pbrt-param
                    { type "rgb" }
                    { name "tex2" }
                    { values { 0.8 0.8 0.8 } }
                }
            }
        }
    }
    T{ pbrt-material
        { type "diffuse" }
        { params
            {
                T{ pbrt-param
                    { type "texture" }
                    { name "reflectance" }
                    { values { "checks" } }
                }
            }
        }
    }
    T{ pbrt-translate { x 0 } { y 0 } { z -1 } }
    T{ pbrt-shape
        { type "bilinearmesh" }
        { params
            {
                T{ pbrt-param
                    { type "point3" }
                    { name "P" }
                    { values
                        { -20 -20 0 20 -20 0 -20 20 0 20 20 0 }
                    }
                }
                T{ pbrt-param
                    { type "point2" }
                    { name "uv" }
                    { values { 0 0 1 0 1 1 0 1 } }
                }
            }
        }
    }
    T{ pbrt-attribute-end }
}

This is available now in the development version of Factor!

Migrating to GTK3

Wed, 17 Dec 2025 08:00:00 -0700

Factor has a native ui-backend that allows us to render our UI framework using OpenGL on top of platform-specific APIs for our primary targets of Linux, macOS, and Windows.

On Linux, for a long time that has meant using the GTK2 library, which has also meant using X11 and an old library called libgtkglext which provides a way to use OpenGL within GTK windows. Well, Linux has moved on and is now pushing Wayland as the “replacement for the X11 window system protocol and architecture with the aim to be easier to develop, extend, and maintain”. Most modern Linux distributions have moved to GTK3 or GTK4 and abstraction libraries like libepoxy for working with OpenGL and others for supporting both X11 and Wayland renderers.

I was reminded of this after our recent Factor 0.101 release when someone asked the question:

Does that message mean that Factor still relies on GTK2? IIRC it was EOL:ed around 2020.

Well, this is embarassing – yeah it sure does! Or rather – yes it sure did.

I got motivated to look into what it would take to support GTK3 or GTK4. We had a pull request that was working through adding support for GTK4. After merging that, and modifying it to also provide GTK3 support, I re-discovered that our OpenGL rendering was generally using OpenGL 1.x pipelines and that would not work in a GTK3+ world.

So, after adding OpenGL 3.x support for most of the things our user interface needs, and migrating from GTK 2.x to GTK3, we now have experimental nightly builds using the GTK3 backend:

You can revert to the older GTK2 backend by applying this diff and then performing a fresh bootstrap:

diff --git a/basis/bootstrap/ui/ui.factor b/basis/bootstrap/ui/ui.factor
index 2974e530f9..416704ce29 100644
--- a/basis/bootstrap/ui/ui.factor
+++ b/basis/bootstrap/ui/ui.factor
@@ -12,6 +12,6 @@ IN: bootstrap.ui
     {
         { [ os macos? ] [ "ui.backend.cocoa" ] }
         { [ os windows? ] [ "ui.backend.windows" ] }
-        { [ os unix? ] [ "ui.backend.gtk3" ] }
+        { [ os unix? ] [ "ui.backend.gtk2" ] }
     } cond
 ] if* require
diff --git a/basis/opengl/gl/extensions/extensions.factor b/basis/opengl/gl/extensions/extensions.factor
index 2d408e93bb..51394eeb4a 100644
--- a/basis/opengl/gl/extensions/extensions.factor
+++ b/basis/opengl/gl/extensions/extensions.factor
@@ -7,7 +7,7 @@ ERROR: unknown-gl-platform ;
 << {
     { [ os windows? ] [ "opengl.gl.windows" ] }
     { [ os macos? ]  [ "opengl.gl.macos" ] }
-    { [ os unix? ] [ "opengl.gl.gtk3" ] }
+    { [ os unix? ] [ "opengl.gl.gtk2" ] }
     [ unknown-gl-platform ]
 } cond use-vocab >>

It seems like the newer OpenGL 3.x functions might introduce some lag which is visible when scrolling on some installations, perhaps by not caching certain things that were cached in the OpenGL 1.x code paths. There will need to be some improvements before we are ready to release it, but it is plenty usable as-is.

I also migrated our macOS backend to use the OpenGL 3.x functions as well to allow us to more broadly test and improve these new rendering paths.

This is available in the latest development version.

DNS LOC Records

Wed, 10 Dec 2025 08:00:00 -0700

DNS is the Domain Name System and is the backbone of the internet:

Most prominently, it translates readily memorized domain names to the numerical IP addresses needed for locating and identifying computer services and devices with the underlying network protocols. The Domain Name System has been an essential component of the functionality of the Internet since 1985.

It is also an oft-cited reason for service outages, with a funny decade-old r/sysadmin meme:

Factor has a DNS vocabulary that supports querying and parsing responses from nameservers:

IN: scratchpad USE: tools.dns

IN: scratchpad "google.com" host
google.com has address 142.250.142.113
google.com has address 142.250.142.138
google.com has address 142.250.142.100
google.com has address 142.250.142.101
google.com has address 142.250.142.102
google.com has address 142.250.142.139
google.com has IPv6 address 2607:f8b0:4023:1c01:0:0:0:8b
google.com has IPv6 address 2607:f8b0:4023:1c01:0:0:0:8a
google.com has IPv6 address 2607:f8b0:4023:1c01:0:0:0:64
google.com has IPv6 address 2607:f8b0:4023:1c01:0:0:0:65
google.com mail is handled by 10 smtp.google.com

Recently, I bumped into an old post on the Cloudflare blog about The weird and wonderful world of DNS LOC records and realized that we did not properly support parsing RFC 1876 which specifies a format for returning LOC or location record specifying the physical location of a service.

At the time of the post, Cloudflare indicated they handle “millions of DNS records; of those just 743 are LOCs.”. I found a webpage that lists sites supporting DNS LOC and contains only nine examples.

It is not widely used, but it is very cool.

You can use the dig command to query for a LOC record and see what is returned:

$ dig alink.net LOC
alink.net.              66      IN      LOC     37 22 26.000 N 122 1 47.000 W 30.00m 30m 30m 10m

The fields that were returned include:

latitude (37° 22’ 26.00" N)
longitude (122° 1’ 47.00" W)
altitude (30.00m)
horizontal precision (30m)
vertical precision (30m)
entity size estimate (10m)

In Factor 0.101, the field is available and returned as bytes but not parsed:

IN: scratchpad "alink.net" dns-LOC-query answer-section>> ...
{
    T{ rr
        { name "alink.net" }
        { type LOC }
        { class IN }
        { ttl 300 }
        { rdata
            B{
                0 51 51 19 136 5 2 80 101 208 181 8 0 152 162 56
            }
        }
    }
}

Of course, I love odd uses of technology like Wikipedia over DNS and I thought Factor should probably add proper support for the LOC record!

First, we define a tuple class to hold the LOC record fields:

TUPLE: loc size horizontal vertical lat lon alt ;

Next, we parse the LOC record, converting sizes (in centimeters), lat/lon (in degrees), and altitude (in centimeters):

: parse-loc ( -- loc )
    loc new
        read1 0 assert=
        read1 [ -4 shift ] [ 4 bits ] bi 10^ * >>size
        read1 [ -4 shift ] [ 4 bits ] bi 10^ * >>horizontal
        read1 [ -4 shift ] [ 4 bits ] bi 10^ * >>vertical
        4 read be> 31 2^ - 3600000 / >>lat
        4 read be> 31 2^ - 3600000 / >>lon
        4 read be> 10000000 - >>alt ;

We hookup the LOC type to be parsed properly:

M: LOC parse-rdata 2drop parse-loc ;

And then build a word to print the location nicely:

: LOC. ( name -- )
    dns-LOC-query answer-section>> [
        rdata>> {
            [ lat>> [ abs 1 /mod 60 * 1 /mod 60 * ] [ neg? "S" "N" ? ] bi ]
            [ lon>> [ abs 1 /mod 60 * 1 /mod 60 * ] [ neg? "W" "E" ? ] bi ]
            [ alt>> 100 / ]
            [ size>> 100 /i ]
            [ horizontal>> 100 /i ]
            [ vertical>> 100 /i ]
        } cleave "%d %d %.3f %s %d %d %.3f %s %.2fm %dm %dm %dm\n" printf
    ] each ;

And, finally, we can give it a try!

IN: scratchpad "alink.net" LOC.
37 22 26.000 N 122 1 47.000 W 30.00m 30m 30m 10m

Yay, it matches!

This is available in the latest development version.

Factor 0.101 now available

Mon, 08 Dec 2025 20:00:00 -0700

“Keep thy airspeed up, lest the earth come from below and smite thee.” - William Kershner

I’m very pleased to announce the release of Factor 0.101!

OS/CPU	Windows	Mac OS	Linux
x86	0.101		0.101
x86-64	0.101	0.101	0.101

Source code: 0.101

This release is brought to you with almost 700 commits by the following individuals:

Aleksander Sabak, Andy Kluger, Cat Stevens, Dmitry Matveyev, Doug Coleman, Giftpflanze, John Benediktsson, Jon Harper, Jonas Bernouli, Leo Mehraban, Mike Stevenson, Nicholas Chandoke, Niklas Larsson, Rebecca Kelly, Samuel Tardieu, Stefan Schmiedl, @Bruno-366, @bobisageek, @coltsingleactionarmyocelot, @inivekin, @knottio, @timor

Besides some bug fixes and library improvements, I want to highlight the following changes:

Moved the UI to render buttons and scrollbars rather than using images, which allows easier theming.
Fixed HiDPI scaling on Linux and Windows, although it currently doesn’t update the window settings when switching between screens with different scaling factors.
Update to Unicode 17.0.0.
Plugin support for the Neovim editor.

Some possible backwards compatibility issues:

The argument order to ltake was swapped to be more consistent with words like head.
The environment vocabulary on Windows now supports disambiguating f and "" (empty) values
The misc/atom folder was removed in favor of the factor/atom-language-factor repo.
The misc/Factor.tmbundle folder was removed in favor of the factor/factor.tmbundle repo.
The misc/vim folder was removed in favor of the factor/factor.vim repo.
The http vocabulary request tuple had a slot rename from post-data to data.
The furnace.asides vocabulary had a slot rename from post-data to data, and might require running ALTER TABLE asides RENAME COLUMN "post-data" TO data;.
The html.streams vocabulary was renamed to io.streams.html
The pdf.streams vocabulary was renamed to io.streams.pdf

What is Factor

Factor is a concatenative, stack-based programming language with high-level features including dynamic types, extensible syntax, macros, and garbage collection. On a practical side, Factor has a full-featured library, supports many different platforms, and has been extensively documented.

The implementation is fully compiled for performance, while still supporting interactive development. Factor applications are portable between all common platforms. Factor can deploy stand-alone applications on all platforms. Full source code for the Factor project is available under a BSD license.

New libraries:

base92: adding support for Base92 encoding/decoding
bitcask: implementing the Bitcask key/value database
bluesky: adding support for the BlueSky protocol
calendar.holidays.world: adding some new holidays including World Emoji Day
classes.enumeration: adding enumeration classes and new ENUMERATION: syntax word
colors.oklab: adding support for OKLAB color space
colors.oklch: adding support for OKLCH color space
colors.wavelength: adding wavelength>rgba
combinators.syntax: adding experimental combinator syntax words @[, *[, and &[, and short-circuiting n&&[, n||[, &&[ and ||[
continuations.extras: adding with-datastacks and datastack-states
dotenv: implementing support for Dotenv files
edn: implementing support for Extensible Data Notation
editors.cursor: adding support for the Cursor editor
editors.rider: adding support for the JetBrains Rider editor
gitignore: parser for .gitignore files
http.json: promoted json.http and added some useful words
io.streams.farkup: a Farkup formatted stream protocol
io.streams.markdowns: a Markdown formatted stream protocol
locals.lazy: prototype of emit syntax
monadics: alternative vocabulary for using Haskell-style monads, applicatives, and functors
multibase: implementation of Multibase
pickle: support for the Pickle serialization format
persistent.hashtables.identity: support an identity-hashcode version of persisent hashtables
raylib.live-coding: demo of a vocabulary to do “live coding” of Raylib programs
rdap: support for the Registration Data Access Protocol
reverse: implementation of the std::flip
slides.cli: simple text-based command-line interface for slides
tools.highlight: command-line syntax-highlighting tool
tools.random: command-line random generator tool
ui.pens.rounded: adding rounded corner pen
ui.pens.theme: experimental themed pen
ui.tools.theme: some words for updating UI developer tools themes

Improved libraries:

alien.syntax: added C-LIBRARY: syntax word
assocs.extras: added nzip and nunzip, map-zip and map-unzip macros
base32: adding the human-oriented Base32 encoding via zbase32> and >zbase32
base64: minor performance improvement
benchmark: adding more benchmarks
bootstrap.assembler: fixes for ARM-64
brainfuck: added BRAINFUCK: syntax word and interpret-brainfuck
bson: use linked-assocs to preserve order
cache: implement M\ cache-assoc delete-at
calendar: adding year<, year<=, year>, year>= words
calendar.format: parse human-readable and elapsed-time output back into duration objects
cbor: use linked-assocs to preserve order
classes.mixin: added definer implementation
classes.singleton: added definer implementation
classes.tuple: added tuple>slots, rename tuple>array to pack-tuple and >tuple to unpack-tuple.
classes.union: added definer implementation
checksums.sha: some 20-40% performance improvements
command-line: allow passing script name of - to use stdin
command-line.parser: support for Argument Parser Commands
command-line.startup: document -q quiet mode flag
concurrency.combinators: faster parallel-map and parallel-assoc-map using a count-down latch
concurrency.promises: 5-7% performance improvement
continuations: improve docs and fix stack effect for ifcc
countries: adding CQ country code for Sark
cpu.architecture: fix *-branch stack effects
cpu.arm: fixes for ARM-64
crontab: added parse-crontab which ignores blank lines and comments
db: making query-each row-polymorphic
delegate.protocols: adding keys and values to assoc-protocol
discord: better support for network disconnects, added a configurable retry interval
discord.chatgpt-bot: some fixes for LM Studio
editors: make the editor restart nicer looking
editors.focus: support open-file-to-line-number on newer releases, support Linux and Window
editors.zed: support use of Zed on Linux
endian: faster endian conversions of c-ptr-like objects
environment: adding os-env?
eval: move datastack and error messages to stderr
fonts: make <font> take a name, easier defaults
furnace.asides: rename post-data slot on aside tuples to data
generalizations: moved some dip words to shuffle
help.tour: fix some typos/grammar
html.templates.chloe: improve use of CDATA tags for unescaping output
http: rename post-data slot on request tuples to data
http.json: adding http-json that doesn’t return the response object
http.websockets: making read-websocket-loop row-polymorphic
ini-file: adding ini>file, file>ini, and use LH{ } to preserve configuration order
io.encodings.detect: adding utf7 detection
io.encodings.utf8: adding utf8-bom to handle optional BOM
io.random: speed up random-line and random-lines
io.streams.ansi: adding documentation and tests, support dim foreground on terminals that support it
io.streams.escape-codes: adding documentation and tests
ip-parser: adding IPV4 and IPV6 network words
kernel: adding until*, fix docs for and* and or*
linked-sets: adding LS{ syntax word
lists.lazy: changed the argument order in ltake
macho: support a few more link edit commands
make: adding ,% for a push-at variant
mason.release.tidy: cleanup a few more git artifacts
math.combinatorics: adding counting words
math.distances: adding jaro-distance and jaro-winkler-distance
math.extras: added all-removals, support Recamán’s sequence, and Tribonacci Numbers
math.factorials: added subfactorial
math.functions: added “closest to zero” modulus
math.parser: improve ratio parsing for consistency
math.primes: make prime? safe from non-integer inputs
math.runge-kutta: make generalized improvements to the Runge-Kutta solver
math.similarity: adding jaro-similarity, jaro-winkler-similarity, and trigram-similarity
math.text.english: fix issue with very large and very small floats
metar: updated the abbreviations glossary
mime.types: updating mime.types file
msgpack: use linked-assocs to preserve order
qw: adding qw: syntax
path-finding: added find-path*
peg.parsers: faster list-of and list-of-many
progress-bars.models: added with-progress-display, map-with-progress-bar, each-with-progress-bar, and reduce-with-progress-bar
raylib: adding trace-log and set-trace-log-level, updated to Raylib 5.5
readline-listener: store history across sessions, support color on terminals that support it
robohash: support for "set4", "set5", and "set6" types
sequences: rename midpoint@ to midpoint, faster each-from and map-reduce on slices
sequences.extras: adding find-nth, find-nth-last, subseq-indices, deep-nth, deep-nth-of, 2none?, filter-errors, reject-errors, all-same?, adjacent-differences, and partial-sum.
sequences.generalizations: fix ?firstn and ?lastn for string inputs, removed (nsequence) which duplicates set-firstn-unsafe
sequences.prefixed: swap order of <prefixed> arguments to match prefix
sequences.repeating: adding <cycles-from> and cycle-from
sequences.snipped: fixed out-of-bounds issues
scryfall: update for duskmourn
shuffle: improve stack-checking of shuffle( syntax, added SHUFFLE: syntax, nreverse
sorting: fix sort-with to apply the quot with access to the stack below
sorting.human: implement human sorting improved
system-info.macos: adding “Tahoe” code-name for macOS 26
terminfo: add words for querying specific output capabilities
threads: define a generalized linked-thread which used to be for concurrency.mailboxes only
toml: use linked-assocs to preserve order, adding >toml and write-toml
tools.annotations: adding <WATCH ... WATCH> syntax
tools.deploy: adding a command-line interface for deploy options
tools.deploy.backend: fix boot image location in system-wide installations
tools.deploy.unix: change binary name to append .out to fix conflict with vocab resources
tools.directory-to-file: better test file metrics, print filename for editing
tools.memory: adding heap-stats-of arbitrary sequence of instances, and total-size size of everything pointed to by an object
txon: use linked-assocs to preserve order
ui: adding adjust-font-size
ui.gadgets.buttons: stop using images and respect theme colors
ui.gadgets.sliders: stop using images and respect theme colors
ui.theme.base16: adding a lot more (270!) Base16 Themes
ui.tools: adding font-sizing keyboard shortcuts
ui.tools.browser: more responsive font sizing
ui.tools.listener: more responsive font sizing, adding some UI listener styling
ui.tools.listener.completion: allow spaces in history search popup
unicode: update to Unicode 17.0.0
webapps.planet: improve CSS for video tags
words: adding define-temp-syntax
zoneinfo: update to version 2025b

Removed libraries

ui.theme.images

VM Improvements:

More work on ARM64 backend (fix set-callstack, fix generic dispatch)

zxcvbn

Fri, 05 Dec 2025 08:00:00 -0700

Years ago, Dropbox wrote about zxcvbn: realistic password strength estimation:

zxcvbn is a password strength estimator inspired by password crackers. Through pattern matching and conservative estimation, it recognizes and weighs 30k common passwords, common names and surnames according to US census data, popular English words from Wikipedia and US television and movies, and other common patterns like dates, repeats (aaa), sequences (abcd), keyboard patterns (qwertyuiop), and l33t speak.

And it appears to have been successful – the original implementation is in JavaScript, but there have been clones of the algorithm generated in many different languages:

At Dropbox we use zxcvbn (Release notes) on our web, desktop, iOS and Android clients. If JavaScript doesn’t work for you, others have graciously ported the library to these languages:

zxcvbn-python (Python)

zxcvbn-cpp (C/C++/Python/JS)

zxcvbn-c (C/C++)

zxcvbn-rs (Rust)

zxcvbn-go (Go)

zxcvbn4j (Java)

nbvcxz (Java)

zxcvbn-ruby (Ruby)

zxcvbn-js (Ruby [via ExecJS])

zxcvbn-ios (Objective-C)

zxcvbn-cs (C#/.NET)

szxcvbn (Scala)

zxcvbn-php (PHP)

zxcvbn-api (REST)

ocaml-zxcvbn (OCaml bindings for zxcvbn-c)

In today’s era of password managers, WebAuthn also known as passkeys, and many pwned accounts, passwords may seem like a funny sort of outdated concept. They have definitely provided good entertainment over the years from XKCD: Password Strength comics to the 20-year old hunter2 meme:

I have wanted a Factor implementation of this for a long time – and finally built zxcvbn in Factor!

We can use it to check out some potential passwords:

IN: scratchpad USE: zxcvbn

IN: scratchpad "F@ct0r!" zxcvbn.
Score:
  1/4 (very guessable)
Crack times:
  Online (throttled):   4 months
  Online (unthrottled): 8 hours
  Offline (slow hash):  30 seconds
  Offline (fast hash):  less than a second
Suggestions:
  Add another word or two. Uncommon words are better.
  Capitalization doesn't help very much.
  Predictable substitutions like '@' instead of 'a' don't help very much.

IN: scratchpad "john2025" zxcvbn.
Score:
  1/4 (very guessable)
Crack times:
  Online (throttled):   3 months
  Online (unthrottled): 6 hours
  Offline (slow hash):  23 seconds
  Offline (fast hash):  less than a second
Warning:
  Common names and surnames are easy to guess.
Suggestions:
  Add another word or two. Uncommon words are better.

That’s not so good, maybe we should use the random.passwords vocabulary instead!

This is available on my GitHub.

AsyncIO Performance

Mon, 01 Dec 2025 08:00:00 -0700

Factor has green threads and a long-standing feature request to be able to utilize native threads more efficiently for concurrent tasks. In the meantime, the cooperative threading model allows for asynchronous tasks which is particularly useful when waiting for I/O such as used by sockets over a computer network.

And while it might be true that asynchrony is not concurrency, there are a lot of other things one could say about concurrency and multi-threaded or multi-process performance. Today I want to discuss an article that Will McGugan wrote about the overhead of Python asyncio tasks and the good discussion that followed on Hacker News.

Let’s go over the benchmark in a few programming languages – including Factor!

Python

The article presents this benchmark in Python that does no work but measures the relative overhead of the asyncio task infrastructure when creating a large number of asynchronous tasks:

from asyncio import create_task, wait, run
from time import process_time as time

async def time_tasks(count=100) -> float:
    """Time creating and destroying tasks."""

    async def nop_task() -> None:
        """Do nothing task."""
        pass

    start = time()
    tasks = [create_task(nop_task()) for _ in range(count)]
    await wait(tasks)
    elapsed = time() - start
    return elapsed

for count in range(100_000, 1_000_000 + 1, 100_000):
    create_time = run(time_tasks(count))
    create_per_second = 1 / (create_time / count)
    print(f"{count:9,} tasks \t {create_per_second:0,.0f} tasks per/s")

Using the latest Python 3.14, this is reasonably fast on my laptop taking about 13 seconds:

$ time python3.14 foo.py
  100,000 tasks    577,247 tasks per/s
  200,000 tasks    533,911 tasks per/s
  300,000 tasks    546,127 tasks per/s
  400,000 tasks    488,219 tasks per/s
  500,000 tasks    466,636 tasks per/s
  600,000 tasks    469,972 tasks per/s
  700,000 tasks    434,126 tasks per/s
  800,000 tasks    428,456 tasks per/s
  900,000 tasks    404,905 tasks per/s
1,000,000 tasks    376,167 tasks per/s

python3.14 foo.py  12.69s user 0.27s system 99% cpu 12.971 total

Factor

We could translate this directly to Factor using the concurrency.combinators vocabulary.

In particular, the parallel-map word starts a new thread applying a quotation to each element in the sequence and then waits for all the threads to finish:

USING: concurrency.combinators formatting io kernel math ranges sequences
tools.time ;

: time-tasks ( n -- )
    <iota> [ ] parallel-map drop ;

: run-tasks ( -- )
    100,000 1,000,000 100,000 <range> [
       dup [ time-tasks ] benchmark 1e9 / dupd /
       "%7d tasks \t %7d tasks per/s\n" printf flush
    ] each ;

After making an improvement to our parallel-map implementation that uses a count-down latch for more efficient waiting on a group of tasks, this runs 2.5x as fast as Python:

IN: scratchpad gc [ run-tasks ] time
 100000 tasks   1246872 tasks per/s
 200000 tasks   1209500 tasks per/s
 300000 tasks   1141121 tasks per/s
 400000 tasks   1121304 tasks per/s
 500000 tasks   1119707 tasks per/s
 600000 tasks   1135459 tasks per/s
 700000 tasks    956541 tasks per/s
 800000 tasks   1091807 tasks per/s
 900000 tasks    944753 tasks per/s
1000000 tasks   1137681 tasks per/s

Running time: 5.142044833 seconds

That’s pretty good for a comparable dynamic language, and especially since we are still running in Rosetta 2 on Apple macOS translating Intel x86-64 to Apple Silicon aarch64 on the fly!

It also turns out that 75% of the benchmark time is spent in the garbage collector, so probably there are some big wins we can get if we look more closely into that.

Go

We could translate that benchmark into Go 1.25:

package main

import (
    "fmt"
    "sync"
    "time"
)

func timeTasks(count int) time.Duration {
    nopTask := func(done func()) {
        done()
    }
    start := time.Now()
    wg := &sync.WaitGroup{}
    wg.Add(count)
    for i := 0; i < count; i++ {
        go nopTask(wg.Done)
    }
    wg.Wait()
    return time.Now().Sub(start)
}
func main() {
    for n := 100_000; n <= 1_000_000; n += 100_000 {
        createTime := timeTasks(n)
        createPerSecond := (1.0 / (float64(createTime) / float64(n))) * float64(time.Second)
        fmt.Printf("%7d tasks \t %7d tasks per/s\n", n, createPerSecond)
    }
}

And show that it is about 11x times faster than Python using multiple CPUs.

$ time go run foo.go
 100000 tasks    3889083 tasks per/s
 200000 tasks    5748283 tasks per/s
 300000 tasks    6324955 tasks per/s
 400000 tasks    6265341 tasks per/s
 500000 tasks    6301852 tasks per/s
 600000 tasks    5572898 tasks per/s
 700000 tasks    6239860 tasks per/s
 800000 tasks    6276241 tasks per/s
 900000 tasks    6226128 tasks per/s
1000000 tasks    6243859 tasks per/s

go run foo.go  2.44s user 0.71s system 270% cpu 1.165 total

If we limit GOMAXPROCS to one CPU, it runs only 7.5x times faster than Python:

$ time GOMAXPROCS=1 go run foo.go
 100000 tasks    2240106 tasks per/s
 200000 tasks    2869379 tasks per/s
 300000 tasks    2745897 tasks per/s
 400000 tasks    3759142 tasks per/s
 500000 tasks    3090267 tasks per/s
 600000 tasks    3489138 tasks per/s
 700000 tasks    3608874 tasks per/s
 800000 tasks    3200636 tasks per/s
 900000 tasks    3682102 tasks per/s
1000000 tasks    3259778 tasks per/s

GOMAXPROCS=1 go run foo.go  1.65s user 0.08s system 99% cpu 1.735 total

JavaScript

We could build the same benchmark in JavaScript:

async function time_tasks(count=100) {
    async function nop_task() {
        return performance.now();
    }

    const start = performance.now()
    let tasks = Array(count).map(nop_task)
    await Promise.all(tasks)
    const elapsed = performance.now() - start
    return elapsed / 1e3
}

async function run_tasks() {
    for (let count = 100000; count < 1000000 + 1; count += 100000) {
        const ct = await time_tasks(count)
        console.log(`${count}: ${Math.round(1 / (ct / count))} tasks/sec`)
    }
}

run_tasks()

And it runs pretty fast on Node 25.2.1 – about 26x times faster than Python!

$ time node foo.js
100000: 9448038 tasks/sec
200000: 11555322 tasks/sec
300000: 18286318 tasks/sec
400000: 10017217 tasks/sec
500000: 12587060 tasks/sec
600000: 14198956 tasks/sec
700000: 13294620 tasks/sec
800000: 12045403 tasks/sec
900000: 11135513 tasks/sec
1000000: 13577663 tasks/sec

node foo.js  0.82s user 0.10s system 185% cpu 0.496 total

But it runs even faster on Bun 1.3.3 – about 36x times faster than Python!

$ time bun foo.js
100000: 9771222 tasks/sec
200000: 13388075 tasks/sec
300000: 13242548 tasks/sec
400000: 13130144 tasks/sec
500000: 16530496 tasks/sec
600000: 16979009 tasks/sec
700000: 16781272 tasks/sec
800000: 17098919 tasks/sec
900000: 17111784 tasks/sec
1000000: 18288515 tasks/sec

bun foo.js  0.37s user 0.02s system 111% cpu 0.353 total

I’m sure other languages perform both better and worse, but this gives us some nice ideas of where we stand relative to some useful production programming languages. There is clearly room to grow, some potential low-hanging fruit, and known features such as supporting native threads that could be a big improvement to the status quo!

PRs welcome!

Cosine FizzBuzz

Sat, 22 Nov 2025 08:00:00 -0700

After revisiting FizzBuzz yesterday to discuss a Lazy FizzBuzz using infinite lazy lists, I thought I would not return to the subject for awhile. Apparently, I was wrong!

Susam Pal just wrote a really fun article about Solving Fizz Buzz with Cosines:

We define a set of four functions { s0, s1, s2, s3 } for integers n by:

s0(n) = n

s1(n) = Fizz

s2(n) = Buzz

s3(n) = FizzBuzz

And from that, they derive a formula which is essentially a finite Fourier series for computing the nth value in the FizzBuzz sequence, showing a nice fixed periodic cycling across n mod 15, resolving at each value of n to either the integers 0, 1, 2, 3:

I recommend reading the whole article, but I will jump to an implementation of the formula in Factor:

:: fizzbuzz ( n -- val )
    11/15
    2/3 n * pi * cos 2/3 * +
    2/5 n * pi * cos 4/5 * +
    4/5 n * pi * cos + round >integer
    { n "Fizz" "Buzz" "FizzBuzz" } nth ;

And we can use that to compute the first few values in the sequence:

IN: scratchpad 1 ..= 100 [ fizzbuzz . ] each
1
2
"Fizz"
4
"Buzz"
"Fizz"
7
8
"Fizz"
"Buzz"
11
"Fizz"
13
14
"FizzBuzz"
16
17
"Fizz"
19
"Buzz"
...

Or, even some arbitrary values in the sequence:

IN: scratchpad 67 fizzbuzz .
67

IN: scratchpad 9,999,999 fizzbuzz .
"Fizz"

IN: scratchpad 10,000,000 fizzbuzz .
"Buzz"

IN: scratchpad 1,234,567,890 fizzbuzz .
"FizzBuzz"

Thats even more fun than using lazy lists!

Lazy FizzBuzz

Fri, 21 Nov 2025 08:00:00 -0700

I wrote about FizzBuzz many years ago. It’s a silly programming task often cited and even included on RosettaCode. The task is described as:

Write a program that prints the integers from 1 to 100 (inclusive).

But:

for multiples of three, print "Fizz" instead of the number;

for multiples of five, print "Buzz" instead of the number;

for multiples of both three and five, print "FizzBuzz" instead of the number.

This has been solved ad nauseum, but a few days ago Evan Hawn wrote about solving Fizz Buzz without conditionals or booleans using Python and the itertools.cycle function to create an infinitely iterable solution.

Let’s build this in Factor!

There are several ways to implement this, including generators, but we will be using the lists.lazy vocabulary to provide a lazy and infinite stream of values. In particular, by combining a stream of integers with a cycle of "Fizz" and a cycle of "Buzz".

The lists.circular vocabulary extends circular sequences to support the lists protocol:

IN: scratchpad USE: lists.circular

IN: scratchpad { 1 2 3 } <circular> 10 ltake list>array .
{ 1 2 3 1 2 3 1 2 3 1 }

Using that, we can create an infinite FizzBuzz list:

: lfizzbuzz ( -- list )
    1 lfrom
    { "" "" "Fizz" } <circular>
    { "" "" "" "" "Buzz" } <circular>
    lzip [ concat ] lmap-lazy lzip ;

We can print out the first few values quite simply:

IN: scratchpad lfizzbuzz 20 ltake [ first2 "%2d: %s\n" printf ] leach
 1:
 2:
 3: Fizz
 4:
 5: Buzz
 6: Fizz
 7:
 8:
 9: Fizz
10: Buzz
11:
12: Fizz
13:
14:
15: FizzBuzz
16:
17:
18: Fizz
19:
20: Buzz

And if we wanted a more traditional stream alternating between numbers and labels:

IN: scratchpad lfizzbuzz 100 ltake [ first2 [ nip ] unless-empty . ] leach
1
2
"Fizz"
4
"Buzz"
"Fizz"
7
8
"Fizz"
"Buzz"
11
"Fizz"
13
14
"FizzBuzz"
16
17
"Fizz"
19
"Buzz"
...

While not the ultimate FizzBuzz Enterprise Edition, this seems like a fun way to improve upon the simple meant for whiteboards implementation that is most often shared.

Lorem Ipsum

Thu, 20 Nov 2025 08:00:00 -0700

Lorem ipsum is a type of placeholder text that can be used in graphic design or web development. The most common form of it will often begin like this paragraph:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

I wanted to make a program to generate believable lorem ipsum text using Factor.

We start by defining a bunch of possible words:

CONSTANT: words qw{
    a ab accusamus accusantium ad adipisci alias aliquam aliquid
    amet animi aperiam architecto asperiores aspernatur
    assumenda at atque aut autem beatae blanditiis commodi
    consectetur consequatur consequuntur corporis corrupti culpa
    cum cumque cupiditate debitis delectus deleniti deserunt
    dicta dignissimos distinctio dolor dolore dolorem doloremque
    dolores doloribus dolorum ducimus ea eaque earum eius
    eligendi enim eos error esse est et eum eveniet ex excepturi
    exercitationem expedita explicabo facere facilis fuga fugiat
    fugit harum hic id illo illum impedit in incidunt inventore
    ipsa ipsam ipsum iste itaque iure iusto labore laboriosam
    laborum laudantium libero magnam magni maiores maxime minima
    minus modi molestiae molestias mollitia nam natus
    necessitatibus nemo neque nesciunt nihil nisi nobis non
    nostrum nulla numquam obcaecati odio odit officia officiis
    omnis optio pariatur perferendis perspiciatis placeat porro
    possimus praesentium provident quae quaerat quam quas quasi
    qui quia quibusdam quidem quis quisquam quo quod quos
    ratione recusandae reiciendis rem repellat repellendus
    reprehenderit repudiandae rerum saepe sapiente sed sequi
    similique sint sit soluta sunt suscipit tempora tempore
    temporibus tenetur totam ullam unde ut vel velit veniam
    veritatis vero vitae voluptas voluptate voluptatem
    voluptates voluptatibus voluptatum
}

We then use these to build a random ipsum sentence:

: random-sentence ( -- str )
    2 ..= 5 random [
        words 3 ..= 12 random sample " " join
    ] replicate ", " join
    0 over [ ch>upper ] change-nth "?." random suffix ;

Then build a random ipsum paragraph from sentences:

: random-paragraph ( -- str )
    2 ..= 4 random [ random-sentence ] replicate " " join ;

We can define the initial paragraph above:

CONSTANT: initial-paragraph "\
Lorem ipsum dolor sit amet, consectetur adipisicing \
elit, sed do eiusmod tempor incididunt ut labore et \
dolore magna aliqua. Ut enim ad minim veniam, quis \
nostrud exercitation ullamco laboris nisi ut aliquip ex \
ea commodo consequat. Duis aute irure dolor in \
reprehenderit in voluptate velit esse cillum dolore eu \
fugiat nulla pariatur. Excepteur sint occaecat cupidatat \
non proident, sunt in culpa qui officia deserunt mollit \
anim id est laborum."

And use it to make random ipsum paragraphs, starting with the initial one:

: random-paragraphs ( n -- str )
    <iota> [
        zero? [ initial-paragraph ] [ random-paragraph ] if
    ] map "\n" join ;

Or even generate a list of random ipsum words, understanding that sample can’t generate more samples than the length of the sequence being sampled from:

:: random-words ( n -- str )
    words length :> w
    [
        n [ words over w min sample % w [-] ] until-zero
    ] { } make ;

We can make a command-line interface using the argument parser to return words, sentence, or paragraph:

CONSTANT: OPTIONS {
    T{ option
        { name "--w" }
        { help "Generate some lorem ipsum words" }
        { #args 1 }
        { type integer }
    }
    T{ option
        { name "--s" }
        { help "Generate a lorem ipsum sentence" }
        { const t }
        { default f }
    }
    T{ option
        { name "--p" }
        { help "Generate a lorem ipsum paragraph" }
        { const t }
        { default f }
    }
}

MAIN: [
    OPTIONS [
        "w" get [ random-words print ] when*
        "s" get [ random-sentence print ] when
        "p" get [ random-paragraph print ] when
    ] with-options
]

And that gives you automatic help text showing the available options:

$ ./factor -run=lorem-ipsum --help
Usage:
    factor -run=lorem-ipsum [--help] [--w W] [--s] [--p]

Options:
    --help    show this help and exit
    --w W     Generate some lorem ipsum words
    --s       Generate a lorem ipsum sentence
    --p       Generate a lorem ipsum paragraph

We can test it by generating some words, a sentence, and a paragraph:

$ ./factor -run=lorem-ipsum --w 10
vero eos quos optio magni soluta nulla delectus voluptas neque

$ ./factor -run=lorem-ipsum --s
Totam dicta laborum perferendis unde voluptas, culpa dignissimos odio
distinctio rem eius, tempora harum corporis accusamus.

$ ./factor -run=lorem-ipsum --p
Quaerat maiores veniam minus reprehenderit architecto numquam mollitia earum,
natus assumenda eius cumque minima sint magni accusantium facere, eius aperiam
explicabo molestias voluptatibus aspernatur maiores assumenda, nulla illo
doloremque voluptatum excepturi accusamus porro officiis tempore molestiae
saepe, iusto quibusdam explicabo obcaecati saepe quasi voluptate? Velit libero
tempore in nobis ratione nisi laborum rerum natus ipsam, aperiam placeat
laborum delectus dolor ab dolores itaque. Fuga maxime culpa quae adipisci, modi
quod distinctio ipsam, et vero natus consequuntur neque placeat saepe quam
perferendis, voluptate nemo ducimus ullam recusandae iusto laboriosam iure
temporibus sed saepe, optio dignissimos dolor modi accusamus quod culpa ab? Ad
dolore dignissimos, perferendis accusamus ducimus fuga eveniet a ut.

The code for this is on my GitHub.

Cardinal Direction

Wed, 19 Nov 2025 08:00:00 -0700

Cardinal direction describes points on a compass — North (N), East (E), South (S), and West (W).

In addition to those, there are also 4 intercardinal directions (Northwest or NW, NE, SW, SE), 8 secondary intercardinal directions (North-Northwest or NNW, WNW, NNE, ENE, WSW, SSW, ESE, SSE), as well as 16 tertiary intercardinal directions (Northwest-by-North or NWbN, etc.).

In fact, you can see all the points of the compass divided into 32 textual directions:

I like puzzles and sometimes those come from code golfing. I stumbled across two symmetric challenges on the Code Golf and Coding Challenges Stack Exchange:

I thought it would be a good task to code in Factor.

We start by enumerating the descriptive names for all 32 points of the compass. For convenience we use the quoted words vocabulary.

CONSTANT: directions qw{
    N NbE NNE NEbN NE NEbE ENE EbN
    E EbS ESE SEbE SE SEbS SSE SbE
    S SbW SSW SWbS SW SWbW WSW WbS
    W WbN WNW NWbW NW NWbN NNW NbW
}

Then, parsing a compass degrees to a textual name involves rounding to the nearest 32-point:

: compass>string ( compass -- str )
    360/32 / round >integer 32 mod directions nth ;

We can use some test cases from the challenges to check that it works:

{ "N"    } [ 0     compass>string ] unit-test
{ "NNE"  } [ 23.97 compass>string ] unit-test
{ "NEbN" } [ 33.7  compass>string ] unit-test
{ "ENE"  } [ 73.12 compass>string ] unit-test
{ "EbN"  } [ 73.13 compass>string ] unit-test
{ "SWbS" } [ 219   compass>string ] unit-test
{ "W"    } [ 275   compass>string ] unit-test
{ "WbN"  } [ 276   compass>string ] unit-test
{ "WNW"  } [ 287   compass>string ] unit-test

And the reverse converts the name back to compass degrees by grabbing the index and multiplying:

: string>compass ( str -- compass )
    directions index 360/32 * ;

It might not be the shortest solution, but it works and it was fun to build!

Flood Fill

Tue, 18 Nov 2025 08:00:00 -0700

Yesterday, Rodrigo Girão Serrão wrote an article about implementing the Floodfill algorithm in Python. He included a Javascript demonstration you can click on, as well as a step-by-step example at the end of his blog post to go over how flood fill works.

We are going to implement this in Factor and then extend the images.viewer vocabulary:

When working with bitmap pixel data, we typically store colors using integers in the range [0..255]. We can generate a random color with 4 bytes by simply using replicate:

:: random-color ( -- color )
    4 [ 255 random ] B{ } replicate-as ;

This allows us to implement the flood fill four-way algorithm:

Start from a specified pixel in an image.
Choose a random but different color to assign.
If the pixel is not the initial color, you’re done.
If the pixel is, then change it’s color.
For each surrounding pixel, continue from step 3.

CONSTANT: neighbors { { 1 0 } { 0 1 } { -1 0 } { 0 -1 } }

:: floodfill ( x y image -- ? )
    image dim>> first2 :> ( w h )
    {
        [ x 0 >= ] [ x w < ]
        [ y 0 >= ] [ y h < ]
    } 0&& [
        x y image pixel-at :> initial
        f [ drop random-color dup initial = ] loop :> color

        color x y image set-pixel-at
        V{ { x y } } :> queue

        [ queue empty? ] [
            queue pop first2 :> ( tx ty )
            neighbors [
                first2 :> ( dx dy )
                tx dx + :> nx
                ty dy + :> ny
                {
                    [ nx 0 >= ] [ nx w < ]
                    [ ny 0 >= ] [ ny h < ]
                    [ nx ny image pixel-at initial = ]
                } 0&& [
                    color nx ny image set-pixel-at
                    { nx ny } queue push
                ] when
            ] each
        ] until t
    ] [ f ] if ;

Note: as implemented, we change every pixel that matches the first click to a different color. It might be more aesthetic to allow for anti-aliasing to adjust colors that are fairly close to the original color.

Now, we’ll extend the image-gadget:

TUPLE: floodfill-gadget < image-gadget ;

: <floodfill-gadget> ( image -- gadget )
    floodfill-gadget new-image-gadget* ;

We implement a click handler that performs a flood fill and if it changed the image, cleans up the texture object and re-renders the gadget.

:: on-click ( gadget -- )
    gadget hand-rel first2 [ gl-scale >integer ] bi@ :> ( x y )
    x y gadget image>> floodfill [
        gadget delete-current-texture
        gadget relayout-1
    ] when ;

That word is assigned as a gesture on button-up mouse clicks:

floodfill-gadget "gestures" f {
    { T{ button-up { # 1 } } on-click }
} define-command-map

And, for convenience, make a main window word to launch the user interface with an example image:

MAIN-WINDOW: floodfill-window { { title "Floodfill" } }
    "vocab:floodfill/logo.png" <floodfill-gadget> >>gadgets ;

This is available on my GitHub.

Parsing Chemistry

Fri, 24 Oct 2025 08:00:00 -0700

In Python, the chemparse project is available as a “lightweight package for parsing chemical formula strings into python dictionaries” mapping chemical elements to numeric counts.

It supports parsing several variants of formula such as:

simple formulas like "H2O"
fractional stoichiometry like "C1.5O3"
groups such as "(CH3)2"
nested groups such as "((CH3)2)3"
square brackets such as "K4[Fe(SCN)6]"

I thought it would fun to build a similar functionality using Factor.

We are going to be using the EBNF syntax support to more simply write a parsing expression grammar. As is often the most useful way to implement things, we break it down into steps. We can parse a symbol as one or two letters, a number as an integer or float, and then a pair which is a symbol with an optional number prefix and postfix.

EBNF: split-formula [=[

symbol = [A-Z] [a-z]? => [[ sift >string ]]

number = [0-9]+ { "." [0-9]+ }? { { "e" | "E" } { "+" | "-" }? [0-9]+ }?

       => [[ first3 [ concat ] bi@ "" 3append-as string>number ]]

pair   = number? { symbol | "("~ pair+ ")"~ | "["~ pair+ "]"~ } number?

       => [[ first3 swapd [ 1 or ] bi@ * 2array ]]

pairs  = pair+

]=]

We can test that this works:

IN: scratchpad "H2O" split-formula .
V{ { "H" 2 } { "O" 1 } }

IN: scratchpad "(CH3)2" split-formula .
V{ { V{ { "C" 1 } { "H" 3 } } 2 } }

But we need to recursively flatten these into an assoc, mapping element to count.

: flatten-formula ( elt n assoc -- )
    [ [ first2 ] [ * ] bi* ] dip pick string?
    [ swapd at+ ] [ '[ _ _ flatten-formula ] each ] if ;

And combine those two steps to parse a formula:

: parse-formula ( str -- seq )
    split-formula H{ } clone [
        '[ 1 _ flatten-formula ] each
    ] keep ;

We can now test that this works with a few unit tests that show each of the features we hoped to support:

{ H{ { "H" 2 } { "O" 1 } } } [ "H2O" parse-formula ] unit-test

{ H{ { "C" 1.5 } { "O" 3 } } } [ "C1.5O3" parse-formula ] unit-test

{ H{ { "C" 2 } { "H" 6 } } } [ "(CH3)2" parse-formula ] unit-test

{ H{ { "C" 6 } { "H" 18 } } } [ "((CH3)2)3" parse-formula ] unit-test

{ H{ { "K" 4 } { "Fe" 1 } { "S" 6 } { "C" 6 } { "N" 6 } } }
[ "K4[Fe(SCN)6]" parse-formula ] unit-test

This is available in my GitHub.

Split Lines

Sun, 12 Oct 2025 08:00:00 -0700

William Woodruff recently noticed that Python’s splitlines does a lot more than just newlines:

I always assumed that Python’s str.splitlines() split strings by “universal newlines”, i.e., \n, \r, and \r\n.

But it turns out it does a lot more than that.

The recent Factor 0.100 release included a change to make the split-lines word split on unicode linebreaks which matches the Python behavior.

IN: scratchpad "line1\nline2\rline3\r\nline4\vline5\x1dhello"
               split-lines .
{ "line1" "line2" "line3" "line4" "line5" "hello" }

These are considered line breaks:

Character	Description
`\n`	Line Feed
`\r`	Carriage Return
`\r\n`	Carriage Return + Line Feed
`\v`	Line Tabulation
`\f`	Form Feed
`\x1c`	File Separator
`\x1d`	Group Separator
`\x1e`	Record Separator
`\x85`	Next Line (C1 Control Code)
`\u002028`	Line Separator
`\u002029`	Paragraph Separator

This might be surprising – or just what you needed!

Extensible Data Notation

Sun, 05 Oct 2025 08:00:00 -0700

I wrote about the Data Formats support that comes included in Factor. As I mentioned in that post, there are many more that we could implement. One of those is Extensible Data Notation – also known as EDN – and comes from the Clojure community.

We can see a nice example of the EDN format in Learn EDN in Y minutes:

; Comments start with a semicolon.
; Anything after the semicolon is ignored.

;;;;;;;;;;;;;;;;;;;
;;; Basic Types ;;;
;;;;;;;;;;;;;;;;;;;

nil         ; also known in other languages as null

; Booleans
true
false

; Strings are enclosed in double quotes
"hungarian breakfast"
"farmer's cheesy omelette"

; Characters are preceded by backslashes
\g \r \a \c \e

; Keywords start with a colon. They behave like enums. Kind of
; like symbols in Ruby.
:eggs
:cheese
:olives

; Symbols are used to represent identifiers. 
; You can namespace symbols by using /. Whatever precedes / is
; the namespace of the symbol.
spoon
kitchen/spoon ; not the same as spoon
kitchen/fork
github/fork   ; you can't eat with this

; Integers and floats
42
3.14159

; Lists are sequences of values
(:bun :beef-patty 9 "yum!")

; Vectors allow random access
[:gelato 1 2 -2]

; Maps are associative data structures that associate the key with its value
{:eggs        2
 :lemon-juice 3.5
 :butter      1}

; You're not restricted to using keywords as keys
{[1 2 3 4] "tell the people what she wore",
 [5 6 7 8] "the more you see the more you hate"}

; You may use commas for readability. They are treated as whitespace.

; Sets are collections that contain unique elements.
#{:a :b 88 "huat"}

;;;;;;;;;;;;;;;;;;;;;;;
;;; Tagged Elements ;;;
;;;;;;;;;;;;;;;;;;;;;;;

; EDN can be extended by tagging elements with # symbols.

#MyYelpClone/MenuItem {:name "eggs-benedict" :rating 10}

Recently, I implemented support for EDN, originally using Parsing Expression Grammar to do the parsing, and then adding support for encoding Factor objects into EDN, and then switching to a faster stream-based parsing approach.

This now allows us to parse that example above into:

{
    null
    t
    f
    "hungarian breakfast"
    "farmer's cheesy omelette"
    103
    114
    97
    99
    101
    T{ keyword { name "eggs" } }
    T{ keyword { name "cheese" } }
    T{ keyword { name "olives" } }
    T{ symbol { name "spoon" } }
    T{ symbol { name "kitchen/spoon" } }
    T{ symbol { name "kitchen/fork" } }
    T{ symbol { name "github/fork" } }
    42
    3.14159
    {
        T{ keyword { name "bun" } }
        T{ keyword { name "beef-patty" } }
        9
        "yum!"
    }
    V{
        T{ keyword { name "gelato" } }
        1
        2
        -2
    }
    LH{
        { T{ keyword { name "eggs" } } 2 }
        { T{ keyword { name "lemon-juice" } } 3.5 }
        { T{ keyword { name "butter" } } 1 }
    }
    LH{
        { V{ 1 2 3 4 } "tell the people what she wore" }
        { V{ 5 6 7 8 } "the more you see the more you hate" }
    }
    HS{
        88
        T{ keyword { name "a" } }
        T{ keyword { name "b" } }
        "huat"
    }
    T{ tagged
        { name "MyYelpClone/MenuItem" }
        { value
            LH{
                { T{ keyword { name "name" } } "eggs-benedict" }
                { T{ keyword { name "rating" } } 10 }
            }
        }
    }
}

The edn vocabulary is now included in the Factor standard library.

You can see some information about the various words currently available:

IN: scratchpad "edn" help
Extensible Data Notation (EDN)

The edn vocabulary supports reading and writing from the Extensible Data
Notation (EDN) format.

Reading from EDN:
 read-edns ( -- objects )
 read-edn ( -- object )
 edn> ( string -- objects )

Writing into EDN:
 write-edns ( objects -- )
 write-edn ( object -- )
 >edn ( object -- string )

Basic support is included for encoding Factor objects:

IN: scratchpad TUPLE: foo a b c ;

IN: scratchpad 1 2 3 foo boa write-edn
#scratchpad/foo {:a 1, :b 2, :c 3}

But we don’t automatically parse these tagged objects back into a Factor object at the moment.

Check it out!

Pseudo Encrypt

Sat, 04 Oct 2025 12:00:00 -0700

Pseudo Encrypt is a function drawn from the PostgreSQL project.

pseudo_encrypt(int) can be used as a pseudo-random generator of unique values. It produces an integer output that is uniquely associated to its integer input (by a mathematical permutation), but looks random at the same time, with zero collision. This is useful to communicate numbers generated sequentially without revealing their ordinal position in the sequence (for ticket numbers, URLs shorteners, promo codes…)

It’s implementation is defined as:

CREATE OR REPLACE FUNCTION pseudo_encrypt(value int) returns int AS $$
DECLARE
l1 int;
l2 int;
r1 int;
r2 int;
i int:=0;
BEGIN
 l1:= (value >> 16) & 65535;
 r1:= value & 65535;
 WHILE i < 3 LOOP
   l2 := r1;
   r2 := l1 # ((((1366 * r1 + 150889) % 714025) / 714025.0) * 32767)::int;
   l1 := l2;
   r1 := r2;
   i := i + 1;
 END LOOP;
 return ((r1 << 16) + l1);
END;
$$ LANGUAGE plpgsql strict immutable;

Let’s implement this in Factor using some of the words from the math.bitwise vocabulary, working with the intermediate results as 32-bit signed integers:

: pseudo-encrypt ( x -- y )
    [ -16 shift ] keep [ 16 bits ] bi@ 3 [
        [
            1366 * 150889 + 714025 rem 714025.0 / 32767 *
            round >integer bitxor 32 >signed
        ] keep swap
    ] times 16 shift + 32 >signed ;

We can compare our results for [-10..10] which are helpfully provided on the original linked page:

IN: scratchpad -10 ..= 10 [ dup pseudo-encrypt "%3d %12d\n" printf ] each
-10  -1270576520
 -9   -236348969
 -8  -1184061109
 -7    -25446276
 -6  -1507538963
 -5   -518858927
 -4  -1458116927
 -3   -532482573
 -2   -157973154
 -1  -1105881908
  0   1777613459
  1    561465857
  2    436885871
  3    576481439
  4    483424269
  5   1905133426
  6    971249312
  7   1926833684
  8    735327624
  9   1731020007
 10    792482838

Great – it matches!

std::flip

Mon, 29 Sep 2025 20:00:00 -0700

Morwenn posted a blog about implementing a std::flip operation in C++:

This is basically walking up the tree from the child node as if it were a linked list. The reverse operation either implies walking through two children nodes, or simply flipping the order of parameters, which is where std::flip intervenes:
auto is_descendant_of = std::flip(is_ancestor_of);

// This property should always hold
assert(is_descendant_of(node1, node2) == is_ancestor_of(node2, node1));

Spoiler: the std::flip operator is not part of the C++ standard library, although an implementation is providing at the end of the blog post in around 90 lines of code.

Still, I thought it would be fun to implement in Factor.

As it turns out, we already have a flip word that modifies a sequence, essentially by returning the transpose of a matrix. One could argue that transpose might be a better name for that operation. In any event, let’s focus on implementing the std::flip operation.

How would we reverse the arguments to a word?

a b can become b a by calling swap.
a b c can become c b a by calling swap rot.
a b c d can become d c b a by calling swap rot roll.

We can generalize this into a macro by repeatedly calling -nrot:

MACRO: nreverse ( n -- quot )
    0 [a..b) [ '[ _ -nrot ] ] map [ ] concat-as ;

And then show that it works:

IN: scratchpad { } [ 0 nreverse ] with-datastack .
{ }

IN: scratchpad { "a" } [ 1 nreverse ] with-datastack .
{ "a" }

IN: scratchpad { "a" "b" } [ 2 nreverse ] with-datastack .
{ "b" "a" }

IN: scratchpad { "a" "b" "c" } [ 3 nreverse ] with-datastack .
{ "c" "b" "a" }

IN: scratchpad { "a" "b" "c" "d" } [ 4 nreverse ] with-datastack .
{ "d" "c" "b" "a" }

IN: scratchpad { "a" "b" "c" "d" "e" } [ 5 nreverse ] with-datastack .
{ "e" "d" "c" "b" "a" }

Note: this has been added to the shuffle vocabulary.

Using this, we can build some syntax that takes the next token and searches for a matching word with that name, and then calls it after reversing the inputs:

SYNTAX: flip:
    scan-word [ stack-effect in>> length ] keep
    '[ _ nreverse _ execute ] append! ;

As an example, we will use the 4array word that returns an array consisting of four arguments from the stack.

IN: scratchpad 10 20 30 40 4array .
{ 10 20 30 40 }

IN: scratchpad 10 20 30 40 flip: 4array .
{ 40 30 20 10 }

We could have different syntax for flipping arbitrary code – first parsing a quotation and then infer the stack-effect and then inline a reversed argument version.

SYNTAX: flip[
    parse-quotation [ infer in>> length ] keep
    '[ _ nreverse @ ] suffix! ;

We can try that out with a simple block of code:

IN: scratchpad 1 2 3 flip[ [ 10 * ] tri@ ] call 3array .
{ 30 20 10 }

And only a few lines of code in total.

Pretty cool!

Scream Cipher

Sun, 21 Sep 2025 08:00:00 -0700

Seth Larson wrote about a Scream Cipher:

You’ve probably heard of stream ciphers, but what about a scream cipher 😱? Today I learned there are more “Latin capital letter A” Unicode characters than there are letters in the English alphabet. You know what that means, it’s time to scream:

We can use bidirectional assocs to keep a single cipher data structure that efficiently maps into and out of the cipher:

CONSTANT: cipher $[
    "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    "AÁĂẮẶẰẲẴǍÂẤẬẦẨẪÄǞȦǠẠȀÀẢȂĀĄ"
    zip >biassoc
]

: >scream ( str -- SCREAM )
    [ ch>upper cipher ?at drop ] map ;

: scream> ( SCREAM -- str )
    [ cipher ?value-at drop ] map ;

And then give it a try!

IN: scratchpad "FACTOR!" >scream .
"ẰAĂẠẪȦ!"

IN: scratchpad "ẰAĂẠẪȦ!" scream> .
"FACTOR!"

Fun!

Environment Variables

Fri, 19 Sep 2025 08:00:00 -0700

Factor has an environment vocabulary for working with process environment variables on all the platforms we currently support: macOS, Windows, and Linux.

Recently, I noticed that .NET 9 added support for empty environment variables. This was particulary relevant due to a test failure of the new Dotenv implementation on Windows. It turns out that we inherited the same issue that earlier .NET versions had, which is an inability to disambiguate an unset environment variable from one that was set to the empty string. This issue has now been fixed in the latest development version.

Before:

IN: scratchpad "FACTOR" os-env .
f

IN: scratchpad "" "FACTOR" set-os-env
IN: scratchpad "FACTOR" os-env .
f

After:

IN: scratchpad "FACTOR" os-env .
f

IN: scratchpad "" "FACTOR" set-os-env
IN: scratchpad "FACTOR" os-env .
""

IN: scratchpad "FACTOR" unset-os-env
IN: scratchpad "FACTOR" os-env .
f

There might be other cross-platform environment-related topics to investigate, such as an open issue to look into case-preserving but case-insensitive environment variables on Windows.

PRs welcome!

HiDPI

Tue, 02 Sep 2025 08:00:00 -0700

HiDPI is a name for high resolution displays, sometimes called retina displays. A long long time ago, I added support for Retina Displays on macOS using Factor. But, they have not been well supported on either Linux or Windows platforms.

That ends today!

Linux

Some users have seen the “small window” problem on Linux, where on high resolution displays the Factor UI listener was rendered super tiny:

This is now fixed, it renders at the appropriate resolution detecting the screen it is launched on, or using the GDK_SCALE environment variable:

There has been one report that this works in the Gnome environments but not on KDE, so we might still have a few code changes necessary to make this more universal. And we also still need to switch from using our older GTK integration to the newer one with clean support for Wayland.

Windows

Other users have noticed the blurry text on Windows, due to using a legacy compatibility mode:

This is now fixed, rendering with the correct scaling factor:

It has been tested with 200% and 300% scaling factors. It is possible that intermediate scaling factors like 150% are not well supported and additional tweaks might be necessary to make this more universal.

Currently, on all three supported platforms, we use a global scaling factor which does not allow for moving Factor windows cleanly between screens with different scaling factors, for example when using HDMI on presentations, etc.

PRs welcome!

Neovim

Wed, 27 Aug 2025 08:00:00 -0700

Neovim is a modern implementation of a vim-like editor. It started as a refactor, but “not a rewrite but a continuation and extension of Vim”. It does have some ability to load plugins built in Vimscript, but most new plugins seem to be written using the Lua programming language.

Factor has many different editor integrations supporting various text editors as well as plugins for some that provide additional features. One of these is the factor.vim plugin for Vim, which I happen to use frequently.

In the Big Omarchy 2.0 Tour, DHH presents about the Omarchy customization of Linux. I noticed that they have a pretty nice Neovim integration, particularly with the system themes. It turns out to be based somewhat on the Lazyvim system.

In any event, I wondered about how easy it would be to make a Neovim plugin for Factor. It isn’t fully necessary as there is support for Vimscript plugins and the Factor one works pretty well out of the box. However, I thought I’d ask Claude Code to go off and YOLO an implementation based on the existing one. Thankfully this was not a FAFO moment, and after a few cycles it came back with something that mostly works!

This is available in the factor.nvim repository and should be pretty easy to integrate into your Neovim setup. Perhaps give it a try and see what you think? I’ve been using it and it seems to work pretty well.

New Icon

Tue, 26 Aug 2025 18:00:00 -0700

Encouraged by a fun rant about MacOS Tahoe’s Dead-Canary Utility App Icons, the reality that Apple is moving into the wonderful squircle-filled future, and the particularly annoying fact that legacy icons look terrible on macOS Tahoe – we have a new icon for Factor!

The latest development version includes new icon files in both PNG and SVG formats. And these are being used across macOS, Windows, and Linux builds. And, it might be burying the lede, but this is a particularly good time to do this as we finally have high-resolution “HiDPI” support working on Windows and Linux.

The next release is likely to be a good one!

TA-Lib

Sun, 24 Aug 2025 08:00:00 -0700

The TA-Lib is a C project that supports adding “technical analysis to your own financial market trading applications”. It was originally created in 2001 and is well-tested, recently released, and popular:

200 indicators such as ADX, MACD, RSI, Stochastic, Bollinger Bands etc… See complete list…

Candlestick patterns recognition

Core written in C/C++ with API also available for Python.

Open-Source (BSD License). Can be freely integrated in your own open-source or commercial applications.

Of course, I wanted to be able to call the library using Factor. We have a C library interface that makes it pretty easy to interface with C libraries.

First, we add the library we expect to load:

<< "ta-lib" {
    { [ os windows? ] [ "libta-lib.dll" ] }
    { [ os macos?   ] [ "libta-lib.dylib" ] }
    { [ os unix?    ] [ "libta-lib.so" ] }
} cond cdecl add-library >>

LIBRARY: ta-lib

Then, we can define some types and some library functions to calculate the relative strength index:

TYPEDEF: int TA_RetCode

FUNCTION: TA_RetCode TA_RSI ( int startIdx, int endIdx, double* inReal, int optInTimePeriod, int* outBegIdx, int* outNBElement, double* outReal )
FUNCTION: int TA_RSI_Lookback ( int optInTimePeriod )

We use a simple code generator to define all the functions, as well as wrappers that can be used to call it:

:: RSI ( real timeperiod -- real )
    0 int <ref> :> outbegidx
    0 int <ref> :> outnbelement
    real check-array :> inreal
    inreal length :> len
    inreal check-begidx1 :> begidx
    len 1 - begidx - :> endidx
    timeperiod TA_RSI_Lookback begidx + :> lookback
    len lookback make-double-array :> outreal
    0 endidx inreal begidx tail-slice timeperiod outbegidx outnbelement outreal lookback tail-slice TA_RSI ta-check-success
    outreal ;

And, now we can use it!

IN: scratchpad 10 10 randoms 3 RSI .
double-array{
    0/0.
    0/0.
    0/0.
    50.0
    62.16216216216216
    31.506849315068497
    46.38069705093834
    32.33644859813084
    59.75541967759867
    66.53570603189276
}

You’ll note that the first few values are 0/0. which represents a NaN when we don’t have enough data to compute an answer – either because we are in the lookback phase or because the inputs have NaNs.

For convenience, we convert the inputs to double-array to perform the calculation, but if the input is already a double-array then there is not any data conversion cost.

There are some advanced techniques including use of the Abstract API for meta-programming, default values for parameters, candlestick settings, streaming indicator support, and documentation that we probably should think about adding as well.

This is available on my GitHub.

String Length

Sat, 23 Aug 2025 08:00:00 -0700

I was reminded recently about a great article about unicode string lengths:

It’s Not Wrong that "🤦🏼‍♂️".length == 7

But It’s Better that "🤦🏼‍♂️".len() == 17 and Rather Useless that len("🤦🏼‍♂️") == 5

This comes at a time of excessive emoji tsunami thanks to the proliferation of large language models and probably lots of Gen Z in the training data sets. Sometimes emojis are fun and useful like in Base256Emoji and sometimes it can get carried away like in the Emoji Kitchen.

I have written about Factor’s unicode support before and wanted to use this example to show a bit more about how Factor represents text using the Unicode standard.

IN: scratchpad "🤦" length .
1

IN: scratchpad "🤦🏼‍♂️" length .
5

Wat.

Well, what is happening is that the current strings vocabulary stores Unicode code points. This can be both useful and useless depending on the task at hand. We can print out which ones are used in this example:

IN: scratchpad "🤦🏼‍♂️" [ char>name . ] each
"face-palm"
"emoji-modifier-fitzpatrick-type-3"
"zero-width-joiner"
"male-sign"
"variation-selector-16"

When a developer expresses a need to store or retrieve textual data, they likely need to know about character encodings. In this case, we can see the number of bytes required to store this string in different encodings:

IN: scratchpad "🤦🏼‍♂️" utf8 encode length .
17

IN: scratchpad "🤦🏼‍♂️" utf16 encode length .
16

IN: scratchpad "🤦🏼‍♂️" utf32 encode length .
24

But, what if we just want to know how many visual characters are in the string?

IN: scratchpad "🤦🏼‍♂️" >graphemes length .
1

This is covered in The Absolute Minimum Every Software Developer Must Know About Unicode in 2023, which is also a great article and covers this as well as a number of other aspects of the Unicode standard.

Anubis

Thu, 21 Aug 2025 08:00:00 -0700

Tavis Ormany wrote a great blog post about the Anubis project asking a very valid sounding question:

Hey… quick question, why are anime catgirls blocking my access to the Linux kernel?

The answer seems to be that it “weighs the soul of your connection using one or more challenges in order to protect upstream resources from scraper bots”. In particular, this project is an attempt to fight the AI scraperbot scourge which is making many popular websites annoying to use these days and spawning a kind of arms race amongst website owners, content delivery networks, and well-funded and morally-ambiguous AI firms.

Tavis goes into great detail about the estimated costs and inconvenience of this approach — and why it might likely inconvenience scraper bots which use many different IP addresses more than normal traffic which typically does not — as well as how the proof-of-work methodology is implemented using a solution written in the C programming language.

Without going into the safety versus security debate (a focus of the discussion on Lobsters), I thought I would show how to implement this using Factor.

How does Anubis work?

The Anubis challenge is a message, for example 5d737f0600ff2dd, which we use as a prefix while trying up to 262144 different nonce suffixes to find a SHA-256 hash that starts with 16 bits of zero.

The Anubis response in this case is 47224, which has a SHA-256 hash starting with 0000:

$ printf "5d737f0600ff2dd%d" 47224 | sha256sum
000043f7c4392a781a04419a7cb503089ebcf3164e2b1d4258b3e6c15b8b07f1  -

Solving in Factor

Factor includes support for SHA-2 checksums that we can use to solve the example puzzle:

USING: checksums checksums.sha kernel math math.parser sequences ;

: find-anubis ( message -- nonce anubis )
    18 2^ <iota> [
        >dec append sha-256 checksum-bytes
        [ B{ 0 0 } head? ] keep f ?
    ] with map-find swap ;

And a test showing that it works:

{
    47224
    "000043f7c4392a781a04419a7cb503089ebcf3164e2b1d4258b3e6c15b8b07f1"
} [ "5d737f0600ff2dd" find-anubis bytes>hex-string ] unit-test

But, I noticed that it’s not that fast:

IN: scratchpad [ "5d737f0600ff2dd" find-anubis ] time
Running time: 0.216027939 seconds

Solving in Factor using C

Tanis used some functions from the OpenSSL library to compute the checksum.

We can take the same approach using the C library interface. It would be great to be able to parse header files and make this a little simpler, but for now we can define these C functions that we would like to call:

LIBRARY: libcrypto

STRUCT: SHA256_CTX
    { h uint[8] }
    { Nl uint }
    { Nh uint }
    { data uint[16] }
    { num uint }
    { md_len uint } ;

FUNCTION: int SHA256_Init ( SHA256_CTX* c )
FUNCTION: int SHA256_Update ( SHA256_CTX* c, void* data, size_t len )
FUNCTION: int SHA256_Final ( uchar* md, SHA256_CTX* c )

And then build the same C program in Factor:

:: find-anubis ( message -- nonce anubis )
    SHA256_CTX new :> base
    base SHA256_Init 1 assert=
    base message binary encode dup length SHA256_Update 1 assert=

    32 <byte-array> :> hash
    18 2^ <iota> [
        base clone :> ctx
        ctx swap >dec binary encode dup length SHA256_Update 1 assert=
        hash ctx SHA256_Final 1 assert=
        hash B{ 0 0 } head?
    ] find nip hash ;

Is it fast?

IN: scratchpad [ "5d737f0600ff2dd" find-anubis ] time
Running time: 0.009508132 seconds

Sure is!

How does that compare to the original C program?

$ gcc -Ofast -march=native anubis-miner.c -lcrypto -o anubis-miner
$ time ./anubis-miner 5d737f0600ff2dd
47224

real    0m0.005s
user    0m0.003s
sys     0m0.002s

Pretty favorably!

Solving in Factor using C approach

Part of the reason the C approach is fast, is that it hashes the message and then only has to hash the additional bytes of the nonce and check the result passes. We can try this, by cloning our sha-256-state and checking on each iteration whether it passes the test:

USING: checksums checksums.sha kernel math math.parser sequences ;

: find-anubis ( message -- nonce anubis )
    sha-256 initialize-checksum-state swap add-checksum-bytes
    18 2^ <iota> [
        [ clone ] dip >dec add-checksum-bytes
        get-checksum [ B{ 0 0 } head? ] keep f ?
    ] with map-find swap ;

Is that faster?

IN: scratchpad [ "5d737f0600ff2dd" find-anubis ] time
Running time: 0.183254613 seconds

A little bit. But what’s the problem?

IN: scratchpad [ "5d737f0600ff2dd" find-anubis ] profile

IN: scratchpad top-down profile.
depth   time ms  GC %  JIT %  FFI %   FT %
  13     183.0   0.00   0.00  13.11   0.00 T{ thread f "Initial" ~quotation~ ~quotation~ 19 ~box~ f t f H{...
  14     149.0   0.00   0.00   8.72   0.00   M\ sha-256-state get-checksum
  15      85.0   0.00   0.00   2.35   0.00     M\ sha2-short checksum-block
  16      27.0   0.00   0.00   7.41   0.00       4be>
  17      10.0   0.00   0.00   0.00   0.00         M\ virtual-sequence nth-unsafe
  18       8.0   0.00   0.00   0.00   0.00           M\ slice virtual@
  19       6.0   0.00   0.00   0.00   0.00             +
  17       6.0   0.00   0.00   0.00   0.00         M\ fixnum shift
  17       4.0   0.00   0.00  50.00   0.00         fixnum-shift
  17       4.0   0.00   0.00   0.00   0.00         M\ byte-array nth-unsafe
  17       1.0   0.00   0.00   0.00   0.00         bitor
  17       1.0   0.00   0.00   0.00   0.00         >
  17       1.0   0.00   0.00   0.00   0.00         M\ slice length
  16       7.0   0.00   0.00   0.00   0.00       M\ fixnum integer>fixnum
  16       5.0   0.00   0.00   0.00   0.00       M\ fixnum >fixnum
  16       4.0   0.00   0.00   0.00   0.00       be>
  17       2.0   0.00   0.00   0.00   0.00         M\ slice length
  16       2.0   0.00   0.00   0.00   0.00       (byte-array)
  16       2.0   0.00   0.00   0.00   0.00       M\ slice length
  16       1.0   0.00   0.00   0.00   0.00       c-ptr?
  16       1.0   0.00   0.00   0.00   0.00       M\ fixnum integer>fixnum-strict
  15      35.0   0.00   0.00  25.71   0.00     pad-last-block
  16      26.0   0.00   0.00  26.92   0.00       %
  17       7.0   0.00   0.00  85.71   0.00         set-alien-unsigned-1
  17       7.0   0.00   0.00   0.00   0.00         M\ growable nth-unsafe
  18       1.0   0.00   0.00   0.00   0.00           M\ byte-vector underlying>>
  17       4.0   0.00   0.00   0.00   0.00         M\ growable set-nth-unsafe
  18       1.0   0.00   0.00   0.00   0.00           M\ byte-vector underlying>>
  17       2.0   0.00   0.00  50.00   0.00         resize-byte-array
  17       1.0   0.00   0.00   0.00   0.00         M\ growable lengthen
  18       1.0   0.00   0.00   0.00   0.00           M\ byte-vector length>>
  17       1.0   0.00   0.00   0.00   0.00         M\ growable length
  17       1.0   0.00   0.00   0.00   0.00         M\ byte-array set-nth-unsafe
  17       1.0   0.00   0.00   0.00   0.00         integer?
  16       5.0   0.00   0.00   0.00   0.00       >slow-be
  17       2.0   0.00   0.00   0.00   0.00         M\ fixnum shift
  17       1.0   0.00   0.00   0.00   0.00         fixnum-shift
  17       1.0   0.00   0.00   0.00   0.00         M\ fixnum integer>fixnum
  16       1.0   0.00   0.00 100.00   0.00       <byte-array>
  16       1.0   0.00   0.00 100.00   0.00       set-alien-unsigned-1
  16       1.0   0.00   0.00   0.00   0.00       M\ growable set-nth
  17       1.0   0.00   0.00   0.00   0.00         M\ byte-vector underlying>>
  16       1.0   0.00   0.00   0.00   0.00       ,
  17       1.0   0.00   0.00   0.00   0.00         assoc-stack
  18       1.0   0.00   0.00   0.00   0.00           M\ hashtable at*
  15      16.0   0.00   0.00   6.25   0.00     >slow-be
  16       8.0   0.00   0.00   0.00   0.00       M\ fixnum shift
  16       2.0   0.00   0.00  50.00   0.00       (byte-array)
  16       1.0   0.00   0.00   0.00   0.00       fixnum-shift
  16       1.0   0.00   0.00   0.00   0.00       <
  16       1.0   0.00   0.00   0.00   0.00       M\ fixnum integer>fixnum
  15       3.0   0.00   0.00   0.00   0.00     M\ chunking nth-unsafe
  16       2.0   0.00   0.00   0.00   0.00       M\ groups group@
  17       1.0   0.00   0.00   0.00   0.00         M\ fixnum min
  17       1.0   0.00   0.00   0.00   0.00         *
  15       2.0   0.00   0.00   0.00   0.00     >be
  15       1.0   0.00   0.00   0.00   0.00     M\ sha2-state H>>
  15       1.0   0.00   0.00 100.00   0.00     fixnum/i
  15       1.0   0.00   0.00   0.00   0.00     M\ sha2-state clone
  16       1.0   0.00   0.00   0.00   0.00       M\ sha2-state H<<
  15       1.0   0.00   0.00   0.00   0.00     M\ uint-array nth-unsafe
  14      23.0   0.00   0.00  30.43   0.00   M\ block-checksum-state add-checksum-bytes
  15      18.0   0.00   0.00  27.78   0.00     >byte-vector
  16       7.0   0.00   0.00   0.00   0.00       M\ virtual-sequence nth-unsafe
  17       1.0   0.00   0.00   0.00   0.00         M\ slice virtual@
  18       1.0   0.00   0.00   0.00   0.00           +
  16       5.0   0.00   0.00  80.00   0.00       set-alien-unsigned-1
  16       2.0   0.00   0.00   0.00   0.00       M\ byte-array nth-unsafe
  16       2.0   0.00   0.00   0.00   0.00       M\ growable nth-unsafe
  17       2.0   0.00   0.00   0.00   0.00         M\ byte-vector underlying>>
  16       1.0   0.00   0.00 100.00   0.00       (byte-array)
  16       1.0   0.00   0.00   0.00   0.00       M\ slice length
  15       2.0   0.00   0.00 100.00   0.00     fixnum/i
  15       1.0   0.00   0.00   0.00   0.00     M\ string nth-unsafe
  15       1.0   0.00   0.00   0.00   0.00     >
  15       1.0   0.00   0.00   0.00   0.00     number=
  14       4.0   0.00   0.00   0.00   0.00   head?
  15       1.0   0.00   0.00   0.00   0.00     M\ byte-array length
  15       1.0   0.00   0.00   0.00   0.00     >
  15       1.0   0.00   0.00   0.00   0.00     M\ byte-array nth-unsafe
  15       1.0   0.00   0.00   0.00   0.00     integer?
  14       3.0   0.00   0.00  66.67   0.00   M\ fixnum positive>dec
  15       2.0   0.00   0.00 100.00   0.00     <string>
  14       2.0   0.00   0.00 100.00   0.00   M\ sha2-state clone
  15       1.0   0.00   0.00 100.00   0.00     M\ checksum-state clone
  16       1.0   0.00   0.00 100.00   0.00       (clone)
  15       1.0   0.00   0.00 100.00   0.00     M\ uint-array clone
  16       1.0   0.00   0.00 100.00   0.00       (clone)
  14       1.0   0.00   0.00   0.00   0.00   M\ integer >base
  14       1.0   0.00   0.00   0.00   0.00   reverse!

Visualizing the profile using the flamegraph vocabulary allows us to dig a little bit further:

Looks like a lot of generic dispatch, inefficient byte swapping, memory allocations, and type conversions. Probably this could be made much faster by looking into how we handle block checksums.

PRs welcome!

Left to Right

Tue, 19 Aug 2025 08:00:00 -0700

An article about Left to Right Programming was posted a few days ago with a good discussions on Hacker News and on Lobsters. It’s a nice read with some syntax examples in different languages looking at some code blocks that are structured left-to-right or right-to-left.

We can look at a few of the shared examples and think about how they might look like naturally in Factor, which inherits a natural data flow style due to the nature of being a concatenative language.

The Challenge

The blog post discusses Graic’s 2024 Advent of Code solution, written in Python:

len(list(filter(lambda line: all([abs(x) >= 1 and abs(x) <= 3 for x in line]) and (all([x > 0 for x in line]) or all([x < 0 for x in line])), diffs)))

And compares it to an equivalent improved form in JavaScript:

diffs.filter(line => 
    line.every(x => Math.abs(x) >= 1 && Math.abs(x) <= 3) &&
    (line.every(x => x > 0) || line.every(x => x < 0))
).length;

There’s nothing quite like syntax wars – the nerd version of the linguistic wars – to get people interested in a topic. It is only one dimension, but perhaps the most visible one, to evaluate a programming language on.

I usually get excited for any code that solves a problem, and I give kudos to Graic for their efforts solving the Advent of Code! It’s often only working code that we can make iterative improvements upon, and that should be appreciated.

The Response

On Hacker News, someone shared a version using Python’s list comprehensions:

len([line for line in diffs
     if all(1 <= abs(x) <= 3 for x in line)
     and (all(x > 0 for x in line) or all(x < 0 for x in line))])

As well as a direct translation in Rust:

diffs.iter().filter(|line| {
    line.iter().all(|x| x.abs() >= 1 && x.abs() <= 3) &&
    (line.iter().all(|x| x > 0) || line.iter().all(|x| x < 0))
}).count()

And a single pass version in Rust with improved performance:

diffs.iter().filter(|line| {
    let mut iter = line.iter();
    let range = match iter.next() {
        Some(-3..=-1) => -3..=-1,
        Some(1..=3) => 1..=3,
        Some(_) => return false,
        None => return true,
    };
    iter.all(|x| range.contains(x))
}).count()

Someone else shared a version using numpy arrays:

sum(1 for line in diffs
    if ((np.abs(line) >= 1) & (np.abs(line) <= 3)).all()
       and ((line > 0).all() or (line < 0).all()))

And another comment shared a version in Kotlin:

diffs.countIf { line -> 
    line.all { abs(it) in 1..3 } and ( 
        line.all { it > 0} or
        line.all { it < 0}
    )
}

There was also a shared version in Python perhaps a bit more idiomatic:

sum(map(lambda line: all(1 <= abs(x) <= 3 for x in line)
                     and (all(x > 0 for x in line) or all(x < 0 for x in line)),
        diffs))

What about Factor?

As you might imagine, I was also curious about what this would look like in Factor.

Directly translating using local variables does up to three passes through the line:

[| line |
    line [ abs 1 3 between? ] all?
    line [ 0 > ] all?
    line [ 0 < ] all? or and
] count

However, it would be better if we only check against zero if the first check passes:

[| line |
    line [ abs 1 3 between? ] all? [
        line [ 0 > ] all?
        line [ 0 < ] all? or
    ] [ f ] if
] count

And, despite still being three passes, it is better if we only check negative if the positive check fails:

[| line |
    line [ abs 1 3 between? ] all? [
        line [ 0 > ] all? [ t ] [
            line [ 0 < ] all?
        ] if
    ] [ f ] if
] count

We can do these short-circuiting checks using a short-circuit combinator:

[
    {
        [ [ abs 1 3 between? ] all? ]
        [ { [ [ 0 > ] all? ] [ [ 0 < ] all? ] } 1|| ]
    } 1&&
] count

Checking that the sign of all the numbers are the same only does two passes through the line:

[| line
    line empty? [ t ] [
        line [ abs 1 3 between? ] all?
        line unclip sgn '[ sgn _ = ] all? and
    ] if
] count

Comparing the first value to subsequent values does a single pass through the line:

[
    [ t ] [
        unclip { [ abs 1 3 between? ] [ sgn ] } 1&& [
            '[ { [ abs 1 3 between? ] [ sgn ] } 1&& _ = ] all?
        ] [ drop f ] if*
    ] if-empty
] count

We often encourage writing combinators to do algorithmic things:

:: all-same? ( seq quot: ( elt -- obj/f ) -- ? )
    seq [ t ] [
        unclip quot call [ '[ quot call _ = ] all? ] [ drop f ] if*
    ] if-empty ; inline

Which makes for a satisfyingly simple version:

[ [ { [ abs 1 3 between? ] [ sgn ] } 1&& ] all-same? ] count

We could even do something like the Rust version above, getting the endpoints from the first value to check that the subsequent ones match:

[
    [ t ] [
        unclip {
            { [ dup 1 3 between? ] [ drop 1 3 ] }
            { [ dup -3 -1 between? ] [ drop -3 -1 ] }
            [ drop f f ]
        } cond [ '[ _ _ between? ] all? ] [ nip ] if*
    ] if-empty
] count

And that simplifies even more if we use range syntax:

[
    [ t ] [
        unclip { 1 ..= 3 -3 ..= -1 } [ in? ] with find nip
        [ '[ _ in? ] all? ] [ drop f ] if*
    ] if-empty
] count

As usual, there is more than one way to do it, and that’s okay.

Are any of these best? How else might we write this better?

Pickle

Mon, 18 Aug 2025 08:00:00 -0700

Pretty much everything pickle is great: sweet, dill, bread and butter, full sour, half sour, gherkins, achar, even pickleball. In addition to being both yummy and fun and a great Tuesday night on the Playa, pickle is also the name for Python object serialization.

There are currently 6 different protocols which can be used for pickling. The higher the protocol used, the more recent the version of Python needed to read the pickle produced.

Protocol version 0 is the original “human-readable” protocol and is backwards compatible with earlier versions of Python.

Protocol version 1 is an old binary format which is also compatible with earlier versions of Python.

Protocol version 2 was introduced in Python 2.3. It provides much more efficient pickling of new-style classes. Refer to PEP 307 for information about improvements brought by protocol 2.

Protocol version 3 was added in Python 3.0. It has explicit support for bytes objects and cannot be unpickled by Python 2.x. This was the default protocol in Python 3.0–3.7.

Protocol version 4 was added in Python 3.4. It adds support for very large objects, pickling more kinds of objects, and some data format optimizations. It is the default protocol starting with Python 3.8. Refer to PEP 3154 for information about improvements brought by protocol 4.

Protocol version 5 was added in Python 3.8. It adds support for out-of-band data and speedup for in-band data. Refer to PEP 574 for information about improvements brought by protocol 5.

While recently learning about how the pickle protocol works, I was able to build a basic unpickler in Factor. The implementation is about 300 lines of code, and has some decent tests. There are a few more features we should add for completeness, but it’s a good start!

I thought I’d go over a few parts of the implementation here.

The pickle protocol is stack-based, which we represent by a growable vector, and uses a memoization feature to refer to objects by integer keys when they repeat in the data stream, which we store in a hashtable:

CONSTANT: stack V{ }

CONSTANT: memo H{ }

ERROR: invalid-memo key ;

: get-memo ( i -- )
    memo ?at [ stack push ] [ invalid-memo ] if ;

: put-memo ( i -- )
    [ stack last ] dip memo set-at ;

It also has the concept of markers which can be placed using the +marker+ symbol and then used, for example, to pop all items on the stack until the last marker was seen:

SYMBOL: +marker+

: pop-from-marker ( -- items )
    +marker+ stack last-index
    [ 1 + stack swap tail ] [ stack shorten ] bi ;

Unpickling starts with a dispatch loop that acts on each supported opcode. We can use a +no-return+ symbol to indicate that we are not ready to return an object until the STOP opcode is seen.

ERROR: invalid-opcode opcode ;

SYMBOL: +no-return+

: unpickle-dispatch ( opcode -- value )
    +no-return+ swap {
        ! Protocol 0 and 1
        { CHAR: ( [ load-mark ] }
        { CHAR: . [ drop stack pop ] }
        { CHAR: 0 [ load-pop ] }
        { CHAR: 1 [ load-pop-mark ] }
        { CHAR: 2 [ load-dup ] }
        { CHAR: F [ load-float ] }
        { CHAR: I [ load-int ] }
        { CHAR: J [ load-binint ] }
        { CHAR: K [ load-binint1 ] }
        { CHAR: L [ load-long ] }
        { CHAR: M [ load-binint2 ] }
        { CHAR: N [ load-none ] }
        { CHAR: P [ load-persid ] }
        { CHAR: Q [ load-binpersid ] }
        { CHAR: R [ load-reduce ] }
        { CHAR: S [ load-string ] }
        { CHAR: T [ load-binstring ] }
        { CHAR: U [ load-short-binstring ] }
        { CHAR: V [ load-unicode ] }
        { CHAR: X [ load-binunicode ] }
        { CHAR: a [ load-append ] }
        { CHAR: b [ load-build ] }
        { CHAR: c [ load-global ] }
        { CHAR: d [ load-dict ] }
        { CHAR: } [ load-empty-dict ] }
        { CHAR: e [ load-appends ] }
        { CHAR: g [ load-get ] }
        { CHAR: h [ load-binget ] }
        { CHAR: i [ load-inst ] }
        { CHAR: j [ load-long-binget ] }
        { CHAR: l [ load-list ] }
        { CHAR: ] [ load-empty-list ] }
        { CHAR: o [ load-obj ] }
        { CHAR: p [ load-put ] }
        { CHAR: q [ load-binput ] }
        { CHAR: r [ load-long-binput ] }
        { CHAR: s [ load-setitem ] }
        { CHAR: t [ load-tuple ] }
        { CHAR: ) [ load-empty-tuple ] }
        { CHAR: u [ load-setitems ] }
        { CHAR: G [ load-binfloat ] }

        ! Protocol 2
        { 0x80 [ load-proto ] }
        { 0x81 [ load-newobj ] }
        { 0x82 [ load-ext1 ] }
        { 0x83 [ load-ext2 ] }
        { 0x84 [ load-ext4 ] }
        { 0x85 [ load-tuple1 ] }
        { 0x86 [ load-tuple2 ] }
        { 0x87 [ load-tuple3 ] }
        { 0x88 [ load-true ] }
        { 0x89 [ load-false ] }
        { 0x8a [ load-long1 ] }
        { 0x8b [ load-long4 ] }

        ! Protocol 3 (Python 3.x)
        { CHAR: B [ load-binbytes ] }
        { CHAR: C [ load-short-binbytes ] }

        ! Protocol 4 (Python 3.4-3.7)
        { 0x8c [ load-short-binunicode ] }
        { 0x8d [ load-binunicode8 ] }
        { 0x8e [ load-binbytes8 ] }
        { 0x8f [ load-empty-set ] }
        { 0x90 [ load-additems ] }
        { 0x91 [ load-frozenset ] }
        { 0x92 [ load-newobj-ex ] }
        { 0x93 [ load-stack-global ] }
        { 0x94 [ load-memoize ] }
        { 0x95 [ load-frame ] }

        ! Protocol 5 (Python 3.8+)
        { 0x96 [ load-bytearray8 ] }
        { 0x97 [ load-readonly-buffer ] }
        { 0x98 [ load-next-buffer ] }

        [ invalid-opcode ]
    } case ;

With that, we can build our unpickle word that acts on an input-stream, first clearing state and then looping until we see an object to return:

: unpickle ( -- obj )
    stack delete-all memo clear-assoc
    f [ drop read1 unpickle-dispatch dup +no-return+ = ] loop ;

For convenience, a pickle> word acts on concrete data:

GENERIC: pickle> ( string -- obj )

M: string pickle> [ unpickle ] with-string-reader ;

M: byte-array pickle> binary [ unpickle ] with-byte-reader ;

In addition, we needed to support Python’s string escapes which are slightly different than the ones that Factor defines – mainly \u#### and \U########, and then add support for some of the basic class types that we might encounter such as byte-arrays, decimals, timestamps, etc.

We currently do not support: persistent id’s, readonly vs read/write buffers, out-of-band buffers, the object build opcode, and the extension registry. And of course, this is just unpickling, we do not yet support pickling of Factor objects, although that shouldn’t be too hard to add.

But, despite that, it works pretty well!

Here’s an example where we store some mixed data in a pickles file with Python:

>>> data = ["abc", 123, 4.56, {"a":1+5j}, {17,37,52}]

>>> import pickle

>>> with open('pickles', 'wb') as f:
...     pickle.dump(data, f)
...

And then look at and then load that pickles file with Factor!

IN: scratchpad USE: tools.hexdump

IN: scratchpad "pickles" hexdump-file
00000000  80 04 95 54 00 00 00 00 00 00 00 5d 94 28 8c 03  ...T.......].(..
00000010  61 62 63 94 4b 7b 47 40 12 3d 70 a3 d7 0a 3d 7d  abc.K{G@.=p...=}
00000020  94 8c 01 61 94 8c 08 62 75 69 6c 74 69 6e 73 94  ...a...builtins.
00000030  8c 07 63 6f 6d 70 6c 65 78 94 93 94 47 3f f0 00  ..complex...G?..
00000040  00 00 00 00 00 47 40 14 00 00 00 00 00 00 86 94  .....G@.........
00000050  52 94 73 8f 94 28 4b 11 4b 34 4b 25 90 65 2e     R.s..(K.K4K%.e.
0000005f

IN: scratchpad USE: pickle

IN: scratchpad "pickles" binary file-contents pickle> .
V{ "abc" 123 4.56 H{ { "a" C{ 1.0 5.0 } } } HS{ 17 52 37 } }

This is available in the latest development version.

Marp

Fri, 15 Aug 2025 08:00:00 -0700

Marp, also known as the Markdown Presentation Ecosystem, is a way to “create beautiful slide decks using an intuitive Markdown experience”.

If you’ve seen a presentation about Factor before, you might notice that we have a slides vocabulary that allows us to build presentations in Factor and then present it using the Factor UI. Many of our previous talks are shared in the factor-talks repository, including the slides for the SVFIG talk.

Today, I thought it would be fun to merge these two concepts together, and allow us to build slides in Factor but do the presentation using Marp.

What does a Factor slide look like?

Let’s start by examining a $slide, and see what it looks like:

{ $slide "Quotations"
    "Quotation: un-named blocks of code"
    { $code "[ \"Hello, World\" print ]" }
    "Combinators: words taking quotations"
    { $code "10 dup 0 < [ 1 - ] [ 1 + ] if ." }
    { $code "{ -1 1 -2 0 3 } [ 0 max ] map ." }
}

As with most businessy slides, it starts with a title, and then has a sequence of various blocks to render.

What would a Marp slide look like?

We can manually translate this to a similar-looking slide using Markdown:

---

# Quotations
- Quotation: un-named blocks of code
```factor
[ "Hello, World" print ]
```

- Combinators: words taking quotations
```factor
10 dup 0 < [ 1 - ] [ 1 + ] if .
```

```factor
{ -1 1 -2 0 3 } [ 0 max ] map .
```

Can we automate this?

Of course!

Our slides vocabulary uses elements from the help system to provide markup (which is how we render it in the Factor user interface). These elements are specified as a kind of array, with typing provided by their first argument.

We can leverage this to manually support a few types:

GENERIC: write-marp ( element -- )

M: string write-marp write ;

M: array write-marp
    unclip {
        { \ $slide [ write-slide ] }
        { \ $code [ write-code ] }
        { \ $link [ write-link ] }
        { \ $vocab-link [ write-vocab-link ] }
        { \ $url [ write-url ] }
        { \ $snippet [ write-snippet ] }
        [ write-marp [ write-marp ] each ]
    } case ;

Using that, we can create a Marp file, with some chosen style:

: write-marp-file ( slides -- )
    "---
marp: true
theme: gaia
paginate: true
backgroundColor: #1e1e2e
color: #cdd6f4
style: |
  section {
    font-family: 'SF Pro Display', 'Segoe UI', sans-serif;
  }
  h1 {
    color: #89b4fa;
  }
  h2 {
    color: #94e2d5;
  }
  h3 {
    color: #f5c2e7;
  }
  code {
    background-color: #313244;
    color: #cdd6f4;
    border-radius: 0.25em;
  }
  pre {
    background-color: #313244;
    border-radius: 0.5em;
  }
  ul {
    list-style: none;
    padding-left: 0;
  }
  ul li::before {
    content: \"▸ \";
    color: #89b4fa;
    font-weight: bold;
  }" print [ write-marp ] each ;

And now we can use that to generate a Marp file of our talk!

IN: scratchpad "~/FACTOR.md" utf8 [
                   svfig-slides write-marp-file
               ] with-file-writer

And then use the Marp CLI to convert it to HTML and open in a browser!

$ marp FACTOR.md
[  INFO ] Converting 1 markdown...
[  INFO ] FACTOR.md => FACTOR.html

$ open -a Safari FACTOR.html

And then view the slides, embedded below for convenience:

This is available on my GitHub.

Data Formats

Thu, 31 Jul 2025 08:00:00 -0700

A data format is a standardized way of encoding, storing, and representing data, allowing different software applications to interpret and process it. I was reminded of this recently when a link was shared to From XML to JSON to CBOR which discusses three pivotal data formats and their evolution.

Some data formats that Factor supports in its extensive standard library:

Bencode or BitTorrent encoding
BSON or Binary JSON
CBOR or Concise Binary Object Representation
CSV or Comma-separated values
JSON or JavaScript Object Notation
MessagePack – It’s like JSON, but fast and small
TOML or Tom’s Obvious Minimal Language
TXON or Text Object Notation
XML or Extensible Markup Language
YAML or Yet Another Markup Language

Most of these are general purpose and can encode most basic object types, including nested structures. With some exceptions – for example csv doesn’t support nesting, txon supports only string keys and values, and xml requires some manual object-to-XML conversion.

In any event, here is an example showing data that round-trips through seven different data formats:

IN: scratchpad LH{
                   { "name" "Factor" }
                   { "age" 22 }
                   { "list" { 4 8 15 16 23 42 } }
                   { "map" { LH{ { "one" 1 } { "two" 2 } } } }
               } [
                   >json json>
                   >msgpack msgpack>
                   >toml toml>
                   >cbor cbor>
                   >bson bson>
                   >bencode bencode>
                   >yaml yaml>
               ] keep = .
t

There are two more data formats that might not be obvious, but also round-trip:

The serialize vocabulary: object>bytes bytes>object
The prettyprint vocabulary: [ unparse ] without-limits eval( -- obj )

And there are probably many more useful ones we could add to the standard library. For example, Zig has a new data format I’d love to support someday called Zon or Zig Object Notation.

PR’s welcome!

Tribonacci Numbers

Thu, 24 Jul 2025 08:00:00 -0700

Tribonacci numbers are a 3 argument version of the Fibonacci sequence. Specifically, they are defined by the first 3 numbers in the sequence and then a recursive formula:

T(0) = 0
T(1) = 1
T(2) = 1
T(n) = T(n-1) + T(n-2) + T(n-3)

Note: Some other tribonacci sequences have different starting values, and in general are defined by their first 3 values: a, b, and c — in this case: a=0, b=1, c=1.

Let’s go ahead and build this in Factor, making sure to use memoization to improve performance!

MEMO: tribonacci ( n -- r )
    {
        { 0 [ 0 ] }
        { 1 [ 1 ] }
        { 2 [ 1 ] }
        [ 3 2 1 [ - tribonacci ] tri-curry@ tri + + ]
    } case ;

And write some tests to make sure it works:

{ 0 } [ 0 tribonacci ] unit-test
{ 1 } [ 1 tribonacci ] unit-test
{ 1 } [ 2 tribonacci ] unit-test
{ 2 } [ 3 tribonacci ] unit-test

{ 98079530178586034536500564 } [ 100 tribonacci ] unit-test

And just for fun, we can easily print the 10,000th tribonacci number:

IN: scratchpad 10,000 tribonacci .
104970887945512179720156402562881884282669908701507376396226849
507715880964253502566653834053217081868282654421270875998847083
354963158781389172213221498323513985337426272701217011859592971
352141098891805206420900134582429646587046427436005495255798883
246073618031475000139396853427893513436406142049154715999765086
094048789520150380301629928195007762620367936007950031281567424
852796946143324764098612546824050145564601448681311727789904337
617216196640219842106590802021786745819241462665059249132141577
894579630772922197404756269452287431128951784520600882264401580
836789430310052922795100107599314150699335030135447534365101972
212856424810302264081751728175450009246816642287808303756431642
190229866618955021220164854376449799003570929253752356712382127
291554067625436999262482815025295032640196317590693733687242877
497883233594054067312780455244938306491277888412407808533948897
005235795246966581196760639279751925372382092157588570477362227
283625689293926489206546461529591280578644740130681436616270947
399935554399753037011040986396134560464682737227548122835065232
761643277488106437767837022869773684192764695319605886553868567
702494744004565987231502563031894899087422143067624588247239246
209236949388263940118095456771744163270015460183378322484380736
904832346845106349873481841268013335459411318711113804921252701
685818949515305915278859806703369590630798205066890114557299609
200541159049091178364828975083353240663240973018961078785884623
754180387714833134965646472067988582492257260500195932241449015
697681978518891318647464646160278703772005509287994250063276822
738895989306087163591588324360743585286169954643632915363025595
643953167630122098912124456056268912098768342566371268760363924
442889888379550780367698680933768551834171582391185682035582704
931276111575092809048515510639048207408965825151544995661087361
202117757159833101907264820660711114748638250941161679012607717
553825173370181059141538350424453457744455331742600716423279982
267989192083554417406081238679478603607583176975551563763168876
520982495090528860247780663854311039370938787207531507059750255
331836155899138120278616825251002413907470627146790383955760350
784999437437547112858192681843997153570155437021076042591359601
550872111063258125973939212682654169479750147837067955643100320
187644616709502788318461621953487535087913200131635597400731137
771945144547828956806094091586378911072748574105653223033633806
849090649589109247111615640029564835225683410527907839461600211
731583707928939187244387112961247901688359209735935698701057372
453283021738155105525668197910700091097683706470201003266177476
093728339123616912247097871463717510180588398935948725584082432
8

And that, of course, shows off our bignum integers holding almost 8,800 bits.

But, if we want to calculate the 100,000th tribonacci number, we get a retain stack overflow. This error depends on how many recursions have to be performed before a previously memoized answer is found:

IN: scratchpad 100,000 tribonacci
Retain stack overflow

Type :help for debugging help.

Instead, we would need to approach it by calculating interim values:

IN: scratchpad 10 [1..b] [
                   10,000 * dup tribonacci log2
                   "%s = %s\n" printf
               ] each
10000 = 8789
20000 = 17581
30000 = 26372
40000 = 35164
50000 = 43955
60000 = 52747
70000 = 61538
80000 = 70330
90000 = 79121
100000 = 87913

An iterative implementation might do more computation when called repeatedly, due to lacking memoization, but gets us directly to the answer we want:

: tribonacci ( n -- r )
    {
        { 0 [ 0 ] }
        { 1 [ 1 ] }
        { 2 [ 1 ] }
        [ [ 0 1 1 ] dip 2 - [ [ + + ] 2keep rot ] times 2nip ]
    } case ;

We can see that the millionth tribonacci number has almost 880,000 bits!

IN: scratchpad 10,000 tribonacci log2 .
8789

IN: scratchpad 100,000 tribonacci log2 .
87913

IN: scratchpad 1,000,000 tribonacci log2 .
879144

We could combine memoization and iterative computation by implementing an inline cache:

:: tribonacci ( n -- r )
    V{ 0 1 1 } :> seq
    n seq ?nth [
        n 1 + next-power-of-2 dup
        seq capacity > [ seq expand ] [ drop ] if
        seq 3 tail-slice* first3 n seq length 1 - -
        [ [ + + ] 2keep rot dup seq push ] times 2nip
    ] unless* ;

Now subsequent calls are fast, at the expense of a possibly large cache!

! First computation is slow
IN: scratchpad [ 100,000 tribonacci ] time .
Running time: 0.211418458 seconds

! Second computation is fast (cached)
IN: scratchpad [ 100,000 tribonacci ] time .
Running time: 0.000272875 seconds

! Other requests in range are fast (cached)
IN: scratchpad [ 99,999 tribonacci ] time .
Running time: 0.000344792 seconds

In this case, the cache now holds 100,000 integers in about 550 MB of memory.

IN: scratchpad 100,000 <iota> [ tribonacci size ] map-sum megs
543.9405975341797 MB

Nifty!

Tak Function

Wed, 23 Jul 2025 08:00:00 -0700

The Tak function is a recursive function, named after Ikuo Takeuchi, used sometimes as a programming language benchmark. On the Concatenative Wiki, I noticed a page with concatenative implementations of the Tak function, and wanted to explore some different ways to build it in Factor.

def tak(x: int, y: int, z: int) -> int:
    if y < x:
        return tak(
            tak(x - 1, y, z),
            tak(y - 1, z, x),
            tak(z - 1, x, y)
        )
    else:
        return y

Note: there are two variants of this function defined, some return z instead of y.

Some parameters cause an extremely large number of recursions. For example, tak(18, 12, 25) makes 76,527,789 function calls! Measuring the execution time demonstrates that:

In [23]: %time tak(18, 12, 25)
CPU times: user 1.79 s, sys: 4.89 ms, total: 1.79 s
Wall time: 1.79 s

Out [23]: 25

A direct translation of the Python code might look like this:

:: tak ( x y z -- result )
    y x < [
        x 1 - y z tak
        y 1 - z x tak
        z 1 - x y tak
        tak
    ] [ y ] if ;

And we can see that it works and is almost 6x faster than Python.

IN: scratchpad [ 18 12 25 tak ] time .
Running time: 0.307333458 seconds

25

But, we could get all stacky and build it like this:

: tak ( x y z -- result )
    2over > [
        [ [ 1 - ] 2dip tak ] 3keep
        [ [ 1 - ] dip rot tak ] 3keep
        1 - -rot tak tak
    ] [ drop nip ] if ;

Or, cleavy and build it like this:

: tak ( x y z -- result )
    2over > [
        {
            [ [ 1 - ] 2dip tak ]
            [ [ 1 - ] dip rot tak ]
            [ 1 - -rot tak ]
        } 3cleave tak
    ] [ drop nip ] if ;

Or, composey and build it like this:

: tak ( x y z -- result )
    2over > [
        [ ] [ rot ] [ -rot ] [ '[ @ [ 1 - ] 2dip tak ] ] tri@ 3tri tak
    ] [ drop nip ] if ;

Or, reduce some unnecessary recursion with a more efficient version:

:: tak ( x! y! z! -- result )
    [ x y > ] [
        x y :> ( oldx oldy )
        x 1 - y z tak x!
        y 1 - z oldx tak y!
        x y <= [ z 1 - oldx oldy tak z! ] unless
    ] while y ;

But, it turns out there is a simpler solution:

:: tak ( x y z -- result )
    {
        { [ y x >= ] [ y ] }
        { [ y z <= ] [ z ] }
        [ x ]
    } cond ;

Which could be written more concisely in one line:

:: tak ( x y z -- result )
    y x >= [ y ] [ y z <= z x ? ] if ;

And that is the fastest of all:

IN: scratchpad [ 18 12 25 tak ] time .
Running time: 0.000073916 seconds

25

Fun.

Ask OK?

Tue, 22 Jul 2025 08:00:00 -0700

To illustrate some varied aspects of programming with Factor, I wanted to show how you might write an example used in the Python documentation to demonstrate control flow and default values:

def ask_ok(prompt, retries=4, complaint='Yes or no, please!'):
    while True:
        ok = input(prompt)
        if ok in ('y', 'ye', 'yes'):
            return True
        if ok in ('n', 'no', 'nop', 'nope'):
            return False
        retries = retries - 1
        if retries < 0:
            raise IOError('invalid user response')
        print(complaint)

You can see how it works in Python:

>>> ask_ok("Continue? ")
Continue? y
True

>>> ask_ok("Continue? ")
Continue? no
False

>>> ask_ok("Continue? ")
Continue? r
Yes or no, please!
Continue? r
Yes or no, please!
Continue? r
Yes or no, please!
Continue? r
Yes or no, please!
Continue? r
Traceback (most recent call last):
  File "<python-input-1>", line 1, in <module>
    ask_ok("Continue? ")
    ~~~~~~^^^^^^^^^^^^^^
  File "<python-input-0>", line 10, in ask_ok
    raise IOError('invalid user response')
OSError: invalid user response

Direct Translation

Without focusing on the keyword arguments with default values for the moment, we could directly translate this to a similar looping implementation with all arguments provided on the stack:

:: ask-ok ( prompt retries! complaint -- ? )
    f [
        drop prompt write bl flush readln {
            { [ dup { "y" "ye" "yes" } member? ] [ drop t f ] }
            { [ dup { "n" "no" "nop" "nope" } member? ] [ drop f f ] }
            [
                retries 1 - retries!
                retries 0 < [ "invalid user response" throw ] when
                complaint print t
            ]
        } cond
    ] loop ;

And then try it out for a pretty similar result:

IN: scratchpad "Continue?" 4 "Yes or no, please!" ask-ok
Continue? y

--- Data stack:
t

IN: scratchpad "Continue?" 4 "Yes or no, please!" ask-ok
Continue? no

--- Data stack:
f

IN: scratchpad "Continue?" 4 "Yes or no, please!" ask-ok
Continue? r
Yes or no, please!
Continue? r
Yes or no, please!
Continue? r
Yes or no, please!
Continue? r
Yes or no, please!
Continue? r
invalid user response

Type :help for debugging help.

Using Namespaces

One way to get default values would be to use initialized dynamic variables, and an inner word that implements the retry logic with tail calls, while also simplifying our prefix check for each supported input variation:

INITIALIZED-SYMBOL: retries [ 4 ]
INITIALIZED-SYMBOL: complaint [ "Yes or no, please!" ]

: (ask-ok) ( prompt n -- ? )
    [ "invalid user response" throw ] [
        1 - over write bl flush readln {
            { [ "yes" over head? ] [ 3drop t ] }
            { [ "nope" over head? ] [ 3drop f ] }
            [ drop complaint get print (ask-ok) ]
        } cond
    ] if-zero ;

: ask-ok ( prompt -- ? )
    retries get (ask-ok) ;

This has the benefit that those arguments can be changed easily using the namespaces vocabulary.

Option Arguments

Another way might be to provide an options tuple, with default values:

TUPLE: ask prompt retries complaint ;

: <ask> ( prompt -- ask )
    4 "Yes or no, please!" ask boa ;

:: ask-ok ( ask -- ? )
    f [
        drop ask prompt>> write bl flush readln {
            { [ "yes" over head? ] [ drop t f ] }
            { [ "nope" over head? ] [ drop f f ] }
            [
                ask [ 1 - dup ] change-retries drop
                0 < [ "invalid user response" throw ] when
                ask complaint>> print t
            ]
        } cond
    ] loop ;

And then use it:

IN: scratchpad "Continue?" <ask> ask-ok
Continue? yes

--- Data stack:
t

Combinators

A different approach might be to use exception handling instead and separate out the logic of the trying quotation from the erroring one, and build a retrying combinator that loops and throws after n attempts:

: with-retries ( try-quot error-quot n -- )
    -rot '[
        [ _ dip ]
        [ swap 1 - [ rethrow ] [ nip @ ] if-zero ] recover t
    ] loop drop ; inline

With that we can then make our simple ask-ok word with an error class to throw the invalid input:

ERROR: invalid-user-response input ;

:: ask-ok ( prompt -- ? )
    prompt write bl flush readln {
        { [ "yes" over head? ] [ drop t ] }
        { [ "nope" over head? ] [ drop f ] }
        [ invalid-user-response ]
    } cond ;

And then try it out with the specified retry logic:

IN: scratchpad [ "Continue?" ask-ok ]
               [ "Yes or no, please?" print ] 4 with-retries
Continue? r
Yes or no, please?
Continue? r
Yes or no, please?
Continue? r
Yes or no, please?
Continue? r
invalid-user-response
input "r"

Type :help for debugging help.

This is now generic and can be used in any other place that retrying is required.

Restarts

Restartable errors is a neat feature, and we can use this to throw a restart when invalid input is provided:

:: ask-ok ( prompt -- ? )
    prompt write bl flush readln {
        { [ "yes" over head? ] [ drop t ] }
        { [ "nope" over head? ] [ drop f ] }
        [
            drop "invalid user response"
            { { "Yes" t } { "Nope" f } } throw-restarts
        ]
    } cond ;

And see that we throw a restart, and then can select one of the options:

IN: scratchpad "Continue?" ask-ok
Continue? r
invalid user response

The following restarts are available:

:1      Yes
:2      Nope

Type :help for debugging help.

IN: scratchpad :1

--- Data stack:
t

Other improvements could include splitting out the yes and nope checks into their own words:

: yes? ( input -- ? ) "yes" swap head? ;

: nope? ( input -- ? ) "nope" swap head? ;

This would allow us to simplify the code – or make it more complex without impacting the calling context – and also to write unit tests for small pieces of the overall system.

How else might you write this?

Asserting Implications

Mon, 21 Jul 2025 08:00:00 -0700

Asserting Implications is a short, but targeted, blog post by the Tigerbeetle team about writing some asserts differently – along with some interesting lobste.rs discussions. Their main conclusion was made using an example code change in Zig:

// Before:
assert(header_b != null or replica.commit_min == replica.op_checkpoint);

// After:
if (header_b == null) assert(replica.commit_min == replica.op_checkpoint);

Directly translating into Factor using local variables might look like this:

! Before
header_b not replica commit-min>> replica checkpoint>> = or t assert=

While using short-circuit combinators looks both less and more readable:

{
    [ header_b not ]
    [ replica commit-min>> replica checkpoint>> = ]
} 0|| t assert=

And writing it awkwardly with dataflow combinators looks inscrutable:

2dup [ not ] [ [ commit-min>> ] [ checkpoint>> ] bi = ] bi* or t assert=

Nesting the assertion does indeed look much better:

! After
header_b [ replica commit-min>> replica checkpoint>> assert= ] when

Alternatively, using implicit arguments on the stack that are preserved:

2dup '[ _ [ commit-min>> ] [ checkpoint>> ] bi assert= ] when

Neat idea!

World Emoji Day

Thu, 17 Jul 2025 08:00:00 -0700

Today, and every July 17th, is World Emoji Day. Kind of amazing that there have already been twelve annual global emoji celebrations! In any event, I was reminded of that by this post:

Your fate is sealed in emoji! 🎲

Roll a d20 and face your destiny on World Emoji Day! pic.twitter.com/aIyzDJB4Bw
— Dungeons & Dragons (@Wizards_DnD) July 17, 2025

We have had support in the calendar.holidays vocabulary for defining and computing on holidays, or at least the ones that we have defined already. Turns out, we were missing World Emoji Day!

Well, after adding it, we can now see:

! The last world emoji day (today)
IN: scratchpad today [ world-emoji-day ] year<= .
T{ timestamp
    { year 2025 }
    { month 7 }
    { day 17 }
}

! The next world emoji day!
IN: scratchpad today [ world-emoji-day ] year> .
T{ timestamp
    { year 2026 }
    { month 7 }
    { day 17 }
}

And we can implement the fate checker from the original post by defining our emojis:

CONSTANT: emojis $[
    "💩🤬🤡😰🤮☠️😱😭🤢🙃🙈🙉🙊👍👀🙂😀🤗😍🐲"
    >graphemes [ >string ] map
]

We could choose one at random:

IN: scratchpad emojis random .
"💩"

Or by rolling dice, and adjusting to zero-based indices:

IN: scratchpad ROLL: 1d20 1 - emojis nth .
"💩"

Crap.

One Billion Loops

Wed, 16 Jul 2025 08:00:00 -0700

The Primeagen had a great video about Language Performance Comparisons Are Junk about six months ago talking about the 1 Billion nested loop iterations benchmark that Benjamin Dicken wrote. You can find a copy of the benchmark code on GitHub.

The loops benchmark can be summarized by a version in Python:

import sys
import random

u = int(sys.argv[1])          # Get an input number from the command line
r = random.randint(0, 10000)  # Get a random number 0 <= r < 10k
a = [0] * 10000               # Array of 10k elements initialized to 0
for i in range(10000):        # 10k outer loop iterations
    for j in range(100000):   # 100k inner loop iterations, per outer loop iteration
        a[i] += j % u         # Simple sum
    a[i] += r                 # Add a random value to each element in array
print(a[r])                   # Print out a single element from the array

As a benchmark, it has some flaws, but it was fun to iterate on and I would like to compare Factor as a percentage of Zig – which was the fastest solution – and find out if Factor is faster than Zig! again.

The Zig version takes about 1.3 seconds on the computer I used for benchmarking:

$ git clone https://github.com/bddicken/languages.git

$ cd languages/loops/zig/

$ zig version
0.14.1

$ zig build-exe -O ReleaseFast code.zig

$ time ./code 100
4958365

real    0m1.292s
user    0m1.288s
sys     0m0.002s

Benjamin has some thoughts about using test runners to eliminate startup time and arrive at stable benchmark times. I’m going to ignore this for the purpose of this blog post, but it might be worth revisiting at some point to see what the steady state performance of Factor is.

Version 1

Our first implementation will be the simplest direct copy of the Python version:

:: loops-benchmark ( u -- )
    10,000 random :> r
    10,000 0 <array> :> a
    10,000 [| i |
        100,000 [| j |
            i a [ j u mod + ] change-nth
        ] each-integer
        i a [ r + ] change-nth
    ] each-integer r a nth . ;

This takes 4.7 seconds or about 3.6x of Zig.

Version 2

By default change-nth performs bounds-checking and we can notice that our indices are always within bounds, so we can use the unsafe version instead:

:: loops-benchmark ( u -- )
    10,000 random :> r
    10,000 0 <array> :> a
    10,000 [| i |
        100,000 [| j |
-            i a [ j u mod + ] change-nth
+            i a [ j u mod + ] change-nth-unsafe
        ] each-integer
-        i a [ r + ] change-nth
+        i a [ r + ] change-nth-unsafe
-    ] each-integer r a nth . ;
+    ] each-integer r a nth-unsafe . ;

This does not really speed the benchmark up, taking 4.7 seconds.

Version 3

We can improve our math dispatch by specifying that the argument is an integer:

-:: loops-benchmark ( u -- )
+TYPED:: loops-benchmark ( u: integer -- )
    10,000 random :> r
    10,000 0 <array> :> al
    10,000 [| i |
        100,000 [| j |
            i a [ j u mod + ] change-nth-unsafe
        ] each-integer
        i a [ r + ] change-nth-unsafe
    ] each-integer r a nth-unsafe . ;

This takes 4.5 seconds or about 3.5x of Zig.

Version 4

The Zig version operates on 32-bit unsigned ints, so we could enforce using fixnum integers:

-TYPED:: loops-benchmark ( u: integer -- )
+TYPED:: loops-benchmark ( u: fixnum -- )
-    10,000 random :> r
+    10,000 random { fixnum } declare :> r
    10,000 0 <array> :> a
    10,000 [| i |
        100,000 [| j |
-            i a [ j u mod + ] change-nth-unsafe
+            i a [ j u mod fixnum+fast ] change-nth-unsafe
        ] each-integer
-        i a [ r + ] change-nth-unsafe
+        i a [ r fixnum+fast ] change-nth-unsafe
    ] each-integer r a nth-unsafe . ;

This takes 3.5 seconds or about 2.7x of Zig.

Version 5

We could operate on those same 32-bit unsigned ints using specialized-arrays:

+SPECIALIZED-ARRAY: uint32_t

TYPED:: loops-benchmark ( u: fixnum -- )
    10,000 random { fixnum } declare :> r
-    10,000 0 <array> :> a
+    10,000 uint32_t <c-array> :> a
    10,000 [| i |
        100,000 [| j |
            i a [ j u mod fixnum+fast ] change-nth-unsafe
        ] each-integer
        i a [ r fixnum+fast ] change-nth-unsafe
    ] each-integer r a nth-unsafe . ;

This takes 3.1 seconds or about 2.4x of Zig.

Version 6

We could notice that the value added to the array index can be computed first and then added:

TYPED:: loops-benchmark ( u: fixnum -- )
    10,000 random { fixnum } declare :> r
    10,000 uint32_t <c-array> :> a
    10,000 [| i |
-        100,000 [| j |
-            i a [ j u mod fixnum+fast ] change-nth-unsafe
-        ] each-integer
+        100,000 <iota> [ u mod ] map-sum :> v
+        i a [ v + ] change-nth-unsafe
        i a [ r fixnum+fast ] change-nth-unsafe
    ] each-integer r a nth-unsafe . ;

This takes 2.4 seconds or about 1.8x of Zig.

Version 7

While the previous version did the same number of loops, it did fewer modifications to the array memory. It turns out that not only can the value added to the array index be computed first – it can be computed outside the loop. And once you do that, you’ll notice that each element of the array gets the same value, and you don’t need to compute the array at all. This is effectively a compiler optimization that the compiler isn’t doing and, after writing a little proof in your head, it can reduce down to:

:: loops-benchmark ( u -- )
    10,000 random 100,000 <iota> [ u mod ] map-sum + . ;

This takes 0.0003 seconds which is now 4300x faster than Zig.

Conclusion

I’ve always thought of Factor as able to have about 2x-4x the performance of C with reasonable looking generic and dynamic code. This depends somewhat on which benchmark is being considered, and on occasion we can get as fast as C.

We can easily profile code using the sampling profiler, visualize profiles using the flamegraph vocabulary, print optimized output from the compiler, as well as disassemble words to investigate the actual machine code that our optimizing compiler generates.

In addition to all the open issues about performance, we have an open issue to improve fixnum iteration that would likely help this benchmark and I hope to someday get resolved. And, there are likely many other improvements we could make to our use of tagged fixnum and integer unions or generic dispatch to improve the un-typed arithmetic in the examples above.

Some interesting results on the relative value of different Factor optimizations!

Command Arguments

Thu, 10 Jul 2025 08:00:00 -0700

A question was asked recently on the Factor mailing list about the Argument Parser that I had previously implemented in Factor:

I have been trying to hack on command-line.parser to add the ability to call it with commands.

The specific feature they want is similar to the ArgumentParser.add_subparsers function in Python’s argparse module. I spent a little bit of time thinking about a quick implementation that can get us started, and applied this patch to support commands.

Here’s their example of a MAIN with two commands with different options using with-commands:

MAIN: [
    H{
        {
            "add"
            {
                T{ option
                    { name "a" }
                    { type integer }
                    { #args 1 }
                }
            }
        }
        {
            "subtract"
            {
                T{ option
                    { name "s" }
                    { type integer }
                    { #args 1 }
                }
            }
        }
    } [ ] with-commands
]

We currently produce no output by default when no command is specified:

$ ./factor foo.factor

The default help prints the possible commands:

$ ./factor foo.factor --help
Usage:
    factor foo.factor [--help] [command]

Arguments:
    command    {add,subtract}

Options:
    --help    show this help and exit

Or get default help for a command:

$ ./factor foo.factor add --help
Usage:
    factor foo.factor add [--help] [a]

Arguments:
    a

Options:
    --help    show this help and exit

Or print an error if the argument is not a valid command:

$ ./factor foo.factor multiply
ERROR: Invalid value 'multiply' for option 'command'

There are other features we might want to add to this including per-command metadata with a brief description of the command, support for additional top-level options besides just the command, and perhaps a different way of handling the no command case rather than empty output.

This is available in the latest developer version!

Fibonacci Style

Tue, 08 Jul 2025 08:00:00 -0700

About 14 years ago, I wrote about Fibonacci Wars which described the relative performance of three different methods of calculating Fibonacci numbers. Today, I wanted to address a style question that someone in the Factor Discord server asked:

How could I write this better?
: fib ( n -- f(n) ) 0 1 rot 1 - [ tuck + ] times nip ;
In a more concatenative style.

I’ve written before about conciseness, concatenative thinking and readability. I found this question to be a good prompt that provides another opportunity to address these topics.

Their suggested solution is an iterative one and is fairly minimal when it comes to “short code”. It uses less common shuffle words like tuck that users might not understand easily. It is probably true that even rot is more inscrutable to people coming from other languages.

Let’s look at some potential variations!

You could use simpler stack shuffling:

: fib ( n -- f(n) )
    [ 1 0 ] dip 1 - [ over + swap ] times drop ;

You could factor out the inner logic to another word:

: fib+ ( f(n-2) f(n-1) -- f(n-1) f(n) )
    [ + ] keep swap ;

: fib ( n -- f(n) )
    [ 0 1 ] dip 1 - [ fib+ ] times nip ;

You could use higher-level words like keepd:

: fib ( n -- f(n) )
    [ 1 0 ] dip 1 - [ [ + ] keepd ] times drop ;

You could use locals and use index 0 as the “first” fib number:

:: fib ( n -- f(n) )
    1 0 n [ [ + ] keepd ] times drop ;

You could write a recursive solution using memoization for improved performance:

MEMO: fib ( n -- f(n) )
    dup 2 < [ drop 1 ] [ [ 2 - fib ] [ 1 - fib ] bi + ] if ;

You could use local variables to make it look nicer:

MEMO:: fib ( n -- f(n) )
    n 2 < [ 1 ] [ n 2 - fib n 1 - fib + ] if ;

But, in many cases, beauty is in the eye of the beholder. And so you could start at a place where you find the code most readable, and that might even be something more conventional looking like this version that uses mutable locals and comments and whitespace to describe what is happening:

:: fib ( n -- f(n) )
    0 :> f(n-1)!
    1 :> f(n)!

    ! loop to calculate
    n [
        ! compute the next number
        f(n-1) f(n) + :> f(n+1)

        ! save the previous
        f(n) f(n-1)!

        ! save the next
        f(n+1) f(n)!
    ] times

    ! return the result
    f(n) ;

Are any of these clearly better than the original version?

Are there other variations we should consider?

There are often multiple competing priorities when improving code style – including readability, performance, simplicity, and aesthetics. I encourage everyone to spend some time iterating on these various axes as they learn more about Factor!

Jaro-Winkler

Sat, 21 Jun 2025 08:00:00 -0700

Jaro-Winkler distance is a measure of string similarity and edit distance between two sequences:

The higher the Jaro–Winkler distance for two strings is, the less similar the strings are. The score is normalized such that 0 means an exact match and 1 means there is no similarity. The original paper actually defined the metric in terms of similarity, so the distance is defined as the inversion of that value (distance = 1 − similarity).

There are actually two different concepts – and RosettaCode tasks – implied by this algorithm:

Jaro similarity and Jaro distance
Jaro-Winkler similarity and Jaro-Winkler distance.

Let’s build an implementation of these in Factor!

Jaro Similarity

The base that all of these are built upon is the Jaro similarity. It is calculated as a score by measuring the number of matches (m) between the strings, counting the number of transpositions divided by 2 (t), and then returning a weighted score using the formula using the lengths of each sequence (|s1| and |s2|):

In particular, it considers a matching character to be one that is found in the other string within a match distance away, calculated by the formula:

There are multiple ways to go about this, with varying performance, but I decided one longer function was simpler to understand than breaking out the steps into their own words. We use a bit-array to efficiently track which characters have been matched already as we iterate:

:: jaro-similarity ( s1 s2 -- n )
    s1 s2 [ length ] bi@       :> ( len1 len2 )
    len1 len2 max 2/ 1 [-]     :> delta
    len2 <bit-array>           :> flags

    s1 [| ch i |
        i delta [-]            :> from
        i delta + 1 + len2 min :> to

        from to [| j |
            j flags nth [ f ] [
                ch j s2 nth = dup j flags set-nth
            ] if
        ] find-integer-from
    ] filter-index

    [ 0 ] [
        [ length ] keep s2 flags [ nip ] 2filter [ = not ] 2count
        :> ( #matches #transpositions )

        #matches len1 /f #matches len2 /f +
        #matches #transpositions 2/ - #matches /f + 3 /
    ] if-empty ;

The Jaro distance is then just a subtraction:

: jaro-distance ( s1 s2 -- n )
    jaro-similarity 1.0 swap - ;

I’m curious if anyone else has a simpler implementation – please share!

Jaro-Winkler Similarity

The Jaro-Winkler similarity builds upon this by factoring in the length of the common prefix (l) times a constant scaling factor (p) that is usually set to 0.1 in most implementations I’ve seen:

We can implement this by calcuting the Jaro similarity and then computing the common prefix and then generating the result:

:: jaro-winkler-similarity ( s1 s2 -- n )
    s1 s2 jaro-similarity :> jaro
    s1 s2 min-length 4 min :> len
    s1 s2 [ len head-slice ] bi@ [ = ] 2count :> #common
    1 jaro - #common 0.1 * * jaro + ;

The Jaro-Winkler distance is again just a subtraction:

: jaro-winkler-distance ( a b -- n )
    jaro-winkler-similarity 1.0 swap - ;

Try it out

The Wikipedia article compares the similarity of FARMVILLE and FAREMVIEL:

IN: scratchpad "FARMVILLE" "FAREMVIEL" jaro-similarity .
0.8842592592592592

We can also see that the algorithm considers the transposition of two close characters to be less of a penalty than the transposition of two characters farther away from each other. It also penalizes additions and substitutions of characters that cannot be expressed as transpositions.

IN: scratchpad "My string" "My tsring" jaro-winkler-similarity .
0.9740740740740741

IN: scratchpad "My string" "My ntrisg" jaro-winkler-similarity .
0.8962962962962963

We can compare the rough performance of Julia using the same algorithm:

julia> using Random

julia> s = randstring(10_000)

julia> t = randstring(10_000)

julia> @time jarowinklerdistance(s, t)
  1.492011 seconds (108.32 M allocations: 2.178 GiB, 1.87% gc time)
0.19016926812348256

Note: I’m not a Julia developer, I just play one on TV. I adapted this implementation in Julia, which originally took over 4.5 seconds. A better developer could probably improve it quite a bit. In fact, it was pointed out that we are indexing UTF-8 String in a loop, and should instead collect the Char into a Vector first. That does indeed make it super fast.

To the implementation in Factor that we built above, which runs quite a bit faster:

IN: scratchpad USE: random.data

IN: scratchpad 10,000 random-string
               10,000 random-string
               gc [ jaro-winkler-distance ] time .
Running time: 0.259643166 seconds

0.1952856823031448

Thats not bad for a first version that uses safe indexing with unnecessary bounds-checking, generic iteration on integers when usually the indices are fixnum (something I hope to fix someday automatically), and should probably order the input sequences by length for consistency.

If we fix those problems, it gets even faster:

IN: scratchpad USE: random.data

IN: scratchpad 10,000 random-string
               10,000 random-string
               gc [ jaro-winkler-distance ] time .
Running time: 0.068086625 seconds

0.19297898770334765

This is available in the development version in the math.similarity and the math.distances vocabularies.

Best Shuffle

Wed, 18 Jun 2025 08:00:00 -0700

The “Best shuffle” is a Rosetta Code task that was not yet implemented in Factor:

Task

Shuffle the characters of a string in such a way that as many of the character values are in a different position as possible.

A shuffle that produces a randomized result among the best choices is to be preferred. A deterministic approach that produces the same sequence every time is acceptable as an alternative.

Display the result as follows:
original string, shuffled string, (score)
The score gives the number of positions whose character value did not change.

There are multiple ways to approach this problem, but the way that most solutions seem to take is to shuffle two sets of indices, and then iterate through them swapping the characters in the result if they are different.

I wanted to contribute a solution in Factor, using local variables and short-circuit combinators:

:: best-shuffle ( str -- str' )
    str clone :> new-str
    str length :> n
    n <iota> >array randomize :> range1
    n <iota> >array randomize :> range2

    range1 [| i |
        range2 [| j |
            {
                [ i j = ]
                [ i new-str nth j new-str nth = ]
                [ i str nth j new-str nth = ]
                [ i new-str nth j str nth = ]
            } 0|| [
                 i j new-str exchange
            ] unless
        ] each
    ] each

    new-str ;

And we can write some code to display the result as requested:

: best-shuffle. ( str -- )
    dup best-shuffle 2dup [ = ] 2count "%s, %s, (%d)\n" printf ;

And then print some test cases:

IN: scratchpad {
                   "abracadabra"
                   "seesaw"
                   "elk"
                   "grrrrrr"
                   "up"
                   "a"
               } [ best-shuffle. ] each
abracadabra, raabaracdab, (0)
seesaw, easwse, (0)
elk, lke, (0)
grrrrrr, rrrrgrr, (5)
up, pu, (0)
a, a, (1)

This is reminiscent to the recent work I had done on derangements and generating a random derangement. While this approach does not generate a perfect derangement of the indices – and happens to be accidentally quadratic – it is somewhat similar with the additional step that we look to make sure not only are the indices different, but that the contents are different as well before swapping.

Dotenv

Tue, 17 Jun 2025 08:00:00 -0700

Dotenv is an informal file specification, a collection of implementations in different languages, and an organization providing cloud-hosting services. They describe the .env file format and some extensions:

The .env file format is central to good DSX and has been since it was introduced by Heroku in 2012 and popularized by the dotenv node module (and other libraries) in 2013.

The .env file format starts where the developer starts - in development. It is added to each project but NOT committed to source control. This gives the developer a single secure place to store sensitive application secrets.

Can you believe that prior to introducing the .env file, almost all developers stored their secrets as hardcoded strings in source control. That was only 10 years ago!

Besides official and many unofficial .env parsers available in a lot of languages, the Dotenv organization provides support for dotenv-vault cloud services in Node.js, Python, Ruby, Go, PHP, and Rust.

Today, I wanted to show how you might implement a .env parser in Factor.

File Format

The .env files are relatively simple formats with key-value pairs that are separated by an equal sign. These values can be un-quoted, single-quoted, double-quoted, or backtick-quoted strings:

SIMPLE=xyz123
INTERPOLATED="Multiple\nLines"
NON_INTERPOLATED='raw text without variable interpolation'
MULTILINE = `long text here,
e.g. a private SSH key`

Parsing

There are a lot of ways to build a parser – everything from manually spinning through bytes using a hand-coded state machine, higher-level parsing grammars like PEG, or explicit parsing syntax forms like EBNF.

We are going to implement a .env parser using standard PEG parsers, beginning with some parsers that look for whitespace, comment lines, and newlines:

: ws ( -- parser )
    [ " \t" member? ] satisfy repeat0 ;

: comment ( -- parser )
    "#" token [ CHAR: \n = not ] satisfy repeat0 2seq hide ;

: newline ( -- parser )
    "\n" token "\r\n" token 2choice ;

Keys

The .env keys are specified simply:

For the sake of portability (and sanity), environment variable names (keys) must consist solely of letters, digits, and the underscore ( _ ) and must not begin with a digit. In regex-speak, the names must match the following pattern:
[a-zA-Z_]+[a-zA-Z0-9_]*

We can build a key parser by looking for those characters:

: key-parser ( -- parser )
    CHAR: A CHAR: Z range
    CHAR: a CHAR: z range
    [ CHAR: _ = ] satisfy 3choice

    CHAR: A CHAR: Z range
    CHAR: a CHAR: z range
    CHAR: 0 CHAR: 9 range
    [ CHAR: _ = ] satisfy 4choice repeat0

    2seq [ first2 swap prefix "" like ] action ;

Values

The .env values can be un-quoted, single-quoted, double-quoted, or backtick-quoted strings. Only double-quoted strings support escape characters, but single-quoted and backtick-quoted strings support escaping either single-quotes or backtick characters.

: single-quote ( -- parser )
    "\\" token hide [ "\\'" member? ] satisfy 2seq [ first ] action
    [ CHAR: ' = not ] satisfy 2choice repeat0 "'" dup surrounded-by ;

: backtick ( -- parser )
    "\\" token hide [ "\\`" member? ] satisfy 2seq [ first ] action
    [ CHAR: ` = not ] satisfy 2choice repeat0 "`" dup surrounded-by ;

: double-quote ( -- parser )
    "\\" token hide [ "\"\\befnrt" member? ] satisfy 2seq [ first escape ] action
    [ CHAR: " = not ] satisfy 2choice repeat0 "\"" dup surrounded-by ;

: literal ( -- parser )
    [ " \t\r\n" member? not ] satisfy repeat0 ;

Before we implement our value parser, we should note that some values can be interpolated:

Interpolation (also known as variable expansion) is supported in environment files. Interpolation is applied for unquoted and double-quoted values. Both braced (${VAR}) and unbraced ($VAR) expressions are supported.

Direct interpolation

${VAR} -> value of VAR

Default value

${VAR:-default} -> value of VAR if set and non-empty, otherwise default

${VAR-default} -> value of VAR if set, otherwise default

And some values can have command substitution:

Add the output of a command to one of your variables in your .env file. Command substitution is applied for unquoted and double-quoted values.

Direct substitution

$(whoami) -> value of $ whoami

We can implement an interpolate parser that acts on strings and replaces observed variables with their interpolated or command-substituted values. This uses a regular expressions and re-replace-with to substitute values appropriately.

: interpolate-value ( string -- string' )
    R/ \$\([^)]+\)|\$\{[^\}:-]+(:?-[^\}]*)?\}|\$[^(^{].+/ [
        "$(" ?head [
            ")" ?tail drop process-contents [ blank? ] trim
        ] [
            "${" ?head [ "}" ?tail drop ] [ "$" ?head drop ] if
            ":-" split1 [
                [ os-env [ empty? not ] keep ] dip ?
            ] [
                "-" split1 [ [ os-env ] dip or ] [ os-env ] if*
            ] if*
        ] if
    ] re-replace-with ;

: interpolate ( parser -- parser )
    [ "" like interpolate-value ] action ;

We can use that to build a value parser, remembering that only un-quoted and double-quoted values are interpolated, and making sure to convert the result to a string:

: value-parser ( -- parser )
    [
        single-quote ,
        double-quote interpolate ,
        backtick ,
        literal interpolate ,
    ] choice* [ "" like ] action ;

Key-Values

Combining those, we can make a key-value parser, that ignores whitespace around the = token and uses set-os-env to update the environment variables:

: key-value-parser ( -- parser )
    [
        key-parser ,
        ws hide ,
        "=" token hide ,
        ws hide ,
        value-parser ,
    ] seq* [ first2 swap set-os-env ignore ] action ;

And finally, we can build a parsing word that looks for these key-value pairs while ignoring optional comments and whitespace:

PEG: parse-dotenv ( string -- ast )
    ws hide key-value-parser optional
    ws hide comment optional hide 4seq
    newline list-of hide ;

Loading Files

We can load a file by reading the file-contents and then parsing it into environment variables:

: load-dotenv-file ( path -- )
    utf8 file-contents parse-dotenv drop ;

These .env files are usually located somewhere above the current directory, typically at a project root. For now, we make a word that traverses from the current directory up to the root, looking for the first .env file that exists:

: find-dotenv-file ( -- path/f )
    f current-directory get absolute-path [
        nip
        [ ".env" append-path dup file-exists? [ drop f ] unless ]
        [ ?parent-directory ] bi over [ f ] [ dup ] if
    ] loop drop ;

And now, finally, we can find and then load the relevant .env file, if there is one:

: load-dotenv ( -- )
    find-dotenv-file [ load-dotenv-file ] when* ;

Try it out

We can make a simple .env file:

$ cat .env
HOST="${HOST:-localhost}"
PORT="${PORT:-80}"
URL="https://${HOST}:${PORT}/index.html"

And then try it out, overriding the PORT environment variable:

$ PORT=8080 ./factor
IN: scratchpad USE: dotenv
IN: scratchpad load-dotenv
IN: scratchpad "URL" os-env .
"https://localhost:8080/index.html"

Some additional features that we might want to follow up on:

investigate the POSIX-compliant dotenv syntax specification and included test cases
optionally use a startup hook to automatically load .env files
support for dotenv-vaults and encrypted deploys
command-line support similar to dotenvx

This is available in the latest development version. Check it out!

Color Prettyprint

Sun, 15 Jun 2025 08:00:00 -0700

Factor has a neat feature in the prettyprint vocabulary that allows printing objects, typically as valid source literal expressions. There are small caveats to that regarding circularity, depth limits, and other prettyprint control variables, but it’s roughly true that you can pprint most everything and have it be useful.

At some point in the past few years, I noticed that Xcode and Swift Playground have support for color literals that are rendered in the source code. You can see that in this short video describing how it works:

Inspired by that – and a past effort at color tab completion – I thought it would be fun to show how you might extend our color support to allow colors to be prettyprinted with a little gadget in the UI that renders their colors.

First, we need to define a section object that holds a color and renders it using a colored border gadget.

TUPLE: color-section < section color ;

: <color-section> ( color -- color-section )
    1 color-section new-section swap >>color ;

M: color-section short-section
     " " <label> { 5 0 } <border>
        swap color>> <solid> >>interior
        COLOR: black <solid> >>boundary
    output-stream get write-gadget ;

Next, we extend pprint* with a custom implementation for any color type as well as our named colors that adds a color section to the output block:

M: color pprint*
    <block
        [ call-next-method ]
        [ <color-section> add-section ] bi
    block> ;

M: parsed-color pprint*
    <block
        [ \ COLOR: pprint-word string>> text ]
        [ <color-section> add-section ] bi
    block> ;

And, now that we have that, we can push some different colors to the stack and see how they are all displayed:

Pretty cool.

I did not commit this yet – partly because I’m not sure we want this as-is and also partly because it needs to only display the gadget if the UI is running. We also might want to consider the UI theme and choose a nice contrasting color for the border element.

Tracking Dict

Sat, 14 Jun 2025 08:00:00 -0700

Peter Bengtsson wrote about building a Python dict that can report which keys you did not use:

This can come in handy if you’re working with large Python objects and you want to be certain that you’re either unit testing everything you retrieve or certain that all the data you draw from a database is actually used in a report.

For example, you might have a SELECT fieldX, fieldY, fieldZ FROM ... SQL query, but in the report you only use fieldX, fieldY in your CSV export.
class TrackingDict(dict):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._accessed_keys = set()

    def __getitem__(self, key):
        self._accessed_keys.add(key)
        return super().__getitem__(key)

    @property
    def accessed_keys(self):
        return self._accessed_keys

    @property
    def never_accessed_keys(self):
        return set(self.keys()) - self._accessed_keys

We can build a version of this in Factor intended to show off a few language features. The original version in Python used inheritance versus composition to implement the data structure. Instead, we build a data structure that will wrap an existing assoc and delegate to it.

First, a simple tuple definition that will have the underlying assoc and a set of accessed keys:

TUPLE: tracking-assoc underlying accessed-keys ;

: <tracking-assoc> ( underlying -- tracking-assoc )
    HS{ } clone tracking-assoc boa ;

INSTANCE: tracking-assoc assoc

We then implement the assoc protocol by using delegation to the underlying assoc, with an override for tracking accessed keys:

CONSULT: assoc-protocol tracking-assoc underlying>> ;

M: tracking-assoc at*
    [ underlying>> at* ] [ accessed-keys>> adjoin ] 2bi ;

And for fun – since we could have built a normal word to do this – we define a protocol slot that we then implement to compute the never accessed keys:

SLOT: never-accessed-keys

M: tracking-assoc never-accessed-keys>>
    [ underlying>> keys ] [ accessed-keys>> ] bi diff ;

And we show it works using a simple example from the original blog post:

H{
    { "name" "John Doe" }
    { "age" 30 }
    { "email" "[email protected]" }
} <tracking-assoc>

"name" over at "John Doe" assert=

[ accessed-keys>> "Accessed keys: %u\n" printf ]
[ never-accessed-keys>> "Never accessed keys: %u\n" printf ] bi

Which prints this:

Accessed keys: HS{ "name" }
Never accessed keys: HS{ "email" "age" }

Fun!

bab=aaa, bbb=bb

Wed, 04 Jun 2025 08:00:00 -0700

Admittedly, I struggle sometimes when I read the word “monoid”. It seems to always remind me of that saying “A Monad is just a Monoid in the Category of Endofunctors” which is both a tongue-twister, requires repeated effort to understand, and is sometimes used in casual conversation when jealously describing the features and capabilities of the Haskell programming language.

In any event, the topic of monoids came up recently on the Factor Discord server. Slava Pestov, the original creator of the Factor programming language, was describing recent work he was doing on some fun mathematical problems:

I’m searching for examples of finitely-presented monoids that cannot be presented by finite complete rewriting systems:

⟨a, b | aba=aa, baa=aab⟩ – my first result in this space.

⟨a, b | bab=aaa, bbb=bb⟩ – explore the equivalence class of a⁸ in this remarkable monoid.

He clarified in the discussion that “the Knuth-Bendix algorithm can solve many cases but not these two, which is how I found them in the first place”.

The second link above – made extra fun because it uses a=🍎 and b=🍌 to make a more emojiful experience – describes this specific problem in more detail and presents it as a simple game to play. You can see the available rules, the current state, and the next possible states achieved by applying either of the rules, which are bi-directional.

Your pie recipe calls for 10 apples, but you only have 8 apples.

Can you turn your 8 apples into 10 apples with these two magic spells?

🍌🍎🍌 ↔️ 🍎🍎🍎

🍌🍌🍌 ↔️ 🍌🍌

Current state:

🍎🍎🍎🍎🍎🍎🍎🍎

Tap to cast a spell:

🍌🍎🍌🍎🍎🍎🍎🍎

🍎🍌🍎🍌🍎🍎🍎🍎

🍎🍎🍌🍎🍌🍎🍎🍎

🍎🍎🍎🍌🍎🍌🍎🍎

🍎🍎🍎🍎🍌🍎🍌🍎

🍎🍎🍎🍎🍎🍌🍎🍌

When exploring things like this, many questions come to mind. For example:

Is this even solvable?
What is the shortest solution?
How many short solutions exist?
How many of the possible states lead to the solution?
How large is the set of all possible states?

After the link was shared, I must have clicked through about 5000 different state transitions hoping to randomly stumble upon the solution. And, eventually, recognized that it might be a good idea – or even possibly poetic – to do that exploration using Factor.

Warning: Spoilers ahead!

Let’s start by writing the rules:

CONSTANT: rules {
    { "bab" "aaa" }
    { "bbb" "bb" }
}

For convenience, we will make a sequence containing all the rules – since these are bi-directional and can be applied in either direction – using a literal expression.

CONSTANT: all-rules $[
    rules dup [ swap ] assoc-map append
]

We can make a word that takes a from node and applies a quotation to the result of rules a -> b. Notice that we’re able to use our previous work on finding subsequences:

:: each-move ( from a b quot: ( next -- ) -- )
    from dup a subseq-indices [
        cut a length tail b glue quot call
    ] with each ; inline

And then a word that returns all the next states:

: all-moves ( from -- moves )
    [ all-rules [ first2 [ , ] each-move ] with each ] { } make ;

It’s often good practice to try each step out during development, so let’s do that and show the first six possible next states match the ones from the game:

IN: scratchpad "aaaaaaaa" all-moves .
{
    "babaaaaa"
    "ababaaaa"
    "aababaaa"
    "aaababaa"
    "aaaababa"
    "aaaaabab"
}

The next state is nice to have, but we’re generally going to be accumulating paths which are a series of states achieved by traversing the graph of all possible states:

:: all-paths% ( path -- )
    path last all-rules [
        first2 [ path swap suffix , ] each-move
    ] with each ;

: all-paths ( paths -- paths' )
    [ [ all-paths% ] each ] { } make members ;

So, these are the first two steps in traversing the graph. You can see that some of the possible second moves end up circling back to the starting position, which makes sense since the rules are bi-directional and if applied can be un-applied on the next step.

IN: scratchpad { { "aaaaaaaa" } } all-paths dup . nl all-paths .
{
    { "aaaaaaaa" "babaaaaa" }
    { "aaaaaaaa" "ababaaaa" }
    { "aaaaaaaa" "aababaaa" }
    { "aaaaaaaa" "aaababaa" }
    { "aaaaaaaa" "aaaababa" }
    { "aaaaaaaa" "aaaaabab" }
}

{
    { "aaaaaaaa" "babaaaaa" "aaaaaaaa" }
    { "aaaaaaaa" "babaaaaa" "babbabaa" }
    { "aaaaaaaa" "babaaaaa" "babababa" }
    { "aaaaaaaa" "babaaaaa" "babaabab" }
    { "aaaaaaaa" "ababaaaa" "aaaaaaaa" }
    { "aaaaaaaa" "ababaaaa" "ababbaba" }
    { "aaaaaaaa" "ababaaaa" "abababab" }
    { "aaaaaaaa" "aababaaa" "aaaaaaaa" }
    { "aaaaaaaa" "aababaaa" "aababbab" }
    { "aaaaaaaa" "aaababaa" "aaaaaaaa" }
    { "aaaaaaaa" "aaababaa" "babbabaa" }
    { "aaaaaaaa" "aaaababa" "aaaaaaaa" }
    { "aaaaaaaa" "aaaababa" "babababa" }
    { "aaaaaaaa" "aaaababa" "ababbaba" }
    { "aaaaaaaa" "aaaaabab" "aaaaaaaa" }
    { "aaaaaaaa" "aaaaabab" "babaabab" }
    { "aaaaaaaa" "aaaaabab" "abababab" }
    { "aaaaaaaa" "aaaaabab" "aababbab" }
}

Let’s solve for the shortest paths, we keep track of states we’ve previously seen to avoid cycles, and we iterate using breadth-first-search until we find any solutions:

:: shortest-paths ( from to -- moves )
    HS{ from } clone :> seen
    { { from } }     :> stack!

    f [
        drop

        ! find all next possibilities
        stack all-paths

        ! reject ones that circle back to visited nodes
        [ last seen in? ] reject

        ! reject any that are over the length of ``to``
        to length '[ last length _ > ] reject stack!

        ! add the newly visited nodes
        stack [ last seen adjoin ] each

        ! stop when we find any solutions
        stack [ last to = ] filter dup empty?
    ] loop ;

Note: we reject any states that are longer than our goal state. This provides a nice way to cull the graph and make the search performance more reasonable. You could also choose not do that, and exhaustively search into that area. However, while this is not generally a valid approach to solving these types of problems, it is specifically a valid approach to this one.

There are quite a few shortest paths:

IN: scratchpad "aaaaaaaa" "aaaaaaaaaa" shortest-paths length .
560

Each of those contain 16 nodes, which means 15 rules were applied:

IN: scratchpad "aaaaaaaa" "aaaaaaaaaa" shortest-paths first length .
16

But they only go through a seemingly small number of nodes:

IN: scratchpad "aaaaaaaa" "aaaaaaaaaa" shortest-paths concat members length .
43

How many nodes are there in total in the graph? Let’s find out!

:: full-graph ( from to -- seen )
    HS{ from } clone :> seen

    { { from } } [
        ! find all next possibilities
        all-paths

        ! reject any that are over the length of ``to``
        to length '[ last length _ > ] reject

        ! only include ones that visit new nodes
        [ last seen ?adjoin ] filter
    ] until-empty seen ;

We can see that the shortest solutions go through about 15% of the nodes:

IN: scratchpad "aaaaaaaa" "aaaaaaaaaa" full-graph cardinality .
279

We can use our graph traversal approach and Graphviz to visualize where solutions are found, showing how some areas of the graph are quite hard to randomly get out of and then on the correct path to a solution. We draw the starting node green, the ending node blue, and the nodes that involved in the shortest path as gray:

And that’s kind of interesting, but if we cluster nodes by their depth when first discovered, some other patterns show up:

Such a fun problem!

Bitcask

Tue, 03 Jun 2025 08:00:00 -0700

Phil Eaton issued a HYIBY? – have you implemented bitcask yet? – challenge yesterday. Of course, I immediately realized that I have not and also that it would be fun to build in Factor.

Bitcask is described in the original Bitcask paper as a “log-structured hash-table for fast key/value data”, and was part of the software developed by Basho Technologies as part of the Riak distributed database. Besides the original paper, various developers over the years have bumped into Bitcask and implemented it in different programming languages. Arpit Bhayani, for example, has a nice blog post describing Bitcask that is worth reading for more background.

At its core, Bitcask describes an append-only storage mechanism for building a key-value database. It consists of one-or-more data files, each of which has an optional index file to allow faster recovery when initializing the database, and generally supports GET, PUT, and DELETE operations.

Data Files

Our data file contains a series of entry records. Each record consists of a key length, value length, key bytes, and value bytes. A simple word provides a way to write these bytes to a file:

: write-entry-bytes ( key value -- )
    [ dup length 4 >be write ] bi@ [ write ] bi@ ;

Then, using the serialize vocabulary we can store Factor objects quite simply:

: write-entry ( key value -- )
    [ object>bytes ] bi@ write-entry-bytes ;

We need the ability to store tombstone records which indicate that a key has been deleted from the database. In this case, we choose to store a zero-sized value to indicate that:

: write-tombstone ( key -- )
    object>bytes f write-entry-bytes ;

Assuming that a data file has had it’s seek position moved to the beginning of an entry record, we can read the value that it contains, or return a boolean indicating that it is not found because it was stored as a tombstone:

: read-entry ( -- value/f ? )
    4 read be> 4 read be> [
        drop f f
    ] [
        [ seek-relative seek-input ]
        [ read bytes>object t ] bi*
    ] if-zero ;

Index Files

Our index file contains hints that provide a way to recover the record offsets into the data files. These hints consist of a series of index records. Each record consists of a key length, key bytes, and file offset.

We can write our index mapping of keys to offsets:

: write-index ( index -- )
    [
        [ object>bytes dup length 4 >be write write ]
        [ 4 >be write ] bi*
    ] assoc-each ;

And then read it back into memory:

: read-index ( -- index )
    H{ } clone [
        4 read [
            be> read bytes>object 4 read be>
            swap pick set-at t
        ] [ f ] if*
    ] loop ;

We want to make the index files optional, continuing to recover the index by first seeking to the last entry that we have in the index, and then continuing to iterate across the records in the data file to recover the full index, making sure to delete any items that are subsequently observed to contain tombstone entries:

: recover-index ( index -- index' )
    dup values [ maximum seek-absolute seek-input ] unless-empty
    [
        tell-input 4 read [
            be> 4 read be> [ read bytes>object ] dip
            [ pick delete-at drop ] [
                [ pick set-at ]
                [ seek-relative seek-input ] bi*
            ] if-zero t
        ] [ drop f ] if*
    ] loop ;

Bitcask Implementation

The associative mapping protocol describes the words that an assoc should support. This type of object provides a mapping of key to value, with ways to add, update, and delete these mappings.

We want our bitcask type to use a single data file, reading and recovering from an index file, and then providing ways to modify – by appending to the data file – the database.

TUPLE: bitcask path index ;

:: <bitcask> ( path -- bitcask )
    path dup touch-file
    path ".idx" append dup touch-file
    binary [ read-index ] with-file-reader
    path binary [ recover-index ] with-file-reader
    bitcask boa ;

INSTANCE: bitcask assoc

The application should control when and how these index files are persisted:

: save-index ( bitcask -- )
    dup path>> ".idx" append binary
    [ index>> write-index ] with-file-writer ;

The first operation we support will be set-at, updating the index after writing the entry.

M:: bitcask set-at ( value key bitcask -- )
    bitcask path>> binary [
        tell-output
        key value write-entry
        key bitcask index>> set-at
    ] with-file-appender ;

Next, we support at*, to lookup a value by seeking in the data file and reading the entry:

M:: bitcask at* ( key bitcask -- value/f ? )
    key bitcask index>> at* [
        bitcask path>> binary [
            seek-absolute seek-input read-entry
        ] with-file-reader
    ] [ drop f f ] if ;

And finally, delete-at removes a key from the index after writing a tombstone:

M:: bitcask delete-at ( key bitcask -- )
    key bitcask index>> key? [
        bitcask path>> binary [
            key write-tombstone
            key bitcask index>> delete-at
        ] with-file-appender
    ] when ;

The assoc-size of our database is the size of the index:

M: bitcask assoc-size
    index>> assoc-size ;

It is helpful to implement >alist to provide a conversion to an assocation list, although if the database gets quite large, this might be of less practical value:

M:: bitcask >alist ( bitcask -- alist )
    bitcask path>> binary [
        bitcask index>> [
            seek-absolute seek-input read-entry t assert=
        ] { } assoc-map-as
    ] with-file-reader ;

And a way to clear-assoc by writing tombstones and clearing the index:

M:: bitcask clear-assoc ( bitcask -- )
    bitcask path>> binary [
        bitcask index>>
        dup keys [ write-tombstone ] each
        clear-assoc
    ] with-file-appender ;

There are some elements desirable in a production database that are not implemented, for example:

reducing the amount of opening and closing files to increase performance
controlling when file writes are flushed to disk
storing other metadata such as timestamps for each entry
rolling over data log files as they reach a maximum size
providing a way to vacuum the database files to remove tombstone entries
compressing the database entries or data log files if size is a consideration
protocol for accessing it over the network by other parts of the application

This is now available in the development version in the bitcask vocabulary!

Game of Life

Tue, 27 May 2025 08:00:00 -0700

One of my first and most memorable graphical programs was implementing John Conway’s Game of Life. At the time, that implementation was as a Java applet. I’ve revisited it periodically in different programming languages including several years ago when I started to implement the Game of Life in Factor – something I’ve always wanted to write about.

The Game of Life is a two-dimensional grid of square cells with fairly simple logic. Each cell can be either live or dead. Each cell interacts with its eight neighboring cells with the following rules determining the next state of the game board:

Any live cell with fewer than two live neighbours dies, as if by underpopulation.
Any live cell with two or three live neighbours lives on to the next generation.
Any live cell with more than three live neighbours dies, as if by overpopulation.
Any dead cell with exactly three live neighbours becomes a live cell, as if by reproduction.

You can run this in any release since Factor 0.98:

IN: scratchpad "game-of-life" run

And it will look something like this:

There should have been a video here but your browser does not seem to support it.

Let’s go ahead and build it!

Game Logic

We will model our two-dimensional game board as an array of arrays. And in particular, since each cell has only two states, we will use bit-arrays to reduce the memory requirements by efficiently storing the state, one bit for each cell.

: <grid> ( rows cols -- grid )
    '[ _ <bit-array> ] replicate ;

: grid-dim ( grid -- rows cols )
    [ length ] [ first length ] bi ;

Making a random grid, which is useful in testing:

: random-grid ( rows cols -- grid )
    '[ _ { t f } ?{ } randoms-as ] replicate ;

And a word we can use for debugging, to print a grid out:

: grid. ( grid -- )
    [ [ CHAR: # CHAR: . ? ] "" map-as print ] each ;

Some implementations choose to make the game boards infinite, but we are instead going to build a wraparound game board. This allows, for example, a glider shape to fly off the bottom right and then re-appear on the top left of the board, which is a lot more fun to watch.

A useful word calculates adjacent indices for a cell – that wrap at a max value of rows or columns:

:: adjacent-indices ( n max -- n-1 n n+1 )
    n [ max ] when-zero 1 -
    n
    n 1 + dup max = [ drop 0 ] when ;

Test it out, showing how it might work in a hypothetical 10 x 10 grid:

! in the middle
IN: scratchpad 3 10 adjacent-indices 3array .
{ 2 3 4 }

! at the start, wrapped around
IN: scratchpad 0 10 adjacent-indices 3array .
{ 9 0 1 }

! at the end, wrapped around
IN: scratchpad 9 10 adjacent-indices 3array .
{ 8 9 0 }

The main game logic requires counting neighbors for each cell. Since each cell can have 8 neighbors, we can store this count in a half-byte – a nibble – which can hold the values [0..15]. In the batteries-included standard library, we have a nibble-arrays vocabulary that makes this easy.

The simplest implementation would just iterate across the game board, and for each cell that is live, increment the count for the neighboring indices around it:

:: count-neighbors ( grid -- counts )
    grid grid-dim :> ( rows cols )
    rows [ cols <nibble-array> ] replicate :> neighbors
    grid [| row j |
        j rows adjacent-indices
        [ neighbors nth ] tri@ :> ( above same below )

        row [| cell i |
            cell [
                i cols adjacent-indices
                [ [ above [ 1 + ] change-nth ] tri@ ]
                [ nip [ same [ 1 + ] change-nth ] bi@ ]
                [ [ below [ 1 + ] change-nth ] tri@ ]
                3tri
            ] when
        ] each-index
    ] each-index neighbors ;

Then the last piece of game logic we need is to adjust the grid cells according to the rules – making some transition from live to dead, and others from dead to live based on their state and the neighboring counts.

:: next-step ( grid -- )
    grid count-neighbors :> neighbors
    grid [| row j |
        j neighbors nth :> neighbor-row
        row [| cell i |
            i neighbor-row nth
            cell [
                2 3 between? i row set-nth
            ] [
                3 = [ t i row set-nth ] when
            ] if
        ] each-index
    ] each-index ;

Before we move on to creating a graphical user interface for the game, let’s try it out in the Factor listener:

! Create a random 10x10 grid
IN: scratchpad 10 10 random-grid

! Print it out
IN: scratchpad dup grid.
#..#..#.##
##....####
..###.####
.##...#..#
.##....###
..###..#.#
...###.#..
.###....##
#...###.##
.##..#.#..

! Compute the neighbors for each cell
IN: scratchpad dup count-neighbors .
{
    N{ 5 5 4 1 2 3 4 6 5 5 }
    N{ 5 3 4 4 3 4 4 7 7 7 }
    N{ 6 5 4 3 1 4 4 6 6 5 }
    N{ 5 4 5 5 2 3 3 6 7 4 }
    N{ 5 4 5 5 2 2 3 3 5 3 }
    N{ 3 3 4 5 4 3 4 3 6 2 }
    N{ 3 3 6 6 5 2 3 2 5 3 }
    N{ 4 2 3 4 6 5 4 4 4 4 }
    N{ 4 5 5 4 3 3 3 4 4 4 }
    N{ 5 3 2 3 4 4 5 4 5 6 }
}

! Compute the next generation
IN: scratchpad dup next-step

! Print it out
IN: scratchpad dup grid.
.....#....
.#..#.....
...#......
.....##...
......##.#
##...#.#.#
##...###.#
.##.......
....###...
.###......

It works!

Game Interface

In Factor, one of the ways we can build user interfaces is using gadgets and OpenGL rendering instructions. We start by modeling our game as a gadget with a grid object, a size that specifies the rendered pixels-per-cell, and a timer to control the speed of repainting new generations.

TUPLE: grid-gadget < gadget grid size timer ;

Our default gadget will have cells that are 20 pixels square, and repaint 10 times per second:

: <grid-gadget> ( grid -- gadget )
    grid-gadget new
        swap >>grid
        20 >>size
        dup '[ _ dup grid>> next-step relayout-1 ]
        f 1/10 seconds <timer> >>timer ;

Gadgets are grafted onto the render hierarchy, and then later ungrafted when they are removed. We handle that state change by stopping the timer before delegating to the parent to cleanup further:

M: grid-gadget ungraft*
    [ timer>> stop-timer ] [ call-next-method ] bi ;

The default dimension for our gadget is the grid dimension times the pixel size:

M: grid-gadget pref-dim*
    [ grid>> grid-dim swap ] [ size>> '[ _ * ] bi@ 2array ] bi ;

If the grid size changes – for example, by using the mouse scroll wheel to zoom in or out – we can create and store a new grid, keeping the cells that are visible in the same state they were in:

:: update-grid ( gadget -- )
    gadget dim>> first2 :> ( w h )
    gadget size>> :> size
    h w [ size /i ] bi@ :> ( new-rows new-cols )
    gadget grid>> :> grid
    grid grid-dim :> ( rows cols )
    rows new-rows = not cols new-cols = not or [
        new-rows new-cols <grid> :> new-grid
        rows new-rows min [| j |
            cols new-cols min [| i |
                i j grid nth nth
                i j new-grid nth set-nth
            ] each-integer
        ] each-integer
        new-grid gadget grid<<
    ] when ;

We can draw the cells that are live as black squares:

:: draw-cells ( gadget -- )
    COLOR: black gl-color
    gadget size>> :> size
    gadget grid>> [| row j |
        row [| cell i |
            cell [
                i j [ size * ] bi@ 2array { size size } gl-fill-rect
            ] when
        ] each-index
    ] each-index ;

And then draw the gray lines that define the grid of cells:

:: draw-lines ( gadget -- )
    gadget size>> :> size
    gadget grid>> grid-dim :> ( rows cols )
    COLOR: gray gl-color
    cols rows [ size * ] bi@ :> ( w h )
    rows 1 + [| j |
        j size * :> y
        { 0 y } { w y } gl-line
    ] each-integer
    cols 1 + [| i |
        i size * :> x
        { x 0 } { x h } gl-line
    ] each-integer ;

Putting this together, we draw our gadget by updating the grid, drawing the cells, and drawing the lines:

M: grid-gadget draw-gadget*
    [ update-grid ] [ draw-cells ] [ draw-lines ] tri ;

And, with the “visual REPL”, you can directly render the grid gadget, to see it work:

We now need to build the interactive parts. Let’s first start by handling a click, to toggle the state of a cell, and storing which state it was toggled to in the last-click variable:

SYMBOL: last-click

:: on-click ( gadget -- )
    gadget grid>> :> grid
    gadget size>> :> size
    grid grid-dim :> ( rows cols )
    gadget hand-rel first2 [ size /i ] bi@ :> ( i j )
    i 0 cols 1 - between?
    j 0 rows 1 - between? and [
        i j grid nth
        [ not dup last-click set ] change-nth
    ] when gadget relayout-1 ;

That allows us to build a drag feature, where as we drag, we continue to either set cells to live or dead according to what the first click was doing:

:: on-drag ( gadget -- )
    gadget grid>> :> grid
    gadget size>> :> size
    grid grid-dim :> ( rows cols )
    gadget hand-rel first2 [ size /i ] bi@ :> ( i j )
    i 0 cols 1 - between?
    j 0 rows 1 - between? and [
        last-click get i j
        grid nth set-nth
        gadget relayout-1
    ] when ;

We implement a scrolling feature to adjust the size of the rendered cells, clamping the value when it gets too small or too large:

: on-scroll ( gadget -- )
    [
        scroll-direction get second {
            { [ dup 0 > ] [ -2 ] }
            { [ dup 0 < ] [ 2 ] }
            [ 0 ]
        } cond nip + 4 30 clamp
    ] change-size relayout-1 ;

And we store these as "gestures" that are supported by the gadget:

grid-gadget "gestures" [
    {
        { T{ button-down { # 1 } } [ on-click ] }
        { T{ drag { # 1 } } [ on-drag ] }
        { mouse-scroll [ on-scroll ] }
    } assoc-union
] change-word-prop

The last bit we need is to make the toolbar, which has a few commands we can run:

:: com-play ( gadget -- )
    gadget timer>> restart-timer ;

:: com-stop ( gadget -- )
    gadget timer>> stop-timer ;

:: com-clear ( gadget -- )
    gadget dup grid>> [ clear-bits ] each relayout-1 ;

:: com-random ( gadget -- )
    gadget dup grid>> [ [ drop { t f } random ] map! drop ] each relayout-1 ;

:: com-glider ( gadget -- )
    gadget dup grid>> :> grid
    { { 2 1 } { 3 2 } { 1 3 } { 2 3 } { 3 3 } }
    [ grid nth t -rot set-nth ] assoc-each relayout-1 ;

:: com-step ( gadget -- )
    gadget dup grid>> next-step relayout-1 ;

And then store these as the "toolbar" command map:

grid-gadget "toolbar" f {
    { T{ key-down { sym "1" } } com-play }
    { T{ key-down { sym "2" } } com-stop }
    { T{ key-down { sym "3" } } com-clear }
    { T{ key-down { sym "4" } } com-random }
    { T{ key-down { sym "5" } } com-glider }
    { T{ key-down { sym "6" } } com-step }
} define-command-map

And finally, we can wrap the grid gadget with something that makes a toolbar, and creates a main window when launched:

TUPLE: life-gadget < track ;

: <life-gadget> ( -- gadget )
    vertical life-gadget new-track
    20 20 make-grid <grid-gadget>
    [ <toolbar> format-toolbar f track-add ]
    [ 1 track-add ] bi ;

M: life-gadget focusable-child* children>> second ;

MAIN-WINDOW: life-window
    { { title "Game of Life" } }
    <life-gadget> >>gadgets ;

As with anything, there are probably things we could continue to improve in our UI framework, but one of the biggest missing pieces are examples of working code, which is largely what motivated writing about this today.

Check it out!

And maybe think about how you might adjust it to be an infinite game board, or to increase performance when computing the next generation, to improve the OpenGL rendering logic, persist the game board between launches, or do things like communicate age of each cell by the color that it is rendered with.

Sorting IPv6

Fri, 23 May 2025 08:00:00 -0700

Recently, Chris Siebenmann was lamenting the lack of a good command line way to sort IPv6 addresses. This followed a post of his a few years ago about how sort -V can easily sort IPv4 addresses. Since I had some fun talking about sorting Roman numerals recently – and we have an extensive standard library – I thought I’d talk about how you might solve this problem with Factor.

As a reminder, IPv6 uses a 128-bit address space with more theoretical addresses than the older – but still quite commonly used – IPv4 32-bit address space.

The internal network address of your computer is sometimes referred to as localhost or a loopback address, and represented as 127.0.0.1 in IPv4, or ::1 in IPv6. We have an ip-parser vocabulary with words for parsing and manipulating IP addresses as well as IP network strings written in CIDR notation. We can use these words to show how to translate these addresses to their byte representation:

IN: scratchpad "127.0.0.1" parse-ipv4 .
B{ 127 0 0 1 }

IN: scratchpad "::1" parse-ipv6 .
{ 0 0 0 0 0 0 0 1 }

And, we could use that to sort a list of addresses pretty easily:

IN: scratchpad {
                   "127.0.0.1"
                   "1.1.1.1"
                   "8.8.8.8"
                   "192.168.10.40"
               } [ parse-ipv4 ] sort-by .
{
    "1.1.1.1"
    "8.8.8.8"
    "127.0.0.1"
    "192.168.10.40"
}

IN: scratchpad {
                   "2620:0:1cfe:face:b00c::3"
                   "2001:4860:4860::8844"
                   "2620:0:ccc::2"
                   "::1"
                   "2001:4860:4860::8888"
               } [ parse-ipv6 ] sort-by .
{
    "::1"
    "2001:4860:4860::8844"
    "2001:4860:4860::8888"
    "2620:0:ccc::2"
    "2620:0:1cfe:face:b00c::3"
}

And so, now that some great feedback encouraged us to do command-line eval with auto-use? enabled, we can run this easily as a one-line script:

# make a file full of unsorted IPv6 addresses
$ cat <<EOF > ips.txt
2620:0:1cfe:face:b00c::3
2001:4860:4860::8844
2620:0:ccc::2
::1
2001:4860:4860::8888
EOF

# show that you can parse the file as strings
$ cat ips.txt | ./factor -e="read-lines ."
{
    "2620:0:1cfe:face:b00c::3"
    "2001:4860:4860::8844"
    "2620:0:ccc::2"
    "::1"
    "2001:4860:4860::8888"
}

# sort and print the sorted output
$ cat ips.txt | ./factor -e="read-lines [ parse-ipv6 ] sort-by [ print ] each"
::1
2001:4860:4860::8844
2001:4860:4860::8888
2620:0:ccc::2
2620:0:1cfe:face:b00c::3

Pretty cool!

Faster Leap Year?

Wed, 21 May 2025 08:00:00 -0700

Last week, Falk Hüffner wrote about making a leap year check in three instructions:

With the following code, we can check whether a year 0 ≤ y ≤ 102499 is a leap year with only about 3 CPU instructions:
bool is_leap_year_fast(uint32_t y) {
    return ((y * 1073750999) & 3221352463) <= 126976;
}
How does this work? The answer is surprisingly complex. This article explains it, mostly to have some fun with bit-twiddling; at the end, I’ll briefly discuss the practical use.

This is how a leap year check is typically implemented:
bool is_leap_year(uint32_t y) {
    if ((y % 4) != 0) return false;
    if ((y % 100) != 0) return true;
    if ((y % 400) == 0) return true;
    return false;
}

It would be fun to see how that works in Factor and compare the relative performance between a simple version and the new super-fast-highly-optimized 3 instruction version. To do that, we can use the benchmark word to record execution time by calling it repeatedly and returning an average time-per-call in nanoseconds:

TYPED: average-benchmark ( n: fixnum quot -- nanos-per-call )
    over [ '[ _ _ times ] benchmark ] dip /f ; inline

Note: We are forcing the iteration loop above to be fixnum to reduce its overhead, and due to the design of the benchmark words below, are going to have code blocks with predictable inputs. Testing your program with random inputs is also important to see the impact of CPU optimizations such as cache and branch predictions, or across multiple CPU architectures. Performance is also impacted by use of code generation features such as inline and compiler steps such as dead-code elimination. Benchmarking is hard.

Simple implementation

The simple – and typical – implementation can be easily written as:

: leap-year? ( year -- ? )
    dup 100 divisor? 400 4 ? divisor? ;

And in fact, that’s how it is implemented in the standard library.

We can write a quick benchmarking word. This ensures we are using the optimizing compiler and also asserts that the result of the word is as expected:

: bench-leap-year ( n year ? -- nanos )
     '[ _ leap-year? _ assert= ] average-benchmark ;

And then call it one hundred million times, to see how long it takes each call on average:

IN: scratchpad 100,000,000 2028 t bench-leap-year .
10.53904317

Just under 11 nanoseconds, including the loop and the assert…

Fast implementation

The fast implementation suggested by Falk can be written directly as:

: fast-leap-year? ( year -- ? )
    1073750999 * 3221352463 bitand 126976 <= ;

And then write a benchmarking word:

: bench-fast-leap-year ( n year ? -- nanos )
     '[ _ fast-leap-year? _ assert= ] average-benchmark ;

And see how long it takes:

IN: scratchpad 100,000,000 2028 t bench-fast-leap-year .
4.74783302

Just under 5 nanoseconds…

Faster implementation

Well, generally Factor supports arbitrarily large integers by allowing integers to implicitly promote from word-sized fixnum to overflow into bignum. And, as they say, you can write the C programming language in any language.

A faster implementation might check the input is a fixnum and then force math without overflow:

TYPED: faster-leap-year? ( year: fixnum -- ? )
    1073750999 fixnum*fast 3221352463 fixnum-bitand 126976 fixnum<= ;

And write a benchmark word:

: bench-faster-leap-year ( n year ? -- nanos )
     '[ _ faster-leap-year? _ assert= ] average-benchmark ;

It’s a bit faster:

IN: scratchpad 100,000,000 2028 t bench-faster-leap-year .
3.24267145

Just under 4 nanoseconds…

Fastest implementation

But, to make sure that we take advantage of the least amount of instructions possible, we can make it slightly-less-safe by declaring the input to be a fixnum to avoid the run-time type checks. This could cause issues if it is called with other types on the stack.

: fastest-leap-year? ( year -- ? )
    { fixnum } declare
    1073750999 fixnum*fast 3221352463 fixnum-bitand 126976 fixnum<= ;

And write a benchmark word:

: bench-fastest-leap-year ( n year ? -- nanos )
     '[ _ fastest-leap-year? _ assert= ] average-benchmark ;

And then you can see it gets quite fast indeed:

IN: scratchpad 100,000,000 2028 t bench-fastest-leap-year .
2.82150476

Just under 3 nanoseconds!

But, is it also just 3 instructions?

IN: scratchpad \ fastest-leap-year? disassemble
000075f0afa19490: 89056a5bd1fe          mov [rip-0x12ea496], eax
000075f0afa19496: 498b06                mov rax, [r14]
000075f0afa19499: 48c1f804              sar rax, 0x4
000075f0afa1949d: 4869c0d7230040        imul rax, rax, 0x400023d7
000075f0afa194a4: bb0ff001c0            mov ebx, 0xc001f00f
000075f0afa194a9: 4821d8                and rax, rbx
000075f0afa194ac: 4881f800f00100        cmp rax, 0x1f000
000075f0afa194b3: b801000000            mov eax, 0x1
000075f0afa194b8: 48bb5c0e388cf0750000  mov rbx, 0x75f08c380e5c
000075f0afa194c2: 480f4ec3              cmovle rax, rbx
000075f0afa194c6: 498906                mov [r14], rax
000075f0afa194c9: 8905315bd1fe          mov [rip-0x12ea4cf], eax
000075f0afa194cf: c3                    ret

Pretty close!

There is an extra instruction near the beginning to untag our fixnum input. Due to the convention around handling booleans in Factor, there are a couple of extra instructions at the end for converting the result into a return value of either t or f.

And it could get even faster if either the assert= was removed, or the code was made inline so the function prologue and epilogue could be elided into the outer scope.

So much fun.

Raylib

Tue, 20 May 2025 08:00:00 -0700

Raylib is a very neat C library that has become popular as a “simple and easy-to-use library to enjoy videogames programming”. Originally released in 2014, it has seen lots of updates including with the latest version 5.5 representing 11 years of updates since the original version 1.0 release.

You can sense the love this library has received from the extensive feature list:

NO external dependencies, all required libraries included with raylib

Multiplatform: Windows, Linux, MacOS, RPI, Android, HTML5… and more!

Written in plain C code (C99) using PascalCase/camelCase notation

Hardware accelerated with OpenGL (1.1, 2.1, 3.3, 4.3 or ES 2.0)

Unique OpenGL abstraction layer: rlgl

Powerful Fonts module (SpriteFonts, BMfonts, TTF, SDF)

Multiple texture formats support, including compressed formats (DXT, ETC, ASTC)

Full 3d support for 3d Shapes, Models, Billboards, Heightmaps and more!

Flexible Materials system, supporting classic maps and PBR maps

Animated 3d models supported (skeletal bones animation)

Shaders support, including Model shaders and Postprocessing shaders

Powerful math module for Vector, Matrix and Quaternion operations: raymath

Audio loading and playing with streaming support (WAV, OGG, MP3, FLAC, XM, MOD)

VR stereo rendering support with configurable HMD device parameters

Huge examples collection with +120 code examples!

Bindings to +60 programming languages!

Free and open source. Check [LICENSE].

In 2019, we first included support for version 2.5 of raylib. Over the years, we have updated our bindings to include new functions and features of new versions. Our most recent two releases, Factor 0.99 and Factor 0.100 included support for version 4.5. And in the current development cycle, we have updated to version 5.0 and then again updated to version 5.5.

It is possible to maintain support for multiple versions, but we have chosen for now to just target the most recent stable release as best we can. Generally, the raylib library has been quite stable, with the occasional deprecation or breaking change which seem to be easy to adjust for.

As a simple demonstration, we can start with the basic window example – an extremely simple program in the spirit of achieving a black triangle moment – to make sure everything works. We can directly translate this example into Factor using our Raylib bindings – which import the C functions using our convention for kebab-case word names.

USE: raylib

: basic-window ( -- )

    800 450 "raylib [core] example - basic window" init-window

    60 set-target-fps

    [ window-should-close ] [

        begin-drawing

        RAYWHITE clear-background

        "Congrats! You created your first window!"
        190 200 20 LIGHTGRAY draw-text

        end-drawing

    ] until close-window ;

And, if you run the example – you’ll end up with this:

Note: if you’re wondering why the background isn’t white, it’s because RAYWHITE is a special version of not-quite-white that matches the Raylib logo.

Of course, your examples can become more complex. For example, Joseph Oziel released a fun game called BitGuessr written in Factor using Raylib. And, given the extensive feature list, your simple program written quickly for a game jam might end up including great audio, images, keyboard, mouse, gamepad, 3d rendering, and other features relatively easily!

I’d love to see more demos written in Factor and Raylib. Happy coding!

Shuffle Syntax

Sun, 18 May 2025 08:00:00 -0700

Some might describe shuffle words as one of the fundamental building blocks of Factor. Others might describe them as a code smell and seek to use dataflow combinators or other higher-level words to reduce code complexity.

Whatever your opinion is, they are useful concepts in a concatenative language. Besides the basic shuffle words – like dup, swap, rot – we have had the shuffle vocabulary which provides some “additional shuffle words” for awhile, as well as a syntax word that can perform arbitrary shuffles:

IN: scratchpad USE: shuffle

IN: scratchpad { 1 2 3 4 5 } [
                   shuffle( a b c d e -- d e c a b )
               ] with-datastack .
{ 4 5 3 1 2 }

This would be quite useful, except that it has had a fundamental issue – the way it is implemented uses a macro to curry the stack arguments into an array, and then pull the stack arguments back out of the array onto the stack in the requested order.

For example, we can look at a simple swap and a complex shuffle:

IN: scratchpad [ shuffle( x y -- y x ) ] optimized.
[
    2 0 <array> dup >R >R R> 3 set-slot
    R> >R R> dup >R >R R> 2 set-slot
    R> >R R> dup >R >R R> >R R> 3 slot R> >R R> >R R> 2 slot
]

IN: scratchpad [ shuffle( a b c d e -- b a d c e ) ] optimized.
[
    5 0 <array> dup >R >R R> 6 set-slot
    R> >R R> dup >R >R R> 5 set-slot
    R> >R R> dup >R >R R> 4 set-slot
    R> >R R> dup >R >R R> 3 set-slot
    R> >R R> dup >R >R R> 2 set-slot
    R> >R R> dup >R >R R> >R R> 3 slot
    R> dup >R >R R> >R R> 2 slot R> dup >R >R R> >R R> 5 slot
    R> dup >R >R R> >R R> 4 slot R> >R R> >R R> 6 slot
]

And not only would this be less-than-efficient, it would also turn literal arguments that were on the stack into run-time arguments and potentially cause a Cannot apply 'call' to a run-time computed value error if one of the shuffled arguments is a quotation they hope to use.

This bug was described on our issue tracker and I spent some time recently looking into it.

It turns out that we can use the stack checker to indicate that a shuffle is taking place, and use some "special" machinery to allow the optimizing compiler to generate efficient and correct code for these arbitrary shuffles.

After applying a small fix, we can see that the earlier examples are now quite simple:

IN: scratchpad [ shuffle( x y -- y x ) ] optimized.
[ swap ]

IN: scratchpad [ shuffle( a b c d e -- b a d c e ) ] optimized.
[
    ( 10791205 10791206 10791207 10791208 10791209 -- 10791206 10791205 10791208 10791207 10791209 )
]

This is available in the latest development version.

Even More Brainf*ck

Sat, 17 May 2025 08:00:00 -0700

I was distracted a little by some recent explorations building a Brainfuck interpreter in Factor and had a couple of follow-ups to add to the conversation.

First, I realized my initial quick-and-dirty Brainfuck interpreter didn’t support nested loops. Specifically, the logic for beginning or ending a loop just searched for the nearest [ or ] character without considering nesting. This was fixed today so that will no longer be an issue.

Second, despite the Brainfuck compiler implicitly making an AST (abstract syntax tree) for Brainfuck by virtue of generating a quotation, I thought it would be more fun to build and generate one intentionally.

We can model the Brainfuck commands as operations using the following tuples and singletons:

TUPLE: ptr n ;
TUPLE: mem n ;
SINGLETONS: output input debug ;
TUPLE: loop ops ;

Next, we can build a parser using EBNF to convert the textual commands into our Brainfuck AST:

EBNF: ast-brainfuck [=[

inc-ptr  = (">")+     => [[ length ptr boa ]]
dec-ptr  = ("<")+     => [[ length neg ptr boa ]]
inc-mem  = ("+")+     => [[ length mem boa ]]
dec-mem  = ("-")+     => [[ length neg mem boa ]]
output   = "."        => [[ output ]]
input    = ","        => [[ input ]]
debug    = "#"        => [[ debug ]]
space    = [ \t\n\r]+ => [[ f ]]
unknown  = (.)        => [[ "Invalid input" throw ]]

ops   = inc-ptr|dec-ptr|inc-mem|dec-mem|output|input|debug|space
loop  = "[" {loop|ops}+ "]" => [[ second sift loop boa ]]

code  = (loop|ops|unknown)* => [[ sift ]]

]=]

This is interesting, because now we can more easily analyze a piece of Brainfuck code, such as the Hello, World example that I have been frequently using:

IN: scratchpad "
               ++++++++++[>+++++++>++++++++++>+++>+<<<<-]
               >++.>+.+++++++..+++.>++.<<+++++++++++++++
               .>.+++.------.--------.>+.>.
               " ast-brainfuck .
V{
    T{ mem { n 10 } }
    T{ loop
        { ops
            V{
                T{ ptr { n 1 } }
                T{ mem { n 7 } }
                T{ ptr { n 1 } }
                T{ mem { n 10 } }
                T{ ptr { n 1 } }
                T{ mem { n 3 } }
                T{ ptr { n 1 } }
                T{ mem { n 1 } }
                T{ ptr { n -4 } }
                T{ mem { n -1 } }
            }
        }
    }
    T{ ptr { n 1 } }
    T{ mem { n 2 } }
    output
    T{ ptr { n 1 } }
    T{ mem { n 1 } }
    output
    T{ mem { n 7 } }
    output
    output
    T{ mem { n 3 } }
    output
    T{ ptr { n 1 } }
    T{ mem { n 2 } }
    output
    T{ ptr { n -2 } }
    T{ mem { n 15 } }
    output
    T{ ptr { n 1 } }
    output
    T{ mem { n 3 } }
    output
    T{ mem { n -6 } }
    output
    T{ mem { n -8 } }
    output
    T{ ptr { n 1 } }
    T{ mem { n 1 } }
    output
    T{ ptr { n 1 } }
    output
}

And then we can implement those operations against a brainfuck state object, by deferring to words from our current implementation:

GENERIC: op ( brainfuck op -- brainfuck )
M: ptr op n>> (>) ;
M: mem op n>> (+) ;
M: output op drop (.) ;
M: input op drop (,) ;
M: debug op drop (#) ;
M: loop op [ get-memory zero? ] swap ops>> '[ _ [ op ] each ] until ;

And now this Brainfuck AST represents a hybrid execution model somewhere between the compiled and interpreted versions:

: hybrid-brainfuck ( code -- )
    [ <brainfuck> ] dip ast-brainfuck [ op ] each drop ;

And see that it works:

IN: scratchpad "
               ++++++++++[>+++++++>++++++++++>+++>+<<<<-]
               >++.>+.+++++++..+++.>++.<<+++++++++++++++
               .>.+++.------.--------.>+.>.
               " hybrid-brainfuck
Hello World!

We also gain some potential for building code optimization techniques that operate on an AST as a step before actual compilation or execution – for example, coalescing adjacent increment and decrement operations or some other more complex analysis.

That, however, is likely to remain an exercise for the reader!

More Brainf*ck

Fri, 16 May 2025 08:00:00 -0700

Almost 16 years ago, I wrote about implementing the Brainfuck programming language in Factor. It is a curious programming language, sometimes considered one of the most famous esoteric programming languages.

In any event – and encouraged by a question I was asked recently – I spent some time thinking about the current process of “compiling” the Brainfuck into quotations versus how an interpreter might work instead.

As a quick reminder, our current implementation expands a program written in Brainfuck into an equivalent form in Factor, and then allows it to be run:

IN: scratchpad USE: brainfuck

IN: scratchpad "
               ++++++++++[>+++++++>++++++++++>+++>+<<<<-]
               >++.>+.+++++++..+++.>++.<<+++++++++++++++
               .>.+++.------.--------.>+.>.
               " run-brainfuck
Hello World!


IN: scratchpad [
                   "
                   ++++++++++[>+++++++>++++++++++>+++>+<<<<-]
                   >++.>+.+++++++..+++.>++.<<+++++++++++++++
                   .>.+++.------.--------.>+.>.
                   " run-brainfuck
               ] expand-macros .
[
    <brainfuck> 10 (+)
    [ get-memory zero? ] [
        1 (>) 7 (+) 1 (>) 10 (+) 1 (>) 3 (+) 1 (>) 1 (+) 4 (<)
        1 (-)
    ] until 1 (>) 2 (+) (.) 1 (>) 1 (+) (.) 7 (+) (.) (.) 3 (+)
    (.) 1 (>) 2 (+) (.) 2 (<) 15 (+) (.) 1 (>) (.) 3 (+)
    (.) 6 (-) (.) 8 (-) (.) 1 (>) 1 (+) (.) 1 (>) (.) drop flush
]

Notice the coalescing that it performs to collapse multiple identical operators into a single action.

But, this got me curious about a couple of things:

How could we cleanly write a Factor word that is implemented in Brainfuck?
How could we write a Factor interpreter for Brainfuck, and what benefits would it have?

Building a syntax word

Thankfully, the answer to the first question is simple using parsing words.

SYNTAX: BRAINFUCK:
    scan-new-word ";" parse-tokens concat
    '[ _ run-brainfuck ] ( -- ) define-declared ;

Now, we can define a Hello, World, complete with inline comments:

BRAINFUCK: hello
    +++++ +++               ! Set Cell #0 to 8
    [
        >++++               ! Add 4 to Cell #1; this will always set Cell #1 to 4
        [                   ! as the cell will be cleared by the loop
            >++             ! Add 4*2 to Cell #2
            >+++            ! Add 4*3 to Cell #3
            >+++            ! Add 4*3 to Cell #4
            >+              ! Add 4 to Cell #5
            <<<<-           ! Decrement the loop counter in Cell #1
        ]                   ! Loop till Cell #1 is zero
        >+                  ! Add 1 to Cell #2
        >+                  ! Add 1 to Cell #3
        >-                  ! Subtract 1 from Cell #4
        >>+                 ! Add 1 to Cell #6
        [<]                 ! Move back to the first zero cell you find; this will
                            ! be Cell #1 which was cleared by the previous loop
        <-                  ! Decrement the loop Counter in Cell #0
    ]                       ! Loop till Cell #0 is zero

    ! The result of this is:
    ! Cell No :   0   1   2   3   4   5   6
    ! Contents:   0   0  72 104  88  32   8
    ! Pointer :   ^

    >>.                     ! Cell #2 has value 72 which is 'H'
    >---.                   ! Subtract 3 from Cell #3 to get 101 which is 'e'
    +++++ ++..+++.          ! Likewise for 'llo' from Cell #3
    >>.                     ! Cell #5 is 32 for the space
    <-.                     ! Subtract 1 from Cell #4 for 87 to give a 'W'
    <.                      ! Cell #3 was set to 'o' from the end of 'Hello'
    +++.----- -.----- ---.  ! Cell #3 for 'rl' and 'd'
    >>+.                    ! Add 1 to Cell #5 gives us an exclamation point
    >++.                    ! And finally a newline from Cell #6
    ;

And, it works!

IN: scratchpad hello
Hello World!

Note: we are using ! as a comment character which is the convention in Factor. Some Brainfuck implementations use that character to indicate embedded program inputs.

That’s pretty cool, and a neat example of using the lexer, the parser, and macros.

Building an interpreter

The answer to the second question might be more complex and nuanced, but thankfully we can re-use some of the current implementation to make a quick-and-dirty interpreter:

: end-loop ( str i -- str j/f )
    CHAR: ] swap pick index-from dup [ 1 + ] when ;

: start-loop ( str i -- str j/f )
    1 - CHAR: [ swap pick last-index-from dup [ 1 + ] when ;

: interpret-brainfuck-from ( str i brainfuck -- str next/f brainfuck )
    2over swap ?nth [ 1 + ] 2dip {
        { CHAR: > [ 1 (>) ] }
        { CHAR: < [ 1 (<) ] }
        { CHAR: + [ 1 (+) ] }
        { CHAR: - [ 1 (-) ] }
        { CHAR: . [ (.) ] }
        { CHAR: , [ (,) ] }
        { CHAR: # [ (#) ] }
        { CHAR: [ [ get-memory zero? [ [ end-loop ] dip ] when ] }
        { CHAR: ] [ get-memory zero? [ [ start-loop ] dip ] unless ] }
        { f [ [ drop f ] dip ] }
        [ blank? [ "Invalid input" throw ] unless ]
    } case ;

: interpret-brainfuck ( str -- )
    0 <brainfuck> [ interpret-brainfuck-from over ] loop 3drop ;

And give it a try:

IN: scratchpad "
               ++++++++++[>+++++++>++++++++++>+++>+<<<<-]
               >++.>+.+++++++..+++.>++.<<+++++++++++++++
               .>.+++.------.--------.>+.>.
               " interpret-brainfuck
Hello, World!

It works! But now I’m curious about relative performance. Let’s build a silly benchmark equivalent to cat to redirect the contents of an input-stream to an output-stream. We can compare our compiled run-brainfuck macro to a version that uses the interpreter we made above and then to a native version implemented using stream-copy.

: cat1 ( -- ) ",[.,]" run-brainfuck ;

: cat2 ( -- ) ",[.,]" interpret-brainfuck ;

: cat3 ( -- ) input-stream get output-stream get stream-copy* ;

First, we should make sure they both work:

IN: scratchpad "Compiled" [ cat1 ] with-string-reader
Compiled

IN: scratchpad "Interpreted" [ cat2 ] with-string-reader
Interpreted

IN: scratchpad "Factor" [ cat3 ] with-string-reader
Factor

Okay, so it seems to work!

For quick performance testing, lets compare them outputting to a null stream:

IN: scratchpad [
                   1,000,000 CHAR: a <string> [
                       [ cat1 ] with-null-writer
                   ] with-string-reader
               ] time
Running time: 0.059820291 seconds

IN: scratchpad [
                   1,000,000 CHAR: a <string> [
                       [ cat2 ] with-null-writer
                   ] with-string-reader
               ] time
Running time: 0.13840325 seconds

IN: scratchpad [
                   1,000,000 CHAR: a <string> [
                       [ cat3 ] with-null-writer
                   ] with-string-reader
               ] time
Running time: 0.015008417 seconds

The compiled one is a bit more than 2x faster than the interpreted version, but both are slower than the native version.

Let’s try comparing our “Hello, World” example – where the operator coalescing that the compiled version does might help:

: hello1 ( -- )
   "
   ++++++++++[>+++++++>++++++++++>+++>+<<<<-]
   >++.>+.+++++++..+++.>++.<<+++++++++++++++
   .>.+++.------.--------.>+.>.
   " run-brainfuck ;

: hello2 ( -- )
   "
   ++++++++++[>+++++++>++++++++++>+++>+<<<<-]
   >++.>+.+++++++..+++.>++.<<+++++++++++++++
   .>.+++.------.--------.>+.>.
   " interpret-brainfuck ;

We can see that the compiled one is now more like 7x faster:

IN: scratchpad [ [ 10,000 [ hello1 ] times ] with-null-writer ] time
Running time: 0.018075292 seconds

IN: scratchpad [ [ 10,000 [ hello2 ] times ] with-null-writer ] time
Running time: 0.133718416 seconds

Obviously, this is somewhat comparing apples and oranges because it’s ignoring compilation time in the comparison, and I didn’t spend any time on optimizing the interpreted version – for example, stripping blanks or validating inputs before doing the interpreter loop – but it’s a useful starting point for understanding tradeoffs.

How long does it currently take to compile?

IN: scratchpad [
                    [
                       gensym [
                           "
                           ++++++++++[>+++++++>++++++++++>+++>+<<<<-]
                           >++.>+.+++++++..+++.>++.<<+++++++++++++++
                           .>.+++.------.--------.>+.>.
                           " run-brainfuck
                       ] ( -- ) define-declared
                   ] with-compilation-unit
               ] time
Running time: 0.02294075 seconds

…a bit more time than it takes to call it 10,000 times. Interesting.

Roman Sort

Mon, 12 May 2025 08:00:00 -0700

Factor has included support for Roman numerals since at least 2008. Sometimes recent events – such as the election of Pope Leo XIV or the Super Bowl LIX – revive modern interest in how they work and how to computationally work with them.

There was a blog post a few days ago about sorting Roman numerals which pointed out that sorting them alphabetically worked pretty well. Given that we have a pretty good roman vocabulary, I thought we could explore different methods of sorting a sequence of strings representing Roman numbers.

Let’s first try the mostly-but-not-entirely-correct method suggested in the blog post:

IN: scratchpad 10 [1..b) [ >ROMAN ] map

IN: scratchpad sort .
{ "I" "II" "III" "IV" "IX" "V" "VI" "VII" "VIII" }

Well, that’s almost correct, but the number IX – the number 9 – sorts in the middle, rather than at the end. Could we fix this by using sort-by to convert the string to a number before calling compare to produce a sorted output?

IN: scratchpad 10 [1..b) [ >ROMAN ] map

IN: scratchpad [ roman> ] sort-by .
{ "I" "II" "III" "IV" "V" "VI" "VII" "VIII" "IX" }

That fixes it, but how often do we end up calling the conversion function?

IN: scratchpad 10 [1..b) [ >ROMAN ] map

IN: scratchpad SYMBOL: calls

IN: scratchpad [ calls inc roman> ] sort-by .
{ "I" "II" "III" "IV" "V" "VI" "VII" "VIII" "IX" }

IN: scratchpad calls get .
40

Wow, 40 times – that seems like a lot!

Perhaps we could try the Schwartzian transform – also known as decorate-sort-undecorate – at the expense of using intermediate storage by saving the keys for each element, then sorting, and then returning only the values:

IN: scratchpad 10 [1..b) [ >ROMAN ] map

IN: scratchpad [ [ roman> ] keep ] map>alist sort-keys values .
{ "I" "II" "III" "IV" "V" "VI" "VII" "VIII" "IX" }

That seems like a lot of code to do a simple thing. Instead, we can use the map-sort abstraction which implements the same method:

IN: scratchpad 10 [1..b) [ >ROMAN ] map

IN: scratchpad [ roman> ] map-sort .
{ "I" "II" "III" "IV" "V" "VI" "VII" "VIII" "IX" }

Does this make much of a difference? Let’s compare each method:

: roman-sort1 ( seq -- sorted-seq ) [ roman> ] sort-by ;

: roman-sort2 ( seq -- sorted-seq ) [ roman> ] map-sort ;

We can time sorting a list of 100,000 random Roman numbers under 1,000:

IN: scratchpad 100,000 1,000 [1..b] randoms [ >ROMAN ] map

IN: scratchpad [ [ roman-sort1 ] time ]
               [ [ roman-sort2 ] time ] bi
Running time: 4.164076625 seconds
Running time: 0.154227583 seconds

You can see that the decorate-sort-undecorate pattern is quite a bit faster in this case. This is not always true, but generally depends on how much work the key function is doing.

Recamán’s Sequence

Tue, 06 May 2025 10:00:00 -0700

I love reading John D. Cook’s blog which covers various mathematical concepts, code snippets, and other curious explorations. Today, he celebrated his 5,000th post which is a pretty incredible milestone. In honor of that, let’s implement the Recamán’s sequence which he wrote about yesterday:

I recently ran into Recamán’s sequence. N. J. A. Sloane, the founder of the Online Encyclopedia of Integer Sequences calls Recamán’s sequence one of his favorites. It’s sequence A005132 in the OEIS.

This sequence starts at 0 and the nth number in the sequence is the result of moving forward or backward n steps from the previous number. You are allowed to move backward if the result is positive and a number you haven’t already visited. Otherwise you move forward.

Here’s Python code to generate the first N elements of the sequence.
def recaman(N):
    a = [0]*N
    for n in range(1, N):
        proposal = a[n-1] - n
        if proposal > 0 and proposal not in set(a):
            a[n] = proposal
        else:
            a[n] = a[n-1] + n
    return a
For example, recaman(10) returns
[0, 1, 3, 6, 2, 7, 13, 20, 12, 21]

Direct Implementation

The code for this subtract if you can, add if you can’t sequence is not particularly complex, but it is often fun to see what an algorithm looks like when implemented in a different language. So, we’ll try a direct implementation in Factor first and then talk about some variations.

Using local variables, we can keep our version very similar to the original:

:: recaman ( N -- seq )
    N 0 <array> :> a
    N [1..b) [| n |
        n 1 - a nth n - :> proposal
        proposal 0 > proposal a member? not and [
            proposal n a set-nth
        ] [
            n 1 - a nth n + n a set-nth
        ] if
    ] each a ;

And, show that it works:

IN: scratchpad 10 recaman .
{ 0 1 3 6 2 7 13 20 12 21 }

Short-Circuit Logic

One difference that is subtle is that Python’s and operation is short-circuiting so that if the first expression returns False, the second expression is not evaluated. We can make that change ourselves:

- proposal 0 > proposal a member? not and [
+ proposal 0 > [ proposal a member? not ] [ f ] if [

Or use short-circuit combinators to do that for us:

- proposal 0 > proposal a member? not and [
+ proposal { [ 0 > ] [ a member? not ] } 1&& [

Reduce Repetitions

We could also notice repetitions that occur, for example both branches of the if set the nth value of the array. In addition, we could keep the proposal on the stack instead of naming it as a local variable, and then just replace it when the branch evaluates to f.

Incorporating these changes results in this version:

:: recaman ( N -- seq )
    N 0 <array> :> a
    N [1..b) [| n |
        n 1 - a nth n -
        dup { [ 0 > ] [ a member? not ] } 1&& [
            drop n 1 - a nth n +
        ] unless n a set-nth
    ] each a ;

There is still repetition as each iteration looks up the previous value twice. Instead, we could fix that by storing that value as a local variable:

:: recaman ( N -- seq )
    N 0 <array> :> a
    N [1..b) [| n |
        n 1 - a nth :> prev
        prev n - dup { [ 0 > ] [ a member? not ] } 1&&
        [ drop prev n + ] unless n a set-nth
    ] each a ;

Sometimes it is simpler to extract an inner loop and then see how it looks:

:: next-recaman ( prev n a -- next )
    prev n - dup { [ 0 > ] [ a member? not ] } 1&&
    [ drop prev n + ] unless ;

:: recaman ( N -- seq )
    N 0 <array> :> a
    N [1..b) [
        [ 1 - a nth ] [ a next-recaman ] [ a set-nth ] tri
    ] each a ;

The retrieval of the previous value is un-necessary work on each iteration. We could remove that by keeping the previous value on the stack and use the make vocabulary to accumulate values. We also start from index 0 instead of 1 as a simplification:

:: next-recaman ( prev n -- next )
    prev n - dup { [ 0 > ] [ building get member? not ] } 1&&
    [ drop prev n + ] unless ;

:: recaman ( N -- seq )
    [ 0 N <iota> [ next-recaman dup , ] each drop ] { } make ;

Constant-time Lookups

But, there is a big performance issue. Searching the previous values to see if it had been generated before uses a linear-time algorithm. It would be much faster to use a bit-vector to track the previously seen numbers and a constant-time lookup:

:: next-recaman ( prev n seen -- next )
    prev n - dup { [ 0 > ] [ seen nth not ] } 1&&
    [ drop prev n + ] unless t over seen set-nth ;

:: recaman ( N -- seq )
    0 N dup <bit-vector> '[ _ next-recaman dup ] map-integers nip ;

This allows us to generate, for example, one million values in the sequence in a small amount of time:

IN: scratchpad [ 1,000,000 recaman last . ] time
1057164

Running time: 0.049441708 seconds

Or even calculate the nth value without saving the sequence:

:: nth-recaman ( N -- elt )
    0 N dup <bit-vector> '[ _ next-recaman ] each-integer ;

Using Generators

We could even use the generators vocabulary to make an infinite stream that we can sample from:

:: next-recaman ( prev n seen -- next )
    prev n - dup { [ 0 > ] [ seen nth not ] } 1&&
    [ drop prev n + ] unless t over seen set-nth ;

GEN: recaman ( -- gen )
    0 0 ?V{ } clone '[
        [ _ next-recaman dup yield ] [ 1 + ] bi t
    ] loop 2drop ;

And then take some values from it:

IN: scratchpad recaman 10 take .
{ 0 1 3 6 2 7 13 20 12 21 }

It is often pleasantly surprising to explore seemingly simple topics – for example, looking at this algorithm and thinking about different time complexity, data structures, or programming language libraries that might be useful.

I have added this today to the Factor development version.

Filtering Errors

Mon, 05 May 2025 08:00:00 -0700

There was a discussion today on the Factor Discord server about how one might filter or reject objects in a sequence based on whether a quotation throws an error or not when applied to each item. We’re going to implement that now in Factor and hopefully learn a few things.

First, we need a way to see if a quotation throws an error or not. One way would be to call the quotation and use recover to return a different boolean if an error was thrown. That might look something like this:

: throws? ( quot -- ? )
    '[ @ f ] [ drop t ] recover ; inline

And that kinda works:

IN: scratchpad 10 20 [ + ] throws?

--- Data stack:
30
f

IN: scratchpad 10 "20" [ + ] throws?

--- Data stack:
10
"20"
t

But you can see that in the working first case, it did not throw an error and the stack has the single output of the quotation and f, but in the failed second case it has the two inputs to the quotation and a t indicating that an error was thrown.

So, ideally we could modify this so that when the quotation succeeds, we drop the outputs, and when the quotation fails, we drop the inputs, and then can consistently produce a single boolean output for any quotation this applies to.

Luckily for us, we can use the drop-outputs and drop-inputs words which infer the stack effect and handles each case correctly. This would change our implementation slightly:

: throws? ( quot -- ? )
    [ '[ _ drop-outputs f ] ]
    [ '[ drop _ drop-inputs t ] ] bi recover ; inline

And now we can see that it handles both cases nicely:

IN: scratchpad 10 20 [ + ] throws?

--- Data stack:
f

IN: scratchpad 10 "20" [ + ] throws?

--- Data stack:
t

Now, we can combine that with our sequence operations and make these useful words:

: filter-errors ( ... seq quot -- ... subseq )
    '[ _ throws? ] filter ; inline

: reject-errors ( ... seq quot -- ... subseq )
    '[ _ throws? ] reject ; inline

And then see how they work:

IN: scratchpad { t "5" 123 } [ string>number ] filter-errors .
{ t 123 }

IN: scratchpad { t "5" 123 } [ string>number ] reject-errors .
{ "5" }

These words were added recently to the development version of Factor.

Human Sorting Improved

Sun, 30 Mar 2025 08:00:00 -0700

Factor has had human-friendly sorting since December 2007. It is unclear if it is related, but Ned Batchelder wrote about “human sorting” around the same time with links to a couple of other blog posts discussing similar topics, so perhaps it was in the zeitgeist of the time.

In any event, Ned recently wrote about “human sorting improved” which deals with the topic of how to sort two strings that are human-equivalent but are different and should probably have an ordering. This was the result of fixing a problem that actually happened in the coverage.py project.

For example, comparing "x1y" and "x001y" using the original algorithm would consider these to be equal given the same human keys: { "x" 1 "y" }.

You can see this in Factor 0.100:

IN: scratchpad USE: sorting.human

IN: scratchpad "x1y" "x001y" human<=> .
+eq+

Ned suggests that instead – if two strings are equal using the human sorting method – they should be compared for lexicographic ordering as a tie-breaker.

I have made that change in the latest development version of Factor so that the behavior is changed:

IN: scratchpad "x1y" "x001y" human<=> .
+gt+

Sports Betting

Wed, 26 Mar 2025 08:00:00 -0700

I have lately been curious about the Zig Programming Language and how it might be useful to Factor. In the process of doing some research, I bumped into a blog post about Converting decimal odds to probabilities with Zig, which shares some Zig source code for calculating sports betting odds:

As sports betting expands across the world, more and more players don’t understand tha math behind it. So, the purpose of this article it’s to clarify how to convert from decimal odds to the probability behind them. The calculations are implemented in Zig, a system programming language perfect to create CLI tools because it’s fast and simple.

It is particularly interesting given the growth and popularity of both sports betting on platforms like DraftKings, but also prediction markets like Polymarket and Manifold. You can read about some background information in Sports Betting Odds: How They Work and How To Read Them.

Note: I am not encouraging gambling, and in addition you should do your own research on the various platforms as there can be controversy about potential manipulation and transparency of the markets.

I don’t know much about (and do not do any) sports betting, but I thought it would be fun to learn about and then to translate some of these concepts to Factor.

Decimal

The “decimal odds” are popular in Europe and are the total returns including each original $1 bet if you win. These “decimal odds” effectively represent inverse probabilities. That is, if the odds are 4.5, then the probability is about 22% (which is 1/4.5).

Note: a variant of this is “fractional odds” popular in Britain which might instead quote those odds as 7/2 or 7-2 and describe the amount won ($7) in addition to the original bet ($2).

We can convert decimal odds to probabilities:

: odds>probs ( m -- n ) recip 100 * ;

: probs>odds ( n -- m ) 100 / recip ;

Sometimes the odds are quoted based on different outcomes – for example, in many team games there are three outcomes: win, lose, or tie. These probabilities should sum to 1, and seem to be often quoted with the payout included (typically less than 100%). The example given in the original blogpost is this:

We can calculate the payout for a group of odds:

: compute-payout ( odds -- k )
    [ recip ] map-sum recip ;

And given the example from the blogpost, the payout is about 95%:

IN: scratchpad { 1.27 6.00 10.25 } compute-payout .
0.9509054938365546

Which means, you could convert odds to probabilities using this payout number:

: compute-probs ( odds -- probs )
    dup compute-payout '[ odds>probs _ * ] map ;

This shows the different probabilities of the outcomes, using printf to format the sequence:

IN: scratchpad { 1.27 6.0 10.25 } compute-probs "%[%.2f, %]" printf
{ 74.87, 15.85, 9.28 }

Moneyline

The “moneyline odds” are more popular in the United States. The original blogpost that I mention above did not go into details, but I was curious, so we will explore how moneyline payouts work. These are quoted as positive (underdogs) or negative (favorites) values, and are the amount you would need to bet to receive $100 of winnings.

Using the NFL example from the Investopedia article.

Let’s say a betting website (also known as an online sportsbook) priced an NFL game between the Pittsburgh Steelers and the Kansas City Chiefs with the following moneyline odds.

Steelers: +585

Chiefs: -760

We can convert moneyline odds to decimal odds:

: moneyline>odds ( moneyline -- odds )
    100 / dup 1 < [ recip neg ] when 1 + ;

: odds>moneyline ( odds -- moneyline )
    1 - dup 1 < [ recip neg ] when 100 * ;

And use that to calculate the implied relative probabilities of that game’s winner:

IN: scratchpad { 585 -760 } [ moneyline>odds ] map
               compute-probs "%[%.2f, %]" printf
{ 14.18, 85.82 }

Parlay

There are also “parlay odds” which are multiple bets linked together as one.

Let’s say you want to bet three heavy favorites on the moneyline because you’re confident each team will win, but not sure if they’ll cover the spread.

So you bet Packers -300 against the Lions, Patriots -200 vs. the Jets, and Eagles -150 at the Washington Commanders.

We can convert these to odds:

IN: scratchpad { -300 -200 -150 } [ moneyline>odds ] map .
{ 1+1/3 1+1/2 1+2/3 }

And write a word to convert it to a parlay outcome:

: compute-parlay ( moneyline -- parlay )
    [ moneyline>odds ] map product odds>moneyline ;

So, this example would have a parlay payout of 233:

IN: scratchpad { -300 -200 -150 } compute-parlay .
233+1/3

And a probability of 30%:

IN: scratchpad 233+1/3 moneyline>odds odds>probs .
30

There are probably aspects of this that I did not cover above, but it’s kinda fun to explore new topics that you don’t know much about and to learn!

Base256Emoji

Sat, 22 Mar 2025 08:00:00 -0700

While looking into the multibase group of self-identifying base encodings, I discovered Base256Emoji which is an encoding format described by an emoji to use for each byte of an input buffer. This spec is implemented, for example, in both Go and Rust

Despite replacing each byte with a typically 3-byte or 4-byte UTF-8 sequence – which is unusual for byte encodings (they often seek to reduce the length of an input sequence) – there are some nice use cases.

We’re going to implement this in Factor and then discuss some variants.

Implementation

First, we define a 256 item sequence of emojis, one for each byte:

CONSTANT: base256>emoji "🚀🪐☄🛰🌌🌑🌒🌓🌔🌕🌖🌗🌘🌍🌏🌎🐉☀💻🖥\
💾💿😂❤😍🤣😊🙏💕😭😘👍😅👏😁🔥🥰💔💖💙😢🤔😆🙄💪😉☺👌🤗💜😔😎😇\
🌹🤦🎉💞✌✨🤷😱😌🌸🙌😋💗💚😏💛🙂💓🤩😄😀🖤😃💯🙈👇🎶😒🤭❣😜💋\
👀😪😑💥🙋😞😩😡🤪👊🥳😥🤤👉💃😳✋😚😝😴🌟😬🙃🍀🌷😻😓⭐✅🥺🌈😈\
🤘💦✔😣🏃💐☹🎊💘😠☝😕🌺🎂🌻😐🖕💝🙊😹🗣💫💀👑🎵🤞😛🔴😤🌼😫⚽🤙\
☕🏆🤫👈😮🙆🍻🍃🐶💁😲🌿🧡🎁⚡🌞🎈❌✊👋😰🤨😶🤝🚶💰🍓💢🤟🙁🚨💨\
🤬✈🎀🍺🤓😙💟🌱😖👶🥴▶➡❓💎💸⬇😨🌚🦋😷🕺⚠🙅😟😵👎🤲🤠🤧📌🔵💅🧐\
🐾🍒😗🤑🌊🤯🐷☎💧😯💆👆🎤🙇🍑❄🌴💣🐸💌📍🥀🤢👅💡💩👐📸👻🤐🤮🎼🥵\
🚩🍎🍊👼💍📣🥂"

And then we compute the reverse mapping:

CONSTANT: emoji>base256 $[ base256>emoji H{ } zip-index-as ]

With those two data blocks, we can define words to convert into and out of base256emoji:

: >base256emoji ( bytes -- str )
    [ base256>emoji nth ] "" map-as ;

: base256emoji> ( str -- bytes )
    [ emoji>base256 at ] B{ } map-as ;

You can try it out:

IN: scratchpad "Hello, Factor!" >base256emoji .
"😄✋🍀🍀😓💪😅💓🤤💃😈😓🥺👏"

IN: scratchpad "😄✋🍀🍀😓💪😅💓🤤💃😈😓🥺👏" base256emoji> "" like .
"Hello, Factor!"

Use Cases

The most interesting use case for base256emoji seems to be the ERC-7673: Distinguishable base256emoji Addresses proposal for Ethereum. This proposal seeks to “address spoofing attacks [that] have mislead tens of thousands of ether, and countless other tokens” by using visual emoji-based strings – a similar justification to the Drunken Bishop algorithm.

We can try base256emoji out with checksums to yield a similar benefit, displaying 16 emojis instead of a 32-byte hex-string:

IN: scratchpad "resource:license.txt" md5 checksum-bytes
               bytes>hex-string .
"ebb5ab617e3a88ed43f7d247c6466d95"

IN: scratchpad "resource:license.txt" md5 checksum-bytes
               >base256emoji .
"💌💨🤨🤤😠✨😹🥀😏🎼🤠🤩⬇💓🌷🤙"

Visual Collisions

Given a desire to use visually dissimilar emojis in identities, it would be useful to think about the chosen emoji set and how that might be interpreted in a visual context. Some criticism of this particular group of emojis, which are sometimes magnified by smaller font sizes, might focus on the visual similarity of:

Similar “globe” emojis (🌍 🌏 🌎)
Similar “face” emojis (🙂 😐 😑 🙁)
Similar “kiss” emojis (😙 😚 😗 😘)
Similar “star” emojis (⭐ 🌟)
Similar “grin” emojis (😀 😃 😄 😁 😆 😅)
Similar “heart” emojis (💔 💗 💕 💖 💘 💙 💜 💚 💛 🖤)
Similar “hand” emojis (🤞 👋 ✋ 👊 ✊ 🤝 🤲)

It might be desirable to choose emojis not based on community membership or common usage, but on their most dissimilar visual identity. to make it even harder for scammers to deliberately pick lookalike emojis and rely on small text sizes, platform font differences, or user inattention.

A couple of other emoji sets can be found, for example @Equim-chan/base256 (which uses one of my favorite emojis: 👾), npm/base-emoji, or even the KittenMoji crate.

Fun!

This is available on my GitHub.

RDAP

Tue, 18 Mar 2025 08:00:00 -0700

ICANN recently posted an update on Launching RDAP; Sunsetting WHOIS which got some discussion on Hacker News. Andy Newton, one of the creators of RDAP, has published A Guide to the Registration Data Access Protocol (RDAP), which is a pretty useful resource for understanding how it works. More information comes from the main RDAP website which describes it as:

The Registration Data Access Protocol (RDAP) is the successor to WHOIS. Like WHOIS, RDAP provides access to information about Internet resources (domain names, autonomous system numbers, and IP addresses). Unlike WHOIS, RDAP provides:

A machine-readable representation of registration data;

Differentiated access;

Structured request and response semantics;

Internationalisation;

Extensibility.

The WHOIS protocol that it replaces is super simple, being described by RFC 3912 in a few paragraphs. And, in fact, you can test it out on the command-line of most computers:

$ echo -e "factorcode.org\r\n" | nc -i 1 whois.cloudflare.com 43
Domain Name: FACTORCODE.ORG
Registry Domain ID: c49c93dee3304f39b081383262d320c6-LROR
Registrar WHOIS Server: whois.cloudflare.com
Registrar URL: https://www.cloudflare.com
Updated Date: 2025-01-15T22:46:54Z
Creation Date: 2005-12-01T04:54:37Z
Registrar Registration Expiration Date: 2025-12-01T04:54:37Z
Registrar: Cloudflare, Inc.
Registrar IANA ID: 1910
Domain Status: clienttransferprohibited https://icann.org/epp#clienttransferprohibited
...

The RDAP protocol must be equally simple, right? Well, not so fast. Instead of a few paragraphs, and simple queries over sockets, you get many RFCs describing it:

These, along with things like the RDAP Extension registry, and the protocol reliance on HTTP/HTTPS, JSON, and JSONPath considerably increase the complexity of RDAP implementations.

Below we are going to start an implementation using Factor.

Bootstrapping

The first concept we have to implement is RDAP Bootstrapping which uses 5 IANA files to redirect searches to the correct upstream RDAP servers.

Type	Link
Forward DNS	https://data.iana.org/rdap/dns.json
IPv4 Addresses	https://data.iana.org/rdap/ipv4.json
IPv6 Addresses	https://data.iana.org/rdap/ipv6.json
Autonomous System Numbers	https://data.iana.org/rdap/asn.json
Object Tags	https://data.iana.org/rdap/object-tags.json

We can abstract these by making a word to convert a type to a URL:

: bootstrap-url ( type -- url )
    "https://data.iana.org/rdap/" ".json" surround ;

We don’t want to retrieve these files all the time, so let’s cache them for 30 days:

INITIALIZED-SYMBOL: bootstrap-cache [ 30 days ]

: bootstrap-get ( type -- data )
    bootstrap-url cache-directory bootstrap-cache get
    download-outdated-into path>json ;

And provide a way to force delete the cached bootstrap files:

CONSTANT: bootstrap-files { "asn" "dns" "ipv4" "ipv6" "object-tags" }

: reset-bootstrap ( -- )
    [ bootstrap-files [ ".json" append ?delete-file ] each ] with-cache-directory ;

Each bootstrap file is described in RFC 9224, but basically we want to extract and manipulate the "services" block, modifying the keys of the assoc for convenient searching:

: parse-services ( data quot: ( key -- key' ) -- services )
    [ "services" of ] dip '[ [ _ map ] dip ] assoc-map ; inline

: search-services ( services quot: ( key -- ? ) -- urls )
    '[ drop _ any? ] assoc-find drop nip ; inline

We can then provide bootstrap data structures that are used for searching. For example, we find the longest subdomain that has an entry in the dns bootstrap list to handle both SLD and TLD:

: dns-bootstrap ( -- services )
    "dns" bootstrap-get "services" of ;

: split-domain ( domain -- domains )
    "." split dup length <iota> [ tail "." join ] with map ;

: domain-endpoints ( domain -- urls )
    split-domain dns-bootstrap [ swap member? ] with search-services ;

You can see that different domain names are directed to different RDAP endpoints:

IN: scratchpad "factorcode.org" domain-endpoints .
{ "https://rdap.publicinterestregistry.org/rdap/" }

IN: scratchpad "google.com" domain-endpoints .
{ "https://rdap.verisign.com/com/v1/" }

Or, find the correct endpoint for a given IPV4 address from the ipv4 bootstrap list:

: ipv4-bootstrap ( -- services )
    "ipv4" bootstrap-get [ >ipv4-network ] parse-services ;

: ipv4-endpoints ( ipv4 -- urls )
    ipv4-aton ipv4-bootstrap [ ipv4-contains? ] with search-services ;

Lookup

The RDAP data is typically available from HTTP or HTTPS web servers, as JSON files, but it uses a custom mime-type application/rdap+json. We can write a simple word to make the request and convert the response:

: accept-rdap ( request -- request )
    "application/rdap+json" "Accept" set-header ;

: rdap-get ( url -- response rdap )
    <get-request> accept-rdap http-request
    dup string? [ utf8 decode ] unless json> ;

And now we can build a word to lookup a domain:

: lookup-domain ( domain -- results )
    [ domain-endpoints random ]
    [ "domain/%s" sprintf derive-url rdap-get nip ] bi ;

Or to lookup an IPV4 address:

: lookup-ipv4 ( ipv4 -- results )
    [ ipv4-endpoints random ]
    [ "ip/%s" sprintf derive-url rdap-get nip ] bi ;

And we can try it out, getting an extensive response:

IN: scratchpad "factorcode.org" lookup-domain

--- Data stack:
LH{ { "rdapConformance" ~array~ } { "notices" ~array~ } { ...

Output

It would be nice to print the output in a more human-readable format. For now, we will just print these as a nested tree of keys and values:

GENERIC: print-rdap-nested ( padding key value -- )

M: linked-assoc print-rdap-nested
    [ over write write ":" print "  " append ] dip
    [ swapd print-rdap-nested ] with assoc-each ;

M: array print-rdap-nested
    [ print-rdap-nested ] 2with each ;

M: object print-rdap-nested
    present [ 2drop ] [ [ ": " [ write ] tri@ ] dip print ] if-empty ;

: print-rdap ( results -- )
    [ "" -rot print-rdap-nested ] assoc-each ;

This could probably be improved a fair bit – for example, the keys could be made more readable, and it doesn’t handle vCard entries very well.

Try it out!

You can try this out, lookup a domain name:

IN: scratchpad "factorcode.org" lookup-domain print-rdap
rdapConformance: rdap_level_0
rdapConformance: icann_rdap_response_profile_0
rdapConformance: icann_rdap_technical_implementation_guide_0
ldhName: factorcode.org
unicodeName: factorcode.org
nameservers:
  ldhName: carl.ns.cloudflare.com
  unicodeName: carl.ns.cloudflare.com
  objectClassName: nameserver
  handle: c34bedeccd8e4514b917e9e82a052077-LROR
  status: associated
nameservers:
  ldhName: kay.ns.cloudflare.com
  unicodeName: kay.ns.cloudflare.com
  objectClassName: nameserver
  handle: 7fc12bf413944de088f27f837349a8da-LROR
  status: associated
...

Or, lookup an IP address:

IN: scratchpad "1.1.1.1" lookup-ipv4 print-rdap
rdapConformance: history_version_0
rdapConformance: nro_rdap_profile_0
rdapConformance: cidr0
rdapConformance: rdap_level_0
events:
  eventAction: registration
  eventDate: 2011-08-10T23:12:35Z
events:
  eventAction: last changed
  eventDate: 2023-04-26T22:57:58Z
name: APNIC-LABS
status: active
type: ASSIGNED PORTABLE
endAddress: 1.1.1.255
ipVersion: v4
startAddress: 1.1.1.0
objectClassName: ip network
handle: 1.1.1.0 - 1.1.1.255
...

Pretty cool!

This was recently committed to the development version of Factor.

Two Sum

Sat, 22 Feb 2025 08:00:00 -0700

A few days ago, Ryan Petermen wrote a tweet about switching to Python from C++ in a solution for the Two Sum problem from LeetCode:

Given an array of integers nums and an integer target, return indices of the two numbers such that they add up to target.

You may assume that each input would have exactly one solution, and you may not use the same element twice.

You can return the answer in any order.

Spoilers below!

They offered this iterative C++ solution:

std::vector<int> twoSum(std::vector<int>& nums, int target) {
    std::unordered_map<int, int> hashMap;
    std::vector<int> result;

    for (int i = 0; i < nums.size(); ++i) {
        int complement = target - nums[i];
        if (hashMap.find(complement) != hashMap.end()) {
            result.push_back(hashMap[complement]);
            result.push_back(i);
            return result;
        }
        hashMap[nums[i]] = i;
    }
    return result;
}

Followed by this improved Python solution:

def two_sum(nums, target):
    hashmap = {}

    for i, num in enumerate(nums):
        complement = target - num
        if complement in hashmap:
            return [hashmap[complement], i]
        hashmap[num] = i

Of course, I was curious what a Factor solution would look like.

Direct translation

We can start by making a mostly direct translation using local variables:

:: two-sum ( nums target -- i j )
    H{ } clone :> hashmap
    nums <enumerated> [
        first2 :> ( i num )
        target num - :> complement
        complement hashmap at [
            i num hashmap set-at f
        ] unless*
    ] map-find ?first ;

And a few test cases to show that it works:

{ 0 1 } [ { 2 7 11 15 } 9 two-sum ] unit-test
{ 1 2 } [ { 3 2 4 } 6 two-sum ] unit-test
{ 0 1 } [ { 3 3 } 6 two-sum ] unit-test

Using map-find-index

Sometimes a higher-level word exists that can simplify logic, for example map-find-index:

:: two-sum ( nums target -- i j )
    H{ } clone :> hashmap
    nums [| num i |
        target num - :> complement
        complement hashmap at [
            i num hashmap set-at f
        ] unless*
    ] map-find-index drop ;

Implicit arguments

In the spirit of concatenative thinking, we could reduce the amount of named variables we have slightly by not naming the complement internal variable:

:: two-sum ( nums target -- i j )
    H{ } clone :> hashmap
    nums [| num i |
        target num - hashmap at [
            i num hashmap set-at f
        ] unless*
    ] map-find-index drop ;

Fried quotations

And that works, but perhaps we could try some alternative syntax features like fried quotations:

: two-sum ( nums target -- i j )
    H{ } clone dup '[
        swap _ over - _ at [ 2nip ] [ _ set-at f ] if*
    ] map-find-index drop ;

Is that better? Perhaps.

How else might you solve this problem?

Can you do it in one-line?

Random Derangement

Mon, 16 Dec 2024 08:00:00 -0700

I had a lot of fun writing code to compute derangements recently. I thought I was done with that topic until I bumped into a question on StackOverflow asking how to generate a random derangement of a list. Being nerd sniped is a real thing, and so I started looking at solutions.

There’s a paper called “An analysis of a simple algorithm for random derangements” that has an, ahem, simple algorithm. The basic idea is to generate a random permutation of indices, breaking early if the random permutation is obviously not a derangement.

One way to take a random permutation would be to use our permutations virtual sequence:

IN: scratchpad "ABCDEF" <permutations> random .
"FCEBDA" ! is a derangement

IN: scratchpad "ABCDEF" <permutations> random .
"DFBCEA" ! is NOT a derangement

And so you could loop until a derangement of indices is found:

: random-derangement-indices ( n -- seq )
    f swap <iota> <permutations>
    '[ drop _ random dup derangement? not ] loop ;

But, since only 36% or so of permutations are derangements, perhaps it would be faster and better to implement the algorithm from that paper – making our own random permutation of indices and breaking early if obviously not a derangement:

:: random-derangement-indices ( n -- indices )
    n <iota> >array :> seq
    f [
        dup :> v
        n 1 (a..b] [| j |
            j 1 + random :> p
            p v nth j = [ t ] [ j p v exchange f ] if
        ] any? v first zero? or
    ] [ drop seq clone ] do while ;

We can use that to build a random-derangement word:

: random-derangement ( seq -- seq' )
    [ length random-derangement-indices ] [ nths ] bi ;

And then, for example, get a random derangement of the alphabet – of which there are one hundred and forty-eight septillion derangements, give or take – in under a millisecond:

IN: scratchpad "ABCDEFGHIJKLMNOPQRSTUVWXYZ" random-derangement .
"CZFABMSUXRQDEHGYJLTPVOIKWN"

We could check to make sure that we generate all derangments with equal possibility using a simple test case:

IN: scratchpad 1,000,000 [
                   "ABCD" random-derangement
               ] replicate histogram sort-keys .
{
    { "BADC" 111639 }
    { "BCDA" 110734 }
    { "BDAC" 110682 }
    { "CADB" 111123 }
    { "CDAB" 111447 }
    { "CDBA" 111147 }
    { "DABC" 111215 }
    { "DCAB" 111114 }
    { "DCBA" 110899 }
}

Looks good to me!

Derangements

Sun, 08 Dec 2024 08:00:00 -0700

Derangements, also sometimes known as deranged permutations, are described as:

In combinatorial mathematics, a derangement is a permutation of the elements of a set in which no element appears in its original position. In other words, a derangement is a permutation that has no fixed points.

There is a fun online derangements generator tool that you can use to play with computing the derangements of a sequence as well as calculating the number of derangements for a given sequence size.

As an example, we can use the math.combinatorics vocabulary, to generate all the permutations of the sequence { 0 1 2 }:

IN: scratchpad { 0 1 2 } all-permutations .
{
    { 0 1 2 }
    { 0 2 1 }
    { 1 0 2 }
    { 1 2 0 }
    { 2 0 1 }
    { 2 1 0 }
}

Since a derangement is a permutation that requires each element to be in a different slot, we could write a word to check the permuted indices to see if that is true:

: derangement? ( indices -- ? )
    dup length <iota> [ = ] 2any? not ;

These would be the two derangements of the indices { 0 1 2 }:

IN: scratchpad { 0 1 2 } all-permutations [ derangement? ] filter .
{
    { 1 2 0 }
    { 2 0 1 }
}

The number of derangements is the subfactorial of the length of the sequence:

: subfactorial ( n -- ? )
    [ 1 ] [ factorial 1 + e /i ] if-zero ;

We can build a <derangement-iota> that is a sequence as long as that number:

: <derangement-iota> ( seq -- <iota> )
    length subfactorial <iota> ; inline

And we can build a next-derangement word that calculates the next permutation that is a derangement:

: next-derangement ( seq -- seq )
    [ dup derangement? ] [ next-permutation ] do until ;

We can then build upon some of the code for iterating permutations, designing an internal derangements-quot word that is similar in form to the existing permutations-quot word:

: derangements-quot ( seq quot -- seq quot' )
    [ [ <derangement-iota> ] [ length <iota> >array ] [ ] tri ] dip
    '[ drop _ next-derangement _ nths-unsafe @ ] ; inline

And then use it to build a series of words that can provide iteration across derangements:

: each-derangement ( ... seq quot: ( ... elt -- ... ) -- ... )
    derangements-quot each ; inline

: map-derangements ( ... seq quot: ( ... elt -- ... newelt ) -- ... newseq )
    derangements-quot map ; inline

: filter-derangements ( ... seq quot: ( ... elt -- ... ? ) -- ... newseq )
    selector [ each-derangement ] dip ; inline

: all-derangements ( seq -- seq' )
    [ ] map-derangements ;

: all-derangements? ( ... seq quot: ( ... elt -- ... ? ) -- ... ? )
    derangements-quot all? ; inline

: find-derangement ( ... seq quot: ( ... elt -- ... ? ) -- ... elt/f )
    '[ _ keep and ] derangements-quot map-find drop ; inline

: reduce-derangements ( ... seq identity quot: ( ... prev elt -- ... next ) -- ... result )
    swapd each-derangement ; inline

And, now we can use this to find the nine derangements for "ABCD":

IN: scratchpad "ABCD" all-derangements .
{
    "BADC"
    "BCDA"
    "BDAC"
    "CADB"
    "CDAB"
    "CDBA"
    "DABC"
    "DCAB"
    "DCBA"
}

This is available on my GitHub.

Zen of Factor

Thu, 05 Dec 2024 08:00:00 -0700

Years ago, I remember reading a blog post called “Why is Groovy is big?” by Slava Pestov, the original author of Factor. In it, he talks about lines of code and sets the stage for how concatenative thinking can lead to properties like conciseness and readability, ending with this:

I tend to think the majority of code people write is overly complicated, full of redundancy, and designed for such flexibility that in practice is not needed at all. I hope one day this trend reverses.

I’ve been thinking a lot recently about 20 years of Factor and I thought it would be fun to write a Zen of Factor. Perhaps inspired by the Zen of Programming book written in 1987 by Geoffrey James, there are a couple of examples I wanted to point to first.

Python

The Python programming language was one of the first languages that I am aware of to get a Zen, contributed at least as far back as February 2002 in a kind of easter egg fashion encrypted by ROT13:

$ python -c "import this" 
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Zig

Quite a bit more recently, the Zig programming language introduced a Zen through a series of commits starting in August 2017. You can see the current version by running zig zen:

$ zig zen

 * Communicate intent precisely.
 * Edge cases matter.
 * Favor reading code over writing code.
 * Only one obvious way to do things.
 * Runtime crashes are better than bugs.
 * Compile errors are better than runtime crashes.
 * Incremental improvements.
 * Avoid local maximums.
 * Reduce the amount one must remember.
 * Focus on code rather than style.
 * Resource allocation may fail; resource deallocation must succeed.
 * Memory is a resource.
 * Together we serve the users.

Factor

In that spirit, I thought it would be fun to put together some ideas for what a Zen of Factor might look like, incorporating aspects of the language that I enjoy, some thematic elements that might make you more successful learning and developing in it, as well as pointing out the strong community that we have and hope to grow.

Here is the current draft:

The Zen of Factor

The REPL is your playground.
Working code beats perfect theory.
Words are better than paragraphs.
Stack effects tell the story.
Any syntax you need, you can create.
Simple primitives yield powerful combinations.
First make it work, then make it beautiful.
Make it beautiful, then make it fast.
Quick hacks grow into robust solutions.
When in doubt, factor it out.
Every word should do one thing well.
Let the stack guide your logic.
Write less, compose more.
If it works, ship it.
If it doesn't work, fix it.
If you don't like it, change it.
Today's beginner is tomorrow's core developer.
Questions encouraged, PRs welcome.

I really appreciate hearing about programs and problems that developers work on, getting feedback that allows us to iteratively improve, and knowing that every commit leads us towards a better future.

Thank you!

Watching Code

Tue, 03 Dec 2024 08:00:00 -0700

Factor has a watching words feature that allows you to see the inputs and outputs of a word when it is called. I have wanted a way to define a watched code block that we could use to see the stack before and after some inner part of a word.

It turns out that it’s pretty simple to make this syntax using our existing watch implementation from the tools.annotations vocabulary:

DEFER: WATCH>

SYNTAX: <WATCH
    \ WATCH> parse-until >quotation dup (watch) append! ;

Using this, we can now watch the inner part of a word, for example this word that increments the input by 1:

: foo ( x -- y ) 1 <WATCH + WATCH> ;

And see it used:

IN: scratchpad 10 foo
--- Entering [ + ]
x 10
x 1
--- Leaving [ + ]
x 11

--- Data stack:
11

One thing that I’d like to improve on this someday, is making it so this watch syntax has a prettyprint implementation that allows it to be rendered as it is typed.

I have added this to the latest developer version if you’d like to update and give it a try!

Listener Font Sizes

Wed, 13 Nov 2024 08:00:00 -0700

Factor contains a REPL – called the Listener – available on the command-line and graphically as part of the UI developer tools. For many users, this is their main interface to programming in the Factor programming language.

We sometimes get requests to better support styling the user interface. This has led to improvements such as support for light and dark themes, adjustable font sizes, and other customizations. There have been a few existing ways to style the UI listener including support for keyboard commands to increase or decrease font sizes. But, until recently this only affected new output or new Listener sessions.

Today, I improved this to make adjusting the font size much more dynamic, using traditional keyboard shortcuts of Ctrl – or Cmd on macOS – combined with + or -:

There should have been a video here but your browser does not seem to support it.

Give it a try, and please let us know other ways we can make improvements!

Finding Subsequences

Tue, 12 Nov 2024 08:00:00 -0700

Recently, I’ve been inspired by conversations taking place on our Factor Discord server. This sometimes reflects areas of interest from new contributors, curiousity exploring similarities and differences between Factor and other programming languages, or even early learning moments when exploring concatenative languages in general.

Today, someone asked about how to think about “accumulation of values in an array… to find all occurences (their position) of a subseq in a seq”. The solution to this might have this word name and stack effect:

: subseq-indices ( seq subseq -- indices ) ... ;

Before answering, I wanted to make sure they wanted to find overlapping indices vs. non-overlapping indices, and they clarified that they expect it to find this result – allowing overlapping subsequences:

IN: scratchpad "abcabcabc" "abcabc" subseq-indices .
{ 0 3 }

So, now that we have a reasonable specification, how do we think about solving this problem when we are at the same time learning to solve problems in stack languages and trying to see what features of Factor’s standard library would help.

There are a lot of ways to think about this, and I often recommend one of three approaches:

starting inside-out (working on the inner part of the loop)
starting outside-in (modeling the outer loop and then figuring out what’s inside it)
using local variables (helpful when coming from an applicative language background)

So let’s look at each approach in turn:

Inside Out

The inner logic is going to require something like “take an index to start from and find the next matching subseq index”, which looks an awful lot like subseq-index-from – except you also want to increment the found index afterwards to make sure you are progressing through the sequence.

: next-subseq-index ( index seq subseq -- next-index/f found-index/f )
    subseq-index-from [ [ 1 + ] keep ] [ f f ] if* ;

Then you could use it like so in a loop with an accumulator:

: subseq-indices ( seq subseq -- indices )
    [ V{ } clone 0 ] 2dip '[
        _ _ next-subseq-index dup [ [ pick push ] keep ] when
    ] loop drop ;

But that feels like we had to work hard to do that, directly using an accumulator, conditionals, and some stack shuffling. Luckily we have some higher level words that might help, for example the make vocabulary which has an implicit accumulator that we can use , or % to push into:

: subseq-indices ( seq subseq -- indices )
    [ 0 ] 2dip '[
        [ _ _ next-subseq-index dup [ , ] when* ] loop
    ] { } make nip ;

Or even using a while* loop, which is less code:

: subseq-indices ( seq subseq -- indices )
    [ 0 ] 2dip '[
        [ _ _ next-subseq-index ] [ , ] while*
    ] { } make nip ;

But that feels like a lot too, simpler might be produce:

: subseq-indices ( seq subseq -- indices )
    [ 0 ] 2dip '[ _ _ next-subseq-index dup ] [ ] produce 2nip ;

Or using follow, adjusting our start index and increment:

: subseq-indices ( seq subseq -- indices )
    [ -1 ] 2dip '[ 1 + _ _ subseq-index-from ] follow rest ;

Outside In

The outer logic approach would be something like “we need to loop from the start of the sequence, finding the next match, and accumulating it, until we hit some exit condition and then return a result” which you could write in a kind of non-functional stack pseudocode:

: subseq-indices ( seq subseq -- indices )
    0 [ find-next-match ] [ accumulate-match ] while ;

Then you have to kind of figure out what goes into those blocks:

: find-next-match ( seq subseq n -- found-index/f )
    -rot subseq-index-from ;

And also something like:

: accumulate-match ( accum found-index -- accum next-index )
    [ suffix! ] keep 1 + ;

Taking those, and maybe thinking about what items should be on the stack and in what order to reduce stack shuffling, becomes something like:

: subseq-indices ( seq subseq -- indices )
    [ V{ } clone 0 ] 2dip
    '[ _ _ subseq-index-from ] [ [ suffix! ] keep 1 + ] while* ;

It is true that [ suffix! ] keep 1 + is also [ suffix! ] [ 1 + ] bi, with varying aesthetics and ease of understanding, but sometimes when learning a new language especially a stack language with combinators, it is sometimes easy to start with stack shuffling and then learn about these forms later to see if they can improve your code.

Locals

Instead of those two stack approaches, we could instead use our local variables and write one big word in a manner similar to applicative languages, stepping back and focusing on the result we want:

:: subseq-indices ( seq subseq -- indices )
    V{ } clone :> accum
    0 :> i!

    [ i seq subseq subseq-index-from ]
    [ dup accum push 1 + i! ] while*

    accum ;

When working on this stuff, it’s nice to remember you can put a B to set a breakpoint in places to examine the stack at some inner point, or perhaps write a comment showing the incoming stack and optionally the outgoing stack that a piece of code is expected to have so that you understand what is happening in the next few lines:

! the next block of code finds the next index
! ( index seq subseq -- found-index )

! and pushes it into an accumulator
! ( accum found-index -- accum )

This was added to the developer branch in the sequences.extras vocabulary.

We love to hear questions and it’s even better when we can provide answers or guidance for learning and solving problems. Feel free to join our conversations and explore learning Factor!

Removing Subdomains

Tue, 05 Nov 2024 08:00:00 -0700

There was an interesting question on the Unix & Linux StackExchange asking how to remove subdomains or existing domains. I thought it would be fun to show a few different approaches to solving this using Factor.

Our first step should be to understand what is a subdomain:

A subdomain is a prefix added to a domain name to separate a section of your website. Site owners primarily use subdomains to manage extensive sections that require their own content hierarchy, such as online stores, blogs, job boards or support platforms.

Common Subdomains

If we’re curious about what common subdomains are, we can turn to the SecLists project – described as a “security tester’s companion” – which maintains a list of common 5,000 subdomains, 20,000 subdomains, and 110,000 subdomains that were generated in 2015 as well as a combined subdomains list that has some additional ones added.

You can download the top 5,000 common subdomains using memoization to cache the result:

MEMO: top-5000-subdomains ( -- subdomains )
    "https://raw.githubusercontent.com/danielmiessler/SecLists/refs/heads/master/Discovery/DNS/subdomains-top1million-5000.txt"
    cache-directory download-once-into utf8 file-lines ;

And then see what the “top 10” are:

IN: scratchpad top-5000-subdomains 10 head .
{
    "www"
    "mail"
    "ftp"
    "localhost"
    "webmail"
    "smtp"
    "webdisk"
    "pop"
    "cpanel"
    "whm"
}

You could remove “common subdomains” – adding a dot to make sure we only strip a full subdomain – by recursively trying to clean the hostname until it stops changing.

: remove-common-subdomains ( host -- host' )
    top-5000-subdomains [ "." append ] map '[ _ [ ?head ] any? ] loop ;

And try it out:

IN: scratchpad "www.mail.ftp.localhost.factorcode.org"
               remove-common-subdomains .
"factorcode.org"

That works pretty well, but it’s reliant on a scraped list of subdomains that might not be exhaustive, and could become stale over time as the tools and techniques that developers use change.

Observed Subdomains

Similarly, another technique we could use would be to use our own observations about domains, and if we observe a domain being used and then subsequently see a subdomain of it, we can ignore the subdomain.

First, we write a word to remove any item that is prefixed by another, sorting to make sure we see the prefix before the item prefixed by it:

: remove-prefixed ( seq -- seq' )
    sort V{ } clone [
        dup '[
            [ _ [ head? ] with none? ] _ push-when
        ] each
    ] keep ;

Second, we can remove the subdomains by using a kind of Schwartzian transform:

reverse the domain names
remove the ones that are prefixed by another
un-reverse the domain names

: remove-observed-subdomains ( hosts -- hosts' )
    [ "." prepend reverse ] map remove-prefixed [ reverse rest ] map ;

And then see it work:

IN: scratchpad { "a.b.c" "b.c" "c.d.e" "e.f" }
               remove-observed-subdomains .
V{ "b.c" "c.d.e" "e.f" }

Resolving Domains

And, finally, another technique might be to use the Domain Name System to find the rootiest domain name.

First, we use our dns vocabulary to check that a host resolves to an IP address:

: valid-domain? ( host -- ? )
    {
        [ dns-A-query message>a-names empty? not ]
        [ dns-AAAA-query message>aaaa-names empty? not ]
    } 1|| ;

And try it out:

IN: scratchpad "re.factorcode.org" valid-domain? .
t

IN: scratchpad "not-valid.factorcode.org" valid-domain? .
f

Second, we write a word to split a domain into chunks to be tested:

: split-domain ( host -- hosts )
    "." split dup length 1 [-] <iota> [ tail "." join ] with map ;

And try it out:

IN: scratchpad "a.b.c.com" split-domain .
{ "a.b.c.com" "b.c.com" "c.com" }

Third, we find the rootiest domain that is valid:

: remove-subdomains ( host -- host' )
    split-domain [ valid-domain? ] find-last nip ;

And try it out:

IN: scratchpad "a.b.c.d.factorcode.org" remove-subdomains .
"factorcode.org"

IN: scratchpad "sorting.cr.yp.to" remove-subdomains .
"cr.yp.to"

This is available on my GitHub.

It’s fun to explore these kinds of problems!

A Language A Day

Mon, 04 Nov 2024 08:00:00 -0700

Andrew Shitov recently published a book called “A Language A Day”, which is a collection of brief overviews to 21 programming languages – including Factor!

This book provides a concise overview of 21 different programming languages. Each language is introduced using the same approach: solving several programming problems to showcase its features and capabilities. Languages covered in the book: C++, Clojure, Crystal, D, Dart, Elixir, Factor, Go, Hack, Hy, Io, Julia, Kotlin, Lua, Mercury, Nim, OCaml, Raku, Rust, Scala, and TypeScript.

Each chapter covers the essentials of a different programming language. To make the content more consistent and comparable, I use the same structure for each language, focusing on the following mini projects:

Creating a ‘Hello, World!’ program.

Implementing a Factorial function using recursion or a functional-style approach.

Creating a polymorphic array of objects (a ‘zoo’ of cats and dogs) and calling methods on them.

Implementing the Sleep Sort algorithm—while impractical for real-word use, it’s a playful demonstration of language’s concurrency capabilities.

Each language description follows—where applicable—this pattern:

Installing a command-line compiler and running a program.

Creating and using variables.

Defining and using functions.

Exploring object-oriented features.

Handling exception.

Introducing basic concurrency and parallelism.

You can find all the code examples in this book on GitHub: https://github.com/ash/a-language-a-day.

You can buy it on Amazon or LeanPub as an electronic or Kindle edition, or as a paper hardcover or paperback version. More information with the links to the shops.

Check it out!

Constants

Tue, 29 Oct 2024 08:00:00 -0700

Factor has programmable syntax, a feature that allows for concise source code, reducing repetition and allowing the programmer to express forms and intent with minimal tokens. As an example of this, today I want to discuss constants.

You can define a word with a constant value, using syntax like this:

CONSTANT: three 3

Someone on our Factor Discord server asked if it was possible to define multiple constants in one syntax expression, to avoid the line noise of defining them one-by-one.

So, instead of these four definitions:

CONSTANT: foo 1
CONSTANT: bar $[ 2 sqrt ]
CONSTANT: baz $ bar
CONSTANT: qux \ foo

We could instead make this syntax:

SYNTAX: CONSTANTS:
    ";" [
        create-word-in
        [ reset-generic ]
        [ scan-object define-constant ] bi
    ] each-token ;

Breaking that down into steps:

SYNTAX: indicates we’re defining new syntax
CONSTANTS: is the name of our new syntax word
";" defines the terminator that will end our constant definitions
each-token will process each token until it hits the terminator

For each constant definition, it performs these steps:

create-word-in creates a new word in the current vocabulary
reset-generic clears any generic word properties
scan-object reads and parses the next value
define-constant makes it a constant with the parsed value

And now this expression works, reducing the visual noise in our source code:

CONSTANTS:
   foo 1
   bar $[ 2 sqrt ]
   baz $ bar
   qux \ foo
;

As an aside, the different syntaxes used above are:

1 is just a token parsed as a number literal
$[ ... ] evaluates the code inside at parse time
$ gets the value of another constant
\ gets the word object itself rather than its value

Factor’s syntax parsing words allow a great deal of flexibility in making custom DSL-style syntax forms work nicely to reduce repetition, and generate code with less effort.

I’m not sure if this is worth adding to the standard library or not, but it’s neat!

Base16 Themes

Thu, 17 Oct 2024 08:00:00 -0700

Over a decade ago, Chris Kempson created the Base16 theme framework for creating color palettes of 16 colors that can be used to provide theming of user interfaces. These have been commonly supported by many text editors, with some developers gravitating toward setting their favorite theme in every user interface that supports it.

A few years ago, this framework and the many themes that became popular in it were forked into the Tinted Theming project described in a post called Base16 Project Lives On. You can view their gallery of Base16 themes which gives a good sense of the variety and utility of these color schemes having commonly recognizable names such as dracula, mocha, solarized, and more.

I was reminded of this recently in a discussion around a recent contribution to change the scrollbar and button implementations to not use images, but to draw the scrollbars using the colors configured in the user’s theme.

Since 2021, the ui.theme.base16 vocabulary has allowed theming the Factor user interface by choosing a base16-theme-name and setting base16-mode. We have just improved our support for Base16 theme support by adding all the current styles from the Tinted Theming schemes list.

So, now you can try solarized-dark:

IN: scratchpad "solarized-dark" base16-theme-name set-global

IN: scratchpad base16-mode

Or perhaps greenscreen:

IN: scratchpad "greenscreen" base16-theme-name set-global

IN: scratchpad base16-mode

Or any of the other 270 named color schemes now available!

Enjoy!

Emit

Sun, 13 Oct 2024 08:00:00 -0700

One of the interesting aspects of a concatenative language like Factor is that blocks of logic can be easily extracted and easily reused since they apply logic to objects on the stack.

For example, if this was a word that operated on stack values:

: do-things ( a b -- c d )
    [ sqrt * ] [ swap sqrt + ] 2bi ;

One change we could easily make is to extract and name the two pieces of logic:

: calc-c ( a b -- c ) sqrt * ;
: calc-d ( a b -- d ) swap sqrt + ;

: do-things ( a b -- c d )
    [ calc-c ] [ calc-d ] 2bi ;

We could also convert it to operate on local variables:

:: do-things ( a b -- c d )
    a b sqrt * a sqrt b + ;

And extract those same two pieces of logic:

:: calc-c ( a b -- c ) a b sqrt * ;
:: calc-d ( a b -- d ) a sqrt b + ;

:: do-things ( a b -- c d )
    a b calc-c a b calc-d ;

But, notice that we have to specify that the local variable a and b have to be put back on the stack before we can call our extracted words that make the computations.

Hypothetical Syntax

Today, someone on the Factor Discord server asked about this very issue, wanting to have extractable pieces of logic that would effectively be operating on nested local variables, wherever they are used. Inspired by the goal of don’t repeat yourself and the convenience of extracting logic that operates on the data stack.

Specifically, they wanted to be able to take blocks of logic that operate on named variables, and extract them in a similar manner to the logic blocks that operate on the stack – offering this hypothetical syntax as the goal:

EMIT: calc-c ( a b -- c ) a b sqrt * ;
EMIT: calc-d ( a b -- d ) a sqrt b + ;

:: do-things ( a b -- c d )
    calc-c calc-d ;

Let’s try and build real syntax that allows this hypothetical syntax to work.

Building the Syntax

First, we make a tuple to hold a lazy variable binding:

TUPLE: lazy token ;
C: <lazy> lazy

Then, we need a way to generate temporary syntax words in a similar manner to temporary words:

: define-temp-syntax ( quot -- word )
    [ gensym dup ] dip define-syntax ;

We create temporary syntax words to convert each named references to lazy variables:

: make-lazy-vars ( names -- words )
    [ dup '[ _ <lazy> suffix! ] define-temp-syntax ] H{ } map>assoc ;

Given a quotation that we have parsed in an emit description, we can build a word to replace all these lazy variables by looking them up in the current vocabulary manifest:

: replace-lazy-vars ( quot -- quot' )
    [ dup lazy? [ token>> parse-word ] when ] deep-map ;

And, finally, create our emit syntax word that parses a definition, making lazy variables that are then replaced when the emit word is called in the nested scope:

SYNTAX: EMIT:
    scan-new-word scan-effect in>>
    [ make-lazy-vars ] with-compilation-unit
    [ parse-definition ] with-words
    '[ _ replace-lazy-vars append! ] define-syntax ;

Using the Syntax

Now, let’s go back to our original example:

EMIT: calc-c ( a b -- c ) a b sqrt * ;
EMIT: calc-d ( a b -- d ) a sqrt b + ;

:: do-things ( a b -- c d )
    calc-c calc-d ;

Does it work?

IN: scratchpad 1 2 do-things

--- Data stack:
1.4142135623730951
3.0

Yep! That’s kind of a neat thing to build.

I have added this syntax in the locals.lazy vocabulary, if you want to try it out.

I’m not sure how useful it will be in general, but it is always fun to build something new with Factor!

Battlesnake

Sun, 15 Sep 2024 08:00:00 -0700

Battlesnake is “a competitive game where your code is the controller”. In particular, in answering the question “What is Battlesnake?”, the documentation says:

In this game, each Battlesnake is controlled in real-time by a live web server, responding to the Battlesnake API. It navigates the game board based on your algorithm, trying to find food, avoid other Battlesnakes, and survive as long as possible. Battlesnakes can be built using any tech stack you’d like, and we encourage you to step outside of your comfort zone.

It is also a very neat set of episodes of “Coding Badly” from almost two years ago that talks about building battlesnakes using Factor. In particular, they use a live-coding style to explore the development environment, build web servers using the furnace web framework, and learn how to use and deploy their program!

I did not know about these videos until today, but I thought it makes a nice series to share with the world. I love it when people build things using Factor and am always glad to hear about it!

More information is also available on the @BattlesnakeOfficial GitHub organization, as well as an archive of the Coding Badly implementation and a different Factor battlesnake library by another contributor.

Episode 1

Episode 2

Episode 3

Factor 0.100 now available

Wed, 11 Sep 2024 08:00:00 -0700

“Life can only be understood backwards; but it must be lived forwards.” — Kierkegaard

I’m very pleased to announce the release of Factor 0.100!

OS/CPU	Windows	Mac OS	Linux
x86	0.100		0.100
x86-64	0.100	0.100	0.100

Source code: 0.100

This release is brought to you with over 1400 commits by the following individuals:

Aditya Aryaman Das, Alex null Maestas, Alexander Ilin, Andy Kluger, Bhargav Shirin Nalamati, Charlie Weismann, Dave Carlton, David Enders, Doug Coleman, Evgenii Petrov, Giftpflanze, Ikko Eltociear Ashimine, J. Ryan Stinnett, Jean-Marc Lugrin, John Benediktsson, Keldan Chapman, Limnanthes Serafini, Marc Michael, Michael Raitzam, Michael Thies, Pragya Pant, Raghu Ranganathan, Rebecca Kelly, Rudi Grinberg, Sandesh Pyakurel, Sebastian Strobl, Shruti Sen, Surav Shrestha, Val Packett, @Capital-EX, @Smoothieewastaken, @TheWitheredStriker, @TryAngle, @chunes3, @inivekin, @nomennescio, @olus2000.

Besides some bug fixes and library improvements, I want to highlight the following changes:

Upgraded to Unicode 15.1
Fix some xmlns that were accidentally changed to https
Improved the printing of shortest decimal representation of floating-point numbers
Some early support for ARM64 in the non-optimizing compiler, more to do for full support
Automatic light/dark theme detection works on Microsoft Windows
Support for compressed images, useful when reducing file size is important

Some possible backwards compatibility issues:

ui: focusable-child* now returns f to indicate parent should be focused
peg: change to compile-time PEG: and PARTIAL-PEG: forms, not delay to first invocation
system: renamed macosx to macos
math.trig: moved deg>rad and rad>deg to math.functions vocabulary
math.functions: fix divisor? to support mixed numbers (floats and integers)
math.functions.integer-logs: moved integer-log10 and integer-log2 to math.functions vocabulary
ranges: fixed exclusive range to be more correct for non-integer types
http.client: moved some download words to http.download vocabulary
rosetta-code: moved solutions to the factor-rosetta-code git repository
json: read-json returns a single object, use read-jsons to read multiple
base32: now contains all of the words from the base32-crockford and base32hex vocabularies

I would also like to bring particular recognition to Raghu Ranganathan, also known as @razetime, who was an incredible developer with an incredibly good attitude and contributing member to many technical communities including code golfing and various programming languages including Factor. We are very sad that he passed away a couple of months ago and would like to have this moment dedicated in his memory.

What is Factor

New libraries:

base45: adding support for Base45 Data Encoding (RFC 9285)
bend: features from the Bend programming language
checksums.khash: implementation of k-hash
command-line.parser: experimental command-line argument parser
containers: experimental high-level container words across sequences, sets, dlists, and assocs.
drunken-bishop: implements OpenSSH Drunken Bishop algorithm
editors.chime: support Chime editor
editors.focus: support Focus Editor
editors.notepadnext: support NotepadNext editor
gilded-rose: adds the Gilded Rose Refactoring Kata
golden-section: recovers an old UI demo that uses processing for rendering
hex-strings: words for changing bytes to hex strings and back
http.download: collection of various download words
io.streams.tee: adding a “tee’ing” stream utility
images.viewer.scaling: adding a “scaling image” gadget
kaggle: wrapper for the Kaggle API
lazy: renamed from promises
leb128: implements LEB128 or Little Endian Base 128 encoding
math.statistics.running: implement “running statistics” and “running regression” tuples
math.vectors.ranges: more efficient vector operations on range objects
models.combinators: some extensions to models
models.combinators.templates: a functor for generating model objects
persistency: simple wrapper for db.tuples, possibly a little redundant
project-euler.061: solution for Cyclical Figurate Numbers
project-euler.098: solution for Anagramic Squares
quoted-printable.rfc2047: implementation of RFC2047 strings
random.xorshift: simple Xorshift random number generator
recipes: recovers an old UI demo of a recipe browser
scryfall: extensive library for working with MtG cards on Scryfall
sequences.prefixed: virtual “prefixed” sequence
sequences.suffixed: virtual “suffixed” sequence
stomp: client library for the STOMP protocol
stomp.cli: command-line interface for STOMP
sudokus: a graphical Sudoku game and solver
tools.image: common code for working with Factor image files
tools.image.analyzer: renamed tools.image-analyzer
tools.image.compressor: working with compressed Factor image files
tools.image.uncompressor: working with compressed Factor image files
ui.gadgets.alert: UI elements to prompt user and popup alerts
ui.gadgets.cartesian: three-axis coordinate system
ui.gadgets.comboboxes: UI elements for a combo box
ui.gadgets.controls: recovers some old UI support code
ui.gadgets.layout: recovers some old UI support code
ui.gadgets.slate: recovers some old UI support code
msgpack.rpc: implementation of MsgPack-RPC

Improved libraries:

alien.c-types: added ssize_t
alien.data: adding stream-read-c-ptr and read-c-ptr
assocs: merged some useful words from assocs.extras, removed with-assoc
assocs.extras: demoted set-of from assocs
base85: added Ascii85, Adobe85, and Z85 variants
calendar: added sunrise, sunset, and solar-noon
classes.algebra: fix class<= for anonymous-predicate
cocoa.statusbar: some cleanup, additional words
codebase-analyzer: handle owner and version files, print clickable links
colors.contrast: adding contrast-text-color to select white/black text on dark/light backgrounds
combinators.extras: adding sequence-case, fix 3tri*
command-line: adding command-line-options for easy options parsing
contributors: added contributors. and make the changelog respect .mailmap file
crontab: adding support for “~” randomization
curses: added nodelay
db.tuples: adding LIKE" column" syntax
debugger: fixed error description for bad-escape
discord: some improvements to the Discord bot library
dns: adding some more DNS types
formatting: split out a format-directive EBNF word
game-of-life: small performance improvements
geo-ip: fix database download link
github: implement more of the GitHub REST API
globs: added rglob for recursive glob
gpu.shaders: change two tuples used in errors to be error classes
html.templates.chloe: improve <t:meta> tag to be able to specific any meta attributes
http.server.responses: adding all the real and joke response codes
http2.hpack.huffman: simplify implementation using nths
interpolate: adding I" interpolated string syntax, allow format directives to be used
inverse: adding under
io.directories: added ?move-file
io.directories.windows: fixed move-file to properly replace existing files
io.files: added if-file-exists combinators and (file-writer-secure)
io.files.temp.macosx: allow default-cache-directory to work in MacPorts environment
io.files.unique: added safe-replace-file and safe-modify-file
io.pipes: added <connected-pair>
io.streams.256color: formatting ANSI string tables properly
io.streams.ansi: formatting ANSI string tables properly
io.streams.escape-codes: added strip-ansi-escapes and format-ansi-tables
json: renamed read-json to read-jsons, added read-json that reads a single object
ldcache: fix to work on platforms that don’t have /etc/ld.so.cache (such as NixOS)
libc: added some setlocale support
libclang: working on header parsing to factor source
machine-learning.functions: added gelu, stable-softmax, and stable-log-softmax
mason.release.archive: build tar file without xattrs
math.bits: adding a binary-bits tuple
math.combinatorics: fix <k-permutations> for k=0
math.distances: added squared-euclidian-distance and normalized-squared-euclidian-distance, aliases for taxicab-distance and chessboard-distance
math.extras: added weighted-randoms-as
math.functions: moved math.trig words in (deg>rad and rad>deg), math.functions.integer-logs, and integer-sqrt, added fma (fused-multiply-add)
math.matrices: fixed stack effect for <matrix-by>
math.parser: implement Dragonbox algorithm, added >digits and digits>
mirrors: changed to display all assocs like hashtables, all sets like hash-sets.
modern.html: added some XML traversal works
msgpack: added ?read-msgpack and read-msgpacks
oauth2: simplify using the json.http vocabulary
openai: adding <cheapest-chat-completion> for ease-of-use with “gpt-4o-mini”, add timestamps to the list-models api
opengl: adding OpenGL 4+ extensions and demos, better support for multi-texture scaling
peg.ebnf: reset ebnf words properly
pong: fix issue when pressing N for new game a lot
prettyprint.config: adding qualified-names? to allow word names to be prettyprinted as fully-qualified
quotations: added compose-all
random: rename random-bits* to random-bits-exact, rename the *-random-float distributions to *-random, add *-distribution types, added more of them, defined a base-random that allows a better not-a-random-generator error to be produced in some cases
random.c: fixed to guarantee rand() is used to generate full range of 32-bit numbers
ranges: support some set operations more efficiently
raylib: added extensive documentation
reddit: fix domain-stats
sequences.extras: adding count=, faster longest-subseq
sequences.generalizations: adding lastn, ?lastn, set-lastn
sequences.parser: rename get+increment to consume, and change next to return the next element
sequences.product: making product-each, product-map, and product-find significantly faster
serialize: added deep-clone
shuffle: added dupdd
splitting: break on more line separators \r\n\v\f\x1c\x1d\x1e\x85\u002028\u002029
system: renamed macosx to macos
syntax: adding VOCAB: syntax
toml: fix issue with sub-tables being defined first
tools.completion: added completion for VOCAB: syntax
tools.disassembler.capstone: support Capstone 4 and Capstone 5
ui: adding close-all-windows
ui.backend.gtk2: check for non-empty DISPLAY os-var for graphical capability
ui.backend.windows: support horizontal mouse wheel, detect light/dark theme on startup
ui.backend.x11: check for non-empty DISPLAY os-var for graphical capability
ui.commands: adding update-command-map
ui.gadgets: fix f focusable-child busy loop
ui.tools.listener.history: saving and restoring UI listener history to a ~/.factor-history file
units.imperial: add area units, change “fingerbreadth” unit
units.si: adding km^2 and more aliases
vocabs.loader: make vocab-exists? no longer throw bad-vocab-name
vocabs.refresh.monitor: fix warnings for non-vocab-like files
xml: fix xml namespace
webapps.mason: improve build farm dashboard
websites.concatenative: allow use of <t:meta> in child templates
words: adding uninterned-word predicate, undefined-word error class
xmode.catalog: adding qdoc and sparql modes
zeromq: change zmq-error to be an error class
zoneinfo: updated to tzdata version 2024b

VM Improvements:

Improved ARM64 bootstrap assembly to allow small forms to execute successfully and natively in the non-optimizing compiler. This continues to be a work-in-progress to fully support ARM64.
Tentative support for compressed images, allowing Factor images to be as much as 8x smaller in size with run-time uncompression overhead.
Improved console I/O on windows to work in environments such as cygwin and GitHub Actions.

Cash Register

Thu, 08 Aug 2024 05:00:00 -0700

Building a “cash register” is an often used example project, from places like the freeCodeCamp’s Javascript Algorithms and Data Structures Project “Cash Register” or codecademy’s “Building a Cash Register” along with other examples like the Simple Cash Register in Python.

I thought it would be fun to write about building something similar, but not the same, in Factor.

We are going to make a few assumptions:

We handle one currency – the “buck”.
We can make change in various units – from the penny to the Benjamin.
Despite still being legal tender, we do not support $500, $1,000, $5,000, or $10,000 bills.
Despite being rare, we include the two-dollar bill.

Here are our units of change along with their descriptions:

CONSTANT: COINS {
    { 10000 "$100" }
    { 5000 "$50" }
    { 2000 "$20" }
    { 1000 "$10" }
    { 500 "$5" }
    { 200 "$2" }
    { 100 "$1" }
    { 25 "quarters" }
    { 10 "dimes" }
    { 5 "nickels" }
    { 1 "pennies" }
}

If we want to make change, we can generate it using something like the greedy algorithm to find minimum number of coins, starting with the largest denomination possible and iterating to smaller ones:

: make-change ( n -- assoc )
    COINS [ [ /mod swap ] dip ] assoc-map swap 0 assert= ;

For convenience, we can make a formatting word to format our coins into dollars:

: $. ( n -- )
    100 /f "$%.2f\n" printf ;

And now a word to print out the change we made:

: change. ( n -- )
    "CHANGE: " write dup $. make-change [
        '[ _ "%d of %s\n" printf ] unless-zero
    ] assoc-each ;

We can store the amount owed and the amount paid in dynamic variables:

INITIALIZED-SYMBOL: owed [ 0 ]

INITIALIZED-SYMBOL: paid [ 0 ]

Using that, we can make a word to display the balance due:

: balance. ( -- )
    "OWED: " write owed get-global $.
    "PAID: " write paid get-global $. ;

A word to add a charge, increasing the amount owed:

: charge ( n -- )
    "CHARGE: " write dup $.
    owed [ + ] change-global balance. ;

A word to make a payment, providing change if the amount paid is greater than the amount owed:

: pay ( n -- )
    "PAY: " write dup $.
    paid [ + ] change-global balance.
    paid get-global owed get-global - dup 0 >=
    [ change. 0 owed set-global 0 paid set-global ] [ drop ] if ;

And a word to cancel a transaction, refunding any paid amounts:

: cancel ( -- )
    "CANCEL" print
    0 owed set-global
    paid [ change. 0 ] change-global ;

Using a word that parses input into a number of pennies:

: parse-$ ( args -- n )
    "$" ?head drop string>number 100 * round >integer ;

We can then define a set of commands using the command-loop vocabulary:

CONSTANT: COMMANDS {
    T{ command
        { name "balance" }
        { quot [ drop balance. ] }
        { help "Display current balance." }
        { abbrevs { "b" } } }
    T{ command
        { name "charge" }
        { quot [ parse-$ charge ] }
        { help "Charge an item." }
        { abbrevs { "c" } } }
    T{ command
        { name "pay" }
        { quot [ parse-$ pay ] }
        { help "Pay with money." }
        { abbrevs { "p" } } }
    T{ command
        { name "cancel" }
        { quot [ drop cancel ] }
        { help "Cancel transaction." }
        { abbrevs { "x" } } }
}

And then define the loop that we run as MAIN:

: cash-register-main ( -- )
    "Welcome to the Cash Register!" "$>"
    command-loop new-command-loop
    COMMANDS [ over add-command ] each
    run-command-loop ;

MAIN: cash-register-main

And you can see an example from running it:

Welcome to the Cash Register!
$> c 10.23
CHARGE: $10.23
OWED: $10.23
PAID: $0.00

$> c 15.37
CHARGE: $15.37
OWED: $25.60
PAID: $0.00

$> p 100.00
PAY: $100.00
OWED: $25.60
PAID: $100.00
CHANGE: $74.40
1 of $50
1 of $20
2 of $2
1 of quarters
1 of dimes
1 of nickels

It could be fun to extend this example to have an inventory of purchasable items, allow users to ring up these items instead of a series of charges, maybe implement taxable items and discounts, display and print receipts, handle refunds, handle available bills and coins when making change, support other currencies, and other features that you might find in a more “complete” or “real-world” cash register.

The code for this is on my GitHub.

Reflecting on 20 Years

Fri, 02 Aug 2024 14:00:00 -0700

As close as I can tell, Factor is the result of contributions from around 180 contributors over the past 20 years.

Recently, I was reminded of a tool that can produce graphs showing some aspects of contributions to git repositories. The tool is called Git of Theseus written in the Python programming language, which can be used to generate a series of interesting plots showing statistics over time about a project. A similar tool is Hercules, which claims to be a bit faster and is written in the Go programming language.

We can look at code written in each year which, aside from the first few years, mostly continues to exist in the latest version. In addition, we see a healthy and increasing chart over time:

When wondering about the half-life of code, we might want to see how long a particular line of code continues to exist in the project. We can see that 50% of them are still existing after 5 years:

Many of those early contributions came from Slava Pestov, Doug Coleman, Chris Double, and others, and we continue to benefit from the impressive early work that they did for the Factor programming language. We can plot author statistics, and see the large blocks of contributions over time by various authors:

And, as a percentage of lines of code, see that beginning with almost 100% of the source code contributed by Slava Pestov, more recently we have around 20% each from Slava Pestov, Doug Coleman, and myself as well as several other significant authors:

We can generate a more detailed breakdown of these lines of code by language using tokei, but in particular see that we have over 388,000 lines of Factor source code in our latest development version:

$ tokei .

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Language            Files        Lines         Code     Comments       Blanks
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Factor               4952       503452       388845        25717        88890
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

It was pretty great to generate these graphs, and to reminisce about all the vocabularies available so far, and ponder all those still yet to be written.

Happy coding!

Deploy Issues on MacOS

Thu, 18 Jul 2024 03:00:00 -0700

While trying to help get the BitGuessr game deployed on macOS, I ran into a few issues that were interesting, and I wanted to discuss the process of troubleshooting them.

Sometimes, using the deploy tool is easy, and sometimes it is not-so-easy. There are some challenges around choosing the right level of reflection for the features used in the application you are trying to deploy – we suggest starting with Full environment and then reducing until the program breaks – but besides that it is typically one command:

IN: scratchpad "bitguessr" deploy

That command results in a per-platform executable, which on macOS is an .app bundle that includes the Factor executable, a deployed image, any resources the deployed image uses, and any libraries that the deployed image depends on:

$ find bitguessr.app 
bitguessr.app
bitguessr.app/Contents
bitguessr.app/Contents/Frameworks
bitguessr.app/Contents/Frameworks/libraylib.dylib
bitguessr.app/Contents/Info.plist
bitguessr.app/Contents/MacOS
bitguessr.app/Contents/MacOS/bitguessr
bitguessr.app/Contents/Resources
bitguessr.app/Contents/Resources/bitguessr
bitguessr.app/Contents/Resources/bitguessr/_resources
bitguessr.app/Contents/Resources/bitguessr/_resources/bitguessr_icon.png
bitguessr.app/Contents/Resources/bitguessr/_resources/bitguessr_soundtrack.wav
bitguessr.app/Contents/Resources/bitguessr/_resources/button-0.png
bitguessr.app/Contents/Resources/bitguessr/_resources/button-1.png
bitguessr.app/Contents/Resources/bitguessr/_resources/correct.wav
bitguessr.app/Contents/Resources/bitguessr/_resources/wrong.wav
bitguessr.app/Contents/Resources/bitguessr.image
bitguessr.app/Contents/Resources/Icon.icns

After building this application, checking that it works for me, and uploading it to the server, of course we got a bug report when someone else tried to run it:

$ ./bitguessr.app/Contents/MacOS/bitguessr
...
Cannot resolve C library function
Library: DLL" libraylib.dylib"
Symbol: InitWindow
DlError: none
See https://concatenative.org/wiki/view/Factor/Requirements

Is the InitWindow symbol in the library?

$ nm -gU ./bitguessr.app/Contents/Frameworks/libraylib.dylib | grep InitWindow
0000000000017920 T _InitWindow

Yes, it is.

Is it loading the correct libraylib.dylib file?

$ DYLD_PRINT_LIBRARIES=1 ./bitguessr.app/Contents/MacOS/bitguessr
...
dyld[69951]: <B5534AF8-58E9-3F59-A5DE-F33164570F6B> ./bitguessr.app/Contents/Frameworks/libraylib.dylib

Yes, it seems to be.

Let’s learn more about how dynamic libraries work. There is a nice thread on dynamic library identification that goes into some details about how these are identified and then loaded.

Let’s start with the library – we get our Raylib from Homebrew:

$ cd $(brew --prefix raylib)

$ otool -l libraylib.dylib | grep -A 2 LC_ID_DYLIB
          cmd LC_ID_DYLIB
      cmdsize 72
         name /usr/local/opt/raylib/lib/libraylib.450.dylib (offset 24)

Okay, so this probably needs to be relative to a “runtime path” or rpath, which you can either set:

$ install_name_tool -id "@rpath/libraylib.dylib" libraylib.dylib

Or, fix by downloading a Raylib release that is already set properly for embedding.

Did it change?

$ otool -l libraylib.dylib | grep -A 2 LC_ID_DYLIB
          cmd LC_ID_DYLIB
      cmdsize 48
         name @rpath/libraylib.dylib (offset 24)

Yes, it did!

Now that we have that, we can re-deploy and see if it works:

$ ./bitguessr.app/Contents/MacOS/bitguessr
...
Cannot resolve C library function
Library: DLL" libraylib.dylib"
Symbol: InitWindow
DlError: none
See https://concatenative.org/wiki/view/Factor/Requirements

Nope.

Okay, maybe the rpath that is used to lookup dynamic libraries isn’t set properly:

$ otool -l ./bitguessr.app/Contents/MacOS/bitguessr| grep -A 2 LC_RPATH

Hmm, it is not set at all. The dynamic linker maintains a list of these “runtime path” directories. Maybe we can make sure it looks in the right place by adding one:

$ cd ./bitguessr.app/Contents/MacOS

$ install_name_tool -add_rpath "@executable_path/../Frameworks" bitguessr

Okay, now it looks right:

$ otool -l ./bitguessr.app/Contents/MacOS/bitguessr | grep -A 2 LC_RPATH
          cmd LC_RPATH
      cmdsize 48
         path @executable_path/../Frameworks (offset 12)

Let’s try again… and, it works!

I pushed a change to set the rpath directory properly for future deploys, changed to distributing it as an Apple Disk image, and also made sure to codesign the application so that it launches easily after being downloaded and gave Joseph Oziel an updated macOS build of BitGuessr which included an Icons.icns file in the Apple Icon Image format for the application icon.

Neat!

Rosetta Code Downloaded

Wed, 17 Jul 2024 07:00:00 -0700

I have been quite distracted by Rosetta Code shenanigans in the past few days. For anyone that is curious about the backstory, I removed the rosetta-code vocabulary from the main Factor git repository, and then decided to archive all the Factor solutions to a separate factor-rosetta-code repository for posterity, utility, and analysis.

I thought it would be fun to talk about different ways to do web scraping for the Rosetta Code website, ultimately choosing to write a Factor vocabulary to do it, which I’ll go into below.

Public Datasets

There are a couple public datasets and scraper tools that you can use:

Hugging Face has a @christopher/rosetta-code dataset, but it looks like it was updated “about 2 years ago”, so perhaps doesn’t contain recently contributed solutions.
The @acmeism/RosettaCodeData repository seems to be quite up-to-date and uses a RosettaCode CPAN module and the MediaWiki API to synchronize their data files periodically. At the moment, this is over 600 MB to clone.
The @brollb/rosetta-code-scraper repository is a Rust scraper that uses the reqwest and scraper crates to parse the web pages and extract task descriptions and solutions. This required some minor tweaks to get working with a recent Rust version, and had some issues with the newer HTML being generated.

Using Factor

I started with the previous approaches, but then I realized that I kinda wanted to build my own program that only grabbed the Factor solutions and weaved them into solution files with the task description for the factor-rosetta-code repository.

After considering using the rendered HTML from the Rosetta Code website, I decided it would be a lot simpler to use the MediaWiki API and our mediawiki.api vocabulary to extract the tasks. That vocabulary requires an endpoint to be specified, so we define a simple combinator that sets it before running a quotation.

: with-rosetta-code ( quot -- )
    [ "https://rosettacode.org/w/api.php" endpoint ] dip
    with-variable ; inline

The Rosetta Code solutions consists of a list of pages as well as sub-categories with their own lists of pages. We need a list-category word that will get the members of a given category, memoized in case pages reference each other or to speed up subsequent calls through caching:

MEMO: list-category ( title -- members )
    'H{ { "list" "categorymembers" } { "cmtitle" _ } }
    query [ "title" of ] map ;

And a list-categories word that will recursively resolve categories containing other categories:

: list-categories ( title -- tasks )
    list-category [ "Category:" head? ] partition swap
    [ list-categories ] map concat append harvest members sort ;

Using these, we can retrieve all tasks and draft tasks:

: all-tasks ( -- tasks )
    "Category:Solutions_by_Programming_Task" list-categories ;

: draft-tasks ( -- tasks )
    "Category:Draft_Programming_Tasks" list-categories ;

Each task page is a series of sections, beginning with the task description, and then a series of solutions in different programming languages. Using page-content, we can see what one of these pages looks like:

IN: scratchpad [ "Sieve_of_Eratosthenes" page-content ] with-rosetta-code

We can build a word that extracts a section that is specified by a begin text and an end text, searching for them using subseq-index to find where they occur in the page:

:: extract-section ( page begin end -- section/f )
    page begin subseq-index [
        begin length +
        dup page end subseq-index-from
        [ page length ] unless*
        page subseq
    ] [ f ] if* ;

The description is everything before the first header section:

: get-description ( page -- description/f )
    "=={{header" over subseq? [
        "" "=={{header" extract-section
    ] [ drop f ] if ;

The solution code is the first <syntaxhighlight> block for our desired language:

: get-code ( page lang -- code/f )
    "<syntaxhighlight lang=\"" "\">" surround
    "</syntaxhighlight>" extract-section ;

We can use those words to weave the commented-out description with the Factor source code:

: get-solution ( task -- solution/f )
    page-content [ get-description ] keep over empty?
    [ 2drop f ] [
        [ string-lines [ "! " prepend ] map "\n" join ]
        [ "factor" get-code "\n\n" glue "\n" append ] bi*
    ] if ;

That works great, you can try it by printing out one of the draft tasks:

IN: scratchpad [ "10001th_prime" get-solution print ] with-rosetta-code
! Task:
!
! Find and show on this page the 10001st prime number.

USING: math math.primes prettyprint ;

2 10,000 [ next-prime ] times .

Now we want a way to save a task, and since the tasks have names that aren’t all valid in filenames or vocabulary names, we do a little cleanup to turn a task name into a path:

: task-path ( task -- path )
    [ dup { [ Letter? ] [ digit? ] } 1|| [ drop CHAR: - ] unless ] map
    >lower R/ --+/ "-" re-replace [ CHAR: - = ] trim ".factor" append ;

Saving a task is getting the solution and then saving to a file:

: save-task ( task -- )
    "vocab:rosetta-code/solutions" [
        [ get-solution ]
        [ task-path '[ _ utf8 set-file-contents ] when* ] bi
    ] with-directory ;

With that, we can finally save all the tasks, or all the draft tasks:

: save-all-tasks ( -- )
    all-tasks [ save-task ] each ;

: save-draft-tasks ( -- )
    draft-tasks [ save-task ] each ;

I used this, with some minor changes to ignore certain categories that do not contain solutions, as well as using Pandoc to convert the MediaWiki markup before embedding in the solution files.

Anyway, pretty cool!

Rosetta Code Revisited

Tue, 16 Jul 2024 07:00:00 -0700

I recently wrote about removing the Rosetta Code solutions from our main Factor repository. We only had 62 solutions out of 1,276 tasks, and I didn’t really want to maintain a subset of the solutions, nor mirror them ineffectively into the main git repository.

As it turns out – and pretty much immediately afterwards – I got curious enough to try and download all of the Factor solutions so we could maybe do some analysis of all the various solutions that have been contributed to the Rosetta Code project. And, it’s not a small amount of code – it’s 12,461 lines of Factor code!

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Language            Files        Lines         Code     Comments       Blanks
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Factor               1663        74221        12461        55885         5875
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Not all of those are fully solved, but almost 1,000 of them seem to be!

This is available in the newly created factor-rosetta-code git repository, if anyone else is as curious as I was. I don’t think we are going to be able to consistently keep this in sync with the Rosetta Code website, but at least it represents a checkpoint today for quite a lot of nice Factor source code.

Some of these were written for older versions of Factor, but most of it is usable as is, or with minor edits.

Check it out!

Rosetta Code

Sun, 14 Jul 2024 15:00:00 -0700

The Rosetta Code project is quite interesting:

The idea is to present solutions to the same task in as many different languages as possible, to demonstrate how languages are similar and different, and to aid a person with a grounding in one approach to a problem in learning another. Rosetta Code currently has 1,276 tasks, 401 draft tasks, and is aware of 948 languages, though we do not (and cannot) have solutions to every task in every language.

There is a page for Factor with some programming language history, notes, and a list of tasks not implemented in Factor. If you are new to learning Factor, this might be a fun way to learn and contribute.

We had been maintaining a rosetta-code vocabulary in the main development repository with various solutions. However, I noticed recently that it had only 62 solutions, and was not being kept in sync with the ones on their website where many more were available. We debated moving these and the general feeling was if we aren’t able to maintain the “authoritative” solutions, it would be best to move this code to the factor-unmaintained repository where we keep old source code that we aren’t actively maintaining.

That move is now complete, but don’t let that stop you from solving the unsolved ones!

BitGuessr

Sat, 13 Jul 2024 15:00:00 -0700

The Raylib library is a “simple and easy-to-use library to enjoy videogames programming” that has seen an increasing popularity in making quick visual demos and games.

We released support for it in Factor 0.99 and have been keeping it up-to-date. Currently, we support version 4.5 – although I just noticed that version 5.0 came out somewhat recently and seems to represent an exciting future that we should probably update to soon.

Joseph Oziel, one of our recent contributors, was learning Factor and Raylib by making a fun little game called BitGuessr. The game presents two buttons, a 0 and a 1 and you have to guess a series of bits that are generated randomly and try to achieve a high score! It supports keyboard and mouse inputs and has sounds and music as well.

Pretty fun – I love this use of Factor! – and particularly neat that they have used our deployment system to create standalone executables as a way of distributing it.

You can read more about it and see the source code as well.

Check it out!

Stack Complexity

Thu, 11 Jul 2024 07:00:00 -0700

We had a recent discussion on the Factor Discord server about how to deal with stack shuffling, and when and how you might use the various dataflow combinators that are available in Factor. Some of the impetus for this discussion was my article about intersecting ranges and the contrasting implementations that it presented.

The question that started it all:

Overall very complex stack states.

How do you write code like that?

Are you in the listener or debugger making these changes one by one?

When I write even simple code I get lost when there’s 3+ things on the stack I have to work with and spend more time figuring out stack order and correct combinators than with the actual problem. I wonder if there’s a good workflow where I can write the code iteratively.

Let’s go over some of the advice that came up in that conversation:

Mockups

I often write Factor code iteratively lately. Usually I come up with a tiny increment that preserves the stack effect, then reload the code to compile it.

For example, if I need to create a word with effect ( a b -- ), I will start with the minimal implementation like
: word ( a b -- )
    2drop ;
Mocking stuff every step of the way, compiling after each step.

For stuff inside a word that seems to be complex from the outset, I would create a mock (word) with defined stack effect. I will inline it later if it turns out to be simpler than expected (because I find something in the standard library that would do most of the work).

Combinators

Some things come with practice, that’s true. But some tips:

Keep the stack shallow, 1-3 things and it often works best.

More than that and don’t worry about using named local variables.

Try and use spread / apply / cleave combinators to represent your logic as dataflow.

Monoliths

I like writing monoliths, so sometimes i write a :: version first and refactor to : later, especially if I don’t know how the data will flow exactly.

When I’m porting code from other languages, I start with locals to keep the code blocks similar. Structure starts to become obvious once you have a big block working.

Practice

Sometimes writing inline comments showing what the stack effect is after each block of code, is useful to visualize the stack effects of the intermediate parts.

Sometimes there are words like vector operations that clean up an iterative piece of code.

But mostly, keep practicing. Ask questions. Maybe get feedback on your code from us here or other devs you like to work with.

And sometimes despite trying; it stays a big block of code that works, you shrug and move on to the next thing.

Variables

One way to reduce stack depth is, for example, to put some state on the namestack as a variable. That is how our stream words work (e.g., read, write, print). It’s how some words that accumulate output work – either using the make vocabulary or an equivalent vector or hash-set variable to accumulate into.

Reordering

Oh, and one advanced tip: sometimes reordering the arguments on the stack to avoid shuffles or allow using with / map / curry / fry things helps a lot.

It takes a long time to “get” it, but there are some rules that help; the order of your arguments to your words matter a lot. If you find yourself juggling a lot, consider a different argument order of your word. Stuff new context “under” your arguments with dip, e.g. a vector to collect stuff in. Look at accessors and see why they’re designed with a specific word order; typically things that are “parameters” go on the top of the stack (e.g. quotations). Also assocs have a certain logical order to them. If you understand that, it becomes more natural, with the occasional stack shuffle still needed.

And don’t forget about tuples; that quickly reduces the amount of things on the stack

Good luck and happy coding!

Random Distributions

Tue, 09 Jul 2024 10:00:00 -0700

As was the case when I got distracted by color support, I recently was distracted by random probability distributions. Other programming languages have these – for example the Python module numpy.random as well as the Julia module Distributions.jl.

In particular, I wanted to make sure we supported a bunch of the commonly used distributions, both continuous and discrete. I have added a few recently, and we now support quite a few in the random vocabulary:

TUPLE: bernoulli-distribution p ;
TUPLE: beta-distribution alpha beta ;
TUPLE: binomial-distribution n p ;
TUPLE: cauchy-distribution median scale ;
TUPLE: chi-square-distribution dof ;
TUPLE: exponential-distribution lambda ;
TUPLE: f-distribution dof-num dof-den ;
TUPLE: gamma-distribution alpha beta ;
TUPLE: geometric-distribution p ;
TUPLE: gumbel-distribution loc scale ;
TUPLE: inv-gamma-distribution shape scale ;
TUPLE: laplace-distribution mean scale ;
TUPLE: logistic-distribution loc scale ;
TUPLE: lognormal-distribution < normal-distribution ;
TUPLE: logseries-distribution p ;
TUPLE: normal-distribution mean sigma ;
TUPLE: pareto-distribution k alpha ;
TUPLE: poisson-distribution mean ;
TUPLE: power-distribution alpha ;
TUPLE: rayleigh-distribution mode ;
TUPLE: student-t-distribution dof ;
TUPLE: triangular-distribution low high ;
TUPLE: uniform-distribution min max ;
TUPLE: von-mises-distribution mu kappa ;
TUPLE: wald-distribution mean scale ;
TUPLE: weibull-distribution alpha beta ;
TUPLE: zipf-distribution a ;

For each of these, we define a convenient foo-random word, an implementation foo-random* that takes a random-generator, and a foo-distribution tuple that can be used as an object to take a faster number of samples using randoms from. For example, using the binomial distribution:

IN: scratchpad 100,000 [ 10 0.6 binomial-random ] replicate histogram .
H{
    { 0 21 }
    { 1 157 }
    { 2 1058 }
    { 3 4228 }
    { 4 11202 }
    { 5 20002 }
    { 6 25151 }
    { 7 21537 }
    { 8 11981 }
    { 9 4045 }
    { 10 618 }
}

IN: scratchpad 100,000
               T{ binomial-distribution { n 10 } { p 0.6 } }
               randoms histogram .
H{
    { 0 6 }
    { 1 164 }
    { 2 1044 }
    { 3 4255 }
    { 4 11169 }
    { 5 19928 }
    { 6 25103 }
    { 7 21414 }
    { 8 12335 }
    { 9 3973 }
    { 10 609 }
}

I would love to get some additional per-distribution generic methods to support calculating things like mean, variance, skewness, kurtosis, entropy, probability density/mass functions, etc. And, of course, would love more distributions to be available in Factor.

There’s always things to add!

Magic Forest

Sat, 29 Jun 2024 06:00:00 -0700

About ten years ago, there was a blog post about a Goats, Wolves, and Lions puzzle that was problem #30 from the 2014 Austrian “Math Kangaroo” contest. The puzzle was kind of fun and the implementation prompted some decent follow-up at the time.

There are three species of animals in a magic forest: lions, wolves and goats. Wolves can devour goats, and lions can devour wolves and goats. ("The stronger animal eats the weaker one".) As this is a magic forest, a wolf, after having devoured a goat, is transmuted into a lion; a lion, after having devoured a goat, is transmuted into a wolf; and a lion having devoured a wolf becomes a goat.

At the very beginning, there are 17 goats, 55 wolves and 6 lions in the forest. After every meal, there is one animal fewer than before; therefore after some time, there is no devouring possible any more.

What is the maximum number of animals who can live in the forest then?

There are two versions of this puzzle: one that doesn’t give any possible answers and the kangaroo version that gives these multiple choice options:

(A) 1
(B) 6
(C) 17
(D) 23
(E) 35

The original post followed up with a description of three ways to solve the Goats, Wolves, and Lions problem and then a brute-force solution in Fortran followed by a comparison of solutions in different programming languages of which the fastest was a C++ version and then someone else described some experiments and improvements in some other languages. The best answer, maybe, was the one that described using linear programming with lpsolve for an algorithmic solution that beats all the previous versions and even reduces to a simple formula that you can calculate by hand!

I had created a solution in Factor at the time, but had never written about it. Below, I want to go over that approach it and compare it with the performance of the “fastest” C++ version.

We are going to store our forest state as a tuple representing the number of goats, wolves, and lions.

TUPLE: forest goats wolves lions ;

C: <forest> forest

: >forest< ( forest -- goats wolves lions )
    [ goats>> ] [ wolves>> ] [ lions>> ] tri ;

We can build words to represent each possible next state, or f if that next state is not possible:

: wolf-devours-goat ( forest -- forest/f )
    >forest< { [ pick 0 > ] [ over 0 > ] } 0&&
    [ [ 1 - ] [ 1 - ] [ 1 + ] tri* <forest> ] [ 3drop f ] if ;

: lion-devours-goat ( forest -- forest/f )
    >forest< { [ pick 0 > ] [ dup 0 > ] } 0&&
    [ [ 1 - ] [ 1 + ] [ 1 - ] tri* <forest> ] [ 3drop f ] if ;

: lion-devours-wolf ( forest -- forest/f )
    >forest< { [ dup 0 > ] [ over 0 > ] } 0&&
    [ [ 1 + ] [ 1 - ] [ 1 - ] tri* <forest> ] [ 3drop f ] if ;

We can track the set of next states by adding them to a set:

: next-forests ( set forest -- set' )
    [ wolf-devours-goat [ over adjoin ] when* ]
    [ lion-devours-goat [ over adjoin ] when* ]
    [ lion-devours-wolf [ over adjoin ] when* ] tri ;

Given a sequence of forest states, we can produce the next states after a meal has occurred:

: meal ( forests -- forests' )
    [ length 3 * <hash-set> ] keep [ next-forests ] each members ;

A forest is stable if there are no goats and either no wolves or no lions, or goats and no wolves and no lions:

: stable? ( forest -- ? )
    >forest< rot zero? [ [ zero? ] either? ] [ [ zero? ] both? ] if ;

We can say that devouring is possible if there are no stable forests:

: devouring-possible? ( forests -- ? )
    [ stable? ] none? ;

And maybe we want to be able to get all the stable forests:

: stable-forests ( forests -- stable-forests )
    [ stable? ] filter ;

So, to find our answer, we can just iterate, until we find our first stable forest state:

: find-stable-forests ( forest -- forests )
    1array [ dup devouring-possible? ] [ meal ] while stable-forests ;

And, now we can answer the original question – which is (D):

IN: scratchpad T{ forest f 17 55 6 } find-stable-forests .
{ T{ forest { goats 0 } { wolves 0 } { lions 23 } } }

We can compare the performance of Factor with the C++ version from that earlier blog post and see that ours is around 5.5x slower:

IN: scratchpad [ T{ forest f 317 355 306 } find-stable-forests ] time .
Running time: 4.625607999 seconds
{ T{ forest { goats 0 } { wolves 0 } { lions 623 } } }

versus

$ time ./magic_forest 317 355 306
0, 0, 623
./magic_forest 317 355 306  0.80s user 0.04s system 99% cpu 0.848 total

One of the reasons for that is we’re doing a lot of generic dispatch, not specifying types anywhere, and passing tuples around that we are accessing and allocating frequently. We could fix some of these issues and make ours faster…

But, approaching this as a linear programming exercise beats all the iterative approaches anyway:

IN: scratchpad T{ forest f 17 55 6 } >forest< min + .
23

IN: scratchpad T{ forest f 317 355 306 } >forest< min + .
623

IN: scratchpad T{ forest f 900006 900055 900017 } >forest< min + .
1800023

Still, there are probably performance lessons to be learned here…

Man or Boy

Wed, 26 Jun 2024 08:00:00 -0700

The man or boy test was proposed by Donald Knuth:

“There are quite a few ALGOL60 translators in existence which have been designed to handle recursion and non-local references properly, and I thought perhaps a little test-program may be of value. Hence I have written the following simple routine, which may separate the man-compilers from the boy-compilers.”

— Donald Knuth

The Rosetta Code project has a list of implementations to compare with, and today we are going to contribute one in Factor. While the original was framed in ALGOL 60, let’s look at one contributed in Python, which is probably more readable for most of the audience:

#!/usr/bin/env python
import sys
sys.setrecursionlimit(1025)

def a(k, x1, x2, x3, x4, x5):
    def b():
        b.k -= 1
        return a(b.k, b, x1, x2, x3, x4)
    b.k = k
    return x4() + x5() if b.k <= 0 else b()

x = lambda i: lambda: i
print(a(10, x(1), x(-1), x(-1), x(1), x(0)))

In particular, this is challenging because it involves creating a “tree of B call frames that refer to each other and to the containing A call frames, each of which has its own copy of k that changes every time the associated B is called.” And, as a result, many languages have difficulting calculating for large values of k due to recursion or depth limits.

In Factor, we have the ability to create inner computations by making quotations, which is essentially an anonymous version of the inner B function. These quotations are allowed to access the variables defined in their outer scope. Thus, we can implement the word reasonably simply by making a B quotation, binding it to a variable for reference, and then calling it:

:: a ( k! x1 x2 x3 x4 x5 -- n )
    k 0 <= [
        x4 call( -- n ) x5 call( -- n ) +
    ] [
        f :> b!
        [ k 1 - dup k! b x1 x2 x3 x4 a ] b!
        b call( -- n )
    ] if ;

The k argument is an integer, the others are quotations with stack effect of ( -- n ), meaning they take no arguments, and produce a number when called. We can call it and demonstrate that it works and produces the first few values of the output of Knuth’s “man or boy” test for varying k:

IN: scratchpad 13 [0..b] [
                   [ 1 ] [ -1 ] [ -1 ] [ 1 ] [ 0 ] a .
               ] each
1
0
-2
0
1
0
1
-1
-10
-30
-67
-138
-291
-642

Presently, larger values of k produce a “retain stack overflow” with the default Factor settings. I looked briefly into using our bend vocabulary, but it currently doesn’t support accessing the outer variable scope, and requires passing arguments on the stack. That would be a nice feature, and theoretically would then look like this simple example:

:: a ( k! x1 x2 x3 x4 x5 -- n )
    k 0 <= [
        x4 call( -- n ) x5 call( -- n ) +
    ] [
        BEND[ k 1 - dup k! [ fork ] x1 x2 x3 x4 a ]
    ] if ;

However, that doesn’t work at the moment. Maybe sometime in the future!

Quit

Wed, 12 Jun 2024 08:00:00 -0700

There is a funny recurring meme that we are all living inside a simulation or maybe even a matrix-in-a-matrix. Some people have even taken this as far as selling funny computer simulation t-shirts:

Well, I have resisted for a long time adding a special end program – quit – command to Factor. We have always supported Ctrl-D to end the listener session (which works in both the UI listener and in the command-line on most POSIX systems). You have also been able to exit with an error code, or even to stop after the last window has been closed. And if you really needed something, you could define your own end program word in your startup initialization file.

This came up again recently in a discussion with @nomennescio, one of our contributors, who has been pushing for a quit command that would be kind of like the Python version of How to Exit a Python Program in the Terminal:

Python 3.11.9 (main, Apr  2 2024, 08:25:04) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> quit
Use quit() or Ctrl-D (i.e. EOF) to exit
>>> quit()

This has become more important since they added saving listener history to the UI tools (currently saved to a history file when the listener window closes). This was paired with a couple of changes to make Factor attempt to close-all-windows and cleanup nicely using a shutdown hook and include the system vocabulary in the default list of interactive vocabularies.

This means that this works now, in both the command-line listener as well as the UI listener:

Factor 0.100 x86.64 (2271, heads/master-250db4215b, Jun  4 2024 18:08:03)
[Clang (GCC Homebrew Clang 18.1.6)] on macosx
IN: scratchpad quit

If you’re curious how it works, quit is an alias for calling exit with an error code of 0:

IN: system

: quit ( -- * ) 0 exit ;

This is available in a development version of Factor and will be in the next release.

Bend

Wed, 05 Jun 2024 08:00:00 -0700

The Bend programming language is a massively parallel, high-level programming language. It has some really neat dataflow concepts that make it inherently parallel because it infers the flow of variables and then naturally performs divide and conquer to utilize all available computing resources, both CPU and GPU.

This results in some impressive speedups:

Yet, since it uses a divide-and-conquer approach, which is inherently parallel, Bend will run it multi-threaded. Some benchmarks:

CPU, Apple M3 Max, 1 thread: 12.15 seconds

CPU, Apple M3 Max, 16 threads: 0.96 seconds

GPU, NVIDIA RTX 4090, 16k threads: 0.21 seconds

That’s a 57x speedup by doing nothing.

One of our current challenges in Factor is that we use green threads and are effectively single-threaded for computations. We have a non-blocking I/O implementation as well as integration with UI event loops, so it ends up feeling more concurrent than it actually is.

We hope to solve that in the future, both by supporting native threads and also by a multi-vm approach that is somewhat equivalent to the Python multiprocessing module. Some early ideas have been contributed towards this goal, but so far it is not complete.

Bend Example

One of the Bend examples that we are going to try and implement in Factor uses a bend and fork operations which creates concurrency points in a kind of fork-join model:

# given a shader, returns a square image
def render(depth, shader):
  bend d = 0, i = 0:
    when d < depth:
      color = (fork(d+1, i*2+0), fork(d+1, i*2+1))
    else:
      width = depth / 2
      color = shader(i % width, i / width)
  return color

# given a position, returns a color
# for this demo, it just busy loops
def demo_shader(x, y):
  bend i = 0:
    when i < 5000:
      color = fork(i + 1)
    else:
      color = 0x000001
  return color

# renders a 256x256 image using demo_shader
def main:
  return render(16, demo_shader)

Factor Syntax

A few days ago, Keldan Chapman contributed an early version of a bend vocabulary. And then, we spent some time working together on the Factor Discord server on some alternative implementations, one of which I want to go over below.

We are going to create a BEND[ quotation that provides support for a fork word that implicitly recurses (but not currently through CPU or GPU parallelism) and then joins the results together. We create an uninterned word to hold our computation and use with-words to make the fork word only valid in the parsing scope to recurse.

SYNTAX: BEND[
    gensym dup
    dup "fork" associate [ parse-quotation ] with-words
    dup infer define-declared suffix! ;

A simple use-case of this might be to compute the factorial through recursion:

: factorial ( n -- n! )
    BEND[ dup 1 > [ dup 1 - fork * ] when ] ;

Or, perhaps computing Fibonacci numbers through recursion:

: fibonacci ( n -- fib )
    BEND[ dup 1 > [ [ 1 - fork ] [ 2 - fork ] bi + ] when ] ;

Factor Example

With that syntax defined, we can now translate the above example to the following form:

: render ( depth shader -- color )
    0 0 BEND[
        [let :> ( depth shader d i )
            d depth < [
                depth shader d 1 + i 2 * fork
                depth shader d 1 + i 2 * 1 + fork
                2array
            ] [
                i depth 2/ /mod swap shader call( x y -- color )
            ] if
        ]
    ] ;

:: demo-shader ( x y -- color )
    0 BEND[ dup 5000 < [ 1 + fork ] [ drop 0x000001 ] if ] ;

: main ( -- color )
    16 [ demo-shader ] render ;

Obviously, this fork doesn’t (currently) increase performance, and it might shadow the fork from the unix.process vocabulary, but it could represent the start of a new computational approach in Factor.

I can’t wait to see more from the Bend programming language and I also hope to see more of these ideas appearing and improving in Factor!

Interpolate Formatting

Tue, 04 Jun 2024 06:00:00 -0700

I wrote almost a decade ago about some minor improvements to the interpolate vocabulary in Factor. This vocabulary provides support for “interpolating variable values into strings” – using words such as interpolate as well as the more recently added interpolate string syntax I" ".

Recently, I added support for format directives such as those used in our formatting vocabulary. This required a change to split out a format-directive EBNF and then I could allow format directives to be used in interpolate forms.

The result is something that works a lot more like formatted string literals in Python:

IN: scratchpad USE: interpolate

IN: scratchpad 1.2345 "${:011.5f}" interpolate
00001.23450

IN: scratchpad 1 ..= 11 [| x |
                   x dup x * dup x *
                   "${:2d} ${:3d} ${:4d}\n" interpolate
               ] each
 1   1    1
 2   4    8
 3   9   27
 4  16   64
 5  25  125
 6  36  216
 7  49  343
 8  64  512
 9  81  729
10 100 1000
11 121 1331

IN: scratchpad H{ { "n" 42 } } [
                   "Hello, Worker #${n:06x}!" interpolate
               ] with-variables
Hello, Worker #00002a!

Perhaps we can add some support for the name= directive, inline factor code instead of always stack and namespace variables, and other features that might be useful from f-string literals and other format libraries.

This is available in a recent nightly build!

Transducers

Mon, 03 Jun 2024 08:00:00 -0700

One of the distinct elements of Clojure that gets discussed on occasion are Clojure Transducers (not to be confused with Clojure Reducers. Specifically, transducers are a type of transformation that allows for composable algorithms using two primary concepts:

A reducing function is the kind of function you’d pass to reduce - it is a function that takes an accumulated result and a new input and returns a new accumulated result, with a reducing function signature of:

whatever, input -> whatever

A transducer (sometimes referred to as xform or xf) is a transformation from one reducing function to another, with a transducer signature of:

(whatever, input -> whatever) -> (whatever, input -> whatever)

In Clojure, the transducer is defined to have 3 different arity functions:

Init (arity 0) - should call the init arity on the nested transform rf, which will eventually call out to the transducing process.

Step (arity 2) - this is a standard reduction function but it is expected to call the rf step arity 0 or more times as appropriate in the transducer. For example, filter will choose (based on the predicate) whether to call rf or not. map will always call it exactly once. cat may call it many times depending on the inputs.

Completion (arity 1) - some processes will not end, but for those that do (like transduce), the completion arity is used to produce a final value and/or flush state. This arity must call the rf completion arity exactly once.

Similarly to Haskell’s stream fusion, this allows for a benefit of eliding intermediate sequences when multiple operations are applied. However, in some ways, they are also quite a bit more powerful.

I wanted to discuss one potential way to build a transducer-like thing in Factor.

Building a transducing function

Our reducing functions are any defined as ( prev elt -- next ) – used with reduce.

Our transducing functions have a similar stack effect, but with several rules:

if elt is null, skip applying the reducing function
if next is reduced, we can early exit and return it
if next is null, we keep the previous result

Note: we have a simple reduced wrapper type to indicate an early exit returns obj:
TUPLE: reduced obj ;

We can build a word that converts a reducing function to a transducing function:

: xf ( rf: ( prev elt -- next ) -- xf )
    '[ { [ over reduced? ] [ dup null eq? ] } 0|| [ drop ] _ if ] ;

A common transducing function is a mapping operation:

: xmap ( xf quot: ( elt -- newelt ) -- xf' )
    '[ @ dup { [ reduced? ] [ null eq? ] } 1|| _ unless ] xf ;

Building transduce

That’s enough definition for us to build our (transduce) word, operating on an identity and then applying the transducing function until a result is achieved.

: (transduce) ( seq identity xf: ( prev elt -- next ) -- result )
    swapd '[
        _ keepd over null eq? [ nip f ] [ drop dup reduced? ] if
    ] find 2drop dup reduced? [ obj>> ] when ; inline

And, for convenience we build ourselves a transduce word that takes a quotation, which is called to compose a transducing function (using make to build up an initialize quotation) and returning a step quotation, and then returning a block of code that calls (transduce) with the null identity.

MACRO: transduce ( quot: ( xf -- xf' ) -- result )
    [ [ nip ] swap call ] [ ] make swap '[ @ null _ (transduce) ] ;

Note: currently this does not implement a complete quotation, but perhaps some future transducing function will require that and we can add it.

This word only operates on sequences, but one could imagine adding support to apply transducers to other kinds of streams of objects including infinite lazy lists.

Building transducers

Using this logic, we can build a bunch of useful transducers with a small amount of code.

A filter transducer skips certain values:

: xfilter ( xf quot: ( elt -- ? ) -- xf' )
    '[ dup @ [ drop null ] unless ] xmap ;

A sum transducer adds up a series of numbers:

: xsum ( xf -- xf' )
    [let f :> n! [ 0 n! ] % [ n + n! n ] xmap ] ;

A prettyprint transducer prints out intermediate values for debugging:

: xpprint ( xf -- xf' ) [ dup . ] xmap ;

A collector transducer collects all results into a vector:

: xcollect ( xf -- xf' )
    [let f :> v! [ V{ } clone v! ] % [ v [ push ] keep ] xmap ] ;

A histogram transducer counts values:

: xhistogram ( xf -- xf' )
    [let f :> h! [ H{ } clone h! ] % [ h [ inc-at ] keep ] xmap ] ;

A unique transducer passes through only unique values:

: xunique ( xf -- xf' )
    [let f :> s! [ HS{ } clone s! ] %
        '[ [ s ?adjoin ] keep null ? ] xmap
    ] ;

And a lot more…

Performance

Some quick performance tests will show the benefit of fusing multiple operations together. The artificial test case is going to be a word that operates on a sequence, returning the sum of the square of prime numbers, if it is bigger than 5,000.

Using normal sequence operations:

: foo ( seq -- n )
    [ prime? ] filter
    [ sq ] map
    [ 5,000 > ] filter
    sum ;

Using our new transducers:

: bar ( seq -- n )
    [
        [ prime? ] xfilter
        [ sq ] xmap
        [ 5,000 > ] xfilter
        xsum
    ] transduce ;

And then see how it performs!

IN: scratchpad 1,000,000 100 randoms
               [ [ foo ] time ]
               [ [ bar ] time ] bi assert=
Running time: 0.037218208 seconds
Running time: 0.028653417 seconds

In this case, it’s about 25% faster. Depending on your use-case and how many intermediate sequence operations the transducer is able to eliminate, it could represent a nice speedup as well as potentially a more elegant method.

Some things we can improve upon:

provide length hints for cases where we are doing map operations
change the reduced and null logic to simplify transducer implementations
investigate using with-return to early exit instead of reduced objects
consider doing initialization differently than using local variables
adding completion logic to do things like free up memory at the end of computation

One of our contributors that uses Codewars experimented with a different approach to transducer words in Factor that is worth exploring as well.

This is available on my GitHub.

Deep Clone

Tue, 28 May 2024 08:00:00 -0700

Someone on the Factor Discord server asked if we have an existing word in Factor to deep clone an object. Specifically, a word that will recursively descend an object’s tree, cloning every visited object to make sure that no reference remains in common between the deep clone and the original.

The answer is: not exactly.

In particular, we have had a clone word that creates a practical shallow clone of an object using the (clone) primitive word – which generates a “byte-by-byte copy of the given object” – and sometimes makes sure to additionally clone certain slots for separate mutation of the cloned object.

But, we also have a serialize vocabulary that could be used (or abused?) to serialize an object and then deserialize it as a kind of poor man’s deep clone:

: deep-clone ( object -- object' )
    object>bytes bytes>object ;

This works, and supports the deep cloning of most objects. There are still a few edge cases that are not a full clone, for example:

The f object is a singleton, and is the deep clone of itself.
Words are not deep cloned, but return as references to themselves.
Integers that are fixnum return as themselves.
Continuations are not supported.

But, other than that, it is a very practical deep clone. And, because it transitions through a byte-array, in some ways is more defensive against errors introduced by a type-specific deep-clone vocabulary, at the cost of some space/time tradeoff. Some future discussion is taking place on what a deeper clone might look like, but this is quite useful as-is.

This is available in the latest development version.

Compressed Images

Mon, 27 May 2024 08:00:00 -0700

A recent contribution by @nomennescio enables support for loading compressed images in Factor!

I’m not talking about graphical images, but rather about the binary image that Factor uses to load from. Specifically, the binary image includes mainly the data and code heaps as well as some special objects that are used to initialize the Factor libraries.

The compressed image support uses the image_header to communicate that a newer compressed version of the Factor binary image should be loaded, instead of an uncompressed one. We currently use the Zstandard compression method, which offers a reasonable balance of speed and compressibility.

Compressibility

The released Factor binary image containing a reasonable default list of vocabularies to be loaded is around 127 megabytes (compressed to 20 megabytes).

127M    factor.image
20M     factor.image.compressed

One of the criticisms that we have received in the past is that a load-all image that loads the over 300,000 lines of Factor code in the main Factor repository can be almost 500 megabytes. While compressed, that gets significantly reduced down to 66 megabytes!

483M    factor.load-all.image
66M     factor.load-all.image.compressed

Performance

This is not without some cost: there is a small runtime delay when starting the Factor binary using a compressed image. For example, we can compare uncompressed and compressed results of loading a default image and doing nothing:

$ time ./factor -i=factor.image -e=""

real    0m0.105s
user    0m0.048s
sys     0m0.057s

$ time ./factor -i=factor.image.compressed -e=""

real    0m0.281s
user    0m0.230s
sys     0m0.050s

Or compare the results when using a load-all image:

$  time ./factor -i=factor.load-all.image -e=""

real    0m0.515s
user    0m0.258s
sys     0m0.257s

$ time ./factor -i=factor.load-all.image.compressed -e=""

real    0m1.042s
user    0m0.809s
sys     0m0.233s

That is not quite an apples-to-apples comparison, as the uncompressed version uses mmap and likely does not fully cache or page it all in, but the uncompressed image is fully uncompressed. However, it gives you a sense of where this feature is heading.

Deploy

If you run "hello-world" deploy you can create a relatively small deployed binary that prints Hello world when run. This can then be compressed manually, to see the difference in size (~25%) with negligible differences in runtime:

$ du -h hello-world*
1.8M    hello-world
1.3M    hello-world-compressed

$ time ./hello-world
Hello world

real    0m0.005s
user    0m0.001s
sys     0m0.004s

$ time ./hello-world-compressed
Hello world

real    0m0.005s
user    0m0.001s
sys     0m0.003s

Some additional work needs to be done to add support in the deploy tools for a checkbox to create binaries using compression, however this already represents a big win for anyone that’s more concerned about file sizes than startup latency.

Compression is currently supported using the tools.image.compressor vocabulary and uncompression using the tools.image.uncompressor vocabulary. This is a new feature and might change as it evolves, but this is a neat preview of things to come in the next release.

Give it a try!

Intersecting Ranges

Thu, 23 May 2024 12:00:00 -0700

After Factor 0.99, we have continued to add new features to what is likely to be an upcoming release. One of those that I would like to discuss today was a patch to add set methods to ranges that was contributed by Keldan Chapman. It was a really nice addition that makes our ranges support efficient set operations.

And that brings us to the topic of today’s blog post: intersecting ranges.

Specifically, a discussion that occurred over code style and I thought it would be interesting to see 3 different approaches to writing a reasonably complex intersecting ranges word in Factor.

You can see this in action, where intersect returns a range in the latest development version:

! Factor 0.99
IN: scratchpad 2 40 2 <range>
               1 40 3 <range> intersect .
V{ 4 10 16 22 28 34 40 }

! Factor 0.100 (git)
IN: scratchpad 2 40 2 <range>
               1 40 3 <range> intersect .
T{ range { from 4 } { length 7 } { step 6 } }

Some inspiration came from looking at the Julia programming language and their intersect function.

Local Variables

The first version used local variables and after a smidge of cleanup, looked pretty good:

:: intersect-range ( range1 range2 -- range3 )
    range1 empty? range2 empty? or [ empty-range ] [
        range1 >forward-range< :> ( start1 stop1 step1 )
        range2 >forward-range< :> ( start2 stop2 step2 )
        step1 step2 gcd :> ( x g )
        start1 start2 - g /mod :> ( z y )
        y zero? not [ empty-range ] [
            start1 x z step1 * * - :> b
            step1 step2 lcm :> a
            start1 start2 [ b over - a rem + ] bi@ max :> m
            stop1  stop2  [  dup b - a rem - ] bi@ min :> n
            m n a <range>
        ] if
    ] if ;

Stack Shuffling

Due to a bug with using local definitions in a bootstrap image, we temporarily switched to a version that used stack shuffling. As you can see – and is often the case when you are dealing with a large number of items on the stack – it isn’t particularly readable.

: intersect-range ( range1 range2 -- range3 )
    2dup [ empty? ] either? [ 2drop empty-range ] [
        [ >forward-range< ] bi@ [ -rot ] [ swap ] [ ] tri* [
            [ reach reach - ] 2dip gcd swap [ /mod ] dip swap
        ] 2keep rot zero? not [
            4drop 4drop empty-range
        ] [
            [ reach ] 4dip dupd lcm [ * * - ] dip [
                [ '[ [ _ over - _ rem + ] bi@ max ] 2dip ]
                [ '[ dup _ - _ rem - ] bi@ min ] 2bi
            ] keep <range>
        ] if
    ] if ;

Small Words

Usually when stack shuffling doesn’t help, sometimes extracting out pieces of logic into words and naming them can often result in a simpler or cleaner version. It did turn out a bit easier to read, but is still overly shuffly.

: explode-ranges ( range1 range2 -- start1 start2 stop1 stop2 step1 step2 )
    [ >forward-range< ] bi@ [ -rot ] [ swap ] [ ] tri* ;

: compute-z-y-x ( start1 start2 step1 step2 -- z y x )
    gcd [ - ] 2dip swap [ /mod ] dip ;

: compute-b-a ( start1 x z step1 step2 -- b a )
    dupd lcm [ * * - ] dip ;

: intersected-range ( start1 start2 stop1 stop2 b a -- range )
    [
        [ '[ [ _ over - _ rem + ] bi@ max ] 2dip ]
        [ '[ dup _ - _ rem - ] bi@ min ] 2bi
    ] keep <range> ;

: intersect-range ( range1 range2 -- range3 )
    2dup [ empty? ] either? [ 2drop empty-range ] [
        explode-ranges [
            [ reach reach ] 2dip compute-z-y-x swap
        ] 2keep rot zero? not [
            4drop 4drop empty-range
        ] [
            [ reach ] 4dip compute-b-a intersected-range
        ] if
    ] if ;

Well, likely the first version was the nicest, but it’s an interesting exercise to think about readability, verbosity, and concatenative thinking. Sometimes you need to puzzle through a piece of code to find the one you like the best.

Argument Parser

Tue, 21 May 2024 10:00:00 -0700

Recently, some discussions on our Factor Discord server reminded me that we were missing an important feature for parsing command-line arguments: an argument parser.

An argument parser is something that parses structured command-line arguments and is often helpful in printing command-line usage information. I thought it would be useful to go over a few different ways to parse command-line arguments using Factor.

Version 0

In our most recent release, command-line parsing was kind of manual and idiosyncratic. For example, this is how arguments were parsed for the STOMP command-line interface that I built recently:

: stomp-options ( args -- )
    [
        unclip >lower {
            { [ dup { "-h" "--host" } member? ] [ unclip stomp-host set-global ] }
            { [ dup { "-p" "--port" } member? ] [ unclip string>number stomp-port set-global ] }
            { [ dup { "-u" "--username" } member? ] [ unclip stomp-username set-global ] }
            { [ dup { "-w" "--password" } member? ] [ unclip stomp-password set-global ] }
        } cond stomp-options
    ] unless-empty ;

Version 1

The Factor binary already does some amount of argument parsing, and so I thought I would extend the command-line vocabulary to provide some support for simple options parsing. Specifically, I added a command-line-options word that could be used like so:

: stomp-options ( args -- )
    command-line-options drop
    "host" get "127.0.0.1" or stomp-host set-global
    "port" get [ string>number ] [ 61613 ] if* stomp-port set-global
    "username" get stomp-username set-global
    "password" get stomp-password set-global ;

This uses our existing parameter parsing code, but stores the parsed options as string keys and string or boolean values in the namespace, doesn’t easily allow for boolean values that default to true, and doesn’t do any conversion or validation of arguments.

Version 2

I was finally able to build something better that could serve our users. It is modeled after the Python argparse module. Using the newly developed command-line.parser vocabulary, we can now define some option objects to create the original example:

{
    T{ option
        { name "--host" }
        { help "set the hostname" }
        { type ipv4 }
        { variable stomp-host }
        { default T{ ipv4 f "127.0.0.1" } }
    }
    T{ option
        { name "--port" }
        { help "set the port" }
        { type integer }
        { variable stomp-port }
        { default 61613 }
    }
    T{ option
        { name "--username" }
        { help "set the username" }
        { variable stomp-username }
    }
    T{ option
        { name "--password" }
        { help "set the password" }
        { variable stomp-password }
    }
} [
    stomp-host get .
    stomp-port get .
    stomp-username get .
    stomp-password get .
] with-options

This now allows for better default values and automatic --help output:

$ ./factor -run=stomp.cli --help
Usage:
    factor -run=stomp.cli [options] [arguments]

Options:
    --help                 show this help and exit
    --host HOST            set the hostname (default: 127.0.0.1)
    --port PORT            set the port (default: 61613)
    --username USERNAME    set the username
    --password PASSWORD    set the password

And some automatic error checking:

$ ./factor -run=stomp.cli --port
ERROR: Expected more arguments for option 'port'

$ ./factor -run=stomp.cli --port asdf
ERROR: Invalid value 'asdf' for option 'port'

$ ./factor -run=stomp.cli --prot
ERROR: Unknown option 'prot'

$ ./factor -run=stomp.cli a b c d
ERROR: Unrecognized arguments: a b c d

$ ./factor -run=stomp.cli -h
ERROR: The option 'h' matches more than one (host, help)

Additionally, this includes some features such as fuzzy matching of options, validation that all required options are provided, constant values when an option is specified, support for type coercion and type validation of option values, support for mixing optional and positional arguments, as well as specifying the number of expected arguments that an option requires.

Some things that I would still like to build include support for short option codes, exit codes when argument parsing errors occur, and perhaps support for docopt style declarations.

This is available in the development version – give it a try!

STOMP

Thu, 09 May 2024 08:00:00 -0700

Over the last few days, I implemented some support for STOMP, the “Simple Text Oriented Messaging Protocol”:

STOMP provides an interoperable wire format so that STOMP clients can communicate with any STOMP message broker to provide easy and widespread messaging interoperability among many languages, platforms and brokers.

There are three versions of the protocol available:

STOMP 1.0
STOMP 1.1
STOMP 1.2 (the latest version)

In the interest of learning Factor, I thought I would write a bit about parsing the STOMP protocol, and then about how to implement a client library using connection-oriented networking, interacting with it using mailboxes, and then building a command-line interface using the command-loop vocabulary.

There are many STOMP servers and clients available in different languages. I tried a few and decided that Apache ActiveMQ was one of the most convenient to setup and reliable to work with, but others are available as well. On my macOS laptop, this can be accomplished by:

$ brew install activemq

$ brew services start activemq

Protocol

The STOMP protocol consists of frames that are sent and received between the STOMP client and the STOMP server. Each frame consists of a command, some headers, and a body:

TUPLE: frame command headers body ;

: <frame> ( command -- frame ) LH{ } clone f frame boa ;

: set-header ( frame header-value header-name -- frame )
    pick headers>> set-at ;

An example SEND message might look like this, with the ^@ indicating a NUL byte to end the message:

SEND
destination:/queue/a

hello queue a
^@

We will begin by implementing words to read each of these sections. The command is the first line, followed by a series of name:value headers before a blank line, and then a body (specified either by a content-length header, or reading until a NUL byte is encountered):

: read-command ( -- command )
    readln ;

: read-headers ( -- headers )
    [ readln dup empty? not ] [ ":" split1 2array ] produce nip ;

: read-body ( content-length/f -- body )
    [ read read1 ] [ B{ 0 } read-until ] if* 0 assert= ;

And then implement a read-frame word that uses those to build up a frame tuple:

: read-frame ( -- frame )
    read-command
    read-headers
    dup "content-length" of string>number
    read-body frame boa ;

We can implement a write-frame word writing it out in the expected structure:

: write-frame ( frame -- )
    [ command>> print ]
    [ headers>> [ ":" swap [ write ] tri@ nl ] assoc-each nl ]
    [ body>> [ write ] when* 0 write1 ] tri flush ;

The CONNECT frame is typically the first one sent to the server:

SYMBOL: stomp-username
SYMBOL: stomp-password

: stomp-connect ( -- frame )
    "CONNECT" <frame>
        stomp-username get [ "login" set-header ] when*
        stomp-password get [ "passcode" set-header ] when* ;

The server will respond to CONNECT with a CONNECTED frame, which we can wait for:

: wait-for-connected ( -- frame )
    f [ drop read-frame dup command>> "CONNECTED" = not ] loop ;

: stomp-connect-and-wait ( -- frame )
    stomp-connect write-frame wait-for-connected ;

The SEND frame contains a body that is sent to a destination:

:: stomp-send ( destination body -- frame )
    "SEND" <frame>
        body >>body
        destination "destination" set-header ;

The SEND frame can also be used to send a file (using MIME types for automatic content encoding):

:: stomp-sendfile ( destination path -- frame )
    "SEND" <frame>
        destination "destination" set-header
        path dup mime-type
        [ mime-type-encoding file-contents >>body ]
        [ "content-type" set-header ] bi ;

The destination is a message queue, which we can SUBSCRIBE or UNSUBSCRIBE from:

:: stomp-subscribe ( destination -- frame )
    "SUBSCRIBE" <frame>
        destination "destination" set-header ;

:: stomp-unsubscribe ( destination -- frame )
    "UNSUBSCRIBE" <frame>
        destination "destination" set-header ;

There are also words to support transactions (BEGIN, COMMIT, ABORT), indicate message receipt (ACK, NACK), as well as to DISCONNECT from the server.

Client

Using those words, we can move on to the networking component that will enable connecting to a STOMP server and interacting with it. An inner loop takes a mailbox that a client can use to enqueue frames to be sent, and a quotation that will be called with each received frame from the server.

:: stomp-loop ( mailbox quot: ( frame -- ) -- )
    stomp-connect-and-wait drop

    [ mailbox mailbox-get write-frame t ]
    "stomp writer" spawn-server drop

    [ read-frame quot call t ] loop ; inline

The client library can use this to connect and print out any received frames, for example:

"127.0.0.1" 61613 utf8 [
    <mailbox> [ [ . flush ] with-global ] stomp-loop
] with-client

Command-Line

To build a simple command-line interface, we first need to create our mailbox and store a reference to it:

CONSTANT: stomp-mailbox $[ <mailbox> ]

And then make a word to put frames into it:

: put-frame ( frame -- )
    stomp-mailbox mailbox-put ;

Using the command-loop vocabulary, we can define some supported commands:

CONSTANT: COMMANDS {
    T{ command
        { name "send" }
        { quot [ " " split1 stomp-send put-frame ] }
        { help "Send a message to a destination in the messaging system." } }
    T{ command
        { name "sendfile" }
        { quot [ " " split1 stomp-sendfile put-frame ] }
        { help "Send a file to a destination in the messaging system." } }
    T{ command
        { name "subscribe" }
        { quot [ stomp-subscribe put-frame ] }
        { help "Subscribe to a destination." } }
    T{ command
        { name "unsubscribe" }
        { quot [ stomp-unsubscribe put-frame ] }
        { help "Unsubscribe from a destination." } }
}

Before we run our command-loop, we start a thread to connect to the STOMP server, and configure it to send frames queued in the mailbox and print out the frames that are received:

INITIALIZED-SYMBOL: stomp-host [ "127.0.0.1" ]
INITIALIZED-SYMBOL: stomp-port [ 61613 ]

: start-stomp-client ( -- )
    [
        stomp-host get stomp-port get <inet4> utf8 [
            stomp-mailbox [ [ nl . flush ] with-global ] stomp-loop
        ] with-client
    ] in-thread ;

And then a simple word to start the command-loop:

: stomp-main ( -- )
    "Welcome to STOMP!" "STOMP>" <command-loop>
    COMMANDS [ over add-command ] each
    start-stomp-client run-command-loop ;

MAIN: stomp-main

And then you can try running it!

$ ./factor -run=stomp.cli
Welcome to STOMP!
STOMP> subscribe /queue/test
STOMP> send /queue/test hello world
T{ frame
    { command "MESSAGE" }
    { headers
        {
            { "expires" "0" }
            { "destination" "/queue/test" }
            { "subscription" "1" }
            { "priority" "4" }
            {
                "message-id"
                "ID\\chostname-59660-1715273573218-3\\c3\\c-1\\c1\\c1"
            }
            { "timestamp" "1715276926454" }
        }
    }
    { body "hello world" }
}

In addition to this, support was added for all the other client messages, all three versions of the STOMP protocol, automatic heartbeats, graceful disconnect using message receipts, words to make using transactions easier, as well as a debug mode to print sent and received frames from the network, and command-line options for configuring the hostname, port, username, and passwords.

As always, there are a few things it would be nice to add – for example: better support for SSL/TLS connections, automatic reconnect attempts with backoff algorithm, better display of connection status in the command-line, and additional protocol-level support like automatic generation of ACK and NACK, and testing with additional STOMP compliant message brokers.

This is available in the development version. Take a look!

Time My Meeting

Wed, 01 May 2024 20:00:00 -0700

Recently, I bumped into Time My Meeting, a cute website that runs a timer for how long a meeting has run and then shows you a fun comparison versus something memorable that has taken a similar amount of time.

I thought it might make a nice demo in Factor:

Our program starts with a list of things that take time and how many milliseconds they take:

CONSTANT: THINGS-THAT-TAKE-TIME {
    ! <10 seconds
    { "A single frame of a film" 100 }
    { "It would take light to go around the Earth" 133 }
    { "A blink of an eye" 400 }
    { "The time it takes light to reach Earth from the moon" 1255 }
    { "The fastest Formula 1 pit stop" 1820 }
    { "The fastest 1/4 mile drag race time" 3580 }
    { "The fastest Rubik's cube solve" 4221 }
    { "The fastest 40-yard time at the NFL Combine" 4240 }
    { "The fastest 1 liter beer chug" 4370 }
    { "A skippable Youtube ad" 5000 }
    { "A full bull ride" 8000 }
    { "The fastest 100m sprint" 9580 }

    ! 10 Seconds
    { "The Wright Brothers first flight" 12000 }
    { "The fastest 200m sprint" 19190 }
    { "The fastest 50m freestyle swim lap" 21300 }
    { "The Westminster Kennel Club dog agility record" 28440 }
    { "A typical television ad" 30000 }
    { "The fastest NASCAR lap at Daytona" 40364 }
    { "The fastest 400m sprint" 43030 }
    { "The fastest NASCAR lap at Talladega" 44270 }
    { "The fastest 100m freestyle swim lap" 47050 }
}

We need a small word to turn those milliseconds into a useful string:

: human-time ( milliseconds -- string )
    1000 / dup 60 <
    [ "%.1f seconds" sprintf ]
    [ seconds duration>human-readable ] if ;

We may also need to know what the next thing that takes time will be, based on the total elapsed time:

: next-thing-that-takes-time ( elapsed-millis -- elt )
    THINGS-THAT-TAKE-TIME [ second < ] with find nip ;

Command-Line

First, we are going to make a simple word to run this on the command-line, iterating through the things that take time and then sleeping the appropriate amount of time, and then printing them out as they pass:

: time-my-meeting. ( -- )
    now THINGS-THAT-TAKE-TIME [
        [ milliseconds pick time+ sleep-until ]
        [ human-time "%s (%s)\n" printf flush ] bi
    ] assoc-each drop ;

You can run it and get something like this:

IN: scratchpad time-my-meeting.
A single frame of a film (0.1 seconds)
It would take light to go around the Earth (0.1 seconds)
A blink of an eye (0.4 seconds)
The time it takes light to reach Earth from the moon (1.3 seconds)
The fastest Formula 1 pit stop (1.8 seconds)
The fastest 1/4 mile drag race time (3.6 seconds)
The fastest Rubik's cube solve (4.2 seconds)
The fastest 40-yard time at the NFL Combine (4.2 seconds)
The fastest 1 liter beer chug (4.4 seconds)
A skippable Youtube ad (5.0 seconds)
A full bull ride (8.0 seconds)
The fastest 100m sprint (9.6 seconds)
The Wright Brothers first flight (12.0 seconds)
...

User Interface

We are also going to build the interface shown above, starting with a gadget that stores a timer, a total elapsed time in milliseconds, and a meeting start timestamp.

TUPLE: meeting-gadget < track timer total start ;

There are different strategies for building user interfaces, depending on the data model, and how composable or how separate the elements being displayed are from each other.

In the interest of tutorials, I want to demonstrate one strategy below that uses local variables to bind the elements to each other, allowing them to be updated in a kind of reactive manner. It is a long word, but the structure of the code matches somewhat to the rendered output that we are going for.

:: <meeting-gadget> ( -- gadget )
    vertical meeting-gadget new-track dup :> meeting
        COLOR: #f7f08b <solid> >>interior
        0 >>total

        "" <label> :> current-text
        "" <label> :> current-time

        "" <label> :> total-time
        "" <label> :> start-time

        THINGS-THAT-TAKE-TIME first first2 human-time
        [ <label> ] bi@ :> ( next-text next-time )

        [
            meeting total>>
            meeting [ now dup ] change-start drop swap time- duration>milliseconds +
            dup meeting total<<

            dup next-thing-that-takes-time first2
            over next-text string>> = [ 2drop ] [
                next-text string>> current-text string<<
                next-time string>> current-time string<<
                human-time
                next-time string<<
                next-text string<<
            ] if

            human-time total-time string<<
        ] f 100 milliseconds <timer> >>timer

        vertical <track>
            current-text f track-add
            current-time f track-add
        "This meeting is longer than..." <labeled-gadget> f track-add

        vertical <track>
            total-time f track-add
            start-time f track-add
        "It has been going on for..." <labeled-gadget> f track-add

        vertical <track>
            next-text f track-add
            next-time f track-add
        "The next milestone is..." <labeled-gadget> f track-add

        "Start" <label> :> start-label
        "Reset" <label> :> reset-label

        horizontal <track>
            start-label [
                drop
                meeting
                dup start>> [
                    0 >>total now timestamp>hms
                    "Started at " prepend start-time string<<
                ] unless
                now >>start
                timer>> dup thread>>
                [ stop-timer "Resume" start-label string<< ]
                [ start-timer "Pause" start-label string<< ] if
            ] <border-button> f track-add

            reset-label [
                drop
                meeting 0 >>total f >>start timer>> stop-timer
                "Start" start-label string<<
                "" current-text string<<
                "" current-time string<<
                "" total-time string<<
                "" start-time string<<
                THINGS-THAT-TAKE-TIME first first2 human-time
                next-time string<<
                next-text string<<
            ] <border-button> f track-add
        f track-add ;

And, then a main entrypoint to open a window when the vocabulary is run:

MAIN-WINDOW: time-my-meeting
    { { title "Time My Meeting" } }
    <meeting-gadget> >>gadgets ;

With a smidge of improved fonts and better gadget spacing, this is now available in my GitHub.

You can try it out!

Factor Language Tutorial

Sun, 28 Apr 2024 09:00:00 -0700

A few days ago, one of our Factor Discord server members posted a video tutorial that they made for Factor. It is a pretty neat hour long introduction going over a lot of features that users new to the language might be interested in:

This is an introductory tutorial for a stack-based (concatenative) programming language Factor. It covers some basic language constructs and a few features of the interactive development environment that is shipped with Factor.

You can watch it here:

Reverse Vowels

Mon, 12 Feb 2024 12:00:00 -0700

Our task today is to “reverse vowels of a string”. This sounds like (and probably is) a coding interview question as well as a LeetCode problem, a Codewars kata, and the second task in the Perl Weekly Challenge #254.

If you don’t want spoilers, maybe stop reading here!

We are going to use Factor to solve this problem as well as a variant that is a bit more challenging.

Let’s Reverse The Vowels

One of the benefits of the monorepo approach that we have taken to building the extensive Factor standard library is developing higher-level words that solve specific kind of tasks.

One of those is arg-where – currently in the miscellaneous sequences.extras vocabulary – which we can use to find all the indices in a string that contain a vowel?:

IN: scratchpad "hello" [ vowel? ] arg-where .
V{ 1 4 }

We’ll want to group the beginning and ending indices, ignoring the middle index if the number of indices is odd since it would not change:

: split-indices ( indices -- head tail )
    dup length 2/ [ head-slice ] [ tail-slice* ] 2bi ;

We can then build a word to reverse specified indices:

: reverse-indices ( str indices -- str )
    split-indices <reversed> [ pick exchange ] 2each ;

And then use it to reverse the vowels:

: reverse-vowels ( str -- str )
    dup >lower [ vowel? ] arg-where reverse-indices ;

And see how it works:

IN: scratchpad "factor" reverse-vowels .
"foctar"

IN: scratchpad "concatenative" reverse-vowels .
"cencitanetavo"

Pretty cool!

Let’s Reverse The Vowels, Maintain The Case

A somewhat more challenging task is to reverse the vowels, and to swap their letter case.

Let’s start by building a word to swap the case of two letters:

: swap-case ( a b -- a' b' )
    2dup [ letter? ] bi@ 2array {
        { { t f } [ [ ch>upper ] [ ch>lower ] bi* ] }
        { { f t } [ [ ch>lower ] [ ch>upper ] bi* ] }
        [ drop ]
    } case ;

And then another word to exchange two indices, but also swap their case:

: exchange-case ( i j seq -- )
    [ '[ _ nth ] bi@ swap-case ]
    [ '[ _ set-nth ] bi@ ] 3bi ; inline

A word to reverse the indices, but also swap their case:

: reverse-indices-case ( str indices -- str )
    split-indices <reversed> [ pick exchange-case ] 2each ;

And, finally, a word to reverse the vowels, but also swap their case:

: reverse-vowels-case ( str -- str )
    dup >lower [ vowel? ] arg-where reverse-indices-case ;

And then see how it works:

IN: scratchpad "FActor" reverse-vowels-case .
"FOctar"

A pretty fun problem!

Dragonbox

Sun, 11 Feb 2024 19:30:00 -0700

One of the challenging problems in computer science is to efficiently take a binary representation of a floating-point number and convert it to the “shortest decimal representation” that will roundtrip back to the same floating-point number when it is parsed.

A few days ago, one of the members of the Factor Discord server posted about an issue they were having where three separate floating-point numbers printed as the same decimal value:

IN: scratchpad 0x1.1ffffffffffffp7 .
144.0

IN: scratchpad 0x1.2p7 .
144.0

IN: scratchpad 0x1.2000000000001p7 .
144.0

Well, that’s not ideal!

And you can see that in other languages like Python, they parse properly into three distinct values:

>>> float.fromhex('0x1.1ffffffffffffp7')
143.99999999999997

>>> float.fromhex('0x1.2p7')
144.0

>>> float.fromhex('0x1.2000000000001p7')
144.00000000000003

In the process of investigating this issue, I re-discovered a few algorithms that have been developed to do this. There is a neat project called Drachennest that investigates the relative performance of several of these algorithms and claims:

Grisu3, Ryu, Schubfach, and Dragonbox are optimal, i.e. the output string

rounds back to the input number when read in,

is as short as possible,

is as close to the input number as possible.

Well, it turns out that the “Dragonbox” algorithm is one of the current best and is described in a paper called A New Floating-Point Binary-to-Decimal Conversion Algorithm as well as a fantastic reference implementation of Dragonbox in C++.

I was able to quickly fix the bug by temporarily using a “modern formatting library” called {fmt} that works in C++11 and provides a version of the C++20 function std::format, but thought it would be a good idea to implement the Dragonbox algorithm someday in pure Factor code and filed an issue to track that idea.

Well, one of our awesome contributors, Giftpflanze, jumped in and implemented Dragonbox in Factor – providing a very readable and understandable and nicely concatenative version – and it was merged today!

Not only does this solve the issue of decimal representation of floats, but it provides quite a large speedup to our float-parsing benchmark:

Currently in Factor 0.99:

IN: scratchpad gc [ parse-float-benchmark ] time
Running time: 3.181906583 seconds

And now after the patch:

IN: scratchpad gc [ parse-float-benchmark ] time
Running time: 0.378132792 seconds

Very impressive!

I’m excited to say that this is now available in the development version of Factor.

Divmods

Fri, 02 Feb 2024 08:00:00 -0700

There’s a discussion on support multiple divisors in divmod() on the Python.

So instead of

minutes, seconds = divmod(t, 60)
hours, minutes = divmod(minutes, 60)
days, hours = divmod(hours, 24)
weeks, days = divmod(days, 7)

you could write:

weeks, days, hours, minutes, seconds = divmod(t, 7, 24, 60, 60)

Sample implementation:

def new_divmod(dividend, *divisors):
    if not divisors:
        raise TypeError('required at least one divisor')
    remainders = []
    for divisor in reversed(divisors):
        dividend, remainder = old_divmod(dividend, divisor)
        remainders.append(remainder)
    return (dividend, *remainders[::-1])

Along with the sample implementation in Python above, the original author provides some thoughts on whether the order of arguments should be reversed or not, and some of the comments in the thread discuss various implementation details and some other use-cases for this approach.

You can see how it might work by trying the code:

>>> new_divmod(1234567, 7, 24, 60, 60)
(2, 0, 6, 56, 7)

Okay, so how might we do this in Factor?

Well, our version of divmod is /mod and we could just run it a few times to get the result:

IN: scratchpad 1234567 60 /mod swap 60 /mod swap 24 /mod swap 7 /mod swap

--- Data stack:
7
56
6
0
2

Alternatively, we could pass the arguments as a sequence and return the result as a sequence:

IN: scratchpad 1234567 { 60 60 24 7 } [ /mod ] map swap suffix

--- Data stack:
{ 7 56 6 0 2 }

Or, perhaps, we could make a macro, taking the input argument as a sequence, but generating code to put the result onto the stack:

MACRO: /mods ( seq -- quot )
    [ '[ _ /mod swap ] ] map concat ;

And then use it:

IN: scratchpad 1234567 { 60 60 24 7 } /mods

--- Data stack:
7
56
6
0
2

Kind of an interesting idea!

Crontab

Wed, 31 Jan 2024 08:00:00 -0700

Cron might be the latest, greatest, and coolest “next-generation calendar” as well as now a product called Notion Calendar. But in the good ol’ days, cron was instead known as:

The cron command-line utility is a job scheduler on Unix-like operating systems. Users who set up and maintain software environments use cron to schedule jobs (commands or shell scripts), also known as cron jobs, to run periodically at fixed times, dates, or intervals. It typically automates system maintenance or administration—though its general-purpose nature makes it useful for things like downloading files from the Internet and downloading email at regular intervals.

There are implementations of crond – the cron daemon – on most operating systems. Many of them have standardized on a crontab format that looks something like this:

# ┌───────────── minute (0–59)
# │ ┌───────────── hour (0–23)
# │ │ ┌───────────── day of the month (1–31)
# │ │ │ ┌───────────── month (1–12)
# │ │ │ │ ┌───────────── day of the week (0–6) (Sunday to Saturday;
# │ │ │ │ │                                   7 is also Sunday on some systems)
# │ │ │ │ │
# │ │ │ │ │
# * * * * * <command to execute>

At first (and sometimes second and third and fourth) glance, this looks a bit inscrutable, and so websites such as crontab guru pop up to help you unpack and explain when a cronentry is expected to be run.

I thought it would be fun to build a parser for these cronentries in Factor.

Let’s start by defining a cronentry type:

TUPLE: cronentry minutes hours days months days-of-week command ;

For each component, there is a variety of allowed inputs:

all values in the range: *
list of values: 3,5,7
range of values: 10-15
step values: 1-20/5
random value in range: 10~30

We build a parse-value word that will take an input string, a quot to parse the input, and a seq of possible values, as well as a parse-range word to help with optional starting and ending input values.

:: parse-range ( from/f to/f quot: ( input -- value ) seq -- from to )
    from/f [ seq first ] quot if-empty
    to/f [ seq last ] quot if-empty ; inline

:: parse-value ( input quot: ( input -- value ) seq -- value )
    input {
        { [ dup "*" = ] [ drop seq ] }

        { [ CHAR: , over member? ] [
            "," split [ quot seq parse-value ] map concat ] }

        { [ CHAR: / over member? ] [
            "/" split1 [
                quot seq parse-value dup length 1 =
                [ seq swap first seq index seq length ]
                [ 0 over length ] if 1 -
            ] dip string>number <range> swap nths ] }

        { [ CHAR: - over member? ] [
            "-" split1 quot seq parse-range [a..b] ] }

        { [ CHAR: ~ over member? ] [
            "~" split1 quot seq parse-range [a..b] random 1array ] }

        [ quot call 1array ]
    } cond members sort ; inline recursive

We can then make parse-cronentry to parse the entry description, handling days and months differently to allow their abbreviations to be passed as input (e.g., sun for Sunday or jan for January).

: parse-day ( str -- n )
    [ string>number dup 7 = [ drop 0 ] when ] [
        >lower $[ day-abbreviations3 [ >lower ] map ] index
    ] ?unless ;

: parse-month ( str -- n )
    [ string>number ] [
        >lower $[ month-abbreviations [ >lower ] map ] index
    ] ?unless ;

: parse-cronentry ( entry -- cronentry )
    " " split1 " " split1 " " split1 " " split1 " " split1 {
        [ [ string>number ] T{ range f 0 60 1 } parse-value ]
        [ [ string>number ] T{ range f 0 24 1 } parse-value ]
        [ [ string>number ] T{ range f 1 31 1 } parse-value ]
        [ [ parse-month ] T{ range f 1 12 1 } parse-value ]
        [ [ parse-day ] T{ circular f T{ range f 0 7 1 } 1 } parse-value ]
        [ ]
    } spread cronentry boa ;

We can try using it to see what a parsed cronentry looks like:

IN: scratchpad "20-30/5 5 */5 * * /path/to/command" parse-cronentry .
T{ cronentry
    { minutes { 20 25 30 } }
    { hours { 5 } }
    { days { 1 6 11 16 21 26 31 } }
    { months { 1 2 3 4 5 6 7 8 9 10 11 12 } }
    { days-of-week { 0 1 2 3 4 5 6 } }
    { command "/path/to/command" }
}

Now that we have that working, we can use it to calculate the next-time-after a given timestamp that the cronentry will trigger, applying a waterfall to rollover the timestamp until a valid one is found:

:: (next-time-after) ( cronentry timestamp -- )

    f ! should we keep searching for a matching time

    timestamp month>> :> month
    cronentry months>> [ month >= ] find nip
    dup month = [ drop ] [
        [ cronentry months>> first timestamp 1 +year drop ] unless*
        timestamp 1 >>day 0 >>hour 0 >>minute month<< drop t
    ] if

    timestamp day-of-week :> weekday
    cronentry days-of-week>> [ weekday >= ] find nip [
        cronentry days-of-week>> first 7 +
    ] unless* weekday - :> days-to-weekday

    timestamp day>> :> day
    cronentry days>> [ day >= ] find nip [
        cronentry days>> first timestamp days-in-month +
    ] unless* day - :> days-to-day

    cronentry days-of-week>> length 7 =
    cronentry days>> length 31 = 2array
    {
        { { f t } [ days-to-weekday ] }
        { { t f } [ days-to-day ] }
        [ drop days-to-weekday days-to-day min ]
    } case [
        timestamp 0 >>hour 0 >>minute swap +day 2drop t
    ] unless-zero

    timestamp hour>> :> hour
    cronentry hours>> [ hour >= ] find nip
    dup hour = [ drop ] [
        [ cronentry hours>> first timestamp 1 +day drop ] unless*
        timestamp 0 >>minute hour<< drop t
    ] if

    timestamp minute>> :> minute
    cronentry minutes>> [ minute >= ] find nip
    dup minute = [ drop ] [
        [ cronentry minutes>> first timestamp 1 +hour drop ] unless*
        timestamp minute<< drop t
    ] if

    [ cronentry timestamp (next-time-after) ] when ;

: next-time-after ( cronentry timestamp -- timestamp )
    [ dup cronentry? [ parse-cronentry ] unless ]
    [ 1 minutes time+ 0 >>second ] bi*
    [ (next-time-after) ] keep ;

This is great, because we can find the next time that a cronentry will trigger. For example, if we wanted to specify something to trigger at midnight on every leap day:

IN: scratchpad "0 0 29 2 *" now next-time-after timestamp>rfc822 .
"Thu, 29 Feb 2024 00:00:00 -0800"

Or even, the next several times that the cronentry will trigger:

IN: scratchpad "0 0 29 2 *" now 5 [
                   dupd next-time-after [ timestamp>rfc822 . ] keep
               ] times 2drop
"Thu, 29 Feb 2024 00:00:00 -0800"
"Tue, 29 Feb 2028 00:00:00 -0800"
"Sun, 29 Feb 2032 00:00:00 -0800"
"Fri, 29 Feb 2036 00:00:00 -0800"
"Wed, 29 Feb 2040 00:00:00 -0800"

This is available in the crontab vocabulary including some features such as support for aliases (e.g., @daily and @weekly) and some higher-level words for working with crontabs and cronentries.

Codewars

Mon, 29 Jan 2024 08:00:00 -0700

Codewars is an online platform for learning programming languages by solving small programming exercises called “kata” and subsequently increasing your degree of proficiency via levels of “kyu”. It has useful features such as extensive unit tests, leaderboards, allies for allowing friendly competition, and discussion boards.

It supports an incredible number of programming languages – albeit some of these are in “beta” status – including Factor!

I wanted to draw attention to the Codewars website and point out that it has newly released support for Factor 0.99 due to great community support and some work on the Codewars test vocabulary that was developed specifically for use with the Codewars system.

It’s pretty fun to complete the katas and then see the solutions that other users have contributed.

Give it a try!

Special Numbers

Mon, 15 Jan 2024 08:00:00 -0700

Lots of numbers are special in various definitions of specialness. This often forms the basis of different programming challenges. In the case of the most recent Perl Weekly Challenge #252, the problem statement declares that a number is “special” in this way:

You are given an array of integers, @ints.

Write a script to find the sum of the squares of all special elements of the given array.

An element $int[i] of @ints is called special if i divides n, i.e. n % i == 0, where n is the length of the given array. Also the array is 1-indexed for the task.

And it gives two examples, which we can use as test cases later when we solve this in Factor.

Spoiler Alert: This weekly challenge deadline is due in a few days from now (on January 21, 2024 at 23:59). This blog post provides some solutions to this challenge. Please don’t read on if you intend to complete the challenge on your own.

Solution

Let’s find the special indices – which are just the divisors of the length of the input sequence – and then take the elements at those special indices:

: special-numbers ( ints -- ints' )
    [ length divisors 1 v-n ] [ nths ] bi ;

And so, we can solve this problem for both provided examples:

{ 21 } [ { 1 2 3 4 } special-numbers sum-of-squares ] unit-test

{ 63 } [ { 2 7 1 19 18 3 } special-numbers sum-of-squares ] unit-test

And a “script”, if we wanted to take input from the command-line, as requested:

MAIN: [
    [ readln ] [
        split-words harvest [ string>number ] map
        dup special-numbers sum-of-squares
        "%u => %u\n" printf
    ] while*
]

Building Hangman

Tue, 26 Dec 2023 08:00:00 -0700

Recently, Jon Fincher published an interesting Python tutorial describing steps to build a hangman game for the command line in Python. It provides for a nice demo of different programming language features including taking user input, printing to the screen, storing game state, and performing some logic until the game is completed.

A game in progress might look like this – with hangman as the word being guessed:


      ┌───┐
      │   │
      O   │
     ─┼─  │
    / │   │
      │   │
          │
          │
          │
          │
   └──────┘

Your word is: _ a n _ _ a n
Your guesses: a e r s n

I thought it would be fun to show how to build a similar hangman game in Factor, using similar steps.

Step 1: Set Up the Hangman Project

We need to create the hangman vocabulary to store our work. We can use the scaffold-vocab word to create a new vocabulary. It will prompt for which vocab-root to place the new vocabulary into.

IN: scratchpad USE: tools.scaffold

IN: scratchpad "hangman" scaffold-vocab

And then open the vocab in your favorite text editor:

IN: scratchpad "hangman" edit-vocab

We can start the file with all these imports, which we will be using in the implementation below and two symbols that we will use to hold the state of our game.

USING: combinators.short-circuit io io.encodings.utf8 io.files
kernel make math multiline namespaces random sequences
sequences.interleaved sets sorting unicode ;

IN: hangman

SYMBOL: target-word  ! the word being guessed
SYMBOL: guesses      ! all of the guessed letters

Step 2: Select a Word to Guess

Let’s create a vocab:hangman/words.txt file containing all of the possible word choices. The original tutorial had a list of words that you can reference – you are welcome to copy my file or create your own:

IN: scratchpad USE: http.client

IN: scratchpad "https://raw.githubusercontent.com/mrjbq7/re-factor/master/hangman/words.txt"
               "vocab:hangman/words.txt" download-to

Now we can add this word to read the file into memory and then choose a random line.

: random-word ( -- word )
    "vocab:hangman/words.txt" utf8 file-lines random ;

Step 3: Get and Validate the Player’s Input

The user will use the readln word to provide input, and we will validate it by making sure the line contains a single character that is not already in our guesses:

: valid-guess? ( input -- ? )
    {
        [ length 1 = ]
        [ lower? ]
        [ first guesses get ?adjoin ]
    } 1&& ;

Reading the player input is then just looping until we get a valid guess:

: player-guess ( -- ch )
    f [ dup valid-guess? ] [ drop readln ] do until first ;

Step 4: Display the Guessed Letters and Word

We can display the guessed letters as a sorted, space-separated list:

: spaces ( str -- str' )
    CHAR: \s <interleaved> ;

: guessed-letters ( -- str )
    guesses get members sort spaces ;

And the target word is also space-separated with blanks for letters we have not guessed, or the actual letters if they have been guessed successfully:

: guessed-word ( -- str )
    target-word get guesses get '[
        dup _ in? [ drop CHAR: _ ] unless
    ] map spaces ;

Step 5: Draw the Hanged Man

We first calculate the number of wrong guesses, by set difference between the guesses and our target word:

: #wrong-guesses ( -- n )
    guesses get target-word get diff cardinality ;

Displaying the “hanged man” requires a bit more lines of code that the rest of the program, using the number of wrong guesses to pick which to output:

CONSTANT: HANGED-MAN {
[[
      ┌───┐
      │   │
          │
          │
          │
          │
          │
          │
          │
          │
   └──────┘
]] [[
      ┌───┐
      │   │
      O   │
          │
          │
          │
          │
          │
          │
          │
   └──────┘
]] [[
      ┌───┐
      │   │
      O   │
     ─┼─  │
      │   │
      │   │
          │
          │
          │
          │
   └──────┘
]] [[
      ┌───┐
      │   │
      O   │
     ─┼─  │
    / │   │
      │   │
          │
          │
          │
          │
   └──────┘
]] [[
      ┌───┐
      │   │
      O   │
     ─┼─  │
    / │ \ │
      │   │
          │
          │
          │
          │
   └──────┘
]] [[
      ┌───┐
      │   │
      O   │
     ─┼─  │
    / │ \ │
      │   │
     ─┴─  │
    /     │
    │     │
          │
   └──────┘
]] [[
      ┌───┐
      │   │
      O   │
     ─┼─  │
    / │ \ │
      │   │
     ─┴─  │
    /   \ │
    │   │ │
          │
   └──────┘
]]
}

: hanged-man. ( -- )
    #wrong-guesses HANGED-MAN nth print ;

Step 6: Figure Out When the Game Is Over

The game is lost when the player has too many wrong guesses:

: lose? ( -- ? )
    #wrong-guesses HANGED-MAN length 1 - >= ;

The game is won when the word has no unknown letters:

: win? ( -- ? )
    target-word get guesses get diff null? ;

And the game is over when it is won or lost:

: game-over? ( -- ? )
    { [ win? ] [ lose? ] } 0|| ;

Step 7: Run the Game Loop

It is frequently useful in Factor to build helper words that, for example, set up some of the state that our program will use and then run a provided quotation:

: with-hangman ( quot -- )
    [
        random-word target-word ,,
        HS{ } clone guesses ,,
    ] H{ } make swap with-variables ; inline

And then we can use that to build and run the game:

: play-hangman ( -- )
    [
        "Welcome to Hangman!" print

        [ game-over? ] [
            hanged-man.
            "Your word is: " write guessed-word print
            "Your guesses: " write guessed-letters print

            nl "What is your guess? " write flush

            player-guess target-word get in?
            "Great guess!" "Sorry, it's not there." ? print
        ] until

        hanged-man.
        lose? "Sorry, you lost!" "Congrats! You did it!" ? print
        "Your word was: " write target-word get print
    ] with-hangman ;

One last thing we can do is set this word as the main entry point of our vocabulary:

MAIN: play-hangman

Next Steps

Well, that’s kind of fun. You can run this in the listener:

IN: scratchpad "hangman" run

Or at the command-line:

$ ./factor -run=hangman

The source code is available on my GitHub.

JavaScript Arrays

Wed, 22 Nov 2023 08:00:00 -0700

JSON or “JavaScript Object Notation” is widely used as a data format for storing, transmitting, and retrieving data objects. It is language-independent and has parsers in most modern programming languages. Factor is no exception, containing the json vocabulary.

I wanted to go over some wtfjs that relates to JavaScript Arrays and JavaScript Objects, and then see how something similar might work in Factor!

In JavaScript

You can define an array:

const person = ["John", "Doe", 46];

Or you can define an object:

const person = {firstName: "John", lastName: "Doe", age: 46};

You can start with an array and set it’s values by index:

const person = [];
person[0] = "John";
person[1] = "Doe";
person[2] = 46;

You can start with an object and set it’s values by key:

const person = {};
person["firstName"] = "John";
person["lastName"] = "Doe";
person["age"] = 46;

Or, using “dot notation” on an object:

const person = {};
person.firstName = "John";
person.lastName = "Doe";
person.age = 46;

In JavaScript, arrays are indexed by number, and objects are indexed by name. But, you can mix these and create association arrays that can be… both?

> const list = ["A", "B", "C"];
> list.length
3

> list[0]
"A"

> list["0"]
"A"

> list.key = "value";
> list.length
3

> list.key
"value"

> Object.assign({}, list);
{0: "A", 1: "B", 2: "C", key: "value"}

That’s kinda weird.

In Factor

Maybe Factor needs something like that? What if we define a type that has both a sequence and an assoc, and supports both the sequence protocol and the assoc protocol. How might that look?

TUPLE: js-array seq assoc ;

: <js-array> ( -- js-array )
    V{ } clone H{ } clone js-array boa ;

INSTANCE: js-array sequence

CONSULT: sequence-protocol js-array seq>> ;

INSTANCE: js-array assoc

CONSULT: assoc-protocol js-array assoc>> ;

And now we can do something kinda similar:

IN: scratchpad <js-array>
               { "A" "B" "C" } [ suffix! ] each
               "value" "key" pick set-at

IN: scratchpad dup first .
"A"

IN: scratchpad dup length .
3

IN: scratchpad dup members .
V{ "A" "B" "C" }

IN: scratchpad dup >alist .
{ { "key" "value" } }

IN: scratchpad "key" of .
"value"

Well, it doesn’t handle converting string keys so that "0" of would return the same value as first. And it doesn’t handle combining all the number indexed keys and name indexed keys to an alist output like the Object.assign({}, ...) call above. And probably a few other idiosyncrasies of the JavaScript association array that I’m not familiar with…

But do we like this? I dunno yet.

Magic Dict

Tue, 21 Nov 2023 08:00:00 -0700

The Raku programming language community comes up with interesting modules, and I enjoy getting emails from the Rakudo Weekly News. This week, there was a link to a post from a few days ago about making a weird data structure in Raku:

I came up with a weird data structure. It’s a hash, but you can also add functions that receive the hash as input so you can do math with it (if you squint, it’s vaguely like a spreadsheet). Something like:
m = MagicDict()
m["a"] = 1
m["b"] = 2
m["sum"] = lambda self: self["a"] + self["b"]

print(m["sum"])
Ideally, I’d want to do this in #RakuLang. (I know it’s possible because I did something much weirder once (I gave Str a CALL-ME method).)

I thought it would be fun to build this in Factor – starting with a magic-dict class that:

wraps an assoc which we will use to store all of the items
has a constructor using a hashtable by default
marked as an instance of assoc, so we support generic words defined on assocs
support the assoc protocol using the delegate vocabulary to defer to the wrapped assoc

TUPLE: magic-dict assoc ;

: <magic-dict> ( -- magic-dict ) H{ } clone magic-dict boa ;

INSTANCE: magic-dict assoc

CONSULT: assoc-protocol magic-dict assoc>> ;

And the main piece of logic we need is to implement the at* lookup word and, if the value is a callable, call it with the assoc on the stack as an argument.

M: magic-dict at*
    swap over assoc>> at* over callable?
    [ drop call( assoc -- value ) t ] [ nipd ] if ;

This allows us to make a Factor version of the original example:

IN: scratchpad <magic-dict>
               1 "a" pick set-at
               2 "b" pick set-at
               [ [ "a" of ] [ "b" of ] bi + ] "sum" pick set-at

IN: scratchpad "sum" of .
3

Pretty cool!

Stack Effects

Thu, 16 Nov 2023 09:00:00 -0700

Factor word definitions have required stack effects. These are used by the stack checker to verify that word definitions are consistent with their stack effects, have internal branches that have similar stack effects, and other validness checks.

Long ago, these stack effects were optional – words might have looked something like this:

: add2 2 + ;

At some point over a decade ago, we made stack effects required, and now it looks like this:

: add2 ( m -- n ) 2 + ;

That change has been generally positive for the Factor standard library – helping readability, improving natural word documentation, and improving the locality of stack checker errors. Every once in awhile, I think fondly back to those simpler days.

Well, it’s still possible to have the good ol’ days back – check this out!

The stack-checker has an infer word to apply the stack checker algorithm to a quotation, returning the inferred stack effect. You can try it in the Listener using the Ctrl-I shortcut or calling it directly:

IN: scratchpad [ 2 + ] infer .
( x -- x )

IN: scratchpad [ [ + ] with map ] infer .
( x x -- x )

We can use this to make new syntax that defines a word with the stack effect that was inferred:

USING: effects.parser kernel parser stack-checker words ;

SYNTAX: INFER:
    [ scan-new-word parse-definition ] with-definition
    dup infer define-declared ;

And now we can just write our words without bothering to write their stack effects:

INFER: add2 2 + ;

And then use it:

IN: scratchpad 1 add2 .
3

Pretty neat!

Anonymous Predicates

Wed, 15 Nov 2023 08:00:00 -0700

Factor has several types of classes that can be used to specialize methods on generic words and to disambiguate objects from other objects. Some examples are:

Today, we are going to discuss predicate classes and a recent feature that was contributed by @Capital-Ex to support anonymous predicates. A typical definition of a predicate class might look like this definition which describes all positive integers:

PREDICATE: positive < integer 0 > ;

You can use it in the listener to identify instances:

IN: scratchpad 12 positive? .
t

IN: scratchpad -5 positive? .
f

You can also dispatch on them:

GENERIC: wat ( obj -- str )
M: object wat drop "object" ;
M: positive wat drop "positive integer" ;

And see how that works:

IN: scratchpad 12 wat .
"positive integer"

IN: scratchpad f wat .
"object"

The new anonymous predicates feature allows us to rewrite that word with inline predicate definitions:

GENERIC: wat ( obj -- str )
M: object wat drop "object" ;
M: predicate{ integer [ 0 > ] } wat drop "positive integer" ;

And, in fact, we could extend it with some of the other anonymous classes to create this monstrosity:

GENERIC: wat ( obj -- str )
M: object wat drop "object" ;
M: predicate{ integer [ 0 > ] } wat drop "positive integer" ;
M: intersection{ bignum positive } wat drop "five" ;
M: union{ fixnum bignum } wat drop "integer" ;
M: maybe{ string } wat drop "maybe a string" ;

And then test most (all? who really knows?) of the cases:

IN: scratchpad 5 wat .
"positive integer"

IN: scratchpad 5 >bignum wat .
"five"

IN: scratchpad -5 wat .
"integer"

IN: scratchpad "hello" wat .
"maybe a string"

IN: scratchpad B{ } wat .
"object"

Pretty cool! This is available in the development version of Factor.

Factor is faster than Zig!

Thu, 09 Nov 2023 08:00:00 -0700

Recently, I was looking at the Zig programming language. As I often do, I started implementing a few typical things in new languages to learn them. Well, one of them was super slow and Zig is supposed to be super fast, so I was trying to understand where the disconnect was and compare it to Factor!

I was able to reduce the issue to a small test case and it turns out that there is a behavioral issue in their implementation of HashMap that makes their HashMaps get slow over time. The test case performs these steps:

creates a HashMap of 2 million items
decrements the map values, removing an item every third loop
inserts a replacement new item to maintain 2 million items in the HashMap
for a total of 250 million actions, then
deletes the remaining items from the HashMap

We record the total time each block of 1 million actions takes:

Something is very wrong!

Zig

This is the simple test case implemented in Zig using the std.HashMap:

const std = @import("std");

pub fn main() !void {
    var map = std.AutoHashMap(u64, u64).init(std.heap.page_allocator);
    defer map.deinit();

    var list = std.ArrayList(u64).init(std.heap.page_allocator);
    defer list.deinit();

    var prng = std.rand.DefaultPrng.init(0);
    const random = prng.random();

    var start = std.time.milliTimestamp();

    var i: u64 = 0;
    while (i < 2_000_000) : (i += 1) {
        try map.put(i, 3);
        try list.append(i);
    }

    while (i < 250_000_000) : (i += 1) {
        var index = random.uintLessThan(usize, list.items.len);
        var j = list.items[index];
        var k = map.get(j).?;
        if (k == 1) {
            _ = map.remove(j);
            try map.put(i, 3);
            list.items[index] = i;
        } else {
            try map.put(j, k - 1);
        }

        if (i % 1_000_000 == 0) {
            var end = std.time.milliTimestamp();
            std.debug.print("{} block took {} ms\n", .{ i, end - start });
            start = std.time.milliTimestamp();
        }
    }

    while (list.items.len > 0) {
        var j = list.pop();
        _ = map.remove(j);
    }
}

We can run it using ReleaseFast to get the best performance and see that, over time, it gets super slow – so slow that it isn’t even able to really finish the test case:

$ zig version
0.11.0

$ zig run -O ReleaseFast maptest.zig
2000000 block took 156 ms
3000000 block took 122 ms
4000000 block took 127 ms
5000000 block took 133 ms
6000000 block took 138 ms
7000000 block took 141 ms
8000000 block took 143 ms
9000000 block took 145 ms
10000000 block took 147 ms
11000000 block took 148 ms
12000000 block took 151 ms
13000000 block took 153 ms
14000000 block took 155 ms
15000000 block took 157 ms
16000000 block took 159 ms
17000000 block took 164 ms
18000000 block took 167 ms
19000000 block took 171 ms
20000000 block took 173 ms
21000000 block took 180 ms
22000000 block took 186 ms
23000000 block took 190 ms
24000000 block took 195 ms
25000000 block took 205 ms
26000000 block took 213 ms
27000000 block took 221 ms
28000000 block took 234 ms
29000000 block took 247 ms
30000000 block took 264 ms
31000000 block took 282 ms
32000000 block took 301 ms
33000000 block took 320 ms
34000000 block took 346 ms
35000000 block took 377 ms
36000000 block took 409 ms
37000000 block took 448 ms
38000000 block took 502 ms
39000000 block took 550 ms
40000000 block took 614 ms
41000000 block took 694 ms
42000000 block took 767 ms
43000000 block took 853 ms
44000000 block took 961 ms
45000000 block took 1088 ms
46000000 block took 1250 ms
47000000 block took 1420 ms
48000000 block took 1612 ms
49000000 block took 1826 ms
50000000 block took 2056 ms
51000000 block took 2320 ms
52000000 block took 2688 ms
53000000 block took 3015 ms
54000000 block took 3467 ms
55000000 block took 3971 ms
56000000 block took 4618 ms
57000000 block took 5377 ms
58000000 block took 6172 ms
59000000 block took 7094 ms
60000000 block took 8173 ms
61000000 block took 9469 ms
62000000 block took 11083 ms
63000000 block took 12737 ms
64000000 block took 14000 ms
65000000 block took 16243 ms
66000000 block took 17912 ms
67000000 block took 20452 ms
68000000 block took 24356 ms
...

We can switch the example above to use std.ArrayHashMap which does not have this problem, although it is a bit slower with each block taking roughly 250 milliseconds.

Factor

We can compare that to a simple implementation in Factor:

USING: assocs calendar formatting io kernel math random
sequences ;

:: maptest ( -- )
    H{ } clone :> m
    V{ } clone :> l

    now :> start!
    0   :> i!

    [ i 2,000,000 < ] [
        3 i m set-at
        i l push
        i 1 + i!
    ] while

    [ i 250,000,000 < ] [
        l length random :> j
        j l nth :> k
        k m at 1 - [
            k m delete-at
            3 i m set-at
            i j l set-nth
        ] [
            k m set-at
        ] if-zero

        i 1,000,000 mod zero? [
            i now start time- duration>milliseconds
            "%d block took %d ms\n" printf flush
            now start!
        ] when
        i 1 + i!
    ] while

    [ l empty? ] [
        l pop m delete-at
    ] until ;

We can run it in Factor and see how long it takes. There are notably some long delays in the first few blocks which I’d like to understand better – possibly due to excessive rehashing or some allocation pattern with the Factor garbage collector – and then it quickly reaches a steady state where each block takes about 250 milliseconds.

$ factor maptest.factor
2000000 block took 855 ms
3000000 block took 198 ms
4000000 block took 205 ms
5000000 block took 3579 ms
6000000 block took 4438 ms
7000000 block took 3624 ms
8000000 block took 2996 ms
9000000 block took 232 ms
10000000 block took 243 ms
11000000 block took 248 ms
12000000 block took 298 ms
13000000 block took 233 ms
14000000 block took 238 ms
15000000 block took 298 ms
16000000 block took 233 ms
17000000 block took 521 ms
18000000 block took 231 ms
19000000 block took 236 ms
20000000 block took 280 ms
21000000 block took 235 ms
22000000 block took 235 ms
23000000 block took 281 ms
24000000 block took 231 ms
25000000 block took 236 ms
26000000 block took 294 ms
27000000 block took 231 ms
28000000 block took 236 ms
29000000 block took 506 ms
30000000 block took 234 ms
31000000 block took 237 ms
32000000 block took 277 ms
33000000 block took 232 ms
34000000 block took 239 ms
35000000 block took 279 ms
36000000 block took 235 ms
37000000 block took 239 ms
38000000 block took 275 ms
39000000 block took 234 ms
40000000 block took 514 ms
41000000 block took 231 ms
42000000 block took 236 ms
43000000 block took 282 ms
44000000 block took 235 ms
45000000 block took 235 ms
46000000 block took 282 ms
47000000 block took 231 ms
48000000 block took 233 ms
49000000 block took 280 ms
50000000 block took 234 ms
51000000 block took 238 ms
52000000 block took 507 ms
53000000 block took 231 ms
54000000 block took 236 ms
55000000 block took 276 ms
56000000 block took 231 ms
57000000 block took 238 ms
58000000 block took 278 ms
59000000 block took 234 ms
60000000 block took 235 ms
61000000 block took 278 ms
62000000 block took 237 ms
63000000 block took 239 ms
64000000 block took 510 ms
65000000 block took 234 ms
66000000 block took 284 ms
...

Not bad!

What’s the Bug?

So, Zig could be super fast, but the default std.HashMap implementation uses tombstone buckets to mark slots as being deleted, and over time these tombstone buckets create fragmentation in the HashMap, which causes their linear probing to trend towards the worst case examination of every bucket in the HashMap when looking for a key.

We can implement a rehash() method on the HashMap that performs an in-place rehashing of all the elements, without allocations. Ideally, this would be done when the number of filled and deleted slots reaches some capacity threshold. But, for now, we can just run map.rehash() once per block, and see how that improves performance:

diff --git a/lib/std/hash_map.zig b/lib/std/hash_map.zig
index 8a3d78283..7192ba733 100644
--- a/lib/std/hash_map.zig
+++ b/lib/std/hash_map.zig
@@ -681,6 +681,11 @@ pub fn HashMap(
             self.unmanaged = .{};
             return result;
         }
+
+         /// Rehash the map, in-place
+         pub fn rehash(self: *Self) void {
+             self.unmanaged.rehash(self.ctx);
+         }
     };
 }
 
@@ -1505,6 +1510,92 @@ pub fn HashMapUnmanaged(
             return result;
         }
 
+       /// Rehash the map, in-place
+       pub fn rehash(self: *Self, ctx: anytype) void {
+             const mask = self.capacity() - 1;
+
+             var metadata = self.metadata.?;
+             var keys_ptr = self.keys();
+             var values_ptr = self.values();
+             var curr: Size = 0;
+
+             // While we are re-hashing every slot, we will use the
+             // fingerprint to mark used buckets as being used and either free
+             // (needing to be rehashed) or tombstone (already rehashed).
+
+             while (curr < self.capacity()) : (curr += 1) {
+                 metadata[curr].fingerprint = Metadata.free;
+             }
+
+             // Now iterate over all the buckets, rehashing them
+
+             curr = 0;
+             while (curr < self.capacity()) {
+                 if (!metadata[curr].isUsed()) {
+                     assert(metadata[curr].isFree());
+                     curr += 1;
+                     continue;
+                 }
+
+                 var hash = ctx.hash(keys_ptr[curr]);
+                 var fingerprint = Metadata.takeFingerprint(hash);
+                 var idx = @as(usize, @truncate(hash & mask));
+
+                 // For each bucket, rehash to an index:
+                 // 1) before the cursor, probed into a free slot, or
+                 // 2) equal to the cursor, no need to move, or
+                 // 3) ahead of the cursor, probing over already rehashed
+
+                 while ((idx < curr and metadata[idx].isUsed()) or
+                     (idx > curr and metadata[idx].fingerprint == Metadata.tombstone))
+                 {
+                     idx = (idx + 1) & mask;
+                 }
+
+                 if (idx < curr) {
+                     assert(metadata[idx].isFree());
+                     metadata[idx].fingerprint = fingerprint;
+                     metadata[idx].used = 1;
+                     keys_ptr[idx] = keys_ptr[curr];
+                     values_ptr[idx] = values_ptr[curr];
+
+                     metadata[curr].used = 0;
+                     assert(metadata[curr].isFree());
+                     keys_ptr[curr] = undefined;
+                     values_ptr[curr] = undefined;
+
+                     curr += 1;
+                 } else if (idx == curr) {
+                     metadata[idx].fingerprint = fingerprint;
+                     curr += 1;
+                 } else {
+                     assert(metadata[idx].fingerprint != Metadata.tombstone);
+                     metadata[idx].fingerprint = Metadata.tombstone;
+                     if (metadata[idx].isUsed()) {
+                         var tmpkey = keys_ptr[idx];
+                         var tmpvalue = values_ptr[idx];
+
+                         keys_ptr[idx] = keys_ptr[curr];
+                         values_ptr[idx] = values_ptr[curr];
+
+                         keys_ptr[curr] = tmpkey;
+                         values_ptr[curr] = tmpvalue;
+                     } else {
+                         metadata[idx].used = 1;
+                         keys_ptr[idx] = keys_ptr[curr];
+                         values_ptr[idx] = values_ptr[curr];
+
+                         metadata[curr].fingerprint = Metadata.free;
+                         metadata[curr].used = 0;
+                         keys_ptr[curr] = undefined;
+                         values_ptr[curr] = undefined;
+
+                         curr += 1;
+                     }
+                 }
+             }
+         }
+
oid {
             @setCold(true);
             const new_cap = @max(new_capacity, minimal_capacity);
@@ -2218,3 +2309,35 @@ test "std.hash_map repeat fetchRemove" {
     try testing.expect(map.get(2) != null);
     try testing.expect(map.get(3) != null);
 }
+
+test "std.hash_map rehash" {
+    var map = AutoHashMap(u32, u32).init(std.testing.allocator);
+    defer map.deinit();
+
+    var prng = std.rand.DefaultPrng.init(0);
+    const random = prng.random();
+
+    const count = 6 * random.intRangeLessThan(u32, 100_000, 500_000);
+
+    var i: u32 = 0;
+    while (i < count) : (i += 1) {
+        try map.put(i, i);
+        if (i % 3 == 0) {
+            try expectEqual(map.remove(i), true);
+        }
+    }
+
+    map.rehash();
+
+    try expectEqual(map.count(), count * 2 / 3);
+
+    i = 0;
+    while (i < count) : (i += 1) {
+        if (i % 3 == 0) {
+            try expectEqual(map.get(i), null);
+        } else {
+            try expectEqual(map.get(i).?, i);
+        }
+    }
+}

We can apply that diff to lib/std/hash_map.zig and try again, now taking about 165 milliseconds per block including the time for map.rehash():

$ zig run -O ReleaseFast maptest.zig --zig-lib-dir ~/Dev/zig/lib
2000000 block took 155 ms
3000000 block took 147 ms
4000000 block took 154 ms
5000000 block took 160 ms
6000000 block took 163 ms
7000000 block took 164 ms
8000000 block took 165 ms
9000000 block took 166 ms
10000000 block took 166 ms
11000000 block took 165 ms
12000000 block took 166 ms
13000000 block took 165 ms
14000000 block took 166 ms
15000000 block took 172 ms
16000000 block took 165 ms
17000000 block took 167 ms
18000000 block took 165 ms
19000000 block took 167 ms
20000000 block took 169 ms
21000000 block took 168 ms
22000000 block took 167 ms
23000000 block took 166 ms
24000000 block took 167 ms
25000000 block took 167 ms
26000000 block took 165 ms
27000000 block took 166 ms
28000000 block took 166 ms
29000000 block took 165 ms
30000000 block took 165 ms
31000000 block took 165 ms
32000000 block took 166 ms
33000000 block took 165 ms
34000000 block took 167 ms
35000000 block took 170 ms
36000000 block took 165 ms
37000000 block took 166 ms
38000000 block took 166 ms
39000000 block took 164 ms
40000000 block took 165 ms
41000000 block took 167 ms
42000000 block took 166 ms
43000000 block took 167 ms
44000000 block took 169 ms
45000000 block took 166 ms
46000000 block took 165 ms
47000000 block took 166 ms
48000000 block took 166 ms
49000000 block took 166 ms
50000000 block took 166 ms
51000000 block took 166 ms
52000000 block took 164 ms
53000000 block took 165 ms
54000000 block took 167 ms
55000000 block took 165 ms
56000000 block took 166 ms
57000000 block took 166 ms
58000000 block took 165 ms
59000000 block took 166 ms
60000000 block took 169 ms
61000000 block took 165 ms
62000000 block took 165 ms
63000000 block took 166 ms
64000000 block took 166 ms
65000000 block took 165 ms
66000000 block took 166 ms
67000000 block took 176 ms
68000000 block took 166 ms
...

Well now, Zig is fast and everything is right again with the world – and Factor takes only about 50% more time than Zig’s std.HashMap with rehash() and about the same as std.ArrayHashMap, which is pretty good for a dynamic language.

I submitted a pull request adding a rehash() method to HashMap and hopefully it gets into the upcoming Zig 0.12 release and maybe for Zig 0.13 they can adjust it to automatically rehash when it gets sufficiently fragmented, consider using quadratic probing instead of linear probing, or perhaps switch to using a completely different HashMap algorithm like Facebook’s F14 hash table, which doesn’t have this issue.

Maybe we should consider some of these improvements for Factor as well!

Open source is fun!

SMAC

Fri, 27 Oct 2023 07:00:00 -0700

Recently, I was looking into the Zig programming language and bumped into this video tutorial:

It presents an example – in the spirit of Fizz buzz – called SMAC, which is basically a program that converts the first million numbers into strings or "SMAC" (if the number is divisible by 7 or ends with a 7) and prints them out. This is then implemented in three different forms:

single-threaded
multi-threaded, and
networked using sockets.

We first start with the basics, a word to check if a number is SMAC or not:

: smac? ( n -- ? )
    { [ 7 mod 0 = ] [ 10 mod 7 = ] } 1|| ;

And then a word that converts a number into its string representation or SMAC:

: smac ( n -- str )
    dup smac? [ drop "SMAC" ] [ number>string ] if ;

Single-Threaded

The simplest example – run in a single-thread – iteratively prints to the output-stream.

: smac. ( n -- )
    [1..b] [ smac print ] each ;

Trying it out, you can see that it works:

IN: scratchpad 20 smac.
1
2
3
4
5
6
SMAC
8
9
10
11
12
13
SMAC
15
16
SMAC
18
19
20

Multi-Threaded

Slightly more complex, the multi-threaded example splits the numbers into four n-groups, computes them as a future in four background threads, and then waits for those computations to finish and iteratively prints them out.

: smac. ( n -- )
    [1..b] 4 <n-groups>
    [ '[ _ [ smac ] map ] future ] map
    [ ?future [ print ] each ] each ;

It produces the same output as the single-threaded smac. word above.

Networked

The simple networked example creates a server configured to print n results when a client connects:

: smac-server ( n -- server )
    utf8 <threaded-server>
        "smac" >>name
        7979 >>insecure
        swap '[ _ smac. ] >>handler ;

You can run it:

IN: scratchpad 20 smac-server start-server

And then try it out:

$ nc localhost 7979
1
2
3
4
5
6
SMAC
8
9
10
11
12
13
SMAC
15
16
SMAC
18
19
20

A fun example for learning about a few introductory concepts in Factor!

Memoization Syntax

Thu, 26 Oct 2023 09:00:00 -0700

A couple of days ago an interesting article was posted about the performance impact of the memoization idiom on modern Ruby. It covers some of the internal changes that have been happening in the last few releases of Ruby and how some “object shape” optimizations have impacted memoization performance.

If you look at the Wikipedia article for memoization, you can see it described as:

In computing, memoization or memoisation is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls to pure functions and returning the cached result when the same inputs occur again.

While I use the memoize vocabulary in some of the posts on this blog and it is used in over 100 words in the Factor standard library, there is an opportunity to talk a bit more about how it works in Factor.

The first example often provided to show the benefit is a recursive definition of the Fibonacci sequence, which could be implemented as:

: fib ( m -- n )
    dup 1 > [ [ 2 - fib ] [ 1 - fib ] bi + ] when ;

If you time this, it is quite slow:

IN: scratchpad [ 40 fib ] time .
Running time: 1.800567 seconds

102334155

In a similar manner to the functools.cache decorator in Python, you can easily improve this by caching the results of previous computations by using the MEMO: syntax which works with any arity function:

MEMO: fib ( m -- n )
    dup 1 > [ [ 2 - fib ] [ 1 - fib ] bi + ] when ;

And see that it is now much faster:

IN: scratchpad [ 40 fib ] time .
Running time: 0.005302375 seconds

102334155

If we wanted to memoize some internal part of a word, we have historically used words like cache or 2cache with a hashtable literal to compute a result by key, storing and returning if it was previously calculated. This is a simple form of memoization – and I realized that we could create a simpler syntax for this that uses the memoize implementation to support arbitrary quotations:

SYNTAX: MEMO[ parse-quotation dup infer memoize-quot append! ;

And then we use it in the middle of a word, pretending that something takes a long time to compute and we need to memoize the computation:

: answer ( a b -- )
    "The answer is: " write MEMO[ + 1 seconds sleep ] . ;

Trying it out shows that the first time is slow and the second time is fast:

IN: scratchpad [ 1 2 answer ] time
The answer is: 3
Running time: 1.011060583 seconds

IN: scratchpad [ 1 2 answer ] time
The answer is: 3
Running time: 0.00117075 seconds

That’s a cool syntax word – and it’s available in the memoize.syntax vocabulary!

SHA-256 from URL

Tue, 12 Sep 2023 08:00:00 -0700

Álvaro Ramírez wrote a blog post about generating a SHA-256 hash from URL, the easy way where they describe wanting to download a file and generate a SHA-256 hash of the contents easily. Their solution involves copying a URL and then having some Emacs Lisp be able to read the clipboard, download the file, then generate and return the hash on the clipboard.

I thought I’d show how this can be done in Factor, by breaking the problem into smaller parts.

USING: checksums checksums.sha http.client io.directories io.files.temp kernel
math.parser namespaces sequences ui.clipboards ;

The first step is downloading a file to a temporary file, returning the path of the downloaded file:

: download-to-temp ( url -- path )
    dup download-name temp-file [
        [ ?delete-file ] [ download-to ] bi
    ] keep ;

The next step is to build a word that applies a checksum to the downloaded file contents:

: checksum-url ( url checksum -- value )
    [ download-to-temp ] [ checksum-file ] bi* ;

The last step is to use the clipboard to access the URL that was copied – checking minimally that it looks like an http or https URL – and then putting the checksum value back onto the clipboard:

: checksum-clipboard ( checksum -- )
    clipboard get clipboard-contents
    dup "http" head? [ throw ] unless
    swap checksum-url bytes>hex-string
    clipboard get set-clipboard-contents ;

This could be improved with better error checking, and maybe cleaning up the temporary file that was downloaded after running the checksum.

Give it a try!

IN: scratchpad sha-256 checksum-clipboard

Sequence Case

Sun, 10 Sep 2023 08:00:00 -0700

Some languages allow ranges to be used in switch statements. For example, like in this Swift code:

let count = 3_000_000_000_000
let countedThings = "stars in the Milky Way"
var naturalCount: String
switch count {
case 0:
    naturalCount = "no"
case 1...3:
    naturalCount = "a few"
case 4...9:
    naturalCount = "several"
case 10...99:
    naturalCount = "tens of"
case 100...999:
    naturalCount = "hundreds of"
case 1000...999_999:
    naturalCount = "thousands of"
default:
    naturalCount = "millions and millions of"
}
println("There are \(naturalCount) \(countedThings).")

Let’s build this functionality in Factor!

Range Syntax

First, let’s look at how we can construct a range.

1000 999,999 [a..b]

If we wanted to use that in a case statement, we could wrap it with a literal:

{ $[ 1000 999,999 [a..b] ] [ "thousands of" ] }

But that’s not that elegant, instead let’s define a syntax word to construct our range:

SYNTAX: ..= dup pop scan-object [a..b] suffix! ;

This is much cleaner:

{ 1000 ..= 999,999 [ "thousands of" ] }

Combinators

In Factor, we have combinators which are, at some level, just words that take code as input. We sometimes refer to these as higher-level concepts, but they allow us to be more expressive with fewer tokens.

Let’s build our combinator using a macro to transform some input cases:

MACRO: sequence-case ( assoc -- quot )
    [
        dup callable? [
            [ first dup set? [ in? ] [ = ] ? '[ dup _ @ ] ]
            [ second '[ drop @ ] ] bi 2array
        ] unless
    ] map [ cond ] curry ;

Now we can write a Factor version of the original Swift example above:

3,000,000,000,000 {
    { 0 [ "no" ] }
    { 1 ..= 3 [ "a few" ] }
    { 4 ..= 9 [ "several" ] }
    { 10 ..= 99 [ "tens of" ] }
    { 100 ..= 999 [ "hundreds of" ] }
    { 1000 ..= 999,999 [ "thousands of" ] }
    [ drop "millions and millions of" ]
} sequence-case "There are %s stars in the Milky Way.\n" printf

Cool!

This is available in the combinators.extras vocabulary in the development version of Factor.

ASCII Art

Fri, 08 Sep 2023 08:00:00 -0700

Raymond Hettinger, an active contributor to Python and responsible for many useful improvements to the language, likes to tweet short and sweet bits of useful Python knowledge. I stumbled into one fun one-liner from long ago:

I thought I would show how it might translate into Factor. To translate this code into a concatenative language, we are going to work from the inside and move out.

Step 1

We start with the sum of the cartesian product of eight sequences of the numbers 0 through 5:

map(sum, product(range(6), repeat=8))

vs.

8 6 <iota> <repetition> [ sum ] product-map

Step 2

Next, we see that it counts each element, and produces a sorted list of items:

sorted(Counter(...).items())

vs.

histogram sort-keys

Step 3

And finally, create a string of lines of stars and print it:

print "\n".join('*'*(c//2000) for i,c in ...)

vs.

values [ 2000 /i CHAR: * <string> ] map "\n" join print

Solution

Putting it all together:

IN: scratchpad USING: assocs io math math.statistics sequences
               sequences.product sorting strings ;

IN: scratchpad 8 6 <iota> <repetition> [ sum ] product-map
               histogram sort-keys values
               [ 2000 /i CHAR: * <string> ] map "\n" join print

It makes this nice ASCII Art visualization:

*
***
*****
********
************
******************
*************************
********************************
*****************************************
*************************************************
********************************************************
**************************************************************
******************************************************************
*******************************************************************
******************************************************************
**************************************************************
********************************************************
*************************************************
*****************************************
********************************
*************************
******************
************
********
*****
***
*

LEB128

Sun, 03 Sep 2023 08:00:00 -0700

LEB128 – Little Endian Base 128 – is a variable-length encoding format designed to store arbitrarily large integers in a small number of bytes. There are two variations: unsigned LEB128 and signed LEB128. These vary slightly, so a user program that wants to decode LEB128 values would explictly choose the appropriate unsigned or signed methods.

Recently, I implemented these in Factor and wanted to describe the implementation:

Decoding

For decoding unsigned LEB128 values, we accumulate the integer 7 bits at a time:

: uleb128> ( byte-array -- n )
    0 [ [ 7 bits ] [ 7 * shift ] bi* + ] reduce-index ;

For decoding signed LEB128 values, we do the same thing, but the last byte indicates if the value was negative, and if it was we bring the sign bit back in:

: leb128> ( byte-array -- n )
    [ uleb128> ] keep dup last 6 bit?
    [ length 7 * 2^ neg bitor ] [ drop ] if ;

Encoding

For encoding unsigned LEB128 values, we output in 7-bit segments, and then for the final segment use the eighth bit to indicate the end of the stream of bytes.

:: >uleb128 ( n -- byte-array )
    BV{ } clone :> accum
    n assert-non-negative [
        [ -7 shift dup zero? not ] [ 7 bits ] bi
        over [ 0x80 bitor ] when accum push
    ] loop drop accum B{ } like ;

For encoding signed LEB128 values, we output in 7-bit segments, but our exit condition depends on reaching the end of the integer and conditionally whether the sixth bit was set in that segment.

:: >leb128 ( n -- byte-array )
    BV{ } clone :> accum
    n [
        [ -7 shift dup ] [ 7 bits ] bi :> ( i b )
        {
            { [ i  0 = ] [ b 6 bit? not ] }
            { [ i -1 = ] [ b 6 bit? ] }
            [ f ]
        } cond b over [ 0x80 bitor ] when accum push
    ] loop drop accum B{ } like ;

Testing

Some test cases for unsigned LEB128:

[ -1 >uleb128 ] [ non-negative-number-expected? ] must-fail-with
{ B{ 255 255 127 } } [ 0x1fffff >uleb128 ] unit-test
{ 0x1fffff } [ B{ 255 255 127 } uleb128> ] unit-test
{ B{ 0xe5 0x8e 0x26 } } [ 624485 >uleb128 ] unit-test
{ 624485 } [ B{ 0xe5 0x8e 0x26 } uleb128> ] unit-test

Some test cases for signed LEB128:

{ B{ 255 255 255 0 } } [ 0x1fffff >leb128 ] unit-test
{ 0x1fffff } [ B{ 255 255 255 0 } leb128> ] unit-test
{ B{ 0xc0 0xbb 0x78 } } [ -123456 >leb128 ] unit-test
{ -123456 } [ B{ 0xc0 0xbb 0x78 } leb128> ] unit-test

Performance

This is available in a recent nightly build, with some support for reading and writing LEB128 values to streams, and some performance improvements by specifying some type information so the compiler can understand we are working with integers and byte-arrays and produce more optimal code.

My current laptop decodes and encodes around 40 million per second, which is pretty good for a dynamic language. There are some references in the fast decoding section to some papers that present some SIMD techniques called “Masked VByte” and “Stream VByte” for significantly improving upon this simple scalar implementation.

Periodic Table

Fri, 01 Sep 2023 16:00:00 -0700

Despite the difficulty that I had with freshman chemistry, I’ve always enjoyed learning about the periodic table, writing blog posts about making words with element symbols, and I do think it would be awesome to have a giant wall-mounted periodic table like Bill Gates has in his office.

The other day I wanted to demonstrate the UI framework by making another simple UI gadget. It occurred to me that we could make a simple periodic table as an example.

The code is about 140 lines to define all the elements and their groups, 10 lines to organize them into a table, and about 40 lines to build the element boxes, put them into a gadget, and add a legend at the bottom. Each element is a <roll-button> that opens the appropriate Wikipedia page for each chemical element.

I was reminded of this recently by a funny comment on the Factor Discord:

Factor crushing the competition thanks to periodic-table vocab

https://codegolf.stackexchange.com/questions/264714/is-it-an-element/264723#264723

You can view the source code for the periodic table vocabulary or try it yourself:

IN: scratchpad "periodic-table" run

RGBA Clock

Tue, 29 Aug 2023 07:00:00 -0700

Today, I’d like to build a little UI gadget clock that changes color as the time changes in Factor.

Unix time is measured as the time – usually seconds – since the Unix epoch at 00:00:00 UTC on January 1, 1970. It is used on many systems and, perhaps, will cause Year 2038 problems on some when it exceeds the maximum value of a signed 32-bit integer.

This is perfect – we need a way to convert a timestamp into a rgba color – we can calculate a unix time in integer seconds and then take each 8-bit segment to refer to a red, green, blue, and alpha value.

: timestamp>rgba ( timestamp -- color/f )
    timestamp>unix-time >integer
    24 2^ /mod 16 2^ /mod 8 2^ /mod
    [ 255 /f ] 4 napply <rgba> ;

We can then extend a UI label to have a timer that starts when the gadget becomes visible and updates its background colors based on the current time and chooses an appropriate foreground text color to match. For that we can use the contrast-text-color word from the colors.contrast vocabulary to select either white-over-background or black-over-background depending on the relative luminance of the background color.

TUPLE: rgba-clock < label timer ;

M: rgba-clock graft*
    [ timer>> start-timer ] [ call-next-method ] bi ;

M: rgba-clock ungraft*
    [ timer>> stop-timer ] [ call-next-method ] bi ;

: update-colors ( color label -- )
    [ [ contrast-text-color ] dip font>> foreground<< ]
    [ [ <solid> ] dip interior<< ] 2bi ;

: <rgba-clock> ( -- gadget )
    "99:99:99" rgba-clock new-label
        monospace-font >>font
        dup '[
            _ now
            [ timestamp>hms >>string ]
            [ timestamp>rgba swap update-colors ] bi
        ] f 1 seconds <timer> >>timer ;

And then we can try it out!

IN: scratchpad <rgba-clock> gadget.

This is on my GitHub.

Drunken Bishop

Sun, 27 Aug 2023 08:00:00 -0700

The OpenSSH project is a widely available tool for working with the SSH protocol in a variety of ways on a variety of operating systems. Their project description states:

OpenSSH is the premier connectivity tool for remote login with the SSH protocol. It encrypts all traffic to eliminate eavesdropping, connection hijacking, and other attacks. In addition, OpenSSH provides a large suite of secure tunneling capabilities, several authentication methods, and sophisticated configuration options.

One of the interesting features that it contains is a method of visualizing public key fingerprints to allow a user to more easily see that a key has changed by examining a visual output that looks something like this:

+----[RSA 2048]---+
|        . o.+o  .|
|     . + *  +o...|
|      + * .. ... |
|       o + .   . |
|        S o   .  |
|         o     . |
|          .     o|
|               .o|
|               Eo|
+------[MD5]------+

This is the Drunken Bishop algorithm, a variant of a technique called random art that was originally described in the paper Hash Visualization: a New Technique to improve Real-World Security. You can see more information about it in The drunken bishop: An analysis of the OpenSSH fingerprint visualization algorithm.

This OpenSSH feature is controlled by the VisualHostKey flag:

VisualHostKey

If this flag is set to yes, an ASCII art representation of the remote host key fingerprint is printed in addition to the fingerprint string at login and for unknown host keys. If this flag is set to no (the default), no fingerprint strings are printed at login and only the fingerprint string will be printed for unknown host keys.

This can be enabled by adding to your ~/.ssh/config:

VisualHostKey yes

Or by adding this option in your ssh command:

$ ssh -o VisualHostKey=yes your.host.name

Implementation

We are going to be implementing this in the Factor programming language.

The algorithm begins by defining a visual board – by default 9 rows by 17 columns – and a starting position in the middle of the board. Each 8-bit byte of input is split into 2-bit groups which indicate where the bishop moves:

Value	Direction
00	↖
01	↗
10	↙
11	↘

As the bishop moves around the board diagonally, it increments a counter in each cell. However, the bishop cannot move through the walls on the edge of the board, remaining stuck in whichever direction is blocked.

CONSTANT: WIDTH 17
CONSTANT: HEIGHT 9

:: drunken-bishop ( bytes -- board )
    HEIGHT [ WIDTH 0 <array> ] replicate :> board
    HEIGHT 2/ :> y!
    WIDTH 2/ :> x!

    15 x y board nth set-nth ! starting position

    bytes [
        { 0 -2 -4 -6 } [
            shift 2 bits {
                { 0b00 [ -1 -1 ] }
                { 0b01 [ -1  1 ] }
                { 0b10 [  1 -1 ] }
                { 0b11 [  1  1 ] }
            } case :> ( dy dx )
            dy y + 0 HEIGHT 1 - clamp y!
            dx x + 0 WIDTH 1 - clamp x!
            x y board nth [ dup 14 < [ 1 + ] when ] change-nth
        ] with each
    ] each

    16 x y board nth set-nth ! ending position

    board ;

The output is rendered using an .o+=*BOX@%&#/^ alphabet with S for the starting position and E for the ending position specially rendered:

CONSTANT: SYMBOLS " .o+=*BOX@%&#/^SE"

: drunken-bishop. ( bytes -- )
    drunken-bishop [ SYMBOLS nths print ] each ;

We can try this out and see that it works:

IN: scratchpad "fc94b0c1e5b0987c5843997697ee9fb7"
               hex-string>bytes drunken-bishop.
       .=o.  .   
     . *+*. o    
      =.*..o     
       o + ..    
        S o.     
         o  .    
          .  . . 
              o .
               E.

This is available in the drunken-bishop vocabulary in a recent development version.

Next Combination

Fri, 25 Aug 2023 08:00:00 -0700

Eleven years ago, I blogged about computing the next permutation of a sequence.

As part of some performance improvements to the math.combinatorics vocabulary that I was making around the same time, I needed a way of calculating the indices of the next combination of a given sequence.

Implementation

Without going into a great explanation, we split the problem into a few steps, operating on a sequence of indices and modifying in place – either incrementing the indices after the “max index” or the last index.

: find-max-index ( seq n -- i )
    over length - '[ _ + >= ] find-index drop ; inline

: increment-rest ( i seq -- )
    [ nth-unsafe ] [ swap index-to-tail <slice-unsafe> ] 2bi
    [ drop 1 + dup ] map! 2drop ; inline

: increment-last ( seq -- )
    [ index-of-last [ 1 + ] change-nth-unsafe ] unless-empty ; inline

:: next-combination ( seq n -- )
    seq n find-max-index [
        1 [-] seq increment-rest
    ] [
        seq increment-last
    ] if* ; inline

We can show how it works by using clone to see the intermediate results of “5 choose 3”:

IN: scratchpad { 0 1 2 } 10 [
                   [ clone 5 next-combination ] keep
               ] replicate nip .
{
    { 0 1 2 }
    { 0 1 3 }
    { 0 1 4 }
    { 0 2 3 }
    { 0 2 4 } 
    { 0 3 4 }
    { 1 2 3 }
    { 1 2 4 }
    { 1 3 4 }
    { 2 3 4 }
}

This is used to implement our various combinations words: each-combination, map-combinations, filter-combinations, all-combinations, find-combination, reduce-combinations, etc.

Benchmarking

The benchmark I have chosen is one that intentionally creates a large array of almost 65 million items with the result of allocating all combinations of 200 items choosing 4 items at a time.

We can run this benchmark on an Apple Mac mini (2018) with a 3.2 GHz 6-Core Intel Core i7 processor and 16 GB of memory that is used as part of the Factor nightly build farm.

Factor 0.99 takes 8.3 seconds:

IN: scratchpad [ 200 <iota> 4 all-combinations length ] time .
Running time: 8.269610590999999 seconds

Additional information was collected.
dispatch-stats.  - Print method dispatch statistics
gc-events.       - Print all garbage collection events
gc-stats.        - Print breakdown of different garbage collection events
gc-summary.      - Print aggregate garbage collection statistics
64684950

Look at the gc-summary. to see that more than half the time (60% or 4.9 seconds) is involved in garbage collection as part of allocating the almost 65 million results:

IN: scratchpad gc-summary.
Collections:          1,480
Cards scanned:        7,070,518
Decks scanned:        9,124
Code blocks scanned:  2
Total time:           4,910,722 µs
Card scan time:       3,256,711 µs
Code block scan time: 162 µs
Marking time:         0 µs
Data heap sweep time: 0 µs
Code heap sweep time: 0 µs
Data compaction time: 0 µs

Other Languages

On that same machine we can try some other programming languages and see how they compare, attempting to get the same behavior as the Factor example above.

Python 3.11.4 takes 8.4 seconds:

In [1]: from itertools import combinations

In [2]: %time print(len(list(combinations(range(200), 4))))
64684950
CPU times: user 5.64 s, sys: 2.77 s, total: 8.41 s
Wall time: 8.44 s

PyPy 7.3.12 (compatible with Python 3.10) takes 17 seconds:

>>>> from itertools import combinations
>>>> from time import time
>>>> t0 = time(); print(len(list(combinations(range(200), 4)))); time() - t0
64684950
17.00031805038452

Ruby 3.2.2 takes 37.4 seconds:

irb(main):001:0> measure
TIME is added.
=> nil

irb(main):002:0> (0...200).to_a.combination(4).to_a.length
processing time: 37.413635s
=> 64684950

Julia 1.9.2 takes 10.7 seconds:

julia> ]add Combinatorics

julia> using Combinatorics

julia> @time length(collect(combinations(range(1, 200), 4)))
 10.668982 seconds (129.37 M allocations: 8.193 GiB, 37.74% gc time)
64684950

Crystal 1.9.2 takes 515 seconds (interpreted) or 13.7 seconds (compiled):

# Run crystal interpreted
$ crystal i
icr:1> t0 = Time.utc;
       puts (0...200).to_a.combinations(4).to_a.size;
       Time.utc - t0
64684950 => 00:08:34.728442000

# Make a crystal source file
$ echo "t0 = Time.utc
puts (0...200).to_a.combinations(4).to_a.size
puts (Time.utc - t0)" > combo.cr

# Run crystal compiled
$ crystal combo.cr
64684950
00:00:13.739838000

Clojure 1.11.1 takes 10.7 seconds:

user=> (ns user (:use [clojure.math.combinatorics]))

user=> (time (count (combinations (range 0 200) 4)))
"Elapsed time: 10685.687736 msecs"
64684950

Rakudo v2023.08 takes 808.9 seconds:

$ time raku -e "print (1..200).combinations(4).Array.elems"
759.45s user 44.32s system 99% cpu 13:28.86 total
64684950

Considering that we only take the length of the large array of combinations, it is a bit artificial as a benchmark, with room for various optimizations, but it does seem to highlight both algorithmic and garbage collector issues in various languages.

Factor compares pretty favorably!

Factor 0.99 now available

Thu, 24 Aug 2023 08:00:00 -0700

“I hear and I forget. I see and I remember. I do and I understand.” - Confucius

I’m very pleased to announce the release of Factor 0.99!

OS/CPU	Windows	Mac OS	Linux
x86	0.99		0.99
x86-64	0.99	0.99	0.99

Source code: 0.99

This release is brought to you with over 4,100 commits by the following individuals:

Abtin Molavi, Ales Huzik, Alex null Maestas, Alexander Ilin, Alexandre Rousseau, Aleksander Sabak, Arnaut Daniel, Ashish Kurmi, Benjamin Pollack, Cat Stevens, Cecilia Knäbchen, Chris Double, Craig Allen, Dave Carlton, David Flores, David Mindlin, Doug Coleman, Dusk Banks, Fred Alger, Giftpflanze, Ikko Ashimine, Jack Lucas, John Benediktsson, Jon Harper, Justin Hill, KUSUMOTO Norio, Keldan Chapman, Kevin Cope, Konrad Hinsen, Kye Shi, Mark Sweeney, Mohamed Akram, Nandeeka Nayak, Niklas Larsson, Raghu Ranganathan, Rudi Grinberg, Samuel Tardieu, Sebastian Strobl, Sergii Fesenko, Silvio Mayolo, Steve Ayerhart, Zoltán Kéri, @Capital-EX, @inivekin, @mariari, @nicolas-p, @nomennescio, @timor

Besides some bug fixes and library improvements, I want to highlight the following changes:

Added a Guided Tour of Factor
Upgraded to Unicode 15
The fixups vocabulary makes upgrading easier when words are renamed
Windows binaries now include OpenSSL 3.1.2 and SQLite 3.42.0 for convenience
Re-added some support for FreeBSD
Improved non-English text entry on macOS
Removed support for 32-bit macOS
File editors are now specified using EDITOR: syntax
Switched to newer ucrtbase.dll on Windows
Support disassembly using Capstone in addition to Udis86
String literals must be separated by whitespace – "hello"length and "foo""bar"append are no longer accepted by the parser
The fry and locals syntax words are now in syntax for use in all vocabularies
Any word can be referred to by it’s fully-qualified name (e.g., math:+ or xml.writer:pprint-xml)
The Emacs “FUEL” and VIM plugins have been updated

Some possible backwards compatibility issues:

Moved colors.constants and colors.hex to colors vocabulary
Merged io.binary.fast into io.binary
Merged io.directories.{hierarchy,search} into io.directories
Merged io.encodings.utf16n into io.encodings.utf16
Renamed math.ranges to ranges
Renamed ranges words from [a,b] to [a..b]
Changed FUNCTION: syntax to not require a semi-colon at the end
Renamed exists? to file-exists?
Renamed vector dot product from v. to vdot
Renamed short to index-or-length
Renamed various sorting words to be more simple
Improved icons and other UI images on retina displays
URL query strings only split on ampersand (?a=b&c=d) not semi-colon (?a=b;c=d)
Renamed some words in interval-sets to prefix interval-…
Renamed contents to read-contents
Renamed lines to read-lines
Renamed selections to all-selections
Renamed intersection to intersect-all
Merged json.reader and json.writer into json vocabulary
Merged bson.reader and bson.writer into bson vocabulary
Moved talks to separate factor-talks repository
Renamed ui.backend.gtk to ui.backend.gtk2 to prepare for newer GTK support

What is Factor

New Libraries:

L-system: brought back from unmaintained
aws: support for the Amazon Web Services API
bare: support for Binary Application Record Encoding format
base16: implements “Base 16 encoding” from RFC 4648
base24: implements Base 24 encoding
base32: implements “Base 32 encoding” from RFC 4648
base32-crockford: implements Douglas’ Crockford’s Base 32 encoding
base32hex: implements “Base 32 Encoding with Extended Hex Alphabet” from RFC 4648
base36: implements Base 36 Encoding
base58: implements Base 58 encoding
base62: implements Base 62 encoding
base91: implements Base 91 encoding
bech32: implements Bech32 encoding
binhex: support BinHex encoding scheme used on classic Mac OS
bittorrent: beginning to support the BitTorrent protocol
brain-flak: implementation of brain-flak esoteric language
broadcast-server: network discovery udp broadcast client/server
build-from-source: allow building dependent libraries from their source
calendar.ranges: support for calendar ranges
cbor: implement Concise Binary Object Representation format
certs: for working with OpenSSL certificates
chrome-tools: for copying curl/fetch commands from Chrome DevTools
checksums.wyhash: implement wyhash hash function
cocoa.statusbar: support for system-wide menu bars on macOS
codebase-analyzer: tools for generating codebase statistics
colors.contrast: implement WCAG color contrast criteria
colors.hwb: support the HWB color model
color-picker-game: demo game where the user tries to match a color with sliders
command-loop: generic line-oriented command interpreters
compression.bzip3: support BZip3, the “better and stronger spiritual successor to BZip2”
compression.gzip: implementation of gzip algorithms
compression.zstd: support Zstandard
countries: various country codes and abbrevations
cpu.arm: assembler backend for ARM 32-bit and 64-bit
crypto.jwt: encode/decode of JSON Web Tokens
csexp: reading/writing from Canonical S-expressionss
db.mysql: prototype of a MySQL database backend
did-you-mean: prototype of a “did you mean?” restarts when tokens aren’t found
discord: implement the Discord API
discord.chatgpt-bot: a ChatGPT bot for Discord
editors.10x: support 10x Editor
editors.acme: support Acme text editor
editors.aquamacs: support Aquamacs
editors.bluefish: support Bluefish editor
editors.cudatext: support CudaText code editor
editors.espresso: support Espresso web editor
editors.kakoune: support Kakoune code editor
editors.kate: support Kate text editor
editors.lapce: support Lapce text editor
editors.lite-xl: support Lite XL editor
editors.nova: support Nova code editor
editors.pulsar: support Pulsar text editor
editors.smultron: support Smultron text editor
editors.subethaedit: support SubEthaEdit
editors.visual-studio-code-exploration: support Visual Studio Code Exploration builds
editors.visual-studio-code-insiders: support Visual Studio Code Insiders
editors.visual-studio-codium: support VSCodium
editors.zed: support Zed text editor
elevate: cross-platform privilege escalation
escape-strings.ui: demo user interface for escape-strings vocabulary
fixups: help provide restarts for word renamings
format-using: experimental tool for formatting USING: blocks differently
gamelib: generic game library with some fluent demos including sokoban and tic-tac-toe
gemini: support for Project Gemini
gemini.cli: Project Gemini command-line interface
gemini.server: Project Gemini file server
gemini.ui: Project Gemini user interface
generators: prototype of generator routines
geohash: implement Geohash geocode system
gir: parser for GIR files
github: wrapper for the GitHub API
glfw: wrapper for GLFW library
gravatar: support Gravatar API
help.tour: added a Guided Tour of Factor
hetzner: wrapper for the Hetzner Cloud API
hipku: implement the Hipku algorithm
html5: beginning of HTML5 parser
http.websockets: support for HTTP WebSockets
http2: beginning of HTTP/2 implementation
images.jpeg: brought back from unmaintained
images.processing: brought back from unmaintained
io.streams.counting: implementation of stream protocol that counts read/write of elements
io.streams.escape-codes: support for faint, underline, and blink styles
iso-codes: support for getting Debian ISO files
itunes: implements iTunes API
json.http: utilities for working with JSON web services
libclang: wrapper for libclang
linux.input-events: support for /dev/input/event* devices for reading mouse and keyboard events
lint.vocabs: a vocabulary lint tool for detecting unused imports, etc
lists.circular: wrapper to allow circular objects to be used in lists
logic: implementation of logic programming techniques
long-urls: expand short URLs to their long versions
markov-chains: support for Markov chains
math.matrices.extras:
math.primes.brute-force: original “brute force” primality factoring
math.primes.pollard-rho-brent: faster Pollard-Rho-Brent primality factoring
math.runge-kutta: support for Runge-Kutta methods
mediawiki.api: support for the MediaWiki API
modern.html: prototype html parser
multisets: beginning to support multisets
notifications: support cross-platform notification APIs
npm: support for working with the npm package manager
openai: implement the OpenAI API
papier: demo sprite-based game
periodic-table: demo periodic table UI gadget
pcre2: bindings to libpcre2
proquint: implements Proquint encoding
process-autopsy: rename from ci.run-process
punycode: support for international domain names
quiz: flashcard-style quiz game
random.passwords: random password generator
random.pcg: support for PCG random number generators
random.xoshiro: support for xoroshiro random number generators
raygui: support for raygui user interface library
raylib: support for raylib game programming library
reservoir-sampling: implements reservoir sampling algorithm
retries: vocab for trying code blocks multiple times
rocksdb: support for RocksDB key-value store
rosetta-code.multisplit: solution for Multisplit
ryu: implementation of Ryū: fast float-to-string conversion
semver: support for Semantic Versioning
sequences.padded: virtual “padded” sequences
solr: beginnings of Apache Solr support
sorting.specification: rename from sorting.slots
stack-as-data: combinators for manipulating the data stack
string-server: server for stress testing large network payloads
syslog: client for the Syslog Protocol
syntax.terse: prototype some terse syntax words
tensors: prototype of typed-array programming library
tftp: support for the Trivial File Transfer Protocol
tinyvg: support for the TinyVG binary encoded vector graphics format
tldr: implementation of the tldr pages
tokencase: convert between various string tokenization methods
toml: readers and writers for Tom’s Obvious Markup Language
tools.disassembler.capstone: backend for Capstone disassembler
totp: support Time-based one-time passwords
ui.backend.cocoa.input-methods: support for typing letters in non-English languages
ui.gadgets.flex-borders: a border gadget that doesn’t dictate the size of the contained gadget
ui.theme.base16: support all base16 themes
ui.theme.wombat: support the “wombat” theme
ui.tools.button-list: a gadget with a list of active buttons
ui.tools.listener.log: helper words to log to a UI listener
ui.windows.drop-target: support dropping files on Windows
ulid: Universally Unique Lexicographically Sortable Identifier
unicode.control-pictures: convert ASCII control characters to their Unicode picture equivalents
unicode.flags: conversion to-and-from unicode flag emojis
unix.scheduler: some words from sched.h
unix.sysctl: support for sysctl primitives
unix.xattrs: support for extended file attributes
verbal-expressions: more readable regular expressions
vin: parsing VIN numbers
visionect: support for Visionect
vulkan: support for Vulkan library
wasm: beginning to support WASM opcodes
windows.drive-strings: support for GetLogicalDriveStrings
windows.hardware: support for enumerating display monitors
windows.powrprof: support for power management policies
windows.processes: support for snapshot APIs
windows.shcore: support for per-monitor DPI scaling
windows.version: support for file version info
wipe: utility to overwrite a file with random data
wipe.ui: windows user interface for wipe
wordlet: implements a game similar to wordle
yenc: support yEnc binary-to-text encoding format
zealot.help-lint: adding help-lint to the zealot nightly builder
zim: support for zim files
zim.builder: support to build zim files
zim.server: serve existing zim files from a web server
zim.tools: command-line tools for working with zim files
zoneinfo.update: make updating the zoneinfo files easier

Improved Libraries:

alien.c-types: defined u8, u16, u32, u64, s8, s16, s32, s64, f32, f64, isize, usize types
alien.libraries.finder.linux: more robust implementation using ld
alien.libraries.finder.windows: more robust implementation using GetModuleFileName
assocs: adding zip-with, collect-by, ?value-at, ?change-at
assocs.extras: more words and tests
base64: performance improvements and added urlsafe versions
base85: performance improvements
boolean-expr: add some more expression simplification rules
bootstrap.image: add a MAIN: for -run=bootstrap.image [platform...]
bootstrap.image.upload: more robust, use IPV4, use scp.exe on windows
bson: unify reader/writer vocabulary
cache: possibly workaround a UI issue using assoc-filter instead of assoc-filter!
calendar: improve docs, add words, improve consistency in cloning/modifying timestamps
calendar.holidays.us: support more holidays
classes.struct: improved new and boa on struct classes
classes.union: faster predicate checks
colors: merged in colors.constant and colors.hex
color-picker: support picking more colors
combinators.extras: more words and tests
combinators.smart: adding smart-loop
command-line: run scripts and eval with auto-use on, allow non-empty data stack
compiler.cfg.builder.alien.boxing: improved System V AMD64 API compliance
compression.huffman: support for Huffman codes
compression.inflate: adding gzip-inflate
contributors: additional Factor developers
core-foundation.fsevents: adding kFSEventStream flags
core-text: improve retina text layout
core-text.fonts: switch default monospace font from “Monaco” to “Menlo”
crontab: fix some leap day issues and non-standard day-of-week fractions
db.postgresql: improve unit testing when sharing a single database instance
db.tuples: adding reject-tuples
debugger: integrate fixups
decimals: better DECIMAL: prettyprinting
delegate: learned how to consult on HOOK: generics
documents.elements: adding paragraph-elt for paragraph level cursor movement
editors: allow loading all editors after selecting the one you want to use, add EDITOR: syntax
editors.ultraedit: support macOS version
endian: merge io.binary.fast versions
escape-strings: support a lot of wild string escaping techniques
eval: adding eval-with-stack words
farkup: only nofollow absolute urls
functors: fix to use with-words for more robust parsing
furnace.actions: support PUT and PATCH methods
furnace.recaptcha: update to reCAPTCHA2
game.input.gtk: support mouse and keyboard events on linux
gdbm: prefix words with gdbm-…
git: more features for working with git and GitHub repositories
gopher.server: adding MAIN:
gopher.ui: adding MAIN:
grouping.extras: more words and tests
hacker-news: more pages and better color support
hashcash: improved Hashcash cryptographic proof-of-work system
heaps: fix for heap-delete bug
help.html: support dark mode, some aesthetic improvements, qualified searching
help.syntax: adding easy help syntax
html.templates.chloe: adding <script> and <meta> tag templates
http.client: support PATCH requests
http.parsers: support more characters in cookie keys
images.loader.cocoa: support more image types on macOS 11+
images.loader.gdiplus: support RGBA pixel format
init: add STARTUP-HOOK: syntax
inverse: fix swapped inverse of + and /
io.crlf: adding stream-read-crlf, some ignoring-crlf words, and a crlf-stream
io.directories: merge in words from io.directories.search and io.directories.hierarchy
io.files.info: support mount points
io.monitors.macosx: support add/remove/rename file system events
io.pathnames: adding canonicalize-path, canonicalize-path-full, >windows-path
io.sockets.secure.openssl:
io.streams.ansi: faster by caching styles, support blink
io.streams.byte-array.fast: faster when writing a byte-array
io.streams.256color: faster by caching styles, support blink
io.styles: fix nested streams
ip-parser: adding ipv6-ntoa and ipv6-aton
json: unify reader/writer vocabulary
listener: adding -q quiet startup
lists: adding list literals L{, 2leach, lreduce, 2lreduce
lru-cache: adding fifo-cache, lifo-cache
machine-learning.data-sets: adding mnist data
mason: improvements for nightly builders, code signing, notarization
math: adding until-zero
math.bitwise: adding d>w/w, w>h/h, h>b/b
math.combinatorics: adding unique-permutations, combination-with-replacement, selection combinators
math.extras: more words and docs
math.floats.half: improve roundtrip of subnormal float16
math.functions: adding e^-1, logit, lgamma
math.intervals: cleanup and bug fixes
math.matrices: cleanup and interface improvements
math.parser: support underscore number separators, e.g. 1_000_000
math.primes.factors: switch to pollard-rho-brent-factors
math.statistics: new words: dcg, ndcg, quartiles, sum-of-squares, sum-of-cubes, sum-of-quads, spearman-corr, rank-by-avg, rank-by-min, rank-by-max
math.text.english: support larger numbers, AP style
math.vectors: adding l1-norm, cross, normal, proj, perp, angle-between
maze: support mouse clicks
metar: improvements to wind and visibility parsing, adding MAIN:
mime.multipart: improve error when decoding runs out of bytes
modern: improved experimental lexer
multiline: remove HEREDOC:, adding lua-style strings [=[
numbers-game: simplify for readability
peg: improve compile performance
peg.ebnf: adding EBNF-PARSER:, PARTIAL-EBNF: syntax
peg.javascript: support more javascript features
peg.parsers: make range-pattern more efficient for single characters
project-euler: solutions for problems 64, 87, 463, and 508
prettyprint.backend: print some special floats differently, like -0.0, 0.0, 1/0., -1/0., etc.
python: update to python 3+
random: implement binomial-random distribution, faster randoms using random* generic
random.mersenne-twister: slightly faster random-32*
readline-listener: support pathname completion
redis: support SCRIPT commands
regexp: fix case-insensitive lookahead and lookbehind
sequences.deep: adding flatten1
sequences.extras: more words and tests
sequences.generalizations: adding ?firstn
sequences.product: adding product-find
sequences.repeating: support repeating elements and repeating sequence
shuffle: adding more shufflers like 2pick, 5roll, 6roll, 7roll, 8roll, 2reach, nipdd…
smtp: fix issues sending email to some SMTP servers requiring HELO after TLS auth
sodium: support more crypto techniques
sorting: renamed natural-sort to sort, sort to sort-with, sort-with to sort-by, etc.
splitting.extras: adding split-head, split-tail
splitting.monotonic: performance improvements to monotonic-split
stack-checker.dependencies: fix depends on struct-class
stack-checker.transforms: implement boa on struct-class
strings.parser: support octal escapes
strings.tables: implement box format
system-info: adding username word
system-info.linux: implement username
system-info.macosx: new OS codenames, implement username
terminal: fix typo in name of terminal-height
tetris: some cleanup, background color showing running or paused
timers: improve timers to re-use threads when restarted
tools.annotations: optionally re-annotate silently, add reset-all convenience word
tools.completions: add pathname completions, improve qualified word name completions
tools.dns: handle dns aliases
tools.dns.public: add more known nameservers
tools.hexdump: adding M\ string hexdump.
tools.scaffold: improve scaffolding of unit tests
tools.test: adding long-unit-test, must-not-fail, test-root, refresh-and-test, MAIN: command-line
tools.wc: better support for files not found
ui.backend.cocoa: supporting dynamic light/dark theme switching
ui.backend.cocoa.views: supporting preedit input methods for language support
ui.backend.windows: some bug fixes and cleanup
ui.gadgets.editors: adding readline-bindings, preedit support
ui.gadgets.glass: fix popups off screen
ui.gadgets.panes: better nested stream style support
ui.gadgets.tabbed: modernized the tab style
ui.tools.inspector: improve non-printable string inspection
ui.tools.listener: improve font-size adjustments
ui.tools.operations: adding copy-object support
unicode: support Unicode 15.0.0
units: adding d^ and d-cube
units.si: more units (quetta, ronna, ronto, quecto)
unix.ffi.macosx: adding xattr support
unix.process: adding posix_spawn support
urls: adding redacted-url for log files, IPV6 support, lots of test cases
urls.encoding: removing “;” query string separators, adding encode-uri and decode-uri
uuid: adding support for versions 6, 7, and 8.
vocabs.cache: fix to not reset disk cache when forgetting a single vocab
vocabs.loader: fix a restarts issue
vocabs.metadata: adding vocab-metadata-paths
vocabs.platforms: adding some experimental syntax
webapps: some style improvements, light/dark mode support
webbrowser: adding MAIN:
websites.concatenative: support cgit.factorcode.org
wikipedia: adding MAIN:
windows.com: more COM interfaces
windows.errors: utility words for error handling
windows.kernel32: adding GetLogicalDriveStrings and GetDynamicTimeZoneInformation
windows.ole32: more constants
windows.psapi: adding GetProcessImageFileNameA and GetModuleFileNameExW
windows.registry: adding query-registry word
windows.shell32: add shell32.dll functions
windows.uniscribe: transparency improvements
windows.user32: adding DPI awareness words
windows.winsock: more socket constants
xdg: adding support for XDG_STATE_HOME
xml.data: better support CDATA tags
xmode: update XMODE files, support newer syntax
zoneinfo: update to 2023c, improved rule parsing

VM Improvements:

Increase codeheap default to 96 MB
Set current-directory when launching Factor.app
More work on ARM backend
Faster file-exists? on Windows
Some work on FreeBSD support

Vigenère Cipher

Mon, 21 Aug 2023 08:00:00 -0700

The Vigenère cipher is a historically interesting method of encrypting a piece of text using a Caesar cipher where each letter is encoded based on a different letter from a repeating input “key” text. If the recipient knows the key, they can recover the plaintext by reversing the encoding process.

For example, suppose that the plaintext to be encrypted is

attackatdawn

The person sending the message chooses a keyword and repeats it until it matches the length of the plaintext, for example, the keyword “LEMON”:

LEMONLEMONLE

We start by defining the available alphabet:

CONSTANT: LETTERS "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

We can implement this using an inner word and a flag for whether we are encrypting or decrypting a message. For convenience, we will use local variables and make sure both the key and the message are uppercase and drop all non-letters from the input:

:: viginere ( msg key encrypt? -- str )
    msg >upper :> MSG
    key >upper :> KEY

    [
        0 MSG [| ch |
            ch LETTERS index [| i |
                [ 1 + KEY length mod i ] keep
                KEY nth LETTERS index
                encrypt? [ + ] [ - ] if
                LETTERS length [ + ] [ mod ] bi
                LETTERS nth ,
            ] when*
        ] each drop
    ] "" make ;

: >vigenere ( msg key -- encrypted ) t viginere ;

: vigenere> ( msg key -- decrypted ) f viginere ;

We can make some test cases showing that it works:

{ "VBVTQBYOLCPB" } [
    "ATTACKATDAWN" "VICTORY" >vigenere
] unit-test

{ "ATTACKATDAWN" } [
    "VBVTQBYOLCPB" "VICTORY" vigenere>
] unit-test

It gained a reputation for being exceptionally strong – even being described as “unbreakable” and “impossible of translation”. It turns out there is a big difference between difficult and impossible, and a cryptanalysis ultimately found various weaknesses that enabled the cipher to be broken or attacked in certain ways.

While not inventing the cipher that holds his name, Vigenère actually invented a stronger version called an autokey cipher which modifies the key during the encoding and decoding process avoiding one of the weak points when the key is shorter than the message and the letters are reused.

It is a smidge more complex to implement but shares the same structure as the version above:

:: autokey ( msg key encrypt? -- str )
    msg >upper :> MSG
    key >upper >sbuf :> KEY

    [
        0 MSG [| ch |
            ch LETTERS index [| i |
                [ 1 + i ] keep
                KEY encrypt? [ ch suffix! ] when nth
                LETTERS index
                encrypt? [ + ] [ - ] if
                LETTERS length [ + ] [ mod ] bi
                LETTERS nth
                encrypt? [ KEY over suffix! drop ] unless ,
            ] when*
        ] each drop
    ] "" make ;

: >autokey ( msg key -- encrypted ) t autokey ;

: autokey> ( msg key -- decrypted ) f autokey ;

And some more test cases:

{ "FWGLCDEJIKG" } [
    "AWESOMENESS" "FACTOR" >autokey
] unit-test

{ "AWESOMENESS" } [
    "FWGLCDEJIKG" "FACTOR" autokey>
] unit-test

Easy Help

Tue, 13 Jun 2023 09:00:00 -0700

One of the challenges with any software platform, and particularly for programming languages, is having enough well-written documentation to onboard new users as well as providing assistance to those that have been using the platform for longer.

In previous versions of Factor, the help system – which provides for writing documentation and browsing the documentation in the command-line as well as in the UI browser – required somewhat verbose syntax needing, for example, all text to be defined as literal strings. This became both hard to write and hard to read as source code.

In the upcoming release of Factor 0.99, we have a kind of “easy help” syntax that allows some smart interpretation of documentation syntax so that the following example is now possible:

HELP: do-the-thing
{ $values
    word: word
    n: number
    quot: { $quotation ( x -- y ) }
    seq: { $sequence "things" }
    seq: { $sequence fixnum }
    obj/f: { $maybe object }
    this: "that other thing"
}
{ $description
    Does the thing. Words which \ execute an input
    parameter must be declared \ inline so
    that a caller which passes in a literal word can have a
    static stack effect.

    A second paragraph should work also. And we should be 
    able to refer to markup { $snippet "text" } .
}
{ $notes
    To execute a non-literal word, you can use
    \ execute( to check the stack effect before
    calling at runtime.
}
{ $examples
    [=[
        USING: kernel io words ;
        IN: scratchpad : twice ( word -- ) dup execute execute ; inline
        : hello ( -- ) "Hello" print ;
        \ hello twice
        "Hello\nHello"
    ]=]
} ;

This also applies to articles, which can now be written much more clearly:

ARTICLE: "roman" "Roman numerals"
The { $vocab-link "roman" } vocabulary can convert numbers to and from the
Roman numeral system and can perform arithmetic given Roman numerals as input.

A parsing word for literal Roman numerals:
{ $subsections POSTPONE: ROMAN: }

Converting to Roman numerals:
{ $subsections
    >roman
    >ROMAN
}

Converting Roman numerals to integers:
{ $subsections roman> }

Roman numeral arithmetic:
{ $subsections
    roman+
    roman-
    roman*
    roman/i
    roman/mod
} ;

We have periodically found some edge cases that don’t quite format as expected, but likely most of these cases have been resolved.

Give it a try and let us know what you think!

Rainbows

Fri, 02 Jun 2023 08:00:00 -0700

Rainbows are awesome, especially the ones that are double rainbows which can be quite intense when they are a double rainbow all the way across the sky. They can also be awesome when they show up as rainbow flags which are used to indicate that a place is welcome, accepting, and safe for people.

Given that this is Pride Month, it might be fun to make some rainbows today using Factor.

I bumped into this nice tutorial on making annoying rainbows in javascript that has some background on color theory including links to how color vision actually works as well as detailed science on light and the eye and a “better rainbow method” called the sinebow.

After implementing a lot of color support for Factor, I get nerd sniped sometimes when it comes to colors. Instead of using HSL colors, lets just use the more common RGB color model to make some rainbows!

:: rainbow-phase. ( str phase -- )
    str >graphemes :> chars
    2pi chars length / :> frequency

    chars [| s i |
        frequency i * 2 + phase + sin 0.5 +
        frequency i * 0 + phase + sin 0.5 +
        frequency i * 4 + phase + sin 0.5 +
        1.0 <rgba> :> color

        s "" like H{ { foreground color } } format
    ] each-index nl ;

: rainbow. ( str -- ) 0 rainbow-phase. ;

This supports Unicode by calling >graphemes to split a word by grouping on visual characters.

And, it looks like this:

Happy Pride Month!

ZIM Builder

Wed, 17 May 2023 07:30:00 -0700

Apparently, it is just too much fun building tools to make an offline Wikipedia and the next thing we needed to build was a way to make offline Factor documentation. This documentation is available inside each Factor instance and generated by the Factor help system.

Since we implemented the zim vocabulary with support for reading the ZIM file format and the zim.server vocabulary with support for serving those files out as websites, the natural follow up is the zim.builder vocabulary to make the ZIM files in the first place!

Yesterday, I wrote the build-zim word that can archive all of the files in the current-directory into a ZIM file at the specified output path. Just now, I generated some “offline Factor documentation” by running this command on a docs directory holding all the HTML files uploaded by a recent nightly build:

IN: scratchpad USE: zim.builder

IN: scratchpad "resource:docs" [ "resource:docs.zim" build-zim ] with-directory

You can then run a local Factor documentation server like so:

$ ./factor -run=zim.server docs.zim

It’s interesting that our Factor documentation is 1.4 GB of HTML files, 52 MB as a docs.tar.gz, and 47 MB as a docs.zim file using Zstandard compression. It’s a cool file format for serving this type of content.

I posted a ZIM snapshot of the Factor documentation if you’d like to download it and give this a try with a recent nightly build.

Offline Wikipedia

Tue, 16 May 2023 07:30:00 -0700

Pretty much everyone agrees that Wikipedia is awesome (except maybe during one of their controversial fundraising campaigns). In addition to Wikipedia, the Wikimedia Foundation operates:

Even though the official Wikipedia iOS app and Wikipedia Android app are both great, they still require access to the internet to be useful. I am not alone when wondering how to build your own Hitchhiker’s Guide with Wikipedia and looking through the options to download a Wikipedia database.

One way you can do this is to implement support for the ZIM file format, for example using the libzim project. There are many archives available to download as a ZIM file for Wikipedia and various popular websites like StackOverflow, Project Gutenberg, and even some open source projects. You can also build your own ZIM file if you want to archive custom content.

ZIM stands for “Zeno IMproved”, as it replaces the earlier Zeno file format. Its file compression uses LZMA2, as implemented by the xz-utils library, and, more recently, Zstandard. The openZIM project is sponsored by Wikimedia CH, and supported by the Wikimedia Foundation.

Let’s implement this using Factor!

Each ZIM file starts with a header in little endian format:

PACKED-STRUCT: zim-header
    { magic-number uint32_t }
    { major-version uint16_t }
    { minor-version uint16_t }
    { uuid uint64_t[2] }
    { entry-count uint32_t }
    { cluster-count uint32_t }
    { url-ptr-pos uint64_t }
    { title-ptr-pos uint64_t }
    { cluster-ptr-pos uint64_t }
    { mime-list-ptr-pos uint64_t }
    { main-page uint32_t }
    { layout-page uint32_t }
    { checksum-pos uint64_t } ;

In addition to 16-bit, 32-bit, and 64-bit little-endian numbers, we need to be able to read null-terminated strings typically stored as UTF-8. For example, when reading the mime-type list:

: read-string ( -- str )
    { 0 } read-until 0 assert= utf8 decode ;

: read-mime-types ( -- seq )
    [ read-string dup empty? not ] [ ] produce nip ;

That’s enough to parse the header file, the list of mime-types, and the lists of pointers to urls, titles, and clusters used for indexing into the ZIM file.

TUPLE: zim path header mime-types urls titles clusters ;

: read-zim ( path -- zim )
    dup binary [
        zim-header read-struct dup {
            [ magic-number>> 0x44D495A assert= ]
            [
                mime-list-ptr-pos>> seek-absolute seek-input
                read-mime-types
            ] [
                dup url-ptr-pos>> seek-absolute seek-input
                entry-count>> [ 8 read le> ] replicate
            ] [
                dup title-ptr-pos>> seek-absolute seek-input
                entry-count>> [ 4 read le> ] replicate
            ] [
                dup cluster-ptr-pos>> seek-absolute seek-input
                cluster-count>> [ 8 read le> ] replicate
            ]
        } cleave zim boa
    ] with-file-reader ;

Entries

There are two types of directory entries:

content entries

TUPLE: content-entry mime-type parameter-len namespace
    revision cluster-number blob-number url title parameter ;

: read-content-entry ( mime-type -- content-entry )
    read1
    read1
    4 read le>
    4 read le>
    4 read le>
    read-string
    read-string
    f
    content-entry boa
    dup parameter-len>> read >>parameter ;

redirect entries

TUPLE: redirect-entry mime-type parameter-len namespace revision
    redirect-index url title parameter ;

: read-redirect-entry ( mime-type -- redirect-entry )
    read1
    read1
    4 read le>
    4 read le>
    read-string
    read-string
    f
    redirect-entry boa
    dup parameter-len>> read >>parameter ;

The mime-type indicates which type of entry we are reading:

: read-entry ( -- entry )
    2 read le> dup 0xffff =
    [ read-redirect-entry ] [ read-content-entry ] if ;

Now we can read the entry at index n in a ZIM file:

: read-entry-index ( n zim -- entry/f )
    urls>> nth seek-absolute seek-input read-entry ;

Clusters

Content is stored as clusters of data, where each cluster is a sequence of binary blobs contained at an offset into the cluster. And the cluster is stored either uncompressed or with optional compression (typically LZMA or ZStandard).

We can read the “no compression” version:

: read-cluster-none ( -- offsets blobs )
    4 read le>
    [ 4 /i 1 - [ 4 read le> ] replicate ] [ prefix ] bi
    dup [ last ] [ first ] bi - read ;

And then read the “ZStandard compression” version:

: read-cluster-zstd ( -- offsets blobs )
    zstd-uncompress-stream-frame dup uint32_t deref
    [ 4 /i uint32_t <c-direct-array> ] [ tail-slice ] 2bi
    2dup [ [ last ] [ first ] bi - ] [ length assert= ] bi* ;

The cluster can then be read by checking the compression type in use:

: read-cluster ( -- offsets blobs )
    read1 [ 5 bit? f assert= ] [ 4 bits ] bi {
        { 1 [ read-cluster-none ] }
        { 2 [ "zlib not supported" throw ] }
        { 3 [ "bzip2 not supported" throw ] }
        { 4 [ "lzma not supported" throw ] }
        { 5 [ read-cluster-zstd ] }
    } case ;

To read the blob at index n, we read the entire cluster, then offset into the blobs data:

:: read-cluster-blob ( n -- blob )
    read-cluster :> ( offsets blobs )
    0 offsets nth :> zero
    n offsets nth :> from
    n 1 + offsets nth :> to
    from to [ zero - ] bi@ blobs subseq ;

Now we can read the blob by index into a given cluster in a ZIM file:

: read-blob-index ( blob-number cluster-number zim -- blob )
    clusters>> nth seek-absolute seek-input read-cluster-blob ;

And we can read the entry content from each entry type or index:

GENERIC#: read-entry-content 1 ( entry zim -- blob mime-type )

M:: content-entry read-entry-content ( entry zim -- blob mime-type )
    entry blob-number>>
    entry cluster-number>>
    zim read-blob-index
    entry mime-type>>
    zim mime-types>> nth ;

M: redirect-entry read-entry-content
    [ redirect-index>> ] [ read-entry-content ] bi* ;

M: integer read-entry-content
    [ read-entry-index ] keep '[ _ read-entry-content ] [ f f ] if* ;

Reading the “main page” content is simple using the index stored in the ZIM header:

: read-main-page ( zim -- blob/f mime-type/f )
    [ header>> main-page>> ] [ read-entry-content ] bi ;

We can find an entry by searching using a namespace and url, taking advantage of the fact the entries are sorted by <namespace><url> to perform a binary search. Some common namespaces include:

A - Article
C - User Content
M - ZIM metadata
W - Well known entries
X - Search indexes

:: find-entry-url ( namespace url zim -- entry/f )
    f zim header>> entry-count>> <iota> [
        nip zim read-entry-index
        namespace over namespace>> <=>
        dup +eq+ = [ drop url over url>> <=> ] when
    ] search 2drop dup {
        [ ] [ namespace>> namespace = ] [ url>> url = ]
    } 1&& [ drop f ] unless ;

If we find the entry after searching, we can read it’s content:

: read-entry-url ( namespace url zim -- blob/f mime-type/f )
    [ find-entry-url ] keep '[ _ read-entry-content ] [ f f ] if* ;

Web Server

This is all kinda awesome, but basically these ZIM files hold HTML data for an offline instance of the various wiki-type servers. So, wouldn’t it be awesome to make a HTTP server responder that loads a ZIM file and then returns data from it on a local Factor HTTP server?

Yes!

TUPLE: zim-responder zim ;

: <zim-responder> ( path -- zim-responder )
    read-zim zim-responder boa ;

M: zim-responder call-responder*
    [
        dup { [ length 1 > ] [ first length 1 = ] } 1&&
        [ unclip-slice first ] [ CHAR: A ] if swap "/" join
    ] dip [
        zim>> dup path>> binary [
            over empty? [ read-entry-url ] [ 2nip read-main-page ] if
        ] with-file-reader
    ] bi* 2dup and [
        <content> binary >>content-encoding
    ] [
        2drop <404>
    ] if ;

We use that to make a little entry point that creates a zim-responder and then sets it as the main-responder and calls httpd to start a web server. Using the latest development version, we can run it like so:

$ ./factor -run=zim.server /path/to/wiki.zim [port]

There are few features that would be nice to add – like searching URLs, titles, and content, or dealing with split ZIM files (when over 4GB on file systems like FAT32) – but this is a pretty sweet neat new tool we have available now in a nightly build and will be released soon in Factor 0.99.

Calendar Ranges

Tue, 09 May 2023 09:00:00 -0700

A post recently titled Python’s Missing Batteries: Essential Libraries You’re Missing Out On caught my eye. One of my favorite parts about Factor is the large standard library that we ship with. Looking at blogs like these sometimes helps me notice functionality that we are missing.

One of the provided examples from the timeutils module is the daterange word that provides an iterator between a start and stop date:

start_date = date(year=2023, month=4, day=9)
end_date = date(year=2023, month=4, day=30)

for day in timeutils.daterange(start_date, end_date, step=(0, 0, 2)):
    print(repr(day))
    # datetime.date(2023, 4, 9)
    # datetime.date(2023, 4, 11)
    # datetime.date(2023, 4, 13)
    # ...

I realize that although we have numeric ranges, the current support for numbers doesn’t allow extending them so that timestamp arithmetic is implicitly supported. Some future version of Factor might fix this when we finish merging support for multiple dispatch, but in the meantime I added a timestamp-range object that works identically to range but with calendar objects.

The above Python example would look something like this:

IN: scratchpad USE: calendar.ranges

IN: scratchpad 2023 4 9 <date-utc>
               2023 4 30 <date-utc>
               2 days <timestamp-range> [ . ] each
T{ timestamp { year 2023 } { month 4 } { day 9 } }
T{ timestamp { year 2023 } { month 4 } { day 11 } }
T{ timestamp { year 2023 } { month 4 } { day 13 } }
...

The current implementation has <timestamp-range> work the same way as <range> as it assumes an inclusive range [from,to]. Give it a try!

Case Conversion

Fri, 05 May 2023 06:00:00 -0700

One aspect of exposure to different programming languages and programmers is differing opinions on proper case conventions for class names, variable names, and other attribute names. Sometimes you want to convert between them for various reasons.

Looking around at other programming languages, you can find modules such as Change Case for Javascript, case-converter for Python, a code golf challenge, a regular expression approach to convert string to different case styles, and even a PHP module written by Jawira Portugal called Case Converter that handles quite a few, ahem, cases:

Convert strings between 13 naming conventions: Snake case, Camel case, Kebab case, Pascal case, Ada case, Train case, Cobol case, Macro case, Upper case, Lower case, Title case, Sentence case and Dot notation.

Examples of which might look something like:

snake_case
camelCase
kebab-case
PascalCase
Ada_Case
Train-Case
COBOL-CASE
MACRO_CASE
UPPER CASE
lower case
Title Case
Sentence case
dot.case

I thought it would be an interesting example, to make a Unicode-aware case conversion library for Factor that handles all of those same cases in a small amount of code (less than 35 lines of code!).

The first word looks for a lowercase grapheme, then finds the next one that is not lowercase:

: case-index ( str -- i/f )
    dup [ lower? ] find [
        swap [ lower? not ] find-from drop
    ] [ nip ] if ;

We can then use that method to split the graphemes at these case boundaries:

: split-case ( str -- words )
    >graphemes [ dup empty? not ] [
        dup [ case-index ] [ length or ] bi
        cut-slice swap concat
    ] produce nip ;

Splitting tokens, first on the common token separators, and then on the case boundaries.

: split-tokens ( str -- words )
    " -_." split [ split-case ] map concat ;

And now the core of the algorithm that splits an input string into tokens, with two variants (one that applies a quotation to each token and another that handles the first token differently than the rest) before joining the tokens using a provided glue character.

: case1 ( str quot glue -- str' )
    [ split-tokens ] [ map ] [ join ] tri* ; inline

: case2 ( str first-quot rest-quot glue -- str' )
    {
        [ split-tokens 0 over ]
        [ change-nth dup rest-slice ]
        [ map! drop ]
        [ join ]
    } spread ; inline

Now that’s everything we need to implement all the case conversions!

: >camelcase ( str -- str' ) [ >lower ] [ >title ] "" case2 ;
: >pascalcase ( str -- str' ) [ >title ] "" case1 ;
: >snakecase ( str -- str' ) [ >lower ] "_" case1 ;
: >adacase ( str -- str' ) [ >title ] "_" case1 ;
: >macrocase ( str -- str' ) [ >upper ] "_" case1 ;
: >kebabcase ( str -- str' ) [ >lower ] "-" case1 ;
: >traincase ( str -- str' ) [ >title ] "-" case1 ;
: >cobolcase ( str -- str' ) [ >upper ] "-" case1 ;
: >lowercase ( str -- str' ) [ >lower ] " " case1 ;
: >uppercase ( str -- str' ) [ >upper ] " " case1 ;
: >titlecase ( str -- str' ) [ >title ] " " case1 ;
: >sentencecase ( str -- str' ) [ >title ] [ >lower ] " " case2 ;
: >dotcase ( str -- str' ) [ >lower ] "." case1 ;

These are available in the tokencase vocabulary and is included in the latest nightly builds.

Unicode

Thu, 04 May 2023 09:00:00 -0700

The Rust programming language is pretty cool. I’ve enjoyed many aspects of the Rewrite It In Rust meme that appears as a part of the Rust Evangelism Strike Force. The Rust documentation includes a pretty awesome Rust book that is probably a gold standard for programming language documentation.

In the Rust book, there is a section on Storing UTF-8 Encoded Text with Strings. It contains a neat example that I would like to use to show how Factor string objects work, how we handle Unicode and other character encodings, and show how we probably can make some improvements in the future. At the time of this blog post, we support Unicode 15.0.0 which was released in September 2022.

Factor strings are a sequence of Unicode code points which we explore to see how they work.

Bytes

If we look at the Hindi word “नमस्ते” written in the Devanagari script, it is stored as a vector of u8 values that looks like this:

IN: scratchpad "नमस्ते" utf8 encode .
B{
    224 164 168 224 164 174 224 164 184 224 165 141 224 164 164
    224 165 135
}

Or, as a series of hex values:

IN: scratchpad "नमस्ते" utf8 encode .h
B{
    0xe0 0xa4 0xa8 0xe0 0xa4 0xae 0xe0 0xa4 0xb8 0xe0 0xa5 0x8d
    0xe0 0xa4 0xa4 0xe0 0xa5 0x87
}

You could instead print them as octal or binary quite easily.

Code Points

That’s 18 bytes and is how computers ultimately store this data. If we look at them as Unicode scalar values, which are what Rust’s char type is, those bytes look like this:

IN: scratchpad "नमस्ते" [ 1string ] { } map-as .
{ "न" "म" "स" "्" "त" "े" }

You can see what the code point numeric values are:

IN: scratchpad "नमस्ते" >array .
{ 2344 2350 2360 2381 2340 2375 }

Or even see what the code point names are:

IN: scratchpad "नमस्ते" [ char>name ] { } map-as .
{
    "devanagari-letter-na"
    "devanagari-letter-ma"
    "devanagari-letter-sa"
    "devanagari-sign-virama"
    "devanagari-letter-ta"
    "devanagari-vowel-sign-e"
}

Characters

There are six char values here, but the fourth and sixth are not letters: they’re diacritics that don’t make sense on their own. Finally, if we look at them as grapheme clusters, we’d get what a person would call the four letters that make up the Hindi word:

IN: scratchpad "नमस्ते" >graphemes [ >string ] map .
{ "न" "म" "स्" "ते" }

These graphemes are code points grouped like so:

IN: scratchpad "नमस्ते" >graphemes [ >array ] map .
{ { 2344 } { 2350 } { 2360 2381 } { 2340 2375 } }

Encodings

Rust provides different ways of interpreting the raw string data that computers store so that each program can choose the interpretation it needs, no matter what human language the data is in.

Factor supports many encodings which can be used for interacting with other computer systems. These include ASCII, many legacy 8-bit encodings (including MacRoman, EBCDIC, and others), other Unicode variants (such as UTF-7, UTF-16, and UTF-32), ISO-2022, and several others.

There are a couple of space optimizations to save memory when only small code points are used, which is common in English as well as formats such as Base64. Looking at the Rust standard library, the improvements made to the Python unicode support, or other languages such as Strings and Characters in Swift, there are likely improvements we can make when working with text in Factor.

Color Picker Game

Wed, 03 May 2023 09:00:00 -0700

In the SwiftUI by Tutorials book by Ray Wenderlich, there is a tutorial on building RGBullsEye, which is a game for adjusting RGB Colors using sliders to match a provided random color and providing a “color score” to the user showing how well they matched it. Some users have even posted their solutions on GitHub.

I thought it would be fun to build a version of this example using Factor.

We could generate a color by using random-unit to make three random values for the red, green, and blue slots. Instead, we can pick randomly from the standard color database.

: random-color ( -- color )
    named-colors random named-color ;

Comparing two colors can use the rgba-distance word from the colors.distances vocabulary, returning an integer score out of 100 points:

: color-score ( color1 color2 -- n )
    rgba-distance 1.0 swap - 100.0 * round >integer ;

We can define a gadget type that can be used to find our object in a gadget hierarchy.

TUPLE: color-picker-game < track ;

Given a child of the color-picker-game instance, we can pull out the color-preview gadgets in a slightly fragile way by knowing where they are in the layout:

: find-color-previews ( gadget -- preview1 preview2 )
    [ color-picker-game? ] find-parent
    children>> first children>> first2 ;

Using that, we can make a button that, when clicked:

finds the two color-preview objects
grabs the latest color value from their models
calculates the “color score”
displays it by modifying the button text

: <match-button> ( -- button )
    "Match Color" [
        dup find-color-previews
        [ model>> compute-model ] bi@
        color-score "Your score: %d" sprintf
        over children>> first text<< relayout
    ] <border-button> ;

Another button can be used to reset the color we are trying to match against to a new random color, setting it on the model used by the left color-preview:

: <reset-button> ( -- button )
    "Random" [
        find-color-previews drop model>>
        random-color swap set-model
    ] <border-button> ;

Using these two buttons, and some gadgets from the color picker vocabulary, we can build up our interface, choosing a random color to start, and then laying out the other components we need:

:: <color-picker-game> ( -- gadget )
    vertical color-picker-game new-track { 5 5 } >>gap

    random-color <model>     :> left-model
    \ <rgba> <color-sliders> :> ( sliders right-model )

    horizontal <track>
        left-model <color-preview> 1/2 track-add
        right-model <color-preview> 1/2 track-add
    1 track-add

    sliders                     f track-add
    right-model <color-status>  f track-add
    <match-button>              f track-add
    <reset-button>              f track-add ;

We can make a main entry point, constructing the game and providing it as the main gadget:

MAIN-WINDOW: color-picker-game-window
    { { title "Color Picker Game" } }
    <color-picker-game> >>gadgets ;

This is available in the development version and includes some additional features such as support for additional color spaces along with some improvements to our tabbed gadgets. Give it a try!

OpenAI

Tue, 25 Apr 2023 00:15:00 -0800

It’s been pretty hard to avoid all of the incredible stories about artificial intelligence over the past few months. There seem to be incredible applications to the area of generative AI occurring on a daily basis. Image generation with Midjourney is pretty next-level. Code generation using GitHub Copilot seems pretty amazing. Interacting with large language models like GPT-4 or Bard or Bing Chat or Facebook LLaMA or StableLM and so many others seems like science fiction. Audio models like Whisper used for audio transcription even make popular audio assistants look pretty dated.

With all of the hype, it seemed inevitable that Factor would gain some kind of AI functionality.

Despite the non-profit vs for-profit controversy of OpenAI, they do seem to have a momentary lead in the race to make tools that others can build upon. One of those tools is the OpenAI API, which is made available using JSON and HTTP. Besides the OpenAI API Reference, there is an OpenAI Cookbook and popular libraries such as OpenAI Python for building systems using it.

Recently, I contributed the openai vocabulary which allows using all the methods currently made available by OpenAI. You will need an OpenAI API Key.

Once you obtain that, you can set it in the listener.

IN: scratchpad USE: openai

IN: scratchpad "sk-....................." openai-api-key set-global

And now you have enough to try it out…

IN: scratchpad "what is the factor programming language"
               "text-davinci-003" <completion>
               100 >>max_tokens create-completion
               "choices" of first "text" of print

Factor is a stack-oriented programming language, designed for creating
flexible, reusable software components. It combines elements from both
object-oriented and functional programming, and provides powerful features,
including static typing and static type checking, an interactive program
development environment, built-in automated testing, and a wide range of
built-in data types. The language is designed to be easy to use, yet provide
a high degree of flexibility.

Cool!

We even have a Discord bot using OpenAI that answers on the Factor Discord server.

ASCII Table PDF

Tue, 07 Mar 2023 09:00:00 -0700

Vasudev Ram has a blog with many different posts about various programming topics including Python, Linux, SQL, and PDFs. On the topic of PDF generation, they have a blog post about making an ASCII Table to PDF with xtopdf.

Recently, I had the need for an ASCII table lookup, which I searched for and found, thanks to the folks here:

www.ascii-code.com

That gave me the idea of writing a simple program to generate an ASCII table in PDF. Here is the code for a part of that table - the first 32 (0 to 31) ASCII characters, which are the control characters:

It might not be widely known, but Factor has built-in support for writing to PDF Streams using the formatted output protocol. This supports text styles including changing font names, bold and italic styles, foreground and background colors, etc.

We start by defining the symbols and descriptions of the first 32 ASCII characters. These are all non-printable control character, which is why we use this array of strings to render them in a table.

CONSTANT: ASCII {
    "NUL Null char"
    "SOH Start of Heading"
    "STX Start of Text"
    "ETX End of Text"
    "EOT End of Transmission"
    "ENQ Enquiry"
    "ACK Acknowledgment"
    "BEL Bell"
    "BS Back Space"
    "HT Horizontal Tab"
    "LF Line Feed"
    "VT Vertical Tab"
    "FF Form Feed"
    "CR Carriage Return"
    "SO Shift Out / X-On"
    "SI Shift In / X-Off"
    "DLE Data Line Escape"
    "DC1 Device Control 1 (oft. XON)"
    "DC2 Device Control 2"
    "DC3 Device Control 3 (oft. XOFF)"
    "DC4 Device Control 4"
    "NAK Negative Acknowledgement"
    "SYN Synchronous Idle"
    "ETB End of Transmit Block"
    "CAN Cancel"
    "EM End of Medium"
    "SUB Substitute"
    "ESC Escape"
    "FS File Separator"
    "GS Group Separator"
    "RS Record Separator"
    "US Unit Separator"
}

The core printing logic is a header, followed by rows for each character, formatted into a table of decimal, octal, hexadecimal, and binary values along with their symbol and description from the array above:

: ascii. ( -- )
    "ASCII Control Characters - 0 to 31" print nl
    ASCII [
        1 + swap [
            {
                [ >dec ]
                [ >oct 3 CHAR: 0 pad-head ]
                [ >hex 2 CHAR: 0 pad-head ]
                [ >bin 8 CHAR: 0 pad-head ]
            } cleave
        ] dip " " split1 6 narray
    ] map-index {
        "DEC" "OCT" "HEX" "BIN" "Symbol" "Description"
    } prefix format-table unclip
    H{ { font-style bold } } format nl
    [ print ] each ;

Since the UI listener supports formatted streams, you can see it from the listener:

Outputting this to a PDF file is now easy. We make sure to set the font to monospace and then run ascii. with our PDF writer, saving the generated PDF output into a file.

: ascii-pdf ( path -- )
    [
        H{ { font-name "monospace" } } [ ascii. ] with-style
    ] with-pdf-writer pdf>string swap utf8 set-file-contents ;

We also support writing to HTML streams in a similar manner, so it would be pretty easy to create an ascii-html word to output an HTML file with the same printing logic above but instead using our HTML writer.

Short UUID

Fri, 03 Mar 2023 09:00:00 -0800

The shortuuid project is a “simple python library that generates concise, unambiguous, URL-safe UUIDs”. I thought it would be a fun exercise to implement this in Factor.

What is a “short UUID”?

You can read the original announcement, but basically it is a string representation of a number using a reduced alphabet that can be used in places like URLs where conciseness is desirable. The author mentions that it provides security by “not divulging information (such as how many rows there are in that particular table, the time difference between one item and the next, etc.)”. However, I think it is more security through obscurity than real security.

In any event, the alphabet used are these 57 characters:

CONSTANT: alphabet
"23456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz"

We encode a numeric input by repeatedly “divmod”, indexing into an alphabet, until exhausted.

: encode-uuid ( uuid -- shortuuid )
    [ dup 0 > ] [
        alphabet [ length /mod ] [ nth ] bi
    ] "" produce-as nip reverse ;

We decode using a reverse process, looking up the position of each character in the alphabet, re-assembling the numeric input for each character in the shortuuid.

: decode-uuid ( shortuuid -- uuid )
    0 [
        alphabet index [ alphabet length * ] dip +
    ] reduce ;

This is available on my GitHub, including features to deal with legacy values generated before version 1.0.0 as well as supporting different alphabets being used.

Geo Timezones

Wed, 01 Mar 2023 09:00:00 -0700

Brad Fitzpatrick wrote a Go package called latlong which efficiently maps a latitude/longitude to a timezone. The original post describing it was on Google+ and is likely lost forever – unless it made it into the Google+ archive before Google+ joined the Google Graveyard.

It tries to have a small binary size (~360 KB), low memory footprint (~1 MB), and incredibly fast lookups (~0.5 microseconds). It does not try to be perfectly accurate when very close to borders.

It’s available in other languages, too!

Huon Wilson ported the library to the Rust Programming Language, making the code available on GitHub and installable via Cargo. There is even a wrapper made for NodeJs that is installable via NPM that uses a command-line executable written in Go.

When it was announced in 2015, I had ported the library to Factor, but missed the opportunity to blog about it. Below we discuss some details about the implementation, starting with its use of a shapefile of the TZ timezones of the world to divide the world into zones that are assigned timezone values – looking something like this:

The world is divided into 6 zoom levels of tiles (represented by a key and an index value) that allow us to search from a very large area first, then down to the more specific geographic area. Note: we represent the struct as a big endian struct with structure packing to minimize wasted space in the files.

The zoom levels are then cached using literal syntax into a zoom-levels constant.

BE-PACKED-STRUCT: tile
    { key uint }
    { idx ushort } ;

SPECIALIZED-ARRAY: tile

CONSTANT: zoom-levels $[
    6 <iota> [
        number>string
        "vocab:geo-tz/zoom" ".dat" surround
        binary file-contents tile cast-array
    ] map
]

Each of the zoom levels reference indexes into a leaves data structure that contains 14,110 items – each represented by one of three data types:

Type S is a string.
Type 2 is a one bit tile.
Type P is a pixmap thats 128 bytes long.

These we load and cache into a unique-leaves constant.

CONSTANT: #leaves 14110

BE-PACKED-STRUCT: one-bit-tile
    { idx0 ushort }
    { idx1 ushort }
    { bits ulonglong } ;

CONSTANT: unique-leaves $[
    "vocab:geo-tz/leaves.dat" binary [
        #leaves [
            read1 {
                { CHAR: S [ { 0 } read-until drop utf8 decode ] }
                { CHAR: 2 [ one-bit-tile read-struct ] }
                { CHAR: P [ 128 read ] }
            } case
        ] replicate
    ] with-file-reader
]

The core logic involves looking up a leaf (which is one of three types, loaded above), given an (x, y) coordinate. If it is a string type, we are done. If it is a one-bit-tile, we defer to the appropriate leaf specified by idx0 or idx1. And if it is pixmap, we have a smidge more logic to detect oceans or defer again to a different leaf.

CONSTANT: ocean-index 0xffff

GENERIC#: lookup-leaf 2 ( leaf x y -- zone/f )

M: string lookup-leaf 2drop ;

M:: one-bit-tile lookup-leaf ( leaf x y -- zone/f )
    leaf bits>> y 3 bits 3 shift x 3 bits bitor bit?
    [ leaf idx1>> ] [ leaf idx0>> ] if
    unique-leaves nth x y lookup-leaf ;

M:: byte-array lookup-leaf ( leaf x y -- zone/f )
    y 3 bits 3 shift x 3 bits bitor 2 * :> i
    i leaf nth 8 shift i 1 + leaf nth +
    dup ocean-index = [ drop f ] [
        unique-leaves nth x y lookup-leaf
    ] if ;

We’re almost done! Given a zoom level, a tile-key helps us find a specific tile that we then can lookup the leaf for, hopefully finding the timezone associated with the coordinate.

:: lookup-zoom-level ( zoom-level x y tile-key -- zone/f )
    zoom-level [ key>> tile-key >=< ] search swap [
        dup key>> tile-key = [
            idx>> unique-leaves nth x y lookup-leaf
        ] [ drop f ] if
    ] [ drop f ] if ;

Each coordinate is effectively a pixel in the image, so our logic searches from the outermost zoom level to the innermost, trying to lookup a timezone in each one using the coordinate and level as a tile-key.

:: tile-key ( x y level -- tile-key )
    level dup 3 + neg :> n
    y x [ n shift 14 bits ] bi@
    { 0 14 28 } bitfield ;

:: lookup-pixel ( x y -- zone )
    6 <iota> [| level |
        level zoom-levels nth
        x y 2dup level tile-key
        lookup-zoom-level
    ] map-find-last drop ;

Finally, we have enough to implement our public API, converting a given latitude/longitude coordinate to a pixel value, deferring to the word we just defined above to do the work.

CONSTANT: deg-pixels 32

:: lookup-zone ( lat lon -- zone )
    lon 180 + deg-pixels * 0 360 deg-pixels * 1 - clamp
    90 lat - deg-pixels * 0 180 deg-pixels * 1 - clamp
    [ >integer ] bi@ lookup-pixel ;

And then a couple of test cases to show it’s working:

{ "America/Los_Angeles" } [ 37.7833 -122.4167 lookup-zone ] unit-test

{ "Australia/Sydney" } [ -33.8885 151.1908 lookup-zone ] unit-test

Performance is pretty good, we can generate over 3 million lookups per second, putting our cost per lookup around 0.33 microseconds. And all of that in less than 70 lines of code.

This is available on my GitHub.

Reference Server

Tue, 28 Feb 2023 07:00:00 -0700

Phil Eaton made a repository of Barebones UNIX socket servers with this description:

I find myself writing this server in some language every few months. Each time I have to scour the web for a good reference. Use this as a reference to write your own bare server in C or other languages with a UNIX API (Python, OCaml, etc).

Many developers learning network programming will encounter Beej’s Guide to Network Programming which uses the sockets API, has been ported to many platforms, and explains the intricacies of making computers talk to each other in this manner.

C

We can take a look at his C implementation of a server that listens on port 15000, accepts client connections, reads up to 1024 bytes which are printed to the screen, then writes hello world back to the client and disconnects them:

#include <netinet/in.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <unistd.h>

int main() {
    int server, client;
    socklen_t addrlen;

    int bufsize = 1024;
    char *buffer = malloc(bufsize);
    struct sockaddr_in address;

    server = socket(AF_INET, SOCK_STREAM, 0);

    address.sin_family = AF_INET;
    address.sin_addr.s_addr = INADDR_ANY;
    address.sin_port = htons(15000);

    bind(server, (struct sockaddr *) &address, sizeof(address));

    while (1) {
        listen(server, 10);
        client = accept(server, (struct sockaddr *) &address, &addrlen);
        recv(client, buffer, bufsize, 0);
        printf("%s\n", buffer);
        write(client, "hello world\n", 12);
        close(client);
    }

    close(server);
    return 0;
}

Factor

A direct Factor translation – without any error checking, like in the original example – using the C library interface might look something like this:

USING: accessors alien.c-types alien.data byte-arrays
classes.struct io io.encodings.string io.encodings.utf8 kernel
sequences unix.ffi unix.types ;

:: reference-server ( -- )
    1024 <byte-array> :> buffer
    AF_INET SOCK_STREAM 0 socket :> server
    sockaddr-in malloc-struct
        AF_INET >>family
        0 >>addr
        15000 htons >>port :> address

    server address sockaddr-in heap-size bind drop

    [
        server 10 listen drop
        server address 0 socklen_t <ref> accept :> client
        client buffer 1024 0 recv
        buffer swap head-slice utf8 decode print flush
        client $[ "hello world\n" >byte-array ]
        dup length unix.ffi:write drop
        client close drop
        t
    ] loop

    server close drop ;

I noticed that some of his examples are more idiomatic to the language, so we could rewrite this using threaded servers – gaining the benefit of working on Windows as well as error handling and logging – using a handler quotation to implement the read/print/write/disconnect logic.

USING: accessors io io.encodings.binary io.encodings.string
io.encodings.utf8 io.servers kernel namespaces ;

: reference-server ( -- )
    binary <threaded-server>
        15000 >>insecure
        [
            1024 read-partial [
                [ utf8 decode print flush ] with-global
                $[ "hello world\n" >byte-array ] write flush
            ] when*
        ] >>handler
    start-server wait-for-server ;

This is available on my GitHub.

Weighted Random

Sun, 26 Feb 2023 09:00:00 -0700

Some time ago, I implemented a way to generate weighted random values from a discrete distribution in Factor. It ended up being a pretty satisfyingly simple word that builds a cumulative probability table, generates a random probability, then searches the table to find which value to return:

: weighted-random ( histogram -- obj )
    unzip cum-sum [ last >float random ] keep bisect-left swap nth ;

Is It Fast?

We can define a simple discrete distribution of values:

CONSTANT: dist H{ { "A" 1 } { "B" 2 } { "C" 3 } { "D" 4 } }

And it seems to work – we can make a few random values from it:

IN: scratchpad dist weighted-random .
"C"
IN: scratchpad dist weighted-random .
"C"
IN: scratchpad dist weighted-random .
"D"
IN: scratchpad dist weighted-random .
"B"

After generating a lot of random values, we can see the histogram matches our distribution:

IN: scratchpad 10,000,000 [ dist weighted-random ] replicate histogram .
H{
    { "A" 998403 }
    { "B" 2000400 }
    { "C" 3001528 }
    { "D" 3999669 }
}

But, how fast is it?

IN: scratchpad [ 10,000,000 [ dist weighted-random ] replicate ] time 
Running time: 3.02998325 seconds

Okay, so it’s not that fast… generating around 3.3 million per second on one of my computers.

Improvements

We can make two quick improvements to this:

First, we can factor out the initial step from the random number generation.
Second, we can take advantage of a recent improvement to the random vocabulary, mainly to change the random word that was previously implemented for different types to instead get the current random-generator and then pass it to the random* implementation instead. This allows a few speedups where we can lookup the dynamic variable once and then use it many times.

That results in this definition:

: weighted-randoms ( length histogram -- seq )
    unzip cum-sum swap
    [ [ last >float random-generator get ] keep ] dip
    '[ _ _ random* _ bisect-left _ nth ] replicate ;

That gives us a nice speedup, just over 10 million per second:

IN: scratchpad [ 10,000,000 dist weighted-randoms ] time histogram .
Running time: 0.989039625 seconds

H{
    { "A" 1000088 }
    { "B" 1999445 }
    { "C" 3000688 }
    { "D" 3999779 }
}

That’s pretty nice, but it turns out that we can do better.

Vose Alias Method

Keith Schwarz wrote a fascinating blog post about some better algorithms for sampling from a discrete distribution. One of those algorithms is the Vose Alias Method which creates a data structure of items, probabilities, and an alias table that is used to return an alternate choice:

TUPLE: vose
    { n integer }
    { items array }
    { probs array }
    { alias array } ;

We construct a vose tuple by splitting the distribution into items and their probabilities, and then processing the probabilities into lists of small (less than 1) or large (greater than or equal to 1), iteratively aliasing the index of smaller items to larger items.

:: <vose> ( dist -- vose )
    V{ } clone :> small
    V{ } clone :> large
    dist assoc-size :> n
    n f <array> :> alias

    dist unzip dup [ length ] [ sum ] bi / v*n :> ( items probs )
    probs [ swap 1 < small large ? push ] each-index

    [ small empty? not large empty? not and ] [
        small pop :> s
        large pop :> l
        l s alias set-nth
        l dup probs [ s probs nth + 1 - dup ] change-nth
        1 < small large ? push
    ] while

    1 large [ probs set-nth ] with each
    1 small [ probs set-nth ] with each

    n items probs alias vose boa ;

We can implement the random* generic to select a random item from the vose tuple – choosing a random item index, check it’s probability against a random number between 0.0 and 1.0, and if it is over a threshold we return the aliased item instead:

M:: vose random* ( obj rnd -- elt )
    obj n>> rnd random*
    dup obj probs>> nth rnd (random-unit) >=
    [ obj alias>> nth ] unless
    obj items>> nth ;

It’s much faster, over 14.4 million per second:

IN: scratchpad [ 10,000,000 dist <vose> randoms ] time 
Running time: 0.693588458 seconds

This is available now in the math.extras vocabulary in the current development version, along with a few tweaks that brings the performance over 21.7 million per second.

DuckDuckGo

Wed, 15 Feb 2023 09:00:00 -0700

The conversation around the current quality of web search engines, the doomsday prediction about various incumbents, and the equal parts inspiring and challenging rollout of large language models to improve search has been fascinating to watch. There are many challengers in the search engine space including companies like Kagi and Neeva among many search engine startups. One privacy-focused startup that has been fun to follow for awhile has been DuckDuckGo.

You can see an example of the DuckDuckGo API that is available on api.duckduckgo.com. This does not provide access to their full search results, but instead provides access to their instant answers. Regardless, I thought it would be neat if we could use this from Factor.

We can take a search query and turn it into a URL object:

: duckduckgo-url ( query -- url )
    URL" http://api.duckduckgo.com"
        swap "q" set-query-param
        "json" "format" set-query-param
        "1" "pretty" set-query-param
        "1" "no_redirect" set-query-param
        "1" "no_html" set-query-param
        "1" "skip_disambig" set-query-param ;

Using the http.client vocabulary and the json vocabulary we can retrieve a result set:

: duckduckgo ( query -- results )
    duckduckgo-url http-get nip utf8 decode json> ;

We can make a word that prints out the abstract response with clickable links:

: abstract. ( results -- )
    dup "Heading" of [ drop ] [
        swap {
            [ "AbstractURL" of >url write-object nl ]
            [ "AbstractText" of print ]
            [ "AbstractSource" of "- " write print ]
        } cleave nl
    ] if-empty ;

And then a word that prints out a result response, parsing the HTML using the html.parser vocabulary and output as text using the html.parser.printer vocabulary:

: result. ( result -- )
    "Result" of [
        "<a href=\"" ?head drop "\">" split1 "</a>" split1
        [ swap >url write-object ]
        [ parse-html html-text. nl ] bi*
    ] when* ;

There are more aspects to the response from the API, but we can initially print out the abstract, the results, and the related topics:

: duckduckgo. ( query -- )
    duckduckgo {
        [ abstract. ]
        [ "Results" of [ result. ] each ]
        [ "RelatedTopics" of [ result. ] each ]
    } cleave ;

We can try it out on a topic that this particular blog likes to discuss:

IN: scratchpad "factorcode" duckduckgo.
Factor (programming language)
Factor is a stack-oriented programming language created by Slava
Pestov. Factor is dynamically typed and has automatic memory
management, as well as powerful metaprogramming features. The
language has a single implementation featuring a self-hosted
optimizing compiler and an interactive development environment.
The Factor distribution includes a large standard library.
- Wikipedia

Official site - Factor (programming language)
Concatenative programming languages
Stack-oriented programming languages
Extensible syntax programming languages
Function-level languages
High-level programming languages
Programming languages
Software using the BSD license

This is available on my GitHub.

Magic

Sun, 12 Feb 2023 09:00:00 -0800

Ever wonder what the type of a particular binary file is? Or wonder how a program knows that a particular binary file is in a compatible file format? One way is to look at the magic number used by the file format in question. You can see some examples in a list of file signatures.

The libmagic library commonly supports the file command on Unix systems, other than Apple macOS which has its own implementation, and uses magic numbers and other techniques to identify file types. You can see how it works through a few examples:

$ file vm/factor.hpp
vm/factor.hpp: C++ source text, ASCII text

$ file Factor.app/Contents/Info.plist 
Factor.app/Contents/Info.plist: XML  document text

$ file factor
factor: Mach-O 64-bit executable x86_64

$ file factor.image
factor.image: data

Wrapping the C library

I am going to show how to wrap a C library using the alien vocabulary which provides an FFI capability in Factor. The man pages for libmagic show us some of the functions available in magic.h.

The libmagic library needs to be made available to the Factor instance:

"magic" {
    { [ os macosx? ] [ "libmagic.dylib" ] }
    { [ os unix? ] [ "libmagic.so" ] }
} cond cdecl add-library

We start by defining an opaque type for magic_t:

TYPEDEF: void* magic_t

Some functions are available for opening, loading, and then closing the magic_t:

FUNCTION: magic_t magic_open ( int flags )

FUNCTION: int magic_load ( magic_t magic, c-string path )

FUNCTION: void magic_close ( magic_t magic )

It is convenient to wrap the close function as a destructor for use in a with-destructors form.

DESTRUCTOR: magic_close

A function that “returns a textual description of the contents of the filename argument”, which gives us the file command ability above:

FUNCTION: c-string magic_file ( magic_t magic, c-string path )

That should be everything we need to continue…

Using the C library

Now that we have the raw C library made available as Factor words, we can create a simpler interface by wrapping some of the words into a simple word that guesses the type of a file:

: guess-file ( path -- result )
    [
        normalize-path
        0 magic_open &magic_close
        [ f magic_load drop ]
        [ swap magic_file ] bi
    ] with-destructors ;

And we can then try it on a few files:

IN: scratchpad "vm/factor.hpp" guess-file .
"C++ source, ASCII text"

IN: scratchpad "Factor.app/Contents/Info.plist" guess-file .
"XML 1.0 document, Unicode text, UTF-8 text"

IN: scratchpad "factor" guess-file .
"symbolic link to Factor.app/Contents/MacOS/factor"

IN: scratchpad "factor.image" guess-file .
"data"

This has been available for awhile in the magic vocabulary with improved error checking and some options to guess the MIME type of files.

Hipku

Wed, 08 Feb 2023 09:00:00 -0800

Once upon a time, there was a Javascript project called Hipku. The original post that described it was lost somewhere in the series of tubes, but thankfully the “full documentation and a working demo” was saved by the Wayback Machine. It is also still available on npm for installation.

Hipku is a small javascript library to encode IP addresses as haiku. It can express any IPv4 or IPv6 address as a Western-style 5/7/5 syllable haiku.

An implementation in Python was created called PyHipku. It is still available on PyPi for installation, but the website associated with it was also lost to history and not even the Great Wayback Machine seems able to recover it. I think of programming as aspiring to a kind of poetic result – and wonder what kind of a language could run Waka Waka Bang Splat – well, the haiku style caught my interest, so I ported the hipku algorithm to the Factor programming language.

At it’s core, we encode an IPv4 address or IPv6 address into a series of numerical values and then make a poem by looking up each word from a word list. Some symbols are defined to help us know to start a sentence with an uppercase letter or end a sentence with a period:

SYMBOLS: Octet octet octet. ;

For example, an IPv4 key specifies the word lists to use for each octet and an IPv4 schema specify how the octets form into a hipku – an f indicates a newline:

CONSTANT: ipv4-key ${
    animal-adjectives animal-colors animal-nouns animal-verbs
    nature-adjectives nature-nouns plant-nouns plant-verbs
}

CONSTANT: ipv4-schema ${
    "The" octet octet octet f
    octet "in the" octet octet. f
    Octet octet.
}

To create the hipku, we iterate across the key, choosing words numerically by looking up the octet value, and then composing them into the ordering specified by the schema.

You can see a couple examples below:

IN: scratchpad "127.0.0.1" >hipku print
The hungry white ape
aches in the ancient canyon.
Autumn colors crunch.

IN: scratchpad "2001:db8:3333:4444:5555:6666:7777:8888" >hipku print
Chilled apes and blunt seas
clap dear firm firm grim grim gnomes.
Large holes grasp pained mares.

We support both encoding into a hipku as well as decoding back into an IPv4/IPv6 address. This is available as the hipku vocabulary in a recent nightly build.

Proquint

Tue, 07 Feb 2023 08:00:00 -0800

A few days ago, Ciprian Dorin Craciun wrote a binary to text encoding blog post about the “state of the art and missed opportunities” in various encoding schemes. In that post, I was introduced to the Proquint encoding which stands for “PRO-nouncable QUINT-uplets”.

In the Factor programming language, we have enjoyed implementing many encoding/decoding methods including: base16, base24, base32, base32hex, base32-crockford, base36, base58, base62, base64, base85, base91, uu, and many others. I thought it would be fun to add a quick implementation of Proquint.

Like other encodings, it makes use of an alphabet – grouped as consonants and vowels:

CONSTANT: consonant "bdfghjklmnprstvz"

CONSTANT: vowel "aiou"

Numbers are grouped into 5-character blocks representing a 16-bit number, with alternating consonants representing 4 bits and vowels representing 2 bits:

: >quint16 ( m -- str )
    5 [
        even? [
            [ -4 shift ] [ 4 bits consonant nth ] bi
        ] [
            [ -2 shift ] [ 2 bits vowel nth ] bi
        ] if
    ] "" map-integers-as reverse nip ;

Encoding a 32-bit number is made by joining two 16-bit blocks:

: >quint32 ( m -- str )
    [ -16 shift ] keep [ 16 bits >quint16 ] bi@ "-" glue ;

Decoding numbers looks up each consonant or vowel, skipping separators:

: quint> ( str -- m )
    0 [
        dup $[ consonant alphabet-inverse ] nth [
            nip [ 4 shift ] [ + ] bi*
        ] [
            dup $[ vowel alphabet-inverse ] nth [
                nip [ 2 shift ] [ + ] bi*
            ] [
                CHAR: - assert=
            ] if*
        ] if*
    ] reduce ;

We can use this to make a random password that might be more memorable – but perhaps more secure if using more random-bits:

: quint-password ( -- quint )
    32 random-bits >quint32 ;

And we could use our ip-parser vocabulary to make IPv4 addresses more memorable:

: ipv4>quint ( ipv4 -- str )
    ipv4-aton >quint32 ;

: quint>ipv4 ( str -- ipv4 )
    quint> ipv4-ntoa ;

You can see how this might work by building a test suite to show roundtrips work:

{ t } [
    {
        { "127.0.0.1"       "lusab-babad" }
        { "63.84.220.193"   "gutih-tugad" }
        { "63.118.7.35"     "gutuk-bisog" }
        { "140.98.193.141"  "mudof-sakat" }
        { "64.255.6.200"    "haguz-biram" }
        { "128.30.52.45"    "mabiv-gibot" }
        { "147.67.119.2"    "natag-lisaf" }
        { "212.58.253.68"   "tibup-zujah" }
        { "216.35.68.215"   "tobog-higil" }
        { "216.68.232.21"   "todah-vobij" }
        { "198.81.129.136"  "sinid-makam" }
        { "12.110.110.204"  "budov-kuras" }
    } [
        [ quint>ipv4 = ] [ swap ipv4>quint = ] 2bi and
    ] assoc-all?
] unit-test

This is now available as the proquint vocabulary in a recent nightly build.

Semantic Versioning

Tue, 24 Jan 2023 11:00:00 -0800

Semantic Versioning (or “semver” for short) is a specification for handling version numbers, and providing a way to sort and specify compatibility using a MAJOR.MINOR.PATCH structure with optional “pre-release” and “build” information.

Some examples of semantic version numbers:

1.0.0-alpha
1.0.0-beta+win32
1.0.0-rc.1
1.0.0

For a long time, I thought it might be funny to follow the upcoming release of Factor version 0.99 with version 0.100. Well, if we wanted to be consistent with “semver”, it might instead have to be something like 0.100.0-joke+haha.

There is now a semver vocabulary that provides some words for sorting and working with semantic version numbers. Here’s an example using it:

IN: scratchpad USE: semver

IN: scratchpad "0.99.0" >semver bump-alpha semver.
0.99.1-alpha.0

IN: scratchpad "0.99.0" >semver bump-preminor bump-rc semver.
0.100.0-rc.0

IN: scratchpad "0.99.0" "0.100.0" semver<=> .
+lt+

IN: scratchpad "0.100.0-joke+haha" >semver bump-major semver.
1.0.0

Reading the Semantic Versioning 2.0.0 specification, it suggests using the version numbers to represent compatibility with previous versions. And many languages have package managers that use these compatibility guarantees with “semver ranges” to manage project dependencies.

Five Questions

Sat, 21 Jan 2023 11:00:00 -0800

Many years ago, there was a blog post containing five programming problems every software engineer should be able to solve in less than 1 hour. I had bookmarked it at the time and didn’t notice the controversy it created on Reddit. The original link seems to be down – you can view a copy of it on the Wayback Machine – but there are various solutions posted online, including a solution in Python.

I finally got around to looking at it and writing up some solutions to the problems listed. Apparently, instead of solving this in 1 hour in Factor, it took me almost 8 years:

IN: scratchpad 2015 05 09 <date> days-since >integer .
2814

Problem 1

Write three functions that compute the sum of the numbers in a given list using a for-loop, a while-loop, and recursion.

In idiomatic Factor, this is just sum, and we would typically use sequence combinators, but instead here are a few solutions using lexical variables.

Using a for-loop, iterating forwards:

:: solve-1 ( seq -- n )
    0 seq length [ seq nth-unsafe + ] each-integer ;

Using a while-loop, iterating forwards:

:: solve-1 ( seq -- n )
    0 0 seq length :> n
    [ dup n < ] [
        [ seq nth-unsafe + ] [ 1 + ] bi
    ] while drop ;

Using recursion, iterating backwards:

:: (solve-1) ( accum i seq -- accum' )
    accum i [
        1 - seq [ nth-unsafe + ] 2keep (solve-1)
    ] unless-zero ;

: solve-1 ( seq -- n )
    0 swap [ length ] keep (solve-1) ;

Some test cases to confirm behavior:

{ 0 } [ { } solve-1 ] unit-test

{ 1 } [ { 1 } solve-1 ] unit-test

{ 6 } [ { 1 2 3 } solve-1 ] unit-test

Problem 2

Write a function that combines two lists by alternatively taking elements. For example: given the two lists [a, b, c] and [1, 2, 3], the function should return [a, 1, b, 2, c, 3].

We can alternately choose items from each list:

: solve-2 ( seq1 seq2 -- newseq )
    [ min-length 2 * ] 2keep '[
        [ 2/ ] [ even? ] bi _ _ ? nth-unsafe
    ] { } map-integers-as ;

Some test cases to confirm behavior:

{ { "a" 1 "b" 2 "c" 3 } } [
    { "a" "b" "c" } { 1 2 3 } solve-2
] unit-test

{ { "a" 1 "b" 2 "c" 3 } } [
    { "a" "b" "c" "d" } { 1 2 3 } solve-2
] unit-test

Problem 3

Write a function that computes the list of the first 100 Fibonacci numbers. By definition, the first two numbers in the Fibonacci sequence are 0 and 1, and each subsequent number is the sum of the previous two.

There are many approaches, including using memoization, but instead we’ll just iterate from the starting values and use replicate to build up an output array.

: solve-3 ( n -- seq )
    [ 0 1 ] dip [ dup rot [ + ] keep ] replicate 2nip ;

Some test cases to confirm behavior:

{ { } } [ 0 solve-3 ] unit-test

{ { 0 } } [ 1 solve-3 ] unit-test

{ { 0 1 } } [ 2 solve-3 ] unit-test

{ { 0 1 1 2 3 5 8 13 21 34 } } [ 10 solve-3 ] unit-test

{ 573147844013817084100 } [ 100 solve-3 sum ] unit-test

Problem 4

Write a function that given a list of non negative integers, arranges them such that they form the largest possible number. For example, given [50, 2, 1, 9], the largest formed number is 95021.

We can try each-permutation of the input numbers, looking for their largest numerical value when the digits are concatenated:

: solve-4 ( seq -- n )
    0 swap [ number>string ] map
    [ concat string>number max ] each-permutation ;

Some test cases to confirm behavior:

{ 95021 } [ { 50 2 1 9 } solve-4 ] unit-test

{ 5523 } [ { 52 5 3 } solve-4 ] unit-test

Problem 5

Write a program that outputs all possibilities to put + or - or nothing between the numbers 1, 2, …, 9 (in this order) such that the result is always 100. For example: 1 + 2 + 34 – 5 + 67 – 8 + 9 = 100.

This one is more complicated than the previous ones, but we can build it up piece by piece, using a test-case on each step to show how it works.

First, we want a word to interleave numbers amongst operators using solve-2:

: insert-numbers ( operators -- seq )
    [ length [1..b] ] [ solve-2 ] [ length 1 + suffix ] tri ;

{ { 1 f 2 + 3 f 4 } } [ { f + f } insert-numbers ] unit-test

Next, we want a word that will join adjacent digits – separated by f:

GENERIC: digits, ( prev intermediate -- next )
M: number digits, [ 10 * ] [ + ] bi* ;
M: object digits, [ [ , 0 ] dip , ] when* ;

: join-digits ( seq -- seq )
    [ [ ] [ digits, ] map-reduce , ] { } make ;

{ { 12 + 34 } } [ { 1 f 2 + 3 f 4 } join-digits ] unit-test

Since Factor is a kind of Reverse Polish notation, we’ll want to swap from infix to postfix:

: swap-operators ( seq -- seq )
    dup rest-slice 2 <groups> [ 0 1 rot exchange ] each ;

{ { 12 34 + } } [ { 12 + 34 } swap-operators ] unit-test

The solution, then, is to use all-selections of addition, subtraction, and adjacency – interleaving the numbers, joining adjacent digits, swapping operators, and then calling each sequence as a quotation, filtering for the ones that return 100:

: solve-5 ( -- solutions )
    { + - f } 8 all-selections
    [ insert-numbers join-digits swap-operators ] map
    [ >quotation call( -- x ) 100 = ] filter ;

We can print the formula out by swapping the operators back to infix and printing them out:

: print-formula ( solutions -- )
    [ present ] map swap-operators " " join print ;

: solve-5. ( -- )
    solve-5 [ print-formula ] each ;

Spoilers! The printed solutions:

IN: scratchpad solve-5.
1 + 2 + 3 - 4 + 5 + 6 + 78 + 9
1 + 2 + 34 - 5 + 67 - 8 + 9
1 + 23 - 4 + 5 + 6 + 78 - 9
1 + 23 - 4 + 56 + 7 + 8 + 9
12 + 3 + 4 + 5 - 6 - 7 + 89
12 + 3 - 4 + 5 + 67 + 8 + 9
12 - 3 - 4 + 5 - 6 + 7 + 89
123 + 4 - 5 + 67 - 89
123 + 45 - 67 + 8 - 9
123 - 4 - 5 - 6 - 7 + 8 - 9
123 - 45 - 67 + 89

GetPercentageRounds

Thu, 19 Jan 2023 08:10:00 -0800

There was a funny post on Twitter a couple of days ago about a recent event where the “Dutch government was forced to release the source code of their DigiD digital authentication iOS app” with this piece of C# code:

Some very funny discussions continued, with comments about how good or bad this code is, and how one might rewrite it in various ways. I thought it would be a fun opportunity to show a few variations of this simple function in Factor.

Implementations

A direct translation of this code, might use cond which is basically a sequence of if statements:

: get-percentage-rounds ( percentage -- str )
    {
        { [ dup 0.0 <= ] [ drop "⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪" ] }
        { [ dup 0.0 0.1 between? ] [ drop "🔵⚪⚪⚪⚪⚪⚪⚪⚪⚪" ] }
        { [ dup 0.1 0.2 between? ] [ drop "🔵🔵⚪⚪⚪⚪⚪⚪⚪⚪" ] }
        { [ dup 0.2 0.3 between? ] [ drop "🔵🔵🔵⚪⚪⚪⚪⚪⚪⚪" ] }
        { [ dup 0.3 0.4 between? ] [ drop "🔵🔵🔵🔵⚪⚪⚪⚪⚪⚪" ] }
        { [ dup 0.4 0.5 between? ] [ drop "🔵🔵🔵🔵🔵⚪⚪⚪⚪⚪" ] }
        { [ dup 0.5 0.6 between? ] [ drop "🔵🔵🔵🔵🔵🔵⚪⚪⚪⚪" ] }
        { [ dup 0.6 0.7 between? ] [ drop "🔵🔵🔵🔵🔵🔵🔵⚪⚪⚪" ] }
        { [ dup 0.7 0.8 between? ] [ drop "🔵🔵🔵🔵🔵🔵🔵🔵⚪⚪" ] }
        { [ dup 0.8 0.9 between? ] [ drop "🔵🔵🔵🔵🔵🔵🔵🔵🔵⚪" ] }
        [ drop "🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵" ]
    } cond ;

But since this is a series of if statements checked sequentially, you can just check the upper bounds. And since we only care about the argument for the comparison, we can use cond-case:

: get-percentage-rounds ( percentage -- str )
    {
        { [ 0.0 <= ] [ "⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪" ] }
        { [ 0.1 <= ] [ "🔵⚪⚪⚪⚪⚪⚪⚪⚪⚪" ] }
        { [ 0.2 <= ] [ "🔵🔵⚪⚪⚪⚪⚪⚪⚪⚪" ] }
        { [ 0.3 <= ] [ "🔵🔵🔵⚪⚪⚪⚪⚪⚪⚪" ] }
        { [ 0.4 <= ] [ "🔵🔵🔵🔵⚪⚪⚪⚪⚪⚪" ] }
        { [ 0.5 <= ] [ "🔵🔵🔵🔵🔵⚪⚪⚪⚪⚪" ] }
        { [ 0.6 <= ] [ "🔵🔵🔵🔵🔵🔵⚪⚪⚪⚪" ] }
        { [ 0.7 <= ] [ "🔵🔵🔵🔵🔵🔵🔵⚪⚪⚪" ] }
        { [ 0.8 <= ] [ "🔵🔵🔵🔵🔵🔵🔵🔵⚪⚪" ] }
        { [ 0.9 <= ] [ "🔵🔵🔵🔵🔵🔵🔵🔵🔵⚪" ] }
        [ drop "🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵" ]
    } cond-case ;

One suggestion was to generate a substring based on the input – with the somewhat negative aspect that it allocates memory for the returned string when called:

: get-percentage-rounds ( percentage -- str )
    10 * 10 swap - >integer dup 10 +
    "🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪" subseq ;

But another way would be to just index into the possible results, using quoted words to reduce the amount of tokens involved – resulting in this fairly aesthetic result:

: get-percentage-rounds ( percentage -- str )
    10 * ceiling >integer qw{
        ⚪⚪⚪⚪⚪⚪⚪⚪⚪⚪
        🔵⚪⚪⚪⚪⚪⚪⚪⚪⚪
        🔵🔵⚪⚪⚪⚪⚪⚪⚪⚪
        🔵🔵🔵⚪⚪⚪⚪⚪⚪⚪
        🔵🔵🔵🔵⚪⚪⚪⚪⚪⚪
        🔵🔵🔵🔵🔵⚪⚪⚪⚪⚪
        🔵🔵🔵🔵🔵🔵⚪⚪⚪⚪
        🔵🔵🔵🔵🔵🔵🔵⚪⚪⚪
        🔵🔵🔵🔵🔵🔵🔵🔵⚪⚪
        🔵🔵🔵🔵🔵🔵🔵🔵🔵⚪
        🔵🔵🔵🔵🔵🔵🔵🔵🔵🔵
    } nth ;

It’s always fun to see different ways to solve problems. In the Twitter thread, that includes using binary search, building the output character-by-character, generating solutions using ChatGPT, one-liners in Python, pattern matching, unit testing, and discussions of edge cases and naming conventions.

Project Gemini

Mon, 16 Jan 2023 08:10:00 -0800

Project Gemini is a neat modern take on the Gopher protocol. You can read the Gemini FAQ or the Gemini specification to learn more details, but the home page has a nice summary:

Gemini is a new internet protocol which

Is heavier than gopher

Is lighter than the web

Will not replace either

Strives for maximum power to weight ratio

Takes user privacy very seriously

There are some nice Gemini clients implemented in various languages, for both the command-line and with nice user interfaces. I happen to enjoy using AV-98 and Lagrange, but many others are also great.

In a similar manner to my Gopher implementation in Factor, I recently implemented the Gemini protocol as well as a Gemini server and a Gemini user interface:

Instead of going into how the protocol or the user interface is implemented, I wanted to go over the Gemini command-line interface. In the spirit of Python’s cmd module, I contributed the command-loop vocabulary to support generic line-oriented command interpreters.

We start by making a sequence of commands that our Gemini interpreter will support:

CONSTANT: COMMANDS {
    T{ command
        { name "back" }
        { quot [ drop gemini-back ] }
        { help "Go back to the previous Gemini URL." }
        { abbrevs { "b" } } }
    T{ command
        { name "forward" }
        { quot [ drop gemini-forward ] }
        { help "Go forward to the next Gemini URL." }
        { abbrevs { "f" } } }
    T{ command
        { name "history" }
        { quot [ drop gemini-history ] }
        { help "Display recently viewed Gemini URLs." }
        { abbrevs { "h" "hist" } } }
    T{ command
        { name "less" }
        { quot [ drop gemini-less ] }
        { help "View the most recent Gemini URL in a pager." }
        { abbrevs { "l" } } }
    T{ command
        { name "ls" }
        { quot [ gemini-ls ] }
        { help "List the currently available links." }
        { abbrevs f } }
    T{ command
        { name "go" }
        { quot [ gemini-go ] }
        { help "Go to a Gemini URL" }
        { abbrevs { "g" } } }
    T{ command
        { name "gus" }
        { quot [ drop "gemini://gus.guru/search" gemini-go ] }
        { help "Submit a query to the GUS search engine." }
        { abbrevs f } }
    T{ command
        { name "up" }
        { quot [ drop gemini-up ] }
        { help "Go up one directory from the recent Gemini URL." }
        { abbrevs { "u" } } }
    T{ command
        { name "url" }
        { quot [ drop gemini-url ] }
        { help "Print the most recent Gemini URL." }
        { abbrevs f } }
    T{ command
        { name "reload" }
        { quot [ drop gemini-reload ] }
        { help "Reload the most recent Gemini URL." }
        { abbrevs { "r" } } }
    T{ command
        { name "root" }
        { quot [ drop gemini-root ] }
        { help "Navigate to the most recent Gemini URL's root." }
        { abbrevs f } }
    T{ command
        { name "shell" }
        { quot [ gemini-shell ] }
        { help "'cat' the most recent Gemini URL through a shell." }
        { abbrevs { "!" } } }
    T{ command
        { name "quit" }
        { quot [ drop gemini-quit ] }
        { help "Quit the program." }
        { abbrevs { "q" "exit" } } }
}

And then we define a custom command-loop that will allow us to number the links on a Gemini page, and then by typing a number we can navigate to one of the links by detecting a “missing command”:

TUPLE: gemini-command-loop < command-loop ;

M: gemini-command-loop missing-command
    over string>number [ 1 - LINKS ?nth ] [ f ] if* [
        gemini-go 3drop
    ] [
        call-next-method
    ] if* ;

Finally, we make a simple MAIN: word to run it:

: gemini-main ( -- )
    "Welcome to Project Gemini!" "GEMINI>"
    gemini-command-loop new-command-loop
    COMMANDS [ over add-command ] each
    run-command-loop ;

MAIN: gemini-main

You can see it in action:

$ ./factor -run=gemini.cli
Welcome to Project Gemini!
GEMINI> go gemini.circumlunar.space/news/

Official Project Gemini news feed

[1] Atom feed

2023 News

[2] 2023-01-14 - Tidying up gemini.circumlunar.space user capsules
[3] 2023-01-08 - Changing DNS server

2022 News

[4] 2022-06-20 - Three years of Gemini!
[5] 2022-01-30 - Minor specification update (0.16.1)
[6] 2022-01-22 - Mailing list archives, Atom feed for official news
[7] 2022-01-16 - Mailing list downtime, official news feed

Guided Tour of Factor

Sun, 15 Jan 2023 08:10:00 -0800

Many years ago, Andrea Ferretti created a Factor tutorial which I had written about at the time.

One of our new core developers, Raghu Ranganathan, got permission to include the tutorial in the main Factor repository, and has reformatted it and updated it for the not-yet-released version 0.99 as the Guided tour of Factor.

You can access it in a recent nightly build by doing:

IN: scratchpad "tour" help

Check it out!

Speedrun Feedback

Fri, 04 Feb 2022 08:10:00 -0800

Recently, Tomasz Wegrzanowski chose Factor for his 100 Languages Speedrun: Episode 71: Factor and it encouraged us to make some improvements that I wanted to describe.

Many of our users use the Factor environment through the UI developer tools or on the command-line with the listener. Another important use case is being able to eval and run scripts – and this is where much of Tomasz’ criticism was focused.

We now do command-line eval and run scripts with auto-use? enabled. This will be available in the nightly builds and as part of an upcoming 0.99 release.

So this works now:

$ ./factor -e="1 2 + ."
3

$ cat foo.factor
USE: io
"Hello World" print
12

$ ./factor foo.factor
Hello World

--- Data stack:
12

Previously, the first example would error with a “No word named “+” found in current vocabulary search path” and the second example would complain that the “Quotation’s stack effect does not match call site” because the script did not have a ( -- ) stack effect.

I appreciate that some users approach Factor differently than I do, and we love getting feedback. I wish we could solve the name conflict with factor(1), but that is more challenging.

We may adjust this slightly as it just landed last night, and if anyone has further suggestions, please keep them coming!

Factor 0.98 now available

Tue, 31 Jul 2018 10:09:00 -0700

“Even though you’re growing up, you should never stop having fun.” - Nina Dobrev

I’m very pleased to announce the release of Factor 0.98!

OS/CPU	Windows	Mac OS	Linux
x86	0.98	0.98	0.98
x86-64	0.98	0.98	0.98

Source code: 0.98

This release is brought to you with almost 4,300 commits by the following individuals:

Alexander Ilin, Arkady Rost, Benjamin Pollack, Björn Lindqvist, Cat Stevens, Chris Double, Dimage Sapelkin, Doug Coleman, Friedrich von Never, John Benediktsson, Jon Harper, Mark Green, Mark Sweeney, Nicolas Pénet, Philip Dexter, Robert Vollmert, Samuel Tardieu, Sankaranarayanan Viswanathan, Shane Pelletier, @catb0t, @hyphz, @thron7, @xzenf

Besides several years of bug fixes and library improvements, I want to highlight the following changes:

Improved user interface with light and dark themes and new icons
Fix GTK library issue affecting some Linux installations
Support Cocoa TouchBar on MacOS
Factor REPL version banner includes build information.
Bindings for ForestDB
New graphical demos including Minesweeper, Game of Life, Bubble Chamber, etc.
Better handling of “out of memory” errors
Improved VM and compiler documentation, test fixtures, and bug fixes
Much faster Heaps and Heapsort
Support for Adobe Brackets, CotEditor, and Microsoft Visual Studio Code editors
On Mac OS, allow use of symlinks to factor binary
Lots of improvements to FUEL (Factor’s emacs mode)

Some possible backwards compatibility issues:

Flattened unicode namespace (either USE: ascii or USE: unicode).
Unified CONSTRUCTOR: syntax to include generated word name
Returning a struct by value with two register-sized values on 64bit now works correctly
Since shutdown hooks run first, calling exit will now unconditionally exit even if there is an error
On Windows, don’t call cd to change directory when launching processes; there is another mechanism for that
In libc, renamed (io-error) to throw-errno
In match, renamed ?1-tail to ?rest
In sequences, renamed iota to <iota>
In sequences, renamed start/start* to subseq-start/subseq-start-from
In syntax, renamed GENERIC# to GENERIC#:
Improve command-line argument parsing of “executable”
Make buffered-port not have a length, because of problem with Linux virtual files and TCP sockets
Fix broken optimization that made floats work for integer keys in case statements
Growable sequences expand by factor of 2 (instead of 3) when growing
Removed support for “triple-quote” strings

What is Factor

New Libraries:

backticks: process backtick syntax for processes
bencode: support for bencoding
boolean-expr: simple boolean expression evaluator and simplifier
bubble-chamber: variation of Jared Tarbell’s Bubble Chamber
calendar.elapsed: rendering elapsed time to text
calendar.english: separated out English localization
checksums.crc16: CRC16 checksum implementation
checksums.metrohash: MetroHash checksum implementation
checksums.multi: support multiple checksums in one pass
checksums.process: support for using command-line checksums
checksums.ripemd: RIPEMD checksum implementation
checksums.sodium: Sodium crypto library checksum implementation
cocoa.touchbar: support for the Apple Touchbar
colors.flex-hex: implement “flex hex” color algorithm
crontab: parsing the cron format
cuckoo-filters: implementation of Cuckoo Filter data structure
dbf: parsers for DBase file format
editors.brackets: support for Adobe Brackets
editors.cot: support for CotEditor
editors.visual-studio-code: support for Microsoft Visual Studio Code
emojify: add emoji’s to text ("I :heart: Factor :+1:")
english: tools for English language words
enigma: implementation of Enigma cipher machine
escape-strings: minimal string escaping algorithm
etc-hosts: cross-platform /etc/hosts file parser
file-monitor: cross-platform file change event monitor
file-picker: cross-platform file picker user interface
file-server: cross-platform file server web interface
flip-text: fun with “uʍop ǝpᴉsdn” text flipping
forestdb: bindings for ForestDB
game-of-life: implementation of John Conway’s Game of Life
google.gmail: adding Google GMail API support
gopher: library for querying Gopher servers
gopher.server: cross-platform Gopher file server
gopher.ui: interface for viewing Gopher servers
ifaddrs: list network interfaces
ip-parser: parsing IPv4 and IPv6 addresses
ldcache: parsing /etc/ld.so.cache file
libtls: wrapper for using libtls functions
linked-sets: sets that yield items in insertion order
lru-cache: least recently used cache algorithm
machine-learning: experiments with some machine learning algorithms
math.factorials: adding reverse-factorial
math.functions.integer-logs: support for integer log2 and log10
math.primes.erato.fast: faster Sieve of Eratosthenes
metar: parsers for METAR and TAF weather reports
midi: reading MIDI files and writing MIDI files
minesweeper: playing the classic game of Minesweeper
named-tuples: using tuples as a sequence and assoc
oauth1: support OAuth 1.0
oauth2: support OAuth 2.0
odbc: support ODBC database query
picomath: implement picomath.org small math words
robohash: adding a robot-based hashing tool
sequences.frozen: virtual “frozen” sequence
sequences.interleaved: virtual “interleaved” sequence
shapefiles: parser for ESRI Shapefiles
shell.parser: parser for shell expressions
snake-game: implementation of the snake video game
sodium: wrapper for libsodium
sorting.bubble: adding Bubblesort
stream.extras: few helper words
subrip-subtitles: parser for SubRip .SRT files
successor: algorithm to find successor to a given string
text-analysis: analyze the complexity of English text
text-to-pdf: simple “Text to PDF” utility
text-to-speech: cross-platform “Text to Speech” utility
tools.cal: command-line “cal” tool
tools.cat: command-line “cat” tool
tools.copy: command-line “copy” tool
tools.echo: command-line"echo" tool
tools.grep: command-line"grep" tool
tools.image-analyzer: examine Factor image files
tools.move: command-line “move” tool
tools.seq: command-line “seq” tool
tools.tree: command-line “tree” tool
tools.uniq: command-line “uniq” tool
tools.wc: command-line “wc” tool
ui.gadgets.charts: UI gadget for rendering line charts
ui.gadgets.frame-buffer: UI gadget that supports a couple games
ui.theme: support for light and dark themes
windows.comdlg32: using comdlg32.dll functions
windows.crypt32: using crypt32.dll functions
windows.dragdrop-listener: allow dropping files into the Listener
windows.dropfiles: implementing “file drop” gesture on Windows
windows.surface-dial: support for Microsoft Surface Dial
xdg: support for XDG Base Directory Specification
zealot: continuous build and testing library

Improved Libraries:

boids: adding Cocoa TouchBar buttons
cap: support for screenshot on retina displays
checksums: cleanup and improved checksum API
color-table: add columns for filled color and hex code
cpu.architecture: adding Bit Test primitive
cpu.x86.64: fix for return register on x86.64
colors.hex: support varying length hex notations
combinators.extras: adding swap-when
compiler.cfg: fix scheduling issue
concurrency.distributed: fix serializing of remote threads
elf: cleanup and performance improvements
formatting: support “space” prefix for numbers
furnace.utilities: improve template path resolution
heaps: performance improvements to the heap algorithm
help: default word help (if not provided)
html.parser.printer: improved plain text printer
http.client: support HTTP proxies and bug fixes
http.server: fix use of port remapping
http.server.static: support sorting by columns
images.png: support reading color profiles
images.tiff: fix bugs exposed by AFL test suite
interpolate: simplify and some minor improvements
io.directories.search: simplify interface and allow BFS and DFS searches
io.encodings.8-bit: more encodings and simplify hierarchy
io.files.info.windows: fix file-readable? to check current user’s permissions
io.files.unique: create multiple unique files at same time
io.launcher: cleanup interface and support hidden processes on Windows
io.streams.c: faster M\ c-reader stream-read-until
json.writer: better support for non-string keys, unicode and special floats
lexer: support universal comments
logging.server: simplify code
macho: updated structures and tests
mason.release: code signing on macOS and Windows
math.binpack: faster binpack and map-binpack
math.combinatorics: performance improvements
math.extras: adding Möbius function and Kelly criterion
math.functions: adding logn
math.parser: faster format-float for known format strings
math.primes.erato: performance improvements
math.primes.factors: command-line support
math.primes.solovay-strassen: adding Solovay-Strassen primality test
math.statistics: fixes and new words
math.text.french: more proper use of French
math.transforms.bwt: performance improvements
math.vectors: adding v>integer
mime.types: update for new mime types
multiline: support “Lua-style” strings
peg.ebnf: handle escapes in strings better
pong: bug fixes and a bit fancier graphics
readline-listener: adding vocab word completions
reddit: fix for Reddit API changes
sequences.extras: some possibly useful new words
simple-tokenizer: consider TAB, CR, LF as spaces
smalltalk: cleanup grammar and fix underscore bug
splitting.monotonic: faster monotonic-split.
sorting: faster sort-keys and sort-values on hashtables
sorting.extras: faster map-sort
strings.parser: allow both \u{snowman} and \u{2603}
terminfo: look in multiple directories for terminfo files
tools.disassembler: allow disassemble of compose and curry
tools.dns: enable use from command-line
tools.hexdump: support stdin hexdump
tools.scaffold: Better examples scaffold. Add more types.
tools.test: adding with-test-file and with-test-directory helper words
tools.which: enable use from command-line
ui.tools.browser: adding Cocoa TouchBar buttons
ui.tools.inspector: improved performance with large arrays and hashtables
ui.tools.listener: adding Cocoa TouchBar buttons, Ctrl-Break support on Windows, and vocab word completions
unicode: simplified hierarchy
urls: improved parsing of scheme component
wrap: performance improvements using better algorithm
xkcd: fix mouseover text.
xml.data: make tags support assoc protocol
z-algorithm: faster z-values

Minesweeper

Mon, 12 Feb 2018 13:39:00 -0800

Minesweeper is a fun game that was probably made most popular by its inclusion in various Microsoft Windows versions since the early 1990’s.

I thought it would be fun to build a simple Minesweeper clone using Factor.

You can run this by updating to the latest code and running:

IN: scratchpad "minesweeper" run

Game Engine

We are going to represent our game grid as a two-dimensional array of “cells”.

Each cell contains the number of mines contained in the (up to eight) adjacent cells, whether the cell contains a mine, and a “state” flag showing whether the cell was +clicked+, +flagged+, or marked with a +question+ mark.

SYMBOLS: +clicked+ +flagged+ +question+ ;

TUPLE: cell #adjacent mined? state ;

Making a (rows, cols) grid of cells:

: make-cells ( rows cols -- cells )
    '[ _ [ cell new ] replicate ] replicate ;

We can lookup a particular cell using its (row, col) index:

:: cell-at ( cells row col -- cell/f )
    row cells ?nth [ col swap ?nth ] [ f ] if* ;

Placing a number of mines into cells, just looks for a certain number of unmined cells at random, and then marks them as mined:

: unmined-cell ( cells -- cell )
    f [ dup mined?>> ] [ drop dup random random ] do while nip ;

: place-mines ( cells n -- cells )
    [ dup unmined-cell t >>mined? drop ] times ;

We can count the number of adjacent mines for each cell, by looking at its neighbors:

CONSTANT: neighbors {
    { -1 -1 } { -1  0 } { -1  1 }
    {  0 -1 }           {  0  1 }
    {  1 -1 } {  1  0 } {  1  1 }
}

: adjacent-mines ( cells row col -- #mines )
    neighbors [
        first2 [ + ] bi-curry@ bi* cell-at
        [ mined?>> ] [ f ] if*
    ] with with with count ;

The each-cell word looks at all the cells, helping us update the “adjacent mines” counts:

:: each-cell ( ... cells quot: ( ... row col cell -- ... ) -- ... )
    cells [| row |
        [| cell col | row col cell quot call ] each-index
    ] each-index ; inline

:: update-counts ( cells -- cells )
    cells [| row col cell |
        cells row col adjacent-mines cell #adjacent<<
    ] each-cell cells ;

Since we aren’t storing the number of rows and columns, we can get it from the array of cells:

: cells-dim ( cells -- rows cols )
    [ length ] [ first length ] bi ;

We can get the number of mines contained in the grid by counting them up:

: #mines ( cells -- n )
    [ [ mined?>> ] count ] map-sum ;

We can reset the game by making new cells and then placing the same number of mines in them:

: reset-cells ( cells -- cells )
    [ cells-dim make-cells ] [ #mines place-mines ] bi update-counts ;

The player wins if they click on all cells that aren’t mines:

: won? ( cells -- ? )
    [ [ { [ state>> +clicked+ = ] [ mined?>> ] } 1|| ] all? ] all? ;

The player loses if they click on any cell that’s a mine:

: lost? ( cells -- ? )
    [ [ { [ state>> +clicked+ = ] [ mined?>> ] } 1&& ] any? ] any? ;

And then the game is over if the player either wins or loses:

: game-over? ( cells -- ? )
    { [ lost? ] [ won? ] } 1|| ;

We can tell this is a new game if no cells are clicked on:

: new-game? ( cells -- ? )
    [ [ state>> +clicked+ = ] any? ] any? not ;

When we click on a cell, if it is not adjacent to any mines, we click on all the “clickable” (non-mined) cells around it:

DEFER: click-cell-at

:: click-cells-around ( cells row col -- )
    neighbors [
        first2 [ row + ] [ col + ] bi* :> ( row' col' )
        cells row' col' cell-at [
            mined?>> [
                cells row' col' click-cell-at
            ] unless
        ] when*
    ] each ;

Handle clicking a cell. If it’s the first click and the cell is mined, we move it to another random cell, then continue with the click. The click is ignored if the cell was already clicked or flagged. Continue clicking around any cells that have no adjacent mines and are not themselves mined.

:: click-cell-at ( cells row col -- )
    cells row col cell-at [
        cells new-game? [
            ! first click shouldn't be a mine
            dup mined?>> [
                cells unmined-cell t >>mined? drop f >>mined?
                cells update-counts drop
            ] when
        ] when
        dup state>> { +clicked+ +flagged+ } member? [ drop ] [
            +clicked+ >>state
            { [ mined?>> not ] [ #adjacent>> 0 = ] } 1&& [
                cells row col click-cells-around
            ] when
        ] if
    ] when* ;

Handle marking a cell. First by flagging it as a likely mine, or marking with a question mark to come back to later. If the cell is not clicked, we just cycle through flagging, question, or not clicked.

:: mark-cell-at ( cells row col -- )
    cells row col cell-at [
        dup state>> {
            { +clicked+ [ +clicked+ ] }
            { +flagged+ [ +question+ ] }
            { +question+ [ f ] }
            { f [ +flagged+ ] }
        } case >>state drop
    ] when* ;

Graphical Interface

Our graphical interface is going to consist of a gadget with an array of cells and a cache of OpenGL texture objects that can be easily drawn on the screen.

TUPLE: grid-gadget < gadget cells textures ;

When you make a new grid-gadget, it initializes the game to a specified number of rows, columns, and number of mines:

:: <grid-gadget> ( rows cols mines -- gadget )
    grid-gadget new
        rows cols make-cells
        mines place-mines update-counts >>cells
        H{ } clone >>textures
        COLOR: gray <solid> >>interior ;

When ungraft* is called to indicate the gadget is no longer visible on the screen, we clean up the cached textures:

M: grid-gadget ungraft*
    dup find-gl-context
    [ values dispose-each H{ } clone ] change-textures
    call-next-method ;

Our images are going to be 32 x 32 squares, so the preferred dimension is number of rows and columns times 32 pixels for each square.

M: grid-gadget pref-dim*
    cells>> cells-dim [ 32 * ] bi@ swap 2array ;

Some slightly complex logic to decide which image to display for each cell, taking into account whether the game is over so we can show the positions of all the mines and whether the player was correct in flagging a cell as mined, etc:

:: cell-image-path ( cell game-over? -- image-path )
    game-over? cell mined?>> and [
        cell state>> +clicked+ = "mineclicked.gif" "mine.gif" ?
    ] [
        cell state>
        {
            { +question+ [ "question.gif" ] }
            { +flagged+ [ game-over? "misflagged.gif" "flagged.gif" ? ] }
            { +clicked+ [
                cell mined?>> [
                    "mine.gif"
                ] [
                    cell #adjacent>> 0 or number>string
                    "open" ".gif" surround
                ] if ] }
            { f [ "blank.gif" ] }
        } case
    ] if "vocab:minesweeper/_resources/" prepend ;

Drawing a cached texture is a matter of looking up the image in our texture cache and then rendering to the screen:

: draw-cached-texture ( path gadget -- )
    textures>> [ load-image { 0 0 } <texture> ] cache
    [ dim>> ] [ draw-scaled-texture ] bi ;

Drawing our gadget, is basically drawing all of the cells at their proper locations on the screen:

M:: grid-gadget draw-gadget* ( gadget -- )
    gadget cells>> game-over? :> game-over?
    gadget cells>> [| row col cell |
        col row [ 32 * ] bi@ 2array [
            cell game-over? cell-image-path
            gadget draw-cached-texture
        ] with-translation
    ] each-cell ;

Basic handling for the gadget being left-clicked on:

:: on-click ( gadget -- )
    gadget hand-rel first2 :> ( w h )
    h w [ 32 /i ] bi@ :> ( row col )
    gadget cells>> :> cells
    cells game-over? [
        cells row col click-cell-at
    ] unless gadget relayout-1 ;

Basic handling for the gadget being right-clicked on:

:: on-mark ( gadget -- )
    gadget hand-rel first2 :> ( w h )
    h w [ 32 /i ] bi@ :> ( row col )
    gadget cells>> :> cells
    cells game-over? [
        cells row col mark-cell-at
    ] unless gadget relayout-1 ;

Logic for creating new games of varying difficulties: easy, medium, and hard:

: new-game ( gadget rows cols mines -- )
    [ make-cells ] dip place-mines update-counts >>cells
    relayout-window ;

: com-easy ( gadget -- ) 7 7 10 new-game ;

: com-medium ( gadget -- ) 15 15 40 new-game ;

: com-hard ( gadget -- ) 15 30 99 new-game ;

We set our gesture handlers for keyboard and mouse inputs:

grid-gadget {
    { T{ key-down { sym "1" } } [ com-easy ] }
    { T{ key-down { sym "2" } } [ com-medium ] }
    { T{ key-down { sym "3" } } [ com-hard ] }
    { T{ button-up { # 1 } } [ on-click ] }
    { T{ button-up { # 3 } } [ on-mark ] }
    { T{ key-down { sym " " } } [ on-mark ] }
} set-gestures

And a main word that creates an easy game in our grid-gadget and opens it in a new window:

MAIN-WINDOW: run-minesweeper {
        { title "Minesweeper" }
        { window-controls
            { normal-title-bar close-button minimize-button } }
    } 7 7 10 <grid-gadget> >>gadgets ;

The implementation above is about 200 lines of code and contains the full game logic. The final version is just under 300 lines of code, and adds:

support for a toolbar to easily start new games
the traditional counter of the number of mines remaining
display of the number of seconds elapsed
a smiley face showing a funny “uh-oh!” face when you are about to click as well as winning and losing smileys
support for retina displays using 2x images

$7.11

Sat, 11 Feb 2017 14:05:00 -0800

Today, someone blogged about a fun problem:

“A mathematician purchased four items in a grocery store. He noticed that when he added the prices of the four items, the sum came to $7.11, and when he multiplied the prices of the four items, the product came to $7.11.”

In some ways, this is similar to the SEND + MORE = MONEY problem that I blogged about awhile ago. You can always approach this problem with an direct and iterative solution, but instead we will use the backtrack vocabulary to solve this problem with less code.

We’ll be solving this exactly, using integer “numbers of cents”, progressively restricting the options, and then calling fail if the solution is not found, so we check the next. The first valid solution will be returned:

:: solve-711 ( -- seq )
    711 <iota> amb-lazy :> w
    711 w - <iota> amb-lazy :> x
    711 w - x - <iota> amb-lazy :> y
    711 w - x - y - :> z

    w x * y * z * 711,000,000 = [ fail ] unless

    { w x y z } ;

Using it, we get our answer:

IN: scratchpad solve-711 .
{ 120 125 150 316 }

And that is: $1.20, $1.25, $1.50, and $3.16.

Dirty Money: Code Challenge

Sun, 05 Feb 2017 15:59:00 -0800

There’s a fun coding challenge to follow the dirty money that I discovered recently.

A shady Internet business has been discovered.

The website has been made public by a whistle blower. We have enough evidence about the dirty deals they did. But to charge them we need to get hands on precise numbers about the transactions that happened on their platform.

Unfortunately no record of the transactions could be seized so far. The only hint we have is this one transaction:

fd0d929f-966f-4d1a-89cd-feee5a1c5347.json

What we need is the total of all transactions in Dollar. Can you trace down all other transactions and get the total?

Be careful to count each transaction only once, even if it is linked multiple times. You can use whatever tool works best.

We need a way to extract the dollar amount from the transaction text. The dollars might be specified with period or a comma to represent the decimal point. We use regular expressions to look for the dollar amount and then convert to a number.

: dollars ( str -- $ )
    R/ \$\d*[,.]\d+/ first-match rest
    "," "." replace string>number ;

We will use a hash-set of visited links, and only if the link has not been visited will we http-get the contents of the URL, parse the JSON, and extract the dollar amount of both the transaction and any links it contains. The set of visited links will tell us how many total transactions we traced.

:: transaction ( url visited -- dollars )
    url visited ?adjoin [
        url http-get nip json> :> data
        data "content" of dollars
        data "links" of [ visited transaction ] map-sum +
    ] [ 0 ] if ;

: transactions ( url -- dollars #transactions )
    HS{ } clone [ transaction ] [ cardinality ] bi ;

That’s all we need to solve the problem. We can run this with the initial URL and get the answer:

$9064.79 in 50 transactions.

The Twelve Days of Christmas

Sat, 24 Dec 2016 11:21:00 -0800

Programming Praxis posted a task to write a program to print the words to The Twelve Days of Christmas song. We are going to solve it in Factor.

We start off by defining all the gifts received on each day:

CONSTANT: gifts {
    { "first" "a partridge in a pear tree" }
    { "second" "two turtle doves and " }
    { "third" "three French hens, " }
    { "fourth" "four calling birds, " }
    { "fifth" "five golden rings, " }
    { "sixth" "six geese a-laying, " }
    { "seventh" "seven swans a-swimming, " }
    { "eighth" "eight maids a-milking, " }
    { "ninth" "nine ladies dancing, " }
    { "tenth" "ten lords a-leaping, " }
    { "eleventh" "eleven pipers piping, " }
    { "twelfth" "twelve drummers drumming, " }
}

Then we iterate through the days, gathering all the gifts in reverse for each day, and formatting them, and wrapping to 72 columns of text for display.

gifts [
    [ first ] [ 1 + gifts swap head values reverse concat ] bi*
    "On the %s day of Christmas my true love gave to me %s." sprintf
    72 wrap-string print nl
] each-index

Which gives us these words:

On the first day of Christmas my true love gave to me a partridge in a pear tree.

On the second day of Christmas my true love gave to me two turtle doves and a partridge in a pear tree.

On the third day of Christmas my true love gave to me three French hens, two turtle doves and a partridge in a pear tree.

On the fourth day of Christmas my true love gave to me four calling birds, three French hens, two turtle doves and a partridge in a pear tree.

On the fifth day of Christmas my true love gave to me five golden rings, four calling birds, three French hens, two turtle doves and a partridge in a pear tree.

On the sixth day of Christmas my true love gave to me six geese a-laying, five golden rings, four calling birds, three French hens, two turtle doves and a partridge in a pear tree.

On the seventh day of Christmas my true love gave to me seven swans a-swimming, six geese a-laying, five golden rings, four calling birds, three French hens, two turtle doves and a partridge in a pear tree.

On the eighth day of Christmas my true love gave to me eight maids a-milking, seven swans a-swimming, six geese a-laying, five golden rings, four calling birds, three French hens, two turtle doves and a partridge in a pear tree.

On the ninth day of Christmas my true love gave to me nine ladies dancing, eight maids a-milking, seven swans a-swimming, six geese a-laying, five golden rings, four calling birds, three French hens, two turtle doves and a partridge in a pear tree.

On the tenth day of Christmas my true love gave to me ten lords a-leaping, nine ladies dancing, eight maids a-milking, seven swans a-swimming, six geese a-laying, five golden rings, four calling birds, three French hens, two turtle doves and a partridge in a pear tree.

On the eleventh day of Christmas my true love gave to me eleven pipers piping, ten lords a-leaping, nine ladies dancing, eight maids a-milking, seven swans a-swimming, six geese a-laying, five golden rings, four calling birds, three French hens, two turtle doves and a partridge in a pear tree.

On the twelfth day of Christmas my true love gave to me twelve drummers drumming, eleven pipers piping, ten lords a-leaping, nine ladies dancing, eight maids a-milking, seven swans a-swimming, six geese a-laying, five golden rings, four calling birds, three French hens, two turtle doves and a partridge in a pear tree.

AnyBar

Mon, 28 Nov 2016 12:58:00 -0800

AnyBar is a macOS status indicator that displays a “colored dot” in the menu bar that can be changed programatically. What it means and when it changes is entirely up to the user.

You can easily install it with Homebrew-cask:

$ brew cask install anybar

The README lists a number of alternative clients in different programming languages. I thought it would be fun to show how to use it from Factor. Since AnyBar responds to AppleScript (and I added support for AppleScript a few years ago), we could do this:

USE: cocoa.apple-script

"tell application \"AnyBar\" to set image name to \"blue\""
run-apple-script

The AnyBar application also listens to a UDP port (default: 1738) and can be instructed to change from a Terminal using a simple echo | nc command:

$ echo -n "blue" | nc -4u -w0 localhost 1738

Using our networking words similarly is pretty simple:

"blue" >byte-array "127.0.0.1" 1738 <inet4> send-once

But if we wanted to get more fancy, we could use symbols to configure which AnyBar instance to send to, with default values to make it easy to use, and resolve-host to lookup hostnames:

SYMBOL: anybar-host
"localhost" anybar-host set-global

SYMBOL: anybar-port
1738 anybar-port set-global

: anybar ( str -- )
    ascii encode
    anybar-host get resolve-host first
    anybar-port get with-port send-once ;

AnyBar is a neat little program!

Reverse Factorial

Sat, 26 Nov 2016 22:28:00 -0800

A few years ago, I wrote about implementing various factorials using Factor. Recently, I came across a programming challenge to implement a “reverse factorial” function to determine what factorial produces a number, or none if it is not a factorial.

To do this, we examine each factorial in order, checking against the number being tested:

: reverse-factorial ( m -- n )
    1 1 [ 2over > ] [ 1 + [ * ] keep ] while [ = ] dip and ;

And some unit tests:

{ 10 } [ 3628800 reverse-factorial ] unit-test
{ 12 } [ 479001600 reverse-factorial ] unit-test
{ 3 } [ 6 reverse-factorial ] unit-test
{ f } [ 18 reverse-factorial ] unit-test

Gopher Server

Thu, 27 Oct 2016 14:14:00 -0700

A few days ago, I noticed a post about building a Gopher Server in Perl 6. I had already implemented a Gopher Client in Factor, and thought it might be fun to show a simple Gopher Server in Factor in around 50 lines of code.

Using the io.servers vocabulary, we will define a new multi-threaded server that has a directory to serve content from and hostname that it can be accessed at:

TUPLE: gopher-server < threaded-server
    { serving-hostname string }
    { serving-directory string } ;

When a file is requested, it can be streamed back to clients:

: send-file ( path -- )
    binary [ [ write ] each-block ] with-file-reader ;

The Gopher protocol is defined in RFC 1436 and lists a few differentiated file types. We use the mime.types vocabulary to return the correct one.

: gopher-type ( entry -- type )
    dup directory? [
        drop "1"
    ] [
        name>> mime-type {
            { [ dup "text/" head? ] [ drop "0" ] }
            { [ dup "image/gif" = ] [ drop "g" ] }
            { [ dup "image/" head? ] [ drop "I" ] }
            [ drop "9" ]
        } cond
    ] if ;

When a directory is requested, we can send a listing of all the sub-directories and files it contains, sending their relative path to the root directory being served so they can be requested properly by the client:

:: send-directory ( server path -- )
    path [
        [
            [ gopher-type ] [ name>> ] bi
            dup path prepend-path
            server serving-directory>> ?head drop
            server serving-hostname>
            server insecure>
            "%s%s\t%s\t%s\t%d\r\n" sprintf utf8 encode write
        ] each
    ] with-directory-entries ;

To know which path was requested, we read the line, split on the first tab, carriage return, or newline character we see:

: read-gopher-path ( -- path )
    readln [ "\t\r\n" member? ] split1-when drop ;

With all of that built, we can now implement a word to handle a client request:

M: gopher-server handle-client*
    dup serving-directory>> read-gopher-path append-path
    dup file-info directory? [
        send-directory
    ] [
        send-file drop
    ] if flush ;

Initializing a gopher-server instance and providing a convenience word to start one:

: <gopher-server> ( directory port -- server )
    utf8 gopher-server new-threaded-server
        "gopher.server" >>name
        swap >>insecure
        binary >>encoding
        "localhost" >>serving-hostname
        swap resolve-symlinks >>serving-directory ;

: start-gopher-server ( directory port -- server )
    <gopher-server> start-server ;

This is available in the gopher.server vocabulary with a few improvements such as:

Support for .gophermap files for alternate results when content is requested.
Support for .gopherhead files to print headers above directory listings.
Navigation to parent directories using .. links.
Display file modified timestamp and file sizes.
Improved error handling.