Chiroptical’s Blog

Why I re-wrote my Slack bot in Gleam

2025-02-13T00:00:00+00:00

Introduction

I started working on Music Monday awhile back. It is currently only a Slack bot which, when installed into a channel, selects a user at random to suggest a musical album to share. This blog post isn’t about the functionality, but the language change I made during implementation. Feel free to check it out though, I need beta testers.

The bot was originally implemented in Erlang. I really like Erlang. The language is quirky and simple. The BEAM is a gorgeous piece of technology to build applications on. However, I’m a static types enjoyer. I write Haskell professionally and I really really like having sum and product types. To summarize, I want strong static types and I want Erlang/OTP.

Enter Gleam. I want to go over some specific examples where I feel Gleam made my developer experience better.

Records

In Erlang, you can share records via header files, .hrl, e.g.,

-record(slack_channel, {id, music_monday_id, channel_id}).

I used this record to denote a return type from a database query in pgo. You can pattern match or access elements, e.g.,

-include_lib("{filename}.hrl").

#slack_channel{id = Id} = function_that_returns_slack_channel(),
SlackChannel = function_that_returns_slack_channel(),
Id = SlackChannel#slack_channel.id,

I don’t think this pattern is great across modules. You can give the fields types and they can be checked with dialyzer/eqwalizer. That just doesn’t provide me enough, I’m not a smart man. A compiler with expressive types, that are checked, and easily shareable across modules saves me a lot of stress and brainpower.

In Gleam, this record is defined,

import youid/uuid

pub type SlackChannel {
  SlackChannel(id: uuid.Uuid, music_monday_id: uuid.Uuid, channel_id: String)
}

I can import this type from anywhere via import path/to/module.{type SlackChannel}. I can use it qualified via import path/to/module with module.SlackChannel. It is easy to pass this type around and it works for both Erlang and Javascript targets.

Database Queries

With pgo, here is how I complete a unique insert into slack_channel

create_unique(ChannelId) ->
    #{command := insert, rows := Rows} = pgo:transaction(fun() ->
        pgo:query(
            <<
                "insert into slack_channel (channel_id)"
                " values ($1)"
                " on conflict (channel_id) do update"
                " set channel_id = excluded.channel_id"
                " returning id"
            >>,
            [ChannelId]
        )
    end),
    case Rows of
        [{Id}] -> {ok, Id};
        _ -> {error, impossible}
    end.

Is ChannelId a UUID or String? Is Id a UUID or String? Is this query even sound? In my Erlang application, I explicitly tested every query because of the number of mistakes I made. I could add -spec annotations to this to inform the reader but it doesn’t mean they are correct! Additionally, PostgreSQL already knows this information! Why not just let PG figure out the types and write the {ok, Id} and {error, impossible} logic ourselves. In Gleam, we can use squirrel for this. You add your query to a SQL file in the module tree and squirrel will autogenerate the requisite Gleam code for you! For the above query (with a slight modification) it will generate,

pub type CreateUniqueRow {
  CreateUniqueRow(id: Uuid)
}

pub fn create_unique(
  db: pog.Connection,
  arg_1: Uuid,
  arg_2: String,
) -> Result(pog.Returned(CreateUniqueRow), pog.QueryError) {
  // ... generated code here
}

In my production application I needed an additional Uuid to create a slack_channel (the internal Slack team identifier). This was partially why I rewrote the application which I’ll explain in the next section. Here, I need a Uuid and String to call this function and I’ll get back, effectively, Result(List(CreateUniqueRow), pog.QueryError). The pog.Returned type also has a count field. You need to understand what arg_1 and arg_2 are supposed to be, but the shape is generated automatically. You can, and probably should, create a usable API around this function. For example, the logic above expects only a single entry to come back from the query. Squirrel also provides helpful error messages when your queries are broken.

Refactoring

The initial Slack developer experience is okay if you are installing into a single Slack team, but Music Monday is intended to be installed in many Slacks. This requires OAuth credentials. I’ve not built applications like this before so I made an assumption my bot is basically the same entity across Slack but that isn’t true. Each team has its own credentials and even its own bot id.

When I built the Erlang application I was too tightly coupled with my development Slack team. I needed to do a huge refactor to support Slack team ids and OAuth credentials. In Erlang, there is no requirement to add dialyzer specs so… I didn’t. I was in a hell hole of refactoring with the tests I had (which actually was non-zero but far from full coverage) and debugging everything else at runtime. It was pain.

After a few hours of this, I had enough. You can say, “skill issue” or “bad tests and use of specs” and… you are right. To me, this is why strong static types are the way. I am forced to do this and the compiler will help me.

Using the example above, I added the slack_team table and modified the slack_channel table via migrations, re-ran gleam run -m squirrel, and ran gleam build. Now, all the places I need to change are revealed to me. No magic, no remembering. I just need to line up the new types.

This is true of internal blocks of code as well. In Erlang, when I pull out a chunk of code, I have to figure out what the spec is supposed to be. In Gleam, it was known before and it is known now. There is even a code action to do it.

Programming the Happy Path

Let’s start with an example using maybe from Erlang,

    maybe
        {~"text", Arguments} ?= proplists:lookup(~"text", Proplist),
        {~"channel_id", ChannelId} ?= proplists:lookup(~"channel_id", Proplist),
        {~"user_id", UserId} ?= proplists:lookup(~"user_id", Proplist),
        {ok, {slack_command, Arguments, UserId, ChannelId}}
    else
        none -> {error, <<"Unable to construct slack_command from_proplist">>}
    end.

Here, I’m trying to lookup some keys in a “proplist” (a list of key-value tuples). They all need to be present to succeed. If proplists:lookup succeeds it returns {Key, Value} if it fails it returns none. This API is actually quite friendly for maybe expressions, others aren’t.

First, the ?= syntax is saying, if the left side of the expression is a successful pattern match continue otherwise go to the else block and start pattern matching. Let’s imagine that an update to the proplists API caused proplists:lookup to error instead of none or {error, not_a_proplist} if you don’t provide a proplist. In either of those update scenarios, my pattern match would fail. Tools like dialyzer won’t catch this, but I believe other projects are working on full compilers for Erlang, e.g. https://github.com/etylizer/etylizer.

In Gleam, I just don’t have this problem because I am using a compiler with exhaustiveness checking. I have a few options for coding this in Gleam, e.g.

// Note: I could use type to distinguish between parse_query failure and lookup failure
// Nil is used for simplicity
case uri.parse_query(a_string) { // https://hexdocs.pm/gleam_stdlib/0.69.0/gleam/uri.html#parse_query
  Error(Nil) -> Error(Nil)
  Ok(a_proplist) -> {
    case list.key_find(a_proplist, "text"), list.key_find(a_proplist, "channel_id"), list.key_find(a_proplist, "user_id") {
      Ok(args), Ok(channel_id), Ok(user_id) -> Ok(#(args, channel_id, user_id))
      _, _, _ -> Error(Nil)
    }
  }  
}

I personally find the Result use style more readable, but I’ll elide that for simplicity. The key is that my pattern match has to be exhaustive, I couldn’t write,

    Ok(args), Ok(channel_id), Ok(user_id) -> Ok(#(args, channel_id, user_id))
    Error(Nil), Error(Nil), Error(Nil) -> Error(Nil)

the compiler will tell me I goofed up. I like that. Additionally, if I want to use some other error type the compiler will help me refactor e.g. if I wanted to use a validation monad (I had to sneak the ‘m’ word in here). Additionally, if the shape is updated I’ll get a compiler error. Note: be careful using blanket pattern matches, e.g. _, _, _ -> above, because you could miss API updates!

Adding a Front-end

There is only a small footprint of front-end code for Music Monday today. Essentially an install button, some frequently asked questions, and a page to describe how to use the bot after it is installed. However, if I want to do something more interesting there aren’t many friendly and maintained Erlang frameworks to build with. I think Nova looks interesting though.

The only times I have ever really enjoyed writing front-end are with Elm. Typescript is a tedious language and doesn’t really have a language server outside of VSCode. Editing in Helix with the LSP is basically useless. React’s hooks are getting easier for me, but the number of times I’ve had to think incredibly deeply about the runtime is brutal. I want a simpler language with a better editor experience. Enter Lustre.

It was very easy for me to get a server-side rendered application together. The types were easy to figure out, examples exist, and a decent amount of documentation is available. I can use the editor I prefer with actually useful LSP functionality. I already have a Tailwind Plus subscription so it was easy to drop the HTML into this converter (thanks Louis!) and get the Lustre representation.

Erlang/OTP

I’m not going to be able to convince you that Erlang/OTP rocks. There are better posts that cover that in way more detail. You are just going to have to believe me for now. With a combination of factory supervisors (in Erlang, simple-one-for-one supervisors) and crew I was able to introduce services and back-pressure into my system with little effort. Slack has team based API limits and I’d like to be able to build with this in mind.

Conclusion

Gleam gave me all the tools I needed to be successful. The language is simple and can be picked up quickly. The community is stellar and super helpful. You can build full-stack applications with one language. If you are looking for a strong statically typed language, check out Gleam. You’ll also, eventually, learn about Erlang/OTP which has really nice patterns to help build robust software.

Feel free to send me a direct message on BlueSky if you have any questions, corrections, or want to tell me I’m wrong. I’m always happy to learn new things.

Leex and yecc by example: part 2

2024-12-04T00:00:00+00:00

Welcome to Part 2!

In part 1, we went over our first example of using leex and yecc. In this blog post, I’ll lex/parse Advent of code 2024 day 4 and simplify day 3 using an observation from day 4. I am still learning too.

Day 4: Ceres Search

In this problem, we are given a matrix of X, M, A, and Ss we are basically doing a word search puzzle for XMAS horizontally, vertically, or diagonally. It can also appear forward or backward. An abbreviated input,

MMMSXX
MSAMXM
AMXSXM
MSAMAS

When I see problems like this I usually reach for maps because I need to search in the vicinity of every element. Essentially, for the first M in that input I want {{1, 1}, m}. Conveniently, that is the exact format I need to use maps:from_list/2.

First, I’ll drop the leex file and then discuss the various sections,

Definitions.

XMAS    = [XMAS]
NEWLINE = \n

Rules.

{XMAS}    : {token, {xmas, TokenLoc, to_atom(TokenChars)}}.
{NEWLINE} : skip_token.

Erlang code.

to_atom("X") ->
	x;
to_atom("M") ->
	m;
to_atom("A") ->
	a;
to_atom("S") ->
	s.

In the definitions section we only have XMAS which is any one of our characters. Note, you would use [A-Z] if you wanted any capital letter but we only care about the specific letters. We also have newlines which you’ve seen before.

The rules show two nice things we have at our disposal. TokenLoc will tell us exacly the row and column our token was found in the format {Row, Column}. Additionally, skip_token is simply dropping the token entirely. We don’t really care about the newlines because we get the position from TokenLoc. We can also run to_atom/1 on our token to convert it into an atom. Amazing! The output of our lexer looks like,

[{xmas,{1,1},m},
 {xmas,{1,2},m},
 {xmas,{1,3},m},
 ...]

Much simpler than day 3. Next our yecc file which I’ll discuss below,

Nonterminals chars.

Terminals xmas.

Rootsymbol chars.

chars -> xmas       : [reshape('$1')].
chars -> xmas chars : [reshape('$1')|'$2'].

Erlang code.

reshape({xmas, {Row, Col}, Xmas}) ->
	{{Row, Col}, Xmas}.

We only have one terminal this time xmas. Additionally, we only have one non-terminal which I called chars. A chars can either be an xmas (base case) or xmas followed by chars (recursive case). Finally, we just reshape/1 to get our desired shape for maps:from_list/2 and we are done!

My solution is here. I plan on doing some refactoring but it is currently working.

Refactor Day 3

If you haven’t read part 1 the following explanation isn’t going to make a lot of sense.

In the previous example, we completely ignored the newlines. In day 3, we also don’t care about the newlines! The “memory” described in the problem is just a contiguous string of characters. We can make some basic changes to our lexer and parser to simplify our application code from dealing with a list of list of inputs.

In the lexer, we can either change the newline rule to,

{NEWLINE} : {token, {skip, TokenLine}}.
{NEWLINE} : skip_token.

I chose “1.” because I don’t really care about a few extra skip tokens. Ideally, I could change the else definition to ELSE = [^all_unimportant_characters]. The [^abc] syntax means inverse i.e. all characters but a, b, c. The regex in leex is not powerful enough to say “anything except a number, do, don’t, parentheses, or comma”. I could just add a bunch of symbols to a list, e.g. [%^&*$#\s\n] but I don’t like brute forcing the possible symbols in the input. I only point this out because it is a limitation you may want to know about. Documentation is here for reference.

In the parser, we remove the non-terminal computer, the terminal newline, and change the Rootsymbol to memory. That is literally it. You’ll end up with an input to your application like,

[{operands,{2,4}},
 {operands,{5,5}},
 {operands,{11,8}},
 {operands,{8,5}},
 ...]

With this simplification, I was able to remove about 20 lines from my application code that just folded over that outer list. Links to the new files for reference,

Conclusion

I hope leex and yecc are already getting easier to use. More examples coming. If you have any suggestions, questions, or corrections hit me up on bluesky.

Leex and yecc by example: part 1

2024-12-03T00:00:00+00:00

Background

Advent of code is here and every year I use a parser combinator library to handle the inputs. Lately, I have been into Erlang but I couldn’t find a parser combinator library that I liked. However, Erlang does have leex and yecc. Leex is a lexical analyzer generator and yecc is a LALR-1 parser generator. I have never used tools like these before and it seemed like a great opportunity. However, the documentation and examples are a little sparse for getting started. Therefore, I decided to start this series as a medium to present examples of lexers and parsers in Erlang using leex and yecc!

I am not going to dive into the theory of these tools. I really just want to jam something examples into your brain piece so you can play with them. If you want to dive into theory later, be my guest.

Day 3: Mull It Over

This is the first example which isn’t just numbers separated by spaces. It is a great lexing and parsing example. The problem on advent of code. The basic idea is you have an input,

xmul(2,4)%&mul[3,7]!@^do_not_mul(5,5)+mul(32,64]then(mul(11,8)mul(8,5))

In the text, you’ll see some mul(A,B). We want to find all of these in the input, multiply the numbers A and B together, then sum all of the multiplications. There are corrupted versions which you aren’t allowed to count, e.g. mul(A, B), mul(A,B, ?(A,B). I think most people will use regex to solve this problem, i.e. mul$[0-9]+,[0-9]+$. How would we do this with leex and yecc?

Lexing

Lexing is a process of breaking a string into tokens. In our case, the tokens would be mul, (, ,, ), or a number. In part 2, we also need do, don't. Finally, we also need to represent the rest of the input i.e. space, newline, skip. The skip token is going to represent any token that isn’t relevant to us. For example, the tokenized version of xmul(2,4)%& is approximately ['skip', 'mul', '(', '2', ',', '4', ')', skip, skip]. We’ll feed the tokenization to the parser in the next step. First, let’s discuss how to describe the lexer using leex.

Definitions.

INT         = [0-9]+
MULTIPLY    = mul
DO          = do\(\)
DONT        = don't\(\)
OPEN_PAREN  = \(
CLOSE_PAREN = \)
COMMA       = ,
SPACE       = \s+
NEWLINE     = \n
ELSE        = .

The first section of the leex file describes our tokens. It uses a relatively simplistic regex language to describe the tokens. More documentation here. INT = [0-9]+ means an INT is described as one or more of any single number between 0 and 9. SPACE = \s+ means a SPACE is described as one or more space characters. ELSE = . means ELSE is described as any character.

Rules.

{INT}                 : {token, {int, TokenLine, list_to_integer(TokenChars)}}.
{MULTIPLY}            : {token, {mul, TokenLine}}.
{DO}                  : {token, {do, TokenLine}}.
{DONT}                : {token, {dont, TokenLine}}.
{OPEN_PAREN}          : {token, {open_paren, TokenLine}}.
{CLOSE_PAREN}         : {token, {close_paren, TokenLine}}.
{COMMA}               : {token, {comma, TokenLine}}.
{SPACE}               : {token, {space, TokenLine}}.
{NEWLINE}             : {token, {newline, TokenLine}}.
{ELSE}                : {token, {skip, TokenLine}}.  

The rules are evaluated in order from top to bottom. All of these rules are generating tokens, but you could also : skip_token or return an error : {error, "..."}. In {token, X}, X must be a tuple and it can be any length but must start with an atom. We’ll use that atom in the parser shortly. You can also use Erlang code here! For example, we really want 2 and not "2" so we use a builtin list_to_integer/1 to convert. TokenLine is a leex variable which is filled in with the line number from the input. There is also TokenLoc which gives you {Line, Column}.

Erlang code.

This is the final section. I don’t need any special Erlang code here, but we’ll use this section in the parser.

If you are using rebar3, and you stick these sections into a single file like src/lexer.xrl it will auto-generate a file src/lexer.erl which contains lexer:lex/1. If you pass that a string, it will try to lex it. For the following example (the test input),

lexer:lex("xmul(2,4)%&mul[3,7]!@^do_not_mul(5,5)+mul(32,64]then(mul(11,8)mul(8,5))").

you get (truncated for article length),

[{skip,1},
 {mul,1},
 {open_paren,1},
 {int,1,2},
 {comma,1},
 {int,1,4},
 {close_paren,1},
 {skip,1},
 {skip,1},
 ...]

See how it is just a list of tokens! Let’s parse it!

Parser

Parsing will take the list of tokens and interpret them into a data structure convenient for our computation. In our case, we only care about the numbers (int), do and don't. In part 1, we only need the numbers. For example, given the above lexed output we want [{operands, {2, 4}}] to come out of parser. operands is arbitrary, we could have used spaceship. Our program decides what to do with the output of the parser.

Nonterminals instruction memory computer else.

Terminals mul int open_paren close_paren comma space newline skip do dont.

Rootsymbol computer.

The terminals are just the atoms from our lexer, hopefully you recognize them. The non-terminals describe the pieces we want to pull out,

instruction: {operand, {X, Y}}, enable, disable
memory: a list of instructions
computer: a list of memorys
else: everything we want to skip

I chose to translate do -> enable, don't -> disable because it seemed more obvious when writing the solution. The Rootsymbol determines the starting point. In our case, we are parsing the computer as a list of list of instructions. The next section of the yecc file is a bit large in this case but I prefer to show code before the example. I like to put a blank line in between each non-terminal for readability.

instruction -> mul open_paren int comma int close_paren : {ok, operands, {extract_integer('$3'), extract_integer('$5')}}.
instruction -> mul open_paren int comma int             : incomplete.
instruction -> mul open_paren int comma                 : incomplete.
instruction -> mul open_paren int                       : incomplete.
instruction -> mul open_paren                           : incomplete.
instruction -> do                                       : {ok, enable}.
instruction -> dont                                     : {ok, disable}.
instruction -> else                                     : incomplete.

memory -> instruction             : remove_incomplete('$1').
memory -> instruction memory      : remove_incomplete_rec('$1', '$2').

computer -> memory           : ['$1'].
computer -> memory computer  : ['$1'|'$2'].

else -> mul.
else -> int.
else -> open_paren.
else -> close_paren.
else -> comma.
else -> space.
else -> newline.
else -> skip.

This section describes all the rules. A single rule is constructed with a non-terminal instruction ->, it’s definition mul open_paren int comma int close_paren, and what we do with it : {ok, operands, {extract_integer('$3'), extract_integer('$5')}}.. The definition is literally the atoms from the tokenized output in that exact order. If you have a space in between any of those the rule will fail. Each token is identified using this odd syntax, i.e. '$3' is the first int token. The return is just an Erlang value. We can use built-ins or add code in an Erlang code. section (which I use in this case). After the matching rule, we have to describe all the incomplete sequences. If you don’t do this, you get very weird errors. Those errors are trying to explain to you that you must have these 6 tokens in order and I only found e.g. mul open_paren, but I can’t find an int after it. Try commenting those out and check the error message, it isn’t great. Finally, we have do, dont, and anything else.

The else non-terminals are just any singular token not matching our incomplete mul(1,2) example. Note how I don’t include do and dont there, I always want to parse those.

The memory non-terminal is recursive and so you need a base case. Our base case is just an instruction and the recursive case is an instruction followed by memory. You’ll see these recursive definitions a lot with advent of code problems. The computer is very similar but replace instruction with memory.

Erlang code.

extract_integer({_Token, _Line, Value}) -> Value.

remove_incomplete({ok, operands, {A, B}}) ->
	[{operands, {A, B}}];
remove_incomplete({ok, enable}) ->
	[enable];
remove_incomplete({ok, disable}) ->
	[disable];
remove_incomplete(incomplete) ->
	[].

remove_incomplete_rec({ok, operands, {A, B}}, Rest) ->
	[{operands, {A, B}} | Rest];
remove_incomplete_rec({ok, enable}, Rest) ->
	[enable | Rest];
remove_incomplete_rec({ok, disable}, Rest) ->
	[disable | Rest];
remove_incomplete_rec(incomplete, X) ->
	X.

Finally, our Erlang code to process. We are just removing the incompletes from our final result and reshaping the operands, enable, and disable to avoid dealing with the oks in our application code.

If you are using rebar3 you will get auto-generation here too. A file src/parser.yrl will generate src/parser.erl which has parse/1.

{ok, Lex} = lexer:lex("xmul(2,4)%&mul[3,7]!@^do_not_mul(5,5)+mul(32,64]then(mul(11,8)mul(8,5))"),
parser:parse(Lex).

will output

[[{operands,{2,4}},
  {operands,{5,5}},
  {operands,{11,8}},
  {operands,{8,5}}]]

Amazing! Solving the problem is trivial from here!

Conclusion

This was just our first example using leex and yecc. We are going to go over a lot more examples in the series. For example, 2023 day 1 makes use of a neat leex feature. If you have any suggestions, questions, or corrections hit me up on bluesky.

Setting up Erlang CI with PostgreSQL

2024-08-07T00:00:00+00:00

Overview

In this blog post, I’ll quickly discuss how I got Erlang and PostgreSQL set up in GitHub CI. Additionally, I discuss how I set up single use databases for my test infrastructure.

Assumptions

You at least have some familiarity with GitHub Actions and Erlang.

Erlang in GitHub Actions

This is actually very easy thanks to the Erlang Foundation’s setup-beam action. Here is the top of my ci.yaml file,

on: push

jobs:
  test:
    runs-on: ubuntu-22.04
    name: Erlang/OTP ${{matrix.otp}} / rebar3 ${{matrix.rebar3}}
    strategy:
      matrix:
        otp: ['26.2.2']
        rebar3: ['3.22.1']
    steps:
      - uses: actions/checkout@v4
      - uses: erlef/setup-beam@v1
        with:
          otp-version: ${{matrix.otp}}
          rebar3-version: ${{matrix.rebar3}}

I am building and testing an application here, so having an intensive matrix isn’t particularly helpful. Next, the actual checks and tests I want to run,

      # The Erlang files should be formatted
      - run: rebar3 fmt --check
      # Run eunit
      - run: rebar3 eunit
      # Run common test
      - run: rebar3 ct

This is essentially all you need to do both eunit and commontest tests.

Single use databases

With common test, I can have build-up and tear-down callbacks for every suite and every test. Using these callbacks, I can create, migrate, and destroy a single use database for each test that needs them. Let’s look at an example Erlang common test suite. Set up our Erlang module, include eunit macros, and export all the functions in the module so common test can run them,

-module(temporary_database_SUITE).
-include_lib("eunit/include/eunit.hrl").
-compile(export_all).

The all/0 callback tells common test which tests to run. We’ll look at the actual test last.

all() ->
    [the_current_database_is_the_temporary_one_and_contains_public_tables].

These callbacks initialize state for this test suite. In this case, I start pgo and ensure all of the environment variables are set up for pgo. This Config variable can be used to store state for your tests or explain to the end_per_suite/1 function what to destroy. This isn’t really a tutorial on common test.

init_per_suite(Config) ->
    application:ensure_all_started(pgo),
    environment:setup_application_variables(),
    Config.

end_per_suite(_Config) ->
    application:stop(pgo),
    ok.

This is the meaty goodness, here we create a migrated database from a template (I’ll explain in the next section). Then initialize a connection to that database and store the details in Config. Again, this is how you pass information to your end step.

init_per_testcase(_TestCase, Config) ->
    #{temporary_database_atom := DatabaseAtom, temporary_pool_id := PoolId} =
        util_tests:create_migrated_database_pool(),
    [{temporary_database_atom, DatabaseAtom}, {temporary_pool_id, PoolId} | Config].

Finally, we pull out the Config information and destroy the temporary database and the connection.

end_per_testcase(_TestCase, Config) ->
    DatabaseAtom = proplists:get_value(temporary_database_atom, Config),
    PoolId = proplists:get_value(temporary_pool_id, Config),
    util_tests:drop_database_and_pool(DatabaseAtom, PoolId).

Voila, single use database per test set up and tore down. In this suite, I just do a test that the number of tables in the migrated database contains seven or more tables. That was just how many tables I had when I set this up. Seven isn’t at all magical.

the_current_database_is_the_temporary_one_and_contains_public_tables(Config) ->
    #{command := select, num_rows := 1, rows := [{NumberPublicTables}]} = pgo:query(
        <<"select count(*) from information_schema.tables where table_schema = 'public'">>
    ),
    ?assert(NumberPublicTables >= 7),

    DatabaseAtom = proplists:get_value(temporary_database_atom, Config),
    #{command := select, num_rows := 1, rows := [{CurrentDatabase}]} = pgo:query(
        <<"select current_database()">>
    ),
    ?assert(CurrentDatabase =:= atom_to_binary(DatabaseAtom)).

Getting PostgreSQL in GitHub Actions CI

I’ll go over more specific Erlang bits in the next section. First, let’s get PostgreSQL in GitHub actions.

      - uses: ikalnytskyi/action-setup-postgres@v6
        id: postgres
        with:
          username: my_app
          password: my_app
          database: my_app_template
          port: 5432

Conveniently there is already a github action for this. The interesting part is that I am defining the database as my_app_template which is just a separate copy of my migrated database. This is because, locally, my_app is used to test and I don’t want to copy any data from it every time I create a temporary database. Migrating the template database is also easy,

      - uses: cachix/install-nix-action@v27
        with:
          nix_path: nixpkgs=channel:nixos-unstable
          github_access_token: ${{ secrets.GITHUB_TOKEN }}
      - run: nix-shell -p dbmate --run "unset PGSERVICEFILE && dbmate up"
        env:
          DATABASE_URL: ${{ steps.postgres.outputs.connection-uri }}?sslmode=disable

Here, I use nix to get dbmate. Both of these tools are nice to use. One of them is a lot harder than the other.

Internal Erlang Functions

I showed you the high level common test earlier. Some of the internals are in this section,

create_migrated_database_pool() ->
    % Create a temporary database and initialize a connection to it
    DatabaseAtom = util_pgo:create_temporary_database(),
    {ok, PoolId} = util_pgo:start_default_pool_with_name(DatabaseAtom),
    #{temporary_database_atom => DatabaseAtom, temporary_pool_id => PoolId}.

The implementation of the function in my per-test database and pool initializer,

create_temporary_database() ->
    DatabaseName = get_random_database_name(),
    DatabaseAtom = list_to_atom(DatabaseName),
    Cmd = "createdb " ++ DatabaseName ++ " -T my_app_template",
    Output = ?cmd(Cmd),
    logger:notice(#{
        action => create_temporary_database, cmd => Cmd, output => Output
    }),
    DatabaseAtom.

I literally just shell out to createdb. The command macro is pretty slick. Also, check out my YouTube video on the logger module. Finally, we can start our connection via pgo,

start_default_pool_with_name(Atom) ->
    pgo:start_pool(
        default,
        #{
            pool_size => 1,
            host => environment:get_application_variable(pgo_host),
            user => environment:get_application_variable(pgo_user),
            database => atom_to_binary(Atom),
            password => environment:get_application_variable(pgo_password)
        }
    ).

At our end step, we just shell out to dropdb and kill the pool with,

    ok = supervisor:terminate_child(pgo_sup, PoolId),

Summary

This was a super interesting thing to set up myself. It isn’t a ton of work and there are GitHub actions to help you out. I am really enjoying my time with Erlang but testing even simple database actions is really important. I found that almost all of my query code was wrong at first. The project I set this up for is currently private but feel free to ask me questions on BlueSky.

Getting started with Erlang’s `maybe_expr`

2024-03-04T00:00:00+00:00

Assumptions

You are using rebar3 to build your project. You are using OTP 25.

Introduction

I am an Erlang beginner and I am currently building a Slack bot to learn more. Here is some code I wrote recently,

{ok, ChannelId} = map_utils:recursive_get([<<"channel">>, <<"id">>], Payload),
{ok, UserId} = map_utils:recursive_get([<<"user">>, <<"id">>], Payload),
{ok, TeamId} = map_utils:recursive_get([<<"team">>, <<"id">>], Payload),
{ok, ResponseUrl} = map_utils:recursive_get([<<"response_url">>], Payload),

The Payload here is a decoded JSON body from Slack. The map_utils:recursive_get/2 function takes the path to a JSON entry and extracts it if possible, given this JSON

{
  "channel": {
    "id": "value"
  }
}

If we ran this JSON through my HTTP handlers, this code would succeed,

{ok, <<"value">>} = map_utils:recursive_get([<<"channel">>, <<"id">>]),
{error, not_found} = map_utils:recursive_get([<<"hello">>, <<"world">>]),

When the ChannelId, UserId, etc are all extracted from the Payload properly everything is great. However, if any of the pattern matches fails everything seems to get dropped into the void. This is obviously problematic when you are building an application. Thankfully, I discovered maybe_expr!

With maybe_expr, the code will look more like this,

-record(interact_payload, {channel_id, user_id, team_id, response_url})

% ...

maybe
  {ok, ChannelId} ?= map_utils:recursive_get([<<"channel">>, <<"id">>], Payload),
  {ok, UserId} ?= map_utils:recursive_get([<<"user">>, <<"id">>], Payload),
  {ok, TeamId} ?= map_utils:recursive_get([<<"team">>, <<"id">>], Payload),
  {ok, ResponseUrl} ?= map_utils:recursive_get([<<"response_url">>], Payload)
  {ok, #interact_payload{channel_id = ChannelId, user_id = UserId, team_id = TeamId, response_url = ResponseUrl}}
else
  {error, not_found} ->
    logger:error(...),
    {error, not_found};
  {error, not_found, Reason} ->
    logger:error(...),
    {error, not_found}
end,

Instead of dropping anything into the void, the else clause can be used to pattern match out any failure cases. Here, we match {error, not_found} and {error, not_found, Reason} and log that we had an unexpected error.

This feature is currently “experimental” in OTP 25. However, it is becoming standardized over the next few OTP releases. See this highlight for more information.

Setting up rebar3

Credit to this forum entry for the details. With OTP 25, first create the file config/vm.args if it doesn’t exist and add,

-enable-feature maybe_expr

Then set this in your environment,

export ERL_FLAGS="-args_file config/vm.args"

Or, prepend ERL_FLAGS="-args_file config/vm.args" to your rebar3 commands. Reminder, you can skip this step in OTP 26.

Enabling the feature

In your Erlang files you only need to add (after your module definition),

-module(...).
-feature(maybe_expr, enable).

Done. Supposedly in OTP 27 you won’t need to do either of these steps!

More information

You can read the Erlang Enhancement Process proposal here.

If you’ve ever written Haskell before this is essentially ExceptT e IO a where the e can be literally anything. It is your job in Erlang to catch all the cases. You can add type checking to your Erlang code with something like eqWAlizer and type annotations via dialyzer. I was first exposed to type annotations in LYAH’s dialyzer introduction.

If there is anything you are curious about in Erlang, please ask me about it on BlueSky or the Erlanger’s Slack. I would like to write more blog posts and learn more about Erlang.

First day with typst, a markup based typesetting system

2023-03-31T00:00:00+00:00

I came across typst recently which looks like an interesting replacement to LaTeX. I don’t really do much collaborative editing anymore, but I really enjoy plain text presentations. I tried pollen as well, but I didn’t like the unicode symbols. What was my first presentation like using typst?

typst is available on the unstable nix channel and you can likely get it with nix-shell -p typst or follow the instructions on their github.

One annoying thing about LaTeX is you have to compile a bunch of times for your PDF to be correct. With typst, you can typst -w document.typ and it will watch the document for changes and recompile automatically. This is a really nice productivity boost.

Setting up the presentation,

#set page(                                                                 
  paper: "presentation-16-9",    
  margin: (    
    rest: 25pt    
  )    
)   

Here, we are setting parameters for the page. # denotes a “code expression”. I believe I could also build my own template to define margins, spacing, font, etc. in a separate file.

Next, the font,

#set text(    
  font: "JetBrains Mono",    
  size: 22pt    
)

This syntax is a bit odd, but it is syntax and LaTeX isn’t necessarily nicer in any way. You can lay out a slide like so,

= The slides title

// The slides content

#pagebreak()

The = denotes a header, you can generate smaller headers with additional =, i.e. ===. The // denotes a comment, most of what you need is from markdown, see the syntax guide. Interestingly, you can make this into a named function,

let slide(title, content) = [
  = #title
  #content
  #pagebreak()
]

The function syntax is a bit weird to me, but I also don’t fully understand the type system yet. As an example, here is another function,

  #let fig(location, width, gap, caption) = [
    #figure(
      image(location, width: width),
      numbering: none,
      gap: gap,
      caption: caption
    )
  ]

Note the difference in how I refer to the parameters in the body. I think the former #x are inserting “content blocks” and the latter are plain values and don’t require the #. Not exactly sure yet.

From here, you could generate a slide with a figure pretty easy.

#slide(
  [The slides title],
  [
    - Some unordered list item
    - Some other unordered list item
    fig(
      "figures/image.png", 
      350pt, // the width of the image, see function definition
      -2pt, // the captions are a bit far away from the images by default
      [ The caption for the figure. ]
    )
  ]
)

From here, you can build a basic presentation! Pretty cool.

I also wrote two other functions for links:

#let l(location) = link(location)[#text(blue)[#location]]
#let ld(location, description) = link(location)[#text(blue)[#description]]

I did try a two column #grid but the alignment was a bit wonky. I would like to spend a bit more time handling columnar layouts before attempting to show some code. Let me know what you think on BlueSky

Discarding monadic results in Haskell

2021-09-01T00:00:00+00:00

Discarding Monadic Results in Haskell

I recently ran this poll on Twitter. The original poll and results,

I wasn’t expecting much from this poll but the comments turned out to be fantastic! Let’s summarize the problem, options, and discuss them a bit. The focus of the discussion will be if I would use it in my personal project. It isn’t a suggestion.

Motivation

module Main where    
    
f :: IO Int    
f = pure 1    
    
main :: IO ()    
main = do    
  -- business...    
  f    
  -- more business...    
  pure ()

The following warning is generated when you compile this with -Wall,

src/Main.hs:9:3: warning: [-Wunused-do-bind]
    A do-notation statement discarded a result of type ‘Int’
    Suppress this warning by saying ‘_ <- f’
  |
9 |   f
  |   ^

Typically, I also use -Werror and therefore the warning becomes an error. What are our options in this case?

Options

We need to discard the result of f. Here are all the suggested solutions (attributed to the first suggester),

Disable the warning -Wno-unused-do-bind @TechnoEmpress
void f (in the poll)
() <$ f @alex_pir
_ <- f (in the poll)
_ ← f @toastal
_descriptiveName <- f @MxLambda
(_ :: ResultType) <- f , in this case Int @vincenthz

Breakdown

Let’s look at a few of the options a bit.

void

void :: Functor f => f a -> f ()

I have always used this in my personal projects. It gets the job done, but isn’t particularly satisfying (hence the poll). The win here is that it only requires a Functor constraint and can be used beyond do notation. I wonder if void would be more compelling if it was named differently? Maybe discard or ignore?

The const equivalent

(<$) :: Functor f => a -> f b -> f a
(<$) = fmap . const

This is const lifted into a functorial context. It is more flexible than void and useful for the same reasons. It is provided, for free, by the Functor typeclass and is one I often forget about. That being said, I don’t feel particularly compelled to start using () <$ ... over void.

Underscores

The options,

_ <- f
(_ :: Int) <- f
_descriptiveName <- f

The first line is saying “match something, but I don’t care what”. This is equivalent to void but preferred by more respondents. There is one exception to this preference (expressed in the responses as well),

do
  -- business...
  _ <- finalMonadicComputation
  pure ()

I personally think this should be void in almost all cases. The latter two lines are much more interesting to consider and make context even more important. Specifying the underscore’s type, i.e. _ :: Int, does add some additional type safety if the monadic computation changes. However, in most cases, changing the monadic computation would at least point me to the underscore (thanks GHC) so I can reconsider my choices. Adding a descriptive name is never a bad thing, but sometimes it is difficult to come up with a good name or the function names are clear enough. I think both of these are interesting and I will probably use some variation of them in the future.

Bonus: with ScopedTypeVariables you can remove the parentheses.

Unicode

Honestly, I don’t even know how to enter a unicode arrow on my keyboard. Cool suggestion nonetheless.

Disable the warning

Here I am appeasing the compiler for -Wall -Werror and Hécate is playing an entirely different game. I think this is interesting and I might try it out in my personal projects. However, you do lose a signal that the monadic computation returns something. In Haskell, we often use descriptiveFunctionName_ to indicate that a function returns () and if you follow that convention you could use that as a signal. Do I really need this signal? I am not so sure anymore.

Wrapping up

This poll generated a surprising response. The results were both fun, interesting, and will hopefully make me think more carefully about context. I hope you enjoyed it as much as I did.

Find typos or have suggestions? My DMs are always open @chiroptical.dev.

Like the content? Follow me on Twitch and subscribe on Youtube

Simple Scaleable Preprocessing with PyTorch and Ray - 0

2020-05-20T00:00:00+00:00

Simple Scaleable Preprocessing With Pytorch and Ray

Background

I have been using PyTorch for a few months now and I really like the Dataset and DataLoader workflow (see torch.utils.data). I realized I might be able to use this workflow for every step in my Machine Learning pipeline, i.e. preprocessing, training, and inference. I further realized I could use Ray to coordinate multi-node parallelism with little changes to my original code.

Escape Hatch: if you would rather explore the code with no explanation there is a Jupyter Notebook on Github

I believe most folks are using Dataset/DataLoader to handle training and inference pipelines but let’s consider a more general preprocessing workflow. A data scientist needs to write a function which processes their entire data set, the function has the approximate signature:

InputFile -> (OutputFiles, Metadata)

Here, InputFile is an input file in your dataset. The function may produce one, or more, OutputFiles and some Metadata related to the operation performed. As a practical example, I often have to split large audio files into multiple audio files of a fixed size and retain some metadata (source audio, destination audio, labels).

In this blog post, I’ll discuss how to get PyTorch’s DataSet and DataLoader workflow running in parallel for this general use case. I will also go over some of the mistakes I made while first exploring this workflow. I will assume the reader knows basic Python.

Why should you care?

I believe this workflow is really easy to teach to beginners. A user only needs to know how to write a function to process an input file and the relationship between batches and parallelism. With the exception of the collate_fn (explained later) the code is essentially boilerplate. If you can implement a Dataset the parallelism comes almost for free which is a massive win for beginners.

Up and Running

I am going to build an example data set which mimics the audio splitting example I introduced. I will have a dataset.csv file which contains the following:

input
a.txt
b.txt
c.txt
d.txt

Each TXT file will contain a word (simple, scaleable, preprocessing, and pytorch respectively). The files will be located in an inputs/ directory. The goal is to split each word into parts of a certain number of characters and overlap, e.g.

a = "hello"
b = split_word(a, num_chars=2, overlap=1)
assert b == ["he", "el", "ll", "lo"]
c = split_word(a, num_chars=3, overlap=2)
assert c == ["hel", "ell", "llo"]

We can build a Dataset which performs this action on all of the input files. First, let’s generate a list of input files. I’ll use the built-in CSV library:

import csv

with open("dataset.csv", "r") as csv_file:
    reader = csv.DictReader(csv_file)
    input_files = [f"inputs/{row['input']}" for row in reader]

assert input_files == ["inputs/a.txt", "inputs/b.txt", "inputs/c.txt", "inputs/d.txt"]

To use Dataset, you’ll need PyTorch (e.g. pip3 install torch==1.5.0)

from torch.utils.data import Dataset

class WordSplitter(Dataset):
    def __init__(self, inputs, num_chars=2, overlap=1):
        self.inputs = inputs
        self.num_chars = num_chars
        self.overlap = overlap
        
    def __len__(self):
        return len(self.inputs)
    
    def __getitem__(self, idx):
        filename = self.inputs[idx]
        
        with open(filename, "r") as f:
            word = f.read().strip()
        
        return split_word(
            word,
            num_chars=self.num_chars,
            overlap=self.overlap
        )

For the Dataset to work, we need to define 3 “dunder” methods __init__, __len__, and __getitem. The __init__ function stores the input files and parameters needed to run split_word. The __len__ function returns the length of input_files. The __getitem__ function is where the computation happens. First, we extract the file at the given index. Second, we read the word from the file and remove any whitespace sorrounding the word. Finally, we feed our word to split_word with the appropriate parameters. Let’s see if it works:

word_splitter = WordSplitter(input_files, num_chars=3, overlap=2)
assert word_splitter[0] == ['sim', 'imp', 'mpl', 'ple']

Awesome. It is really important to make sure your Dataset works before moving on to the next steps. Remember our signature from before:

InputFile -> (OutputFiles, Metadata)

Think of the __getitem__ method in WordSplitter as inputting an InputFile, not writing any OutputFiles, and producing Metadata related to the operation. In the realistic audio splitting example the OutputFiles could be written to an outputs/ directory. We can now wrap this into a DataLoader and run our analysis in parallel!

from torch.utils.data import DataLoader

loader = DataLoader(
    word_splitter,
    batch_size=1,
    shuffle=False,
    num_workers=len(word_splitter),
)

The DataLoader bundles our work into batches to be operated on. The DataLoader takes in the word_splitter Dataset object we initialized previously. When we set batch_size=1, the loader will split our work into 4 total batches where each batch contains 1 file (batch_size=2 means 2 batches each with 2 files). With 4 batches it is possible to split the work over 4 cores on our machine by setting num_workers=len(word_splitter). Important: with batch_size=4 there is only 1 batch to process and therefore no parallelism can be extracted (i.e. setting num_workers will have no effect). The shuffle=False argument asks the loader to process inputs in order (the default). The loader object behaves like other iterators, i.e. we can print the results in a for loop:

for metadata in loader:
    print(metadata)

Let’s look at the output:

[('sim',), ('imp',), ('mpl',), ('ple',)]
[('sca',), ('cal',), ('ale',), ('lea',), ('eab',), ('abl',), ('ble',)]
[('pre',), ('rep',), ('epr',), ('pro',), ('roc',), ('oce',), ('ces',), ('ess',), ('ssi',), ('sin',), ('ing',)]
[('pyt',), ('yto',), ('tor',), ('orc',), ('rch',)]

Hmm… Something looks weird, each string is embedded in a tuple. The issue is PyTorch uses a collation function which is designed for their Tensor type. It doesn’t work great in this case. Luckily, we can define our own to fix this! In the following code I will use ... to represent code shown above. First, we need to figure out what the input to collate_fn even looks like. Add the collate_fn to WordSplitter

 class WordSplitter(Dataset):
 	...
    
    @classmethod
    def collate_fn(*batch):
        print(f"BATCH: {batch}")
        return []

The @classmethod decorator allows us to call WordSplitter.collate_fn (you’ll see it in a moment). I use *batch to tuple up all of the inputs if the arity is greater than one. The collate_fn isn’t complete but this allows us to inspect our inputs to the function. Second, we add our new function to the DataLoader:

loader = DataLoader(
	...,
    collate_fn=WordSplitter.collate_fn,
)

Note, you don’t want to run this test over your entire data set. I would suggest doing this on a small subset of inputs. If we loop over the loader again,

BATCH: (, [['sim', 'imp', 'mpl', 'ple']])
BATCH: (, [['sca', 'cal', 'ale', 'lea', 'eab', 'abl', 'ble']])
BATCH: (, [['pre', 'rep', 'epr', 'pro', 'roc', 'oce', 'ces', 'ess', 'ssi', 'sin', 'ing']])
BATCH: (, [['pyt', 'yto', 'tor', 'orc', 'rch']])
[]
[]
[]
[]

Let’s modify batch_size=2 in the loader and see what happens when there is actual batching,

BATCH: (, [['sim', 'imp', 'mpl', 'ple'], ['sca', 'cal', 'ale', 'lea', 'eab', 'abl', 'ble']])
BATCH: (, [['pre', 'rep', 'epr', 'pro', 'roc', 'oce', 'ces', 'ess', 'ssi', 'sin', 'ing'], ['pyt', 'yto', 'tor', 'orc', 'rch']])
[]
[]

Okay, so PyTorch returns something like (DatasetObject, [metadata0, metadata1, ...]). All we need to do is extract the list of metadata from the tuple and return it, i.e.

@classmethod
def collate_fn(*batch):
    return batch[1]

In the for loop we need to additionally loop over the returned list of metadata, i.e.

for metadatas in loader:
    for metadata in metadatas:
        print(metadata)

Result with batch_size=1,

['sim', 'imp', 'mpl', 'ple']
['sca', 'cal', 'ale', 'lea', 'eab', 'abl', 'ble']
['pre', 'rep', 'epr', 'pro', 'roc', 'oce', 'ces', 'ess', 'ssi', 'sin', 'ing']
['pyt', 'yto', 'tor', 'orc', 'rch']

With batch_size=2,

['sim', 'imp', 'mpl', 'ple']
['sca', 'cal', 'ale', 'lea', 'eab', 'abl', 'ble']
['pre', 'rep', 'epr', 'pro', 'roc', 'oce', 'ces', 'ess', 'ssi', 'sin', 'ing']
['pyt', 'yto', 'tor', 'orc', 'rch']

With batch_size=4,

['sim', 'imp', 'mpl', 'ple']
['sca', 'cal', 'ale', 'lea', 'eab', 'abl', 'ble']
['pre', 'rep', 'epr', 'pro', 'roc', 'oce', 'ces', 'ess', 'ssi', 'sin', 'ing']
['pyt', 'yto', 'tor', 'orc', 'rch']

Heck yes, this is exactly what we want! You could easily write this metadata somewhere for further use. The key thing to remember here is that the parallelism happens over batches, in this case the maximum possible cores used with varying batch sizes:

`batch_size`	cores
1	4
2	2
4	1

The full code is available in a Jupyter Notebook on Github. This concludes part 0. Next time we’ll look into Ray and let it coordinate the Dataset/DataLoader workflow over multiple nodes!

If you have any suggestions or improvements please message me on BlueSky @chiroptical.dev or submit an issue on Github.

Edits

05/20/2020: Use snake-case over camel-case for wordSplitter

Path to Beginnery in Functional Programming with Haskell - 1

2018-10-18T00:00:00+00:00

Path To Beginnery in Functional Programming with Haskell

See the first post in this series for an introduction to this series. Quick recap: I am trying to track my path completing Haskell programming projects from books I am reading. Feel free to message me on BlueSky @chiroptical.dev with any corrections or suggestions on new topics.

Project 1

This is a short problem, but I was getting stuck on a foldr implementation. I wanted to write down the problem, reductions, correct solution, and some alternate implementations to increase my understanding.

Definition of Problem

Implement myMaximumBy using a fold. myMaximumBy takes a comparison function, of type (a -> a -> Ordering), and returns the greatest element of the list based on the last value in the list which returned GT. Some examples:

Prelude> myMaximumBy (\_ _ -> GT) [1..10]
1
Prelude> myMaximumBy (\_ _ -> LT) [1..10]
10
Prelude> myMaximumBy compare [1..10]
10

Solving The Problem

The base case, or accumulator, is simply the first value in the list. My initial thought is that given an empty list our function should return an error. Side note: after additional thought I decided to implement a version which returns Maybe a, but I will show that in the Practical Considerations section. If given a list with one element, simply return that element. Next we need to define our folding function (for a foldr),

folder :: (a -> a -> Ordering) -> a -> a -> a
folder f x acc = if f x acc == GT then x else acc

and the full foldr with pattern matches for empty and single-item lists,

myMaximumBy :: (a -> a -> Ordering) -> [a] -> a
myMaximumBy _ [] = error "Cannot myMaximumBy on empty list!"
myMaximumBy _ [x] = x
myMaximumBy f xs = foldr (folder f) (head xs) $ tail xs

For a novice, this might look like working code as it will type check! However, it doesn’t work correctly. A foldr breaks down like this for [a] with 3 items:

foldr g acc [a, a', a'']
-- ==
-- g a (g a' (g a'' acc))

Let’s take the example where,

-- Omitting types
g = folder f
f = \_ _ -> GT

-- Reduction (g x x' = x, always!)
-- 1. g a (g a' a'')
-- 2. g a a'
-- 3. a

Which is not what we are looking for! We actually want to return a''. To do that, we need foldl,

folder :: (a -> a -> Ordering) -> a -> a -> a
folder f acc x = if f acc x == GT then acc else x

myMaximumBy :: (a -> a -> Ordering) -> [a] -> a
myMaximumBy _ [] = error "Cannot myMaximumBy on empty list!"
myMaximumBy _ [x] = x
myMaximumBy f xs = foldl (folder f) (head xs) $ tail xs

-- Reduction
-- 1. f (f a a') a''
-- 2. f a' a''
-- 3. a''

Practical Considerations

Implement `... -> Maybe a` Version

Let’s remove the version of myMaximumBy which errors out by returning Nothing when given an empty list and a Maybe a otherwise.

folder :: (a -> a -> Ordering) -> Maybe a -> a -> Maybe a
folder f (Just acc) x = if f acc x == GT then (Just acc) else (Just x)
folder _ _ _ = Nothing

myMaximumBy :: (a -> a -> Ordering) -> [a] -> Maybe a
myMaximumBy _ [] = Nothing
myMaximumBy _ [x] = Just x
myMaximumBy f xs = foldl (folder f) (Just $ head xs) $ tail xs

I don’t think folder f _ x pattern in necessary, but it definitely doesn’t hurt.

Implement `myMinimumBy`

For myMinimumBy you simply replace GT in folder with LT. With a little abstraction, you can write both in a nice point-free style.

folder :: Ordering -> (a -> a -> Ordering) -> Maybe a -> a -> Maybe a
folder o f (Just acc) x = if f acc x == o then (Just acc) else (Just x)
folder _ _ _ _ = Nothing

myOrderingBy :: Ordering -> (a -> a -> Ordering) -> [a] -> Maybe a
myOrderingBy _ _ [] = Nothing
myOrderingBy _ _ [x] = Just x
myOrderingBy o f as = foldl (folder o f) (Just $ head as) $ tail as

myMaximumBy :: (a -> a -> Ordering) -> [a] -> Maybe a
myMaximumBy = myOrderingBy GT

myMinimumBy :: (a -> a -> Ordering) -> [a] -> Maybe a
myMinimumBy = myOrderingBy LT

Wrapping Up

This wasn’t a particularly difficult problem or solution, but it was one of the first cases where my code looked correct, type-checked, and failed. It is really important to understand the difference between foldr and foldl. I am starting to really enjoy point-free style in Haskell. When understood, it is terse and beautiful.

Edits made on 10/18/18 cleaning up patterns with unneccesary named parameters. Replace (x:[]) with [x].

C++ Recursive Template Metaprogramming: Fibonacci Numbers

2018-07-01T00:00:00+00:00

Background

After a brief dive into Scala, I am back to writing C++. However, I do have a much better appreciation for functional programming and recursion. I am far from an expert at either, but I am interested in increasing my programming skills. I decided to revive my blog and try to post things I find fun or interesting. I am currently reading “Effective Modern C++” by Scott Meyers and continually come across Metaprogramming online. I was poking around Stack Overflow and I found this post which asks about tail recursion in Template Metaprogramming (TMP). I thought this was interesting and decided to see if I could write the naive recursive Fibonacci number generator using TMP.

I had already written this in Scala, which looks like:

import scala.annotation.tailrec
def fib(n: Int): Int = {
    @tailrec
    def loop(iter: Int, prev: Int, next: Int): Int = {
        if (iter >= n) prev
        else loop(iter + 1, next, prev + next)
    }
    loop(0, 0, 1)
}
fib(10)

However, fib(10) will execute at runtime and the Java Virtual Machine occurs additional runtime overhead each time you run the program. A neat benefit of TMP in C++ is the compiler can compute fib(10) and then each invocation of the program is as simple as printing an integer. My first implementation in C++, looked like:

#include 
#include 

namespace impl {

    template<int64_t n, bool isPositive>
    struct fib_impl {
        static constexpr int64_t val = fib_impl<n - 1, isPositive>::val + fib_impl<n - 2, isPositive>::val;
    };

    template<>
    struct fib_impl<1, true> {
        static constexpr int64_t val = 1;
    };

    template<>
    struct fib_impl<0, true> {
        static constexpr int64_t val = 0;
    };

    // If calling fib<-1>::val it will try to do the recursion infinitely
    // -> this template short circuits that recursion
    template<int64_t n>
    struct fib_impl<n, false> {
        static constexpr int64_t val = -1;
    };

} // namespace impl

template<int64_t n>
struct fib {
    static_assert(n >= 0, "Error: fib can't be called with a negative integer");
    static constexpr int64_t val = impl::fib_impl<n, (n >= 0)>::val;
};

int main() {
//    static_assert(fib<-1>::val); // This will fail.
//    static_assert(fib<10>::val == 55); // Make sure it works at compile time!
    std::cout << fib<91>::val << '\n';
    return 0;
}

I want the interface of fib to accept only a positive integer, therefore we abstract away whether, or not, the integer is positive with impl::fib_impl. In this implementation, you need 3 template specializations. Two are the termination conditions: 0 and 1; the other provides protection from an infinite recursion when you give a negative number to fib. Even though you get an error from the static_assert(fib<-1>::val), the compiler still tries to create infinite templates. Luckily, your compiler will protect you from creating literally infinite templates (GCC 7.2.1 allowed 900 to be generated, use -ftemplate-depth= to change it). This implementation isn’t tail recursive because the recursion isn’t in the tail position. The recursive call,

fib_impl<n - 1, isPositive>::val + fib_impl<n - 2, isPositive>::val

is shaped like recursive_template(...) + recursive_template(...), but must look like: recursive_template(...) to be tail recursive. You can verify this by modifying the Scala code. In C++, I believe the only way to find out if tail recursion is actually applied is looking at the assembly for loops. Unfortunately, this is done at compile time and you can’t review the compile time assembly (to my knowledge). The tail recursive implementation is:

#include 
#include 

namespace impl {

    template <int64_t n, int64_t prev, int64_t next, bool isPositive>
    struct fib_impl {
        static constexpr int64_t val = fib_impl<n - 1, next, prev + next, isPositive>::val;
    };

    template <int64_t prev, int64_t next>
    struct fib_impl<0, prev, next, true> {
        static constexpr int64_t val = prev;
    };

    template <int64_t n, int64_t prev, int64_t next>
    struct fib_impl<n, prev, next, false> {
        static constexpr int64_t val = -1;
    };

} // namespace impl


template <int64_t n>
struct fib {
    static_assert(n >= 0, "Error: fib can't be called with negative numbers!");
    static constexpr int64_t val = impl::fib_impl<n, 0, 1, (n >= 0)>::val;
};

int main() {
//    static_assert(fib<-1>::val); // This will fail.
//    static_assert(fib<10>::val == 55); // Make sure it works at compile time
    std::cout << fib<91>::val << '\n';
    return 0;
}

Great, now the recursive call is in the tail position. Additionally, we only need 2 template specializations. The one where n = 0 and the infinite template recursion protection for negative integers. I compiled both versions with GCC 7.2.1 using the C++11 standard (which is necessary for constexpr) 10 times and measured the average compile time. It was essentially the same (about 0.2s). The tail recursive version has a major downside though: it overflows a int64_t faster than the non-tail recursive version. The largest value of n for the non-tail recursive version was 92 and for the tail recursive version was 91. The reason for this is because the template recursion for fib<92>::val contains a prev + next which would contain a value to large to fit in int64_t.

This code was an academic exercise, but I think it is neat. This is my first experience with TMP and I am very interested to learn more. Feel free to message me, or follow me, on BlueSky with constructive criticism or for future blog posts.

Chiroptical’s Blog

Why I re-wrote my Slack bot in Gleam

Introduction

Records

Database Queries

Refactoring

Programming the Happy Path

Adding a Front-end

Erlang/OTP

Conclusion

Leex and yecc by example: part 2

Welcome to Part 2!

Day 4: Ceres Search

Refactor Day 3

Conclusion

Leex and yecc by example: part 1

Background

Day 3: Mull It Over

Lexing

Parser

Conclusion

Setting up Erlang CI with PostgreSQL

Overview

Assumptions

Erlang in GitHub Actions

Single use databases

Getting PostgreSQL in GitHub Actions CI

Internal Erlang Functions

Summary

Getting started with Erlang’s `maybe_expr`

Assumptions

Introduction

Setting up rebar3

Enabling the feature

More information

First day with typst, a markup based typesetting system

Discarding monadic results in Haskell

Discarding Monadic Results in Haskell

Motivation

Options

Breakdown

void

The const equivalent

Underscores

Unicode

Disable the warning

Wrapping up

Simple Scaleable Preprocessing with PyTorch and Ray - 0

Simple Scaleable Preprocessing With Pytorch and Ray

Background

Why should you care?

Up and Running

Edits

Path to Beginnery in Functional Programming with Haskell - 1

Path To Beginnery in Functional Programming with Haskell

Project 1

Definition of Problem

Solving The Problem

Practical Considerations

Implement ... -> Maybe a Version

Implement myMinimumBy

Wrapping Up

C++ Recursive Template Metaprogramming: Fibonacci Numbers

Background

Implement `... -> Maybe a` Version

Implement `myMinimumBy`