HyperThunk

Does Erlang/OTP need a new package management solution?

Hyperthunk — Mon, 28 May 2012 21:43:37 +0000

Since mid 2011 I’ve been thinking on and off about this question. There are some package management solutions available for Erlang/OTP already, but none of them really seem to meet my needs. I had been considering writing a new solution from the ground up, but decided to take a pause and engage with members of the open source community first. I reasoned that it’s better to build something that benefits the whole community and supports a wide range of user experiences, rather than just hack something together for my own use. Since the turn of the year, I’ve had some very constructive conversations with the Erlware developers, as well as some recent discussions on erlang-questions about this topic, with Joe Armstrong contributing to the pool of ideas. This post looks at the origin of these conversations, some of the driving forces, and concludes with a review of the direction in which the Erlware developers and I think we ought to consider going in.

TL;DR

Really, you mean you don’t want to read my essay of a blog post? I don’t blame you – I generally have the attention span of a small puppy, so here goes with the overview for those other ADHD folks out there:

we need/want a dependency manager, not a package manager
everything works on the command line
there is also an Erlang API for everything (with no assumptions about the runtime environment) so that tools integration is easy
packages need to be identified by name, version and publisher (e.g., basho/rebar-1.0 is not the same as hyperthunk/rebar-1.0)
multiple versions of packages (plus maybe multiple originators) may exist within a local machine so….
- the plain OTP lib_dir thing isn’t going to work (just having basho/rebar-1.0 and hyperthunk/rebar-1.0 installed isn’t viable), and
- reltool isn’t going to like this or provide a decent way of handling it
so we need some kind of custom local repository
- that understands publisher as an additional concept
- with a means of getting the right code path set up for any given task
storing package meta-data indexes in git is smart and easy
- one git repository per publisher/organisation is best, supporting index meta-data for any package they’re publishing
- users simply white-list publishers/organisations and get access to all their packages thereafter
- creating your own index (of packages) is easy and secure (via your github repository access control settings)
we prefer pushing built, binary artefacts rather than having the index(es) point to source code only repositories
the binary (+ mandatory stuff like includes and/or optional things like source code) artefacts should probably be bundled as .ez files
published binary (.ez) artefacts can and should just live inside the index, but pulling the index should not mean pulling the whole remote repo
- you never actually clone the whole repository unless you’re the publisher who owns and is working on and publishing to it
- the master branch is generally empty and contains only a README for the benefit of github browsing
- the index itself lives in a special ‘index’ branch and nothing but index metadata ever goes in here
- when a binary artefact is added, it is put on a new branch and tagged – all the index metadata is deleted from the branch/tag so only the artefact remains
- when pulling the index or a specific artefact that has been located by examining a local copy of the index, you fetch only that specific subset of the repository, by using git’s fetch-pack instruction set.

And for those who, rather than go on to read the following summation, would prefer trawling through the (very interesting!) conversations on the mailing list, you can go and search through the erlware questions group mailing list for things around package management. Here are some pointed conversations to browse at your leisure:

– original idea from Eric:
https://groups.google.com/forum/?fromgroups#!topic/erlware-questions/GtFBTQtgeng
– overview: https://groups.google.com/forum/?fromgroups#!topic/erlware-questions/omunsj8pfs4
– some of Joe’s questions from the erlang-questions package management thread that we visited:
https://groups.google.com/forum/?fromgroups#!topic/erlware-questions/ZbRdDAkFQPo
– repository design questions:
https://groups.google.com/forum/?fromgroups#!topic/erlware-questions/vNHjrvIScGE
– erlang/repository namespace issues:
https://groups.google.com/forum/?fromgroups#!topic/erlware-questions/cav3oK_D8sw
– code signing:
https://groups.google.com/forum/?fromgroups#!topic/erlware-questions/1esqRJU11EE

The idea of using git fetch-pack is illustrated towards the bottom of this post: https://groups.google.com/forum/?fromgroups#!topic/erlware-questions/js06abXa8Mk.

And so to the little details…

A quick aside…

In this post, I repeatedly refer to the packaging tool, as I’m very lazy. In fact, in the Erlware discussions, we’ve generally agreed that each of the following tasks could potentially be solved by a different program, with the tool chain working in an integrated fashion to provide a complete workflow:

managing local and/or remote repositories
solving dependencies
fetching dependencies/indexes
building
packaging/assembling
publishing

The user/developer experience

The first tenet of this proposed tool chain, is that every aspect of the workflow, for developers who’re consuming software, publishing it, or even just trying to build someone else’s kit – must be automated. The command line should suffice for everything, and the level of configuration required to make the tool(s) work should be minimal.

If I’m going to consume an OTP library/application, then I’d like this process to be *really* simple. If it’s a matter of fetching the software to my local machine, then I want the command to be something really simple, like ${tool} fetch {thing} or whatever. If I’m building a project and want to have this ‘dependency management tool’ integrated into my build, then I basically want a simple sequence like

specify the things I depend on
call the tool to fetch/install everything
compile/build my project (step ‘2’ might be integrated here instead)

Anybody who wants to build my project from source, should perform (2) and (3), or just (3) if the build tool is integrated nicely. That’s how it should be – simple. As a user wanting to deal with stage (1), most people will happily settle for one of

writing a file containing erlang terms, a la rebar/sinan config – or adding to an exiting build config of this ilk
running a command in the shell that lists the potential choices, and given a choice generates/updates the config for me

Either way, the file in which dependencies are declared must be human readable, and should not be hard to write by hand, which pretty much rules out JSON or XML or anything like that.

Declaring Dependencies and Managing Repositories

Obviously gathering dependencies requires that I know the application/library name and version. Some tools (like Cabal and RubyGems) support specifying a version based on some kind of range – that’s nice, but for now let’s put it to one side and assume the version number is going to be fixed. So to get hold of the lager logging framework, what should I declare?

%% dependencies.cfg
{lager, "0.9.4"}.

That’s nice and simple – no URL to worry about or anything like that. So assuming that lager-0.9.4 is not already available to me locally – we’ll cover what this means in practice later – how should a dependency manager resolve it to a viable list of packages? This is where the next assumption comes in – it shouldn’t. Perhaps OOTB some default source might be available – possibly provided by Erlware, or ProcessOne, Erlang Solutions or even Ericsson!? – but assuming it isn’t, the dependency manager should puke. You need to configure at least one source repository, so that the index of available packages can be downloaded/updated and searched for candidates.

One thing that many package management solutions have got right, is providing the ability to source software from multiple places. For the developer of a packaging solution, it is better not to have to maintain a canonical repository of source code (or built artefacts), simply because in order to verify their origin, you need some degree of manual intervention. This leads us to the second kind of user: those who’re packaging and publishing their software. Their experience ought to look something like this:

project is tagged, built and ready for publication
publisher runs the packaging tool [this is the end of the manual intervention required]
packaging tool generates appropriate meta-data for a repository index based on the project
packaging tool bundles everything up into an archive – probably a compressed .ez archive
packaging tool hashes the data in the archive and digitally signs the result using the publisher’s private key
packaging tool inserts the package meta-data into the local index
packaging tool places the package artefact into the local repository

The package is now ‘released’ into the local repository and ready to be uploaded so that others can benefit from it. This process should work with any project that uses an OTP layout (and with the help of some command line flags, also those that don’t) and therefore can be used by a developer to install any source code package or pre-packaged artefact into their local repository, even if they originally had to download it from bittorrent because it was never published anywhere else (properly).

So how does the consumer get hold of this package which is now installed on the publisher’s local machine? This part ought to be relatively easy. The packaging tool will obviously support a kind of `publish` operation, and the implementation of this can be incredibly simple.

Let’s assume that the local repository is implemented as a git repository. Let’s also assume that the index meta-data is stored in a simple file system layout (the details of which we’ll revisit later), and only this index meta-data – which describes the artefact, version, digital signature, MD5 for comparison post download, etc – is present in the ‘master’ branch. No other data lives in the main branch of the repository.

Assuming this is true, and given an artefact lager-0.9.4, writing the index meta-data might be as simple as:

$ cd $REPO
$ ls  # only one directory and a readme in here
.git index README.txt
$ mkdir -p index/lager/0.9.4
$ cat $LAGER_METADATA >> lager/0.9.4/index.meta
$ git add lager/0.9.4
$ git commit -m $LAGER_COMMIT_MSG

Clearly if we did a git push origin master now, we’d have an index that ‘claimed’ a lager-0.9.4 was present in the repository, when this isn’t the case. Now the underlying work the packaging tool must do looks something like:

$ git co -b lager_0.9.4
$ rm -drf index
$ git rm index
$ cp $LAGER_EZ_ARTEFACT .
$ git add $LAGER_EZ_ARTEFACT
$ git ci -m $LAGER_COMMIT_MSG
$ git tag -a lager-0.9.4 -m $LAGER_COMMIT_MSG
$ git push origin lager_0.9.4 --tags
$ git push origin master

And now there is both an index and an artefact uploaded to the remote repository.

Only Downloading What You Need

If you’ve been following this carefully, you’ll now be thinking about how the consumer does not want to download an entire repository full of applications/libraries, and many potential versions of each! It turns out that fetching only the parts that you want, is easy enough.

$ cd $REPO_INSTALL_AREA
$ mkdir -p $REQUIRED_ARTEFACT   # lager-0.9.4 in our case
$ cd $REQUIRED_ARTEFACT
$ git init
Initialized empty Git repository in /private/tmp/scratch-dir/.git/
$ git fetch-pack --include-tag -v [email protected]:hyperthunk/gitfoo.git refs/tags/lager-0.9.4
Server supports multi_ack_detailed
Server supports side-band-64k
Server supports ofs-delta
want c1bba117cc28e3c839a21d69e56af5768856930b (refs/tags/lager-0.9.4)
done
remote: Counting objects: 7, done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 7 (delta 0), reused 7 (delta 0)
Unpacking objects: 100% (7/7), done.
c1bba117cc28e3c839a21d69e56af5768856930b refs/tags/lager-0.9.4
$ git archive c1bba117cc28e3c839a21d69e56af5768856930b >> lager-0.9.4.ez
$ ls -la
total 280
drwxr-xr-x   4 t4    st     136 28 May 21:55 .
drwxrwxrwt  11 root  st     374 28 May 21:52 ..
drwxr-xr-x  10 t4    st     340 28 May 21:54 .git
-rw-r--r--   1 t4    st  143360 28 May 21:55 lager-0.9.4.ez
$ unzip -l lager-0.9.4.ez 
Archive:  lager-0.9.4.ez
warning [lager-0.9.4.ez]:  1536 extra bytes at beginning or within zipfile
  (attempting to process anyway)
  Length     Date   Time    Name
 --------    ----   ----    ----
        0  05-28-12 21:51   ebin/
    16168  05-28-12 21:51   ebin/error_logger_lager_h.beam
      937  05-28-12 21:51   ebin/lager.app
    10220  05-28-12 21:51   ebin/lager.beam
     3500  05-28-12 21:51   ebin/lager_app.beam
     3704  05-28-12 21:51   ebin/lager_console_backend.beam
    11228  05-28-12 21:51   ebin/lager_crash_log.beam
    14600  05-28-12 21:51   ebin/lager_file_backend.beam
    23060  05-28-12 21:51   ebin/lager_format.beam
     3384  05-28-12 21:51   ebin/lager_handler_watcher.beam
     1284  05-28-12 21:51   ebin/lager_handler_watcher_sup.beam
     3720  05-28-12 21:51   ebin/lager_mochiglobal.beam
    22096  05-28-12 21:51   ebin/lager_stdlib.beam
     2928  05-28-12 21:51   ebin/lager_sup.beam
     8244  05-28-12 21:51   ebin/lager_transform.beam
    12580  05-28-12 21:51   ebin/lager_trunc_io.beam
    13920  05-28-12 21:51   ebin/lager_util.beam
        0  11-07-11 11:20   include/
     3048  11-07-11 11:20   include/lager.hrl
    10175  11-07-11 11:20   LICENSE
     7639  11-07-11 11:20   README.org
 --------                   -------
   172435                   21 files
$

We can clearly see that in the scratch directory where we ran git init, we have acquired only the data from the tag the packaging tool created, which lives in its own artefact/version specific branch and is thus isolated from everything else in the repository. Each artefact version can be kept separate by the tool in this way, and all of them held apart from the repository’s searchable index, which is stored and maintained in the master branch.

This approach also solve the security conundrum, because only persons with ssh access to your github repository will be able to push changes. Those who wish to consume the data (i.e., check out the master-index and potentially fetch-pack some of the branches to obtain artefacts) may do so at will, but they cannot write back to the repository unless you’ve authorised them yourself.

Packaging Namespaces

One key issue we wanted to address was the tendency of open source projects – particularly those hosted on github – to be forked by multiple authors/maintainers. When consuming a OTP library or application, you may not care about this, but in order for more than one source repository to exist, we need to be able to distinguish between publishers!

If I have forked the lager application to my hyperthunk git account, and you have both my index/repository and the Basho Technologies repository listed as potential sources for resolving dependencies, then you’re going to have to get specific about which published version of lager-0.9.4 you actually want. We assume that you will do this by specifying the publisher/organisation along with each dependency, like so:

%% dependencies.cfg
{esl, parse_trans, latest}.
%{basho, lager, "0.9.4"}.
{hyperthunk, lager, "0.9.4"}.
{hyperthunk, annotations, "0.0.2"}.

The great thing about this approach, is that I (hyperthunk) do not need to actually fork the lager repository in order to publish it under my name! All I need to do is build the artefact and publish + push it to my repository. The code in my repository is signed with my private key, so if you trust me (and git’s ssh based security) then you can use my version of lager-0.9.4 if you wish. You can always obtain my public key (from the repository or elsewhere) in order to verify the integrity of the signed package, which is of course what the packaging tool will do for you when you fetch and install something.

Why is this so great? Well if some author decides not to publish a repository of their own, you can still rely on their code for your project, and treat it just like any other dependency. The mechanism for this kind of 3rd party signing is simple, and works like this:

fetch the unpublished code
build it yourself
publish it into your repo (self signing)
declare {your_organisation, dependency, vsn} in your config and you’re good to go!

So obviously the local repository needs to be split in two, one part which contains your own published stuff, another which contains dependencies you’ve fetched from other publishers and installed onto your local machine. If you never publish anything of your own, that part of the repo will simply be ignored.

The other reason why this is necessary, is that you might want to build Package-A which depends on hyperthunk/lager for one project, and Package-B which depends on basho/lager for another. When these projects are built (individually) they must have an isolated (clean) environment, so the following constraints need to be handled

the packaging tool must choose the right org/artefact/vsn and make it available on the code path for any relevant operations (compile, xref, dialyzer, eunit, etc)
the assembly tool must choose the right org/artefact/vsn items when generating a release – reltool is not going to understand how to do this
once the complete dependency DAG is generated, the solver must crash if two items with matching artefact names and versions exist – only one version of an OTP application will ever make it into the runtime via the code server, but having two clashing dependencies that differ only by publisher/organisation names is an error

Publishing Source Code or Binary Artefacts

I have a strong preference for publishing binary artefacts instead of source code that must be built. In a development environment, where there may well be multiple versions of Erlang/OTP installed, there is no getting around the fact the the beam emulator offers only 2 versions forwards compatibility, and none backwards. If you’ve compiled with R13, you’re probably ok up to R15, but you cannot use code compiled with Rn with any version Rx where x < n.

Because of this constraint, fetching source packages and building them locally doesn’t do you much good in practice, because you still have to track what erts version they’ve been built against to ensure they’re compatible. Sure you can rebuild a package once you realize that you need compatibility with an earlier or later runtime, but once you’ve dealt with this issue then you’re quite a way towards handling binary version compatibility (between erts releases) anyway.

Adding the erts version you built a binary package with to the publication meta-data is easy, and once that information is in the index, the solver can notify a user of R13 that the only published packages available are built for >= R14 . The user can then contact the publisher and ask for R13 packages to be provided, or resort to building the sources themselves and 3rd party signing them (or looking to see if someone else they trust has already done so). Either way, if the package meta-data carries information about the source repository and build command(s), this can easily be automated.

Having binary artefacts gets more involved when dealing with dependencies that contain native source code (for ports, linked-in drivers, NIFs and BIFs, etc). This has been discussed at length on erlware-questions and basically it boils down to the same issue as supporting multiple erts versions, except that you’ve got OS plus different kinds of architecture to deal with, leaving you with a more complex scheme for searching indexes and/or putting items into repositories:

$ erts-vsn/os/arch/32|64

In practice this does add complexity, and it’s highly likely many publishers will not bother to produce artefacts for various platforms/architectures. This is, of course, where 3rd party signing really shines once again.

Why not use [X] instead?

Finally, I’d like to address what kind of ‘package management’ tools we’ve been discussed on the Erlware mailing list. The conversation(s) inevitably started out with expressed dissatisfaction about the current solutions that are available to solve the problem of getting code onto your machine. We quickly noted that most of the discussion centered around activities that take place at build/development time, the focal point being how to obtain working dependencies when building a complex software project in Erlang/OTP. To my mind, this immediately takes us out of the traditional ‘package management’ territory, where the primary concern has to do with installing version X of package Y into the local environment. This also puts us outside of the ideas Joe Armstrong has put forward about remote code loading and importing code from URLs and the like. These are very good ideas – just go look at Smalltalk to see how well they can work – but they’re out of scope for most of today’s tools and probably not going to surface in the near/immediate future.

The two major players at the moment appear to be rebar and agner, although rebar is of course a build tool at heart and not a package manager. The approach that rebar takes is definitely closer to what I’d call a ‘dependency manager’, in that it supports the declaration of software component dependencies in static configuration, and provides a command line interface for fetching, updating and/or removing these from the local source tree of the project in which the declaring config resides. Once these dependencies have been fetched, they are thereafter treated like part of the project’s own source tree and are built (i.e., compiled, tested, cleaned, etc) along with the project itself. As rebar is a tool for building OTP compliant software packages, any dependency must also be a valid OTP application (or library application) in order for this mechanism to work. The approach that agner takes is similar, fetching, building and installing OTP applications/libraries into the code:lib_dir of the current Erlang/OTP installation, or an alternative site. It [agner] also supports upgrading and removing them if required. The mechanism agner uses is more complete than rebar’s simple approach based on VCS URL and optional commit/tag/branch, allowing the publisher to specify details about the package that ease the pain of consumption. In order to support applications that aren’t rebar compatible, agner allows an explicit build command to be specified, which is executed in an external shell process.

What else is out there right now? There is Jacob Vorreuter’s Erlang Package Manager. This is actually a great bit of kit, but suffers from problems with rate limiting due to it’s use of the github search API instead of an explicit package index. Jake published another package management solution later on, in the form of sutro, which is inspired by Homebrew (and works in a similar vein).

The key issue we saw with rebar‘s dependency handling approach was that it only works for rebar – so it is of no use to projects using sinan or some other build system such as waf, fw-erlang or more traditional autotools an/or make based projects such as Ejabberd and RabbitMQ. The use of a local directory to store dependencies also makes this approach a menace when you’ve got dozens of little projects which all depend on the same libraries, as they end up littering the machine. This approach however, was put in place for a good reason, and it does avoid running into problems where globally installed components can lead to unexpected clashes on the code path, incorrect versions being resolved or other problems inherent in shared/global environments. This idea of isolation is very important to maintaining a clean development environment for each build of each project you work on, as evidenced by the excellent virutal-env tool for Python and similar tools for Ruby (such as rvm and rbenv) and Haskell’s virtual-env clone.

Clean, isolated build environments are essential to maintaining a productive development life-cycle and even more vital for things like CI.

The main reason we found ourselves not using agner was the dual cost of maintaining indexes and searching them. The former is a relatively minor pain, but the latter is excruciating due to general slowness.

rebar plugins tutorial is moving

Hyperthunk — Fri, 30 Dec 2011 01:23:03 +0000

I’ve been wanting to spend some time checking out Octopress, and the series on rebar plugins provides me with a good opportunity to do this. Instead of spending my free time writing up a second post this week, I’ve opted to move the series to a custom location and use Octopress as a publishing engine.

My reasons for doing this are several. Firstly, as I mentioned, I wanted a chance to check out the Octopress wrap around Jekyll, and I must admit that so far I’m finding it nice and high level. Secondly, I wanted better handling of code highlighting than the free wordpress account gives me, and the pygments integration in Jekyll does the job very nicely. I also wanted to be able to provide sample code for each of the posts, and by publishing the series using github pages, I can use a single git repository to manage both the sample code and the gh-pages publication branch. All in all, it seems like a pretty neat solution.

From now on, rebar plugin tutorials will be published to http://hyperthunk.github.com/rebar-plugin-tutorial/.

rebar plugins – customising your build

Hyperthunk — Mon, 26 Dec 2011 22:55:24 +0000

This is the first post in a series on customising rebar using plugins. Initially, I want to focus on how plugins work, then move on to see what kind of extensibility they can provide to developers.

Caveats

There are so many caveats to this, I hardly know where to start. Here’s a short-list to begin with:

What’s written here is based on my experience and could be lacking important detail, or even plain wrong!
Rebar’s support for plugins may disappear at any time in the future, as they’re not a fully documented or official feature AFAIK – though this probably isn’t likely in practise, it is possible
Rebar has no official API for plugins, so their use is at best undocumented, and in all cases probably completely unsupported – you will find friends on the mailing list though, I’m sure

I’ve written quite a few plugins by now, and have even submitted a few patches relating to them. It’s fair to say that I’m quite opinionated about how plugins work and how they should work, but it’s also worth remembering that I’m only a very minor contributor and my opinions are just that – my private opinions.

First things first

Currently rebar supports two kinds of extensibility mechanism: Plugin Extensions and Hooks. Hooks are a lightly documented feature, with the only real explanation being the sample rebar config file. We’re not going to cover hooks in much detail, as they are simple enough to understand and are only really applicable to simple scripting tasks that don’t require, for example, cross platform support or complex logic. Plugin extensions on the other hand, are documented (to some extend anyway), and provide a much greater degree of extensibility to developers.

Before we can talk sensibly about plugins, we need to take a look at some of the fundamentals behind rebar, especially its handling of build configuration files and processing of commands. For any given command, say foo, rebar understands the command if (and only if) one of the modules it knows about exports a function with a signature that matches:

the name of the command and arity 2
the name of the command prefixed with pre_, and arity 2
the name of the command prefixed with post_, and arity 2

We’ll be covering how rebar knows about modules later on, but for now we’ll just assume it’s magic. For the command foo to have any meaning then, we’d need at least one module with at least one of the following signatures exported:

-module(foo).
-compile(export_all).

pre_foo(_, _) ->
	io:format("we're running just before foo!~n").

foo(_, _) ->
	io:format("we're in foo!~n").

post_foo(_, _) ->
	io:format("we're running just after foo!~n").

Another essential is how rebar handles build configuration. There are four ways that rebar handles configuration settings. Firstly, rebar loads the global config file from $HOME/.rebar/config if it actually exists, otherwise it creates an empty config set. Secondly, rebar loads configurations for any directory by either (a) examining the terms in the local file rebar.config if it exists, or (b) creating an empty config set. Thirdly, when first executing in the current directory (known as base_dir), rebar will check for a special global variable (passed as config= or alternatively -C instead) which overrides the name of the config file it should search in. This latter technique is only applied to the configuration in the base_dir.

The fourth approach to configuration handling is not just for initialising new configurations. As rebar executes user commands (e.g., clean, compile, eunit) in a given directory, it uses two special commands to obtain a list of directories that need to be processed before the current one, and providing the current directory is processed without error, afterwards as well. These commands, preprocess and postprocess, can be exported by any module.

When rebar executes, it builds up a list of modules that understand the current command. For each of these modules it tries to call pre and postprocess, then it traverses any pre- directories before handling the current command in the current directory. Once all the pre-processing is done, each module that exports one of the three function signatures compatible with the current command is called (for one or more of the pre_/2, /2 and post_/2 exports) to handle the actual command. The directories returned by any postprocess calls are handled last of all.

What is vital to understand about all of this, is that as rebar traverses the file system, recursively handling any pre- directories, in each new dir it executes with a brand new rebar config set. This config set inherits the parent configuration (i.e., the config set for the parent dir) but can override certain configuration variables by providing it’s own rebar.config file. This is how dependencies and applications stored in sub-directories are handled. The salient points about this mechanism are that

the only configuration file rebar notices in sub-directories is the one named rebar.config
any configuration override (passed with -C for example) is ignored in sub-directories
just because a local rebar.config overrides a variable/setting, this might not be applied

Point #3 is a bit scary if you’re new to rebar, but essentially it is the result of rebar’s config handling module exporting multiple config handling functions, some of which get the local (i.e., the most locally scoped) value, some a list of possible values and others the combined list of all values. Depending on which of these functions a particular command/module uses when reading the configuration, you can potentially see a number of things happen:

you might see the local value (from rebar.config) get applied
you might see the value from the parent config get applied (e.g., if there is no local config)
you might see the local value get ignored

I strongly recommend spending some time looking at rebar’s config module if you’re planning on writing plugins (or using complex plugins written by others), as it’ll save you a lot of head scratching time if you understand this up front.

What are plugins?

As far as rebar is concerned, plugins are simply Erlang modules that it knows something about. There are essentially two ways that rebar knows about modules:

From the rebar.app configuration file
Via the plugins section of the build configuration

Modules which are registered in the rebar.app configuration file are basically part of rebar itself. Plugins on the other hand, are modules which the build configuration (somewhere in the tree) knows about via the plugins configuration element. This configuration is built up to include every branch, including the global config, so just because you’ve got no local plugins configuration, doesn’t mean plugins won’t get run in your subdirectories. In practise, this means that plugins registered up top (e.g., globally or in the base_dir configuration) will get run in all your sub-directories, including of course dependencies. Bare this in mind when using plugins, and take advantage of skip_deps, apps= and skip_apps= where necessary to avoid unexpected things happening in your sub-dirs and deps folders.

To (hopefully) make the differences between plugin extensions and built-in modules a bit clearer, we’re going to classify plugin extensions into three groups, and will hereafter refer to them simply as plugins:

Internal/Built-in
External/Pre-packaged
Local

Let’s look at what these classifications mean in practise, and hopefully get an understanding of the terminology I’ve chosen. Internal (or built-in) modules come bundled as part of rebar itself, and as per the documentation, these are registered in the rebar application config file. The functionality exposed by these modules is available to every rebar user, so they work Out Of The Box. These plugins are the least likely to be used for extending rebar however, because in practise they require you to either (a) maintain a custom fork of rebar or (b) submit a pull request in order for your extension(s) to be accepted as part of the main source tree. It is the other two types of plugin we will be looking at in this post.

External Plugins

Pre-packaged plugins are bundled as separate Erlang/OTP libraries to be installed globally, or included in a project using rebar’s dependency handling mechanism. The latter technique is more useful, as it ensures that someone who fetches your source code to build/install it, will be able to obtain the right plugins without going outside of the project source tree.

The key thing to understand here is that the plugin must be installed somehow in order for rebar to pick it up. We’ve mentioned that rebar knows about plugins because they’re in the {plugins, ListOfPlugins} configuration element, but in practise things aren’t quite that simple. In order for a plugin to actually get executed (in response to a specific command, it’s pre/post hooks or indeed the special preprocess and postprocess commands), it needs to be on the code path! This is fine if the plugin is installed globally into the user’s erl environment (for example by putting it’s application root directory somewhere on the ERL_LIBS environment variable), but not so fine if you’re fetching it into the project’s dependencies. If the dependency is a direct one, then the preprocess handler in rebar_deps will nicely update the code path for all commands, so as long as you’re not trying to make the plugin run before rebar’s built-in modules (which is, in fact, impossible) then it’ll be on the path. This once again doesn’t always work in practise however, because the function that builds up the code path makes no attempt to deal with transitive dependencies. I keep meaning to do a pull request for this, but I’m waiting for others to get through the queue first.

Local Plugins

You probably recall that I mentioned plugins need to be on the code path in order to be executed by rebar? Well thanks to a nifty pull request from yours truly, there is in fact another way. If rebar cannot find a module on the code path matching the name given to the plugins configuration element, it will attempt to locate a source file for the module in either the base_dir or the directory indicated by the plugin_dir config element. If it finds a source file with a matching name, it attempts to compile it on the fly and load the beam code into the running emulator, thereby making the plugin available dynamically.

The aim of local plugins is to provide a mechanism for scripting complex tasks/hooks that apply only to your specific project. This is in contrast with the idea of external/pre-packaged plugins, which provide add-on re-usable features to rebar that can be used across projects.

Next time…

Next time we’ll be looking at the structure of the plugin callback functions and how to use them in practise. We’ll also be taking a whirlwind tour of some of the commonly (re)used rebar modules such as rebar_config, rebar_utils and rebar_log, as well as discussing some of the pros and cons of using plugins and what the current workarounds look like. We’ll finish with a working example of an external plugin that adds new functionality to rebar with all the source code available on github.

Packaging OTP applications with the rebar-dist-plugin

Hyperthunk — Thu, 26 May 2011 13:23:29 +0000

A good example of how rebar-plugins can add useful features to your build, the rebar-dist-plugin allows you to produce an archive for your project which can be distributed rather than forcing people to use git/mercurial/etc to obtain and build your sources.

The plugin comes with some pre-defined assemblies (which are the plugin’s unit of configuration) for packaging up a rebar generated release, or project (i.e., the ebin, include and priv directories). Future releases will add other pre-packaged options such as sources, docs and so on.

Using the plugin is pretty simple, and there is some documentation on the project wiki which is mostly up to date.

Customise your build, with rebar plugins

Hyperthunk — Thu, 28 Apr 2011 12:14:43 +0000

I started playing with this today, and have come up with a sample application here. The basic concept behind rebar plugins is fairly simple: you refer to them in your rebar.config and they get hooked into the build at execution time. Naturally rebar (which is executed via escript) needs to be able to find the beam code for these (plugin) modules on the code path, so if you’re putting one together specifically for a project, you’ll need to take advantage of rebar’s sub_dirs support in order to pre-compile them before the rest of your code. The sample project does just that, by compiling the build project prior to the rest of the sources. Including it in your lib_dirs also ensures it is on the code path.

So what can you do with your plugins? ~~Plugins do not participate in rebar’s preprocess stage, so they cannot run in isolation from the core (internal) rebar modules~~ – edit: as of a while back, plugins do in fact participate in the pre and post processing via the same callbacks as built in modules. Check out some of my later posts, or better still head over to http://hyperthunk.github.com/rebar-plugin-tutorial/ for more details.

In practise, this means that your plugin can do one of two things:

Hook into an existing command (such as ‘compile’), or
Expose a new command (such as ‘frobble’ in the example code on github)

The second approach comes with (yet more) caveats though: new, custom commands cannot run in isolation. I suspect this is because plugins do not participate in the preprocess stage, or that they’re excluded from the code that identified modules willing to handle a given command, or both. This means that the rebar_frobble plugin from the example project, runs in two contexts:

During execution of the ‘compile’ command, after the other (internal) modules have handled it
After execution of the ‘compile’ command, during execution of the ‘frobble’ command

In practice, this means you can run [rebar compile frobble], but not [rebar frobble]. [UPDATE] If you referenced the plugin in the top level rebar.config, it would remedy this situation, but you don’t always want to do that. This isn’t very intuitive and I suspect the developers may decide to clarify (or change) this behaviour in future. Despite the slightly confusing execution profile, rebar plugins are a very neat way of customising your rebar build. With full access to all the rebar internal modules, as well as the current (local and global) configuration, the plugin author has a lot of flexibility and power at their fingertips. Naturally with great power comes great responsibility, and plugin authors should consider carefully the use of exports besides published command names and their command/2 function interfaces.

Managing multiple Erlang/OTP installations

Hyperthunk — Wed, 23 Feb 2011 18:52:19 +0000

A recent post on the erlang-questions mailing list got me thinking about the way that I manage multiple (concurrent) versions of Erlang/OTP at the moment. This only works on unix-like operating systems, but it has been useful until now.

Basically, I choose a common folder, which on OSX tends to be ~/Library/Erlang and somewhere similar on other *nixes. Under this directory I keep a subdirectory into which multiple ERTS versions can be installed and another site directory into which common/shared libraries and applications can be installed.

~/Library/Erlang/Current -> /Users/t4/Library/Erlang/Versions/R13B04 ~/Library/Erlang/Versions ~/Library/Erlang/Versions/R12B02 ~/Library/Erlang/Versions/R12B05 ~/Library/Erlang/Versions/R13B04 ~/Library/Erlang/Site ~/Library/Erlang/Site/erlsom-1.2.1 ~/Library/Erlang/Site/gen_server2-1.0.0 ~/Library/Erlang/Site/mochiweb-1.7.1 ~/Library/Erlang/Site/protobuffs-0.5.0 ~/Library/Erlang/Site/webmachine-1.8.0

I then set my $ERL_LIBS environment variable to the site directory and symlink the current folder as I wish. I also configure tools like epm and/or sutro to use the site directory as their target install folder, giving me a consistent way to install things.

The main thing lacking from this approach is that I have control over which libs/apps in the site directory are compatible with which installed versions of ERTS. A good solution to this that doesn’t force me to use an entire tool-chain in order to take advantage of it, sounds very promising.

Compiling Erlang Linked-in driver code on Snow Leopard

Hyperthunk — Wed, 15 Dec 2010 10:01:00 +0000

Someone emailed me to ask how to get Joe’s example driver code to compile on snow leopard. The solution is to pass the right flags to gcc:

t4$ gcc -o exampledrv \ -arch x86_64 \ -fPIC -bundle \ -flat_namespace \ -undefined suppress \ $CFLAGS complex.c port_driver.c

This will generally still fail at runtime unless you rename (or symlink to) the .dylib you’ve created so that your shared library has the .so extension, for which the erts code is explicitly looking. Caveat: this last point may have been fixed in recent Erlang/OTP releases, but I’m a little out of touch! Using rebar to build your port driver sources circumvents this naming issue either way.

jquery.xslt

Hyperthunk — Tue, 21 Sep 2010 20:12:16 +0000

I’ve created yet another jQuery xslt transformation plugin. This one is based on the sarissa javascript library, which abstracts many of the cross browser implementation details for you. I tried the jquery.transform plugin, which didn’t work for me in all browsers, but I wanted the same kind of control over caching and ajax calls. This one is already in use at work, and you can get a look at it here.

Concurrency testing in Erlang with libtest

Hyperthunk — Tue, 10 Aug 2010 12:56:49 +0000

I’ve just started a new project, with the aim of making it easier to test the concurrent programs I write in Erlang. My stated aims are quite simple:

Provide services for capturing inter-process messages and context
Provide support for stubbing out processes and OTP behaviours
Provide hamcrest matchers that make it simple to define assertions about your application’s behaviour

It’s all in a very early stage (apart from my Erlang implementation of hamcrest, which is coming along nicely now) and the source code is available on github here. Here are a couple of samples to whet the appetite:
?assertThat(registered_name(Mod), observed_message_from(self(), Msg)), ?assertThat(registered_name(Slave, ProcessName), observed_message(Msg)), ?assertThat(Pid, observed_message({message, "hello"})), ?assertThat(?COLLECTOR, categorises(observed_message({message, "woo hoo"}), as(category1))).

Stubbing of OTP behaviours will almost certainly be based on emock, and I’m planning on integrating nicely with PropEr as I’ve started using this library quite a lot myself. The mechanics of replacing registered gen_servers and the like (using runtime code generation and/or re-registration/config), I’ll probably leave alone as there are plenty of good libraries out there that do this already.

Property based testing against mocks?

Hyperthunk — Tue, 13 Jul 2010 00:43:13 +0000

More of a quick splurge than anything – I was looking over some old code and noticed a unit test that creates a lazy list of tests, each of which asserts some properties about a bisection based search over a set of stub objects which are all based on an existing superclass and have a property defined which returns a randomised value each time you interact with it. Unusual but kind of cool. Here’s the code…
@forall_lazy(searchbase= sets(items=stubs(cls=Identifiable, salsaId=strings(low=1)), size=(1,3011)), subject=integers(low=0, high=1010)) def check_detect_using_bisect(searchbase, subject): searchbase = list(searchbase) item = searchbase[max(0, min(subject, len(searchbase) -1))] universe = sorted(searchbase, key=salsa_key) search_result = detect(cmp, item, universe, key=salsa_key) reason = "item (%i) mapped_to (%i)" % (universe.index(item), universe.index(search_result)) assert_that(search_result, is_(not_none()), reason) assert_that(search_result.salsaId, is_(equal_to(item.salsaId)), reason)

def test_detect_using_bisect(): [ (yield testcase) for testcase in check_detect_using_bisect() ]