Here they are:
The goal here was really just to give the absolute most basic examples of how to use the tool, for people who use tcpdump or dig infrequently (or have never used it before!) and don’t remember how it works.
So far saying “hey, I want to write an examples section for beginners and infrequent users of this tools” has been working really well. It’s easy to explain, I think it makes sense from everything I’ve heard from users about what they want from a man page, and maintainers seem to find it compelling.
Thanks to Denis Ovsienko, Guy Harris, Ondřej Surý, and everyone else who reviewed the docs changes, it was a good experience and left me motivated to do a little more work on man pages.
I’m interested in working on tools’ official documentation right now because:
tcpdump -w out.pcap, it’s useful to pass -v to print
a live summary of how many packets have been captured so far. That’s really
useful, I didn’t know it, and I don’t think I ever would have noticed it on
my own.It’s kind of a weird place for me to be because honestly I always kind of assume documentation is going to be hard to read, and I usually just skip it and read a blog post or Stack Overflow comment or ask a friend instead. But right now I’m feeling optimistic, like maybe the documentation doesn’t have to be bad? Maybe it could be just as good as reading a really great blog post, but with the benefit of also being actually correct? I’ve been using the Django documentation recently, and it’s really good! We’ll see.
The tcpdump project tool’s man page is
written in the roff language,
which is kind of hard to use and that I really did not feel like learning it.
I handled this by writing a very basic markdown-to-roff script to convert Markdown to roff, using similar conventions to what the man page was already using. I could maybe have just used pandoc, but the output pandoc produced seemed pretty different, so I thought it might be better to write my own script instead. Who knows.
I did think it was cool to be able to just use an existing Markdown library’s ability to parse the Markdown AST and then implement my own code-emitting methods to format things in a way that seemed to make sense in this context.
I went on a whole rabbit hole learning about the history of roff, how it’s
evolved since the 70s, and who’s working on it today, inspired by learning about
the mandoc project that BSD systems (and some Linux
systems, and I think Mac OS) use for formatting man pages. I won’t say more
about that today though, maybe another time.
In general it seems like there’s a technical and cultural divide in how documentation works on BSD and on Linux that I still haven’t really understood, but I have been feeling curious about what’s going on in the BSD world.
]]>I’ve spent a lot of time writing cheat sheets for tools (tcpdump, git, dig, etc) which have a man page as their primary documentation. This is because I often find the man pages hard to navigate to get the information I want.
Lately I’ve wondering – could the man page itself have an amazing cheat sheet in it? What might make a man page easier to use? I’m still very early in thinking about this but I wanted to write down some quick notes.
I asked some people on Mastodon for their favourite man pages, and here are some examples of interesting things I saw on those man pages.
If you’ve read a lot of man pages you’ve probably seen something like this in
the SYNOPSIS: once you’re listing almost the entire alphabet, it’s hard
ls [-@ABCFGHILOPRSTUWabcdefghiklmnopqrstuvwxy1%,]
grep [-abcdDEFGHhIiJLlMmnOopqRSsUVvwXxZz]
The rsync man page has a solution I’ve never seen before: it keeps its SYNOPSIS very terse, like this:
Local:
rsync [OPTION...] SRC... [DEST]
and then has an “OPTIONS SUMMARY” section with a 1-line summary of each option, like this:
--verbose, -v increase verbosity
--info=FLAGS fine-grained informational verbosity
--debug=FLAGS fine-grained debug verbosity
--stderr=e|a|c change stderr output mode (default: errors)
--quiet, -q suppress non-error messages
--no-motd suppress daemon-mode MOTD
Then later there’s the usual OPTIONS section with a full description of each option.
The strace man page organizes its options by category (like “General”, “Startup”, “Tracing”, and “Filtering”, “Output Format”) instead of alphabetically.
As an experiment I tried to take the grep man page and make an
“OPTIONS SUMMARY” section grouped by category, you can see the results
here. I’m not
sure what I think of the results but it was a fun exercise. When I was writing
that I was thinking about how I can never remember the name of the -l grep
option. It always takes me what feels like forever to find it in the man page
and I was trying to think of what structure would make it easier for me to find.
Maybe categories?
A couple of people pointed me to the suite of Perl man pages (perlfunc, perlre, etc), and one thing I
noticed was man perlcheat, which has
cheat sheet sections like this:
SYNTAX
foreach (LIST) { } for (a;b;c) { }
while (e) { } until (e) { }
if (e) { } elsif (e) { } else { }
unless (e) { } elsif (e) { } else { }
given (e) { when (e) {} default {} }
I think this is so cool and it makes me wonder if there are other ways to write condensed ASCII 80-character-wide cheat sheets for use in man pages.
A common comment was something to the effect of “I like any man page that has examples”. Someone mentioned the OpenBSD man pages, and the openbsd tail man page has examples of the exact 2 ways I use tail at the end.
I think I’ve most often seen the EXAMPLES section at the end of the man page, but some man pages (like the rsync man page from earlier) start with the examples. When I was working on the git-add and git rebase man pages I put a short example at the beginning.
This isn’t a property of the man page itself, but one issue with man pages in the terminal is it’s hard to know what sections the man page has.
When working on the Git man pages, one thing Marie and I did was to add a table of contents to the sidebar of the HTML versions of the man pages hosted on the Git site.
I’d also like to add more hyperlinks to the HTML versions of the Git man pages at some point, so that you can click on “INCOMPATIBLE OPTIONS” to get to that section. It’s very easy to add links like this in the Git project since Git’s man pages are generated with AsciiDoc.
I think adding a table of contents and adding internal hyperlinks is kind of a nice middle ground where we can make some improvements to the man page format (in the HTML version of the man page at least) without maintaining a totally different form of documentation. Though for this to work you do need to set up a toolchain like Git’s AsciiDoc system.
It would be amazing if there were some kind of universal system to make it easy
to look up a specific option in a man page (“what does -a do?”).
The best trick I know is use the man pager to search for something like ^ *-a
but I never remember to do it and instead just end up going through
every instance of -a in the man page until I find what I’m looking for.
The curl man page has examples for every option, and there’s also a table of contents on the HTML version so you can more easily jump to the option you’re interested in.
For instance the example for --cert makes it easy to see that you likely also want to pass the --key option, like this:
curl --cert certfile --key keyfile https://example.com
The way they implement this is that there’s [one file for each option](https://github.com/curl/curl/blob/dc08922a61efe546b318daf964514ffbf41583 25/docs/cmdline-opts/append.md) and there’s an “Example” field in that file.
Quite a few people said that man ascii was their favourite man page, which looks like this:
Oct Dec Hex Char
───────────────────────────────────────────
000 0 00 NUL '\0' (null character)
001 1 01 SOH (start of heading)
002 2 02 STX (start of text)
003 3 03 ETX (end of text)
004 4 04 EOT (end of transmission)
005 5 05 ENQ (enquiry)
006 6 06 ACK (acknowledge)
007 7 07 BEL '\a' (bell)
010 8 08 BS '\b' (backspace)
011 9 09 HT '\t' (horizontal tab)
012 10 0A LF '\n' (new line)
Obviously man ascii is an unusual man page but I think what’s cool about this man page (other than the fact that it’s always
useful to have an ASCII reference) is it’s very easy to scan to find the
information you need because of the table format. It makes me wonder if there
are more opportunities to display information in a “table” in a man page to make
it easier to scan.
When I talk about man pages it often comes up that the GNU coreutils man pages (for example man tail) don’t have examples, unlike the OpenBSD man pages, which do have examples.
I’m not going to get into this too much because it seems like a fairly political topic and I definitely can’t do it justice here, but here are some things I believe to be true:
info tool. I’ve heard from some Emacs users that they like the Emacs info browser. I don’t think I’ve ever talked to anyone who uses the standalone info tool.After a certain level of complexity a man page gets really hard to navigate: while I’ve never used the coreutils info manual and probably won’t, I would almost certainly prefer to use the GNU Bash reference manual or the The GNU C Library Reference Manual via their HTML documentation rather than through a man page.
Here are some tools I think are interesting:
tldr grep. Lots of people have told me they find it useful.
Man pages are such a constrained format and it’s fun to think about what you can do with such limited formatting options.
Even though I’m very into writing I’ve always had a bad habit of never reading documentation and so it’s a little bit hard for me to think about what I actually find useful in man pages, I’m not sure whether I think most of the things in this post would improve my experience or not. (Except for examples, I LOVE examples)
So I’d be interested to hear about other man pages that you think are well designed and what you like about them, the comments section is here.
]]>I’ve thought it would be cool to learn a popular web framework like Rails or Django or Laravel for a long time, but I’d never really managed to make it happen. But I started learning Django to make a website a few months back, I’ve been liking it so far, and here are a few quick notes!
I spent some time trying to learn Rails in 2020,
and while it was cool and I really wanted to like Rails (the Ruby community is great!),
I found that if I left my Rails project alone for months, when I came
back to it it was hard for me to remember how to get anything done because
(for example) if it says resources :topics in your routes.rb, on its own
that doesn’t tell you where the topics routes are configured, you need to
remember or look up the convention.
Being able to abandon a project for months or years and then come back to it is really important to me (that’s how all my projects work!), and Django feels easier to me because things are more explicit.
In my small Django project it feels like I just have 5 main files (other
than the settings files): urls.py, models.py, views.py, admin.py, and
tests.py, and if I want to know where something else is (like an HTML template)
is then it’s usually explicitly referenced from one of those files.
For this project I wanted to have an admin interface to manually edit or view some of the data in the database. Django has a really nice built-in admin interface, and I can customize it with just a little bit of code.
For example, here’s part of one of my admin classes, which sets up which fields to display in the “list” view, which field to search on, and how to order them by default.
@admin.register(Zine)
class ZineAdmin(admin.ModelAdmin):
list_display = ["name", "publication_date", "free", "slug", "image_preview"]
search_fields = ["name", "slug"]
readonly_fields = ["image_preview"]
ordering = ["-publication_date"]
In the past my attitude has been “ORMs? Who needs them? I can just write my own SQL queries!”.
I’ve been enjoying Django’s ORM so far though, and I think it’s cool how Django
uses __ to represent a JOIN, like this:
Zine.objects
.exclude(product__order__email_hash=email_hash)
This query involves 5 tables: zines, zine_products, products, order_products, and orders.
To make this work I just had to tell Django that there’s a ManyToManyField
relating “orders” and “products”, and another ManyToManyField relating
“zines”, and “products”, so that it knows how to connect zines, orders, products.
I definitely could write that query, but writing product__order__email_hash is
a lot less typing, it feels a lot easier to read, and honestly I think it would
take me a little while to figure out how to construct the query
(which needs to do a few other things than just those joins).
I have zero concern about the performance of my ORM-generated queries so I’m pretty excited about ORMs for now, though I’m sure I’ll find things to be frustrated with eventually.
The other great thing about the ORM is migrations!
If I add, delete, or change a field in models.py, Django will automatically
generate a migration script like migrations/0006_delete_imageblob.py.
I assume that I could edit those scripts if I wanted, but so far I’ve just been running the generated scripts with no change and it’s been going great. It really feels like magic.
I’m realizing that being able to do migrations easily is important for me right now because I’m changing my data model fairly often as I figure out how I want it to work.
I had a bad habit of never reading the documentation but I’ve been really enjoying the parts of Django’s docs that I’ve read so far. This isn’t by accident: Jacob Kaplan-Moss has a talk from PyCon 2011 on Django’s documentation culture.
For example the intro to models lists the most important common fields you might want to set when using the ORM.
After having a bad experience trying to operate Postgres and not being able to
understand what was going on, I decided to run all of my small websites with
SQLite instead. It’s been going way better, and I love being able to backup by
just doing a VACUUM INTO and then copying the resulting single file.
I’ve been following these instructions for using SQLite with Django in production.
I think it should be fine because I’m expecting the site to have a few hundred writes per day at most, much less than Mess with DNS which has a lot more of writes and has been working well (though the writes are split across 3 different SQLite databases).
Django seems to be very “batteries-included”, which I love – if I want CSRF
protection, or a Content-Security-Policy, or I want to send email, it’s all
in there!
For example, I wanted to save the emails Django sends to a file in dev mode (so that it didn’t send real email to real people), which was just a little bit of configuration.
I just put this settings/dev.py:
EMAIL_BACKEND = "django.core.mail.backends.filebased.EmailBackend"
EMAIL_FILE_PATH = BASE_DIR / "emails"
and then set up the production email like this in settings/production.py
EMAIL_BACKEND = "django.core.mail.backends.smtp.EmailBackend"
EMAIL_HOST = "smtp.whatever.com"
EMAIL_PORT = 587
EMAIL_USE_TLS = True
EMAIL_HOST_USER = "xxxx"
EMAIL_HOST_PASSWORD = os.getenv('EMAIL_API_KEY')
That made me feel like if I want some other basic website feature, there’s likely to be an easy way to do it built into Django already.
I’m still a bit intimidated by the settings.py file: Django’s settings system
works by setting a bunch of global variables in a file, and I feel a bit
stressed about… what if I make a typo in the name of one of those variables?
How will I know? What if I type WSGI_APPLICATOIN = "config.wsgi.application"
instead of WSGI_APPLICATION?
I guess I’ve gotten used to having a Python language server tell me when I’ve made a typo and so now it feels a bit disorienting when I can’t rely on the language server support.
I haven’t really successfully used an actual web framework for a project before (right now almost all of my websites are either a single Go binary or static sites), so I’m interested in seeing how it goes!
There’s still lots for me to learn about, I still haven’t really gotten into Django’s form validation tooling or authentication systems.
Thanks to Marco Rogers for convincing me to give ORMs a chance.
(we’re still experimenting with the comments-on-Mastodon system! Here are the comments on Mastodon! tell me your favourite Django feature!)
]]>So Marie and I made a few changes to the Git documentation!
After a while working on the documentation, we noticed that Git uses the terms “object”, “reference”, or “index” in its documentation a lot, but that it didn’t have a great explanation of what those terms mean or how they relate to other core concepts like “commit” and “branch”. So we wrote a new “data model” document!
You can read the data model here for now. I assume at some point (after the next release?) it’ll also be on the Git website.
I’m excited about this because understanding how Git organizes its commit and branch data has really helped me reason about how Git works over the years, and I think it’s important to have a short (1600 words!) version of the data model that’s accurate.
The “accurate” part turned out to not be that easy: I knew the basics of how Git’s data model worked, but during the review process I learned some new details and had to make quite a few changes (for example how merge conflicts are stored in the staging area).
git push, git pull, and moreI also worked on updating the introduction to some of Git’s core man pages. I quickly realized that “just try to improve it according to my best judgement” was not going to work: why should the maintainers believe me that my version is better?
I’ve seen a problem a lot when discussing open source documentation changes where 2 expert users of the software argue about whether an explanation is clear or not (“I think X would be a good way to explain it! Well, I think Y would be better!”)
I don’t think this is very productive (expert users of a piece of software are notoriously bad at being able to tell if an explanation will be clear to non-experts), so I needed to find a way to identify problems with the man pages that was a little more evidence-based.
I asked for test readers on Mastodon to read the current version of documentation and tell me what they find confusing or what questions they have. About 80 test readers left comments, and I learned so much!
People left a huge amount of great feedback, for example:
Most of the test readers had been using Git for at least 5-10 years, which I think worked well – if a group of test readers who have been using Git regularly for 5+ years find a sentence or term impossible to understand, it makes it easy to argue that the documentation should be updated to make it clearer.
I thought this “get users of the software to comment on the existing documentation and then fix the problems they find” pattern worked really well and I’m excited about potentially trying it again in the future.
We ended updating these 4 man pages:
git add (before, after)git checkout (before, after)git push (before, after)git pull (before, after)The git push and git pull changes were the most interesting to me: in
addition to updating the intro to those pages, we also ended up writing:
Making those changes really gave me an appreciation for how much work it is
to maintain open source documentation: it’s not easy to write things that are
both clear and true, and sometimes we had to make compromises, for example the sentence
“git push may fail if you haven’t set an upstream for the current branch,
depending on what push.default is set to.” is a little vague, but the exact
details of what “depending” means are really complicated and untangling that is
a big project.
It took me a while to understand Git’s development process. I’m not going to try to describe it here (that could be a whole other post!), but a few quick notes:
I also found the mailing list archives on lore.kernel.org hard to navigate, so I hacked together my own git list viewer to make it easier to read the long mailing list threads.
Many people helped me navigate the contribution process and review the changes: thanks to Emily Shaffer, Johannes Schindelin (the author of GitGitGadget), Patrick Steinhardt, Ben Knoble, Junio Hamano, and more.
(I’m experimenting with comments on Mastodon, you can see the comments here)
]]>I’ve been using it for 3 months now and here are a few notes.
I think what motivated me to try Helix is that I’ve been trying to get a working language server setup (so I can do things like “go to definition”) and getting a setup that feels good in Vim or Neovim just felt like too much work.
After using Vim/Neovim for 20 years, I’ve tried both “build my own custom configuration from scratch” and “use someone else’s pre-buld configuration system” and even though I love Vim I was excited about having things just work without having to work on my configuration at all.
Helix comes with built in language server support, and it feels nice to be able to do things like “rename this symbol” in any language.
One of my favourite things about Helix is the search! If I’m searching all the files in my repository for a string, it lets me scroll through the potential matching files and see the full context of the match, like this:
For comparison, here’s what the vim ripgrep plugin I’ve been using looks like:
There’s no context for what else is around that line.
One thing I like about Helix is that when I press g, I get a little help popup
telling me places I can go. I really appreciate this because I don’t often use
the “go to definition” or “go to reference” feature and I often forget the
keyboard shortcut.
ma, 'a, instead I’ve been using Ctrl+O and
Ctrl+I to go back (or forward) to the last cursor location% (to highlight everything), then s
to select (with a regex) the things I want to change, then I can just edit
all of them as needed.<space>b)
I can use to switch to the buffer I want. There’s a
pull request here to implement neovim-style tabs.
There’s also a setting bufferline="multiple" which can act a bit like tabs
with gp, gn for prev/next “tab” and :bc to close a “tab”.Here’s everything that’s annoyed me about Helix so far.
:reflow works much less than how
vim reflows text with gq. It doesn’t work as well with lists. (github issue):reload-all (:ra<tab>) to manually reload them. Not a big deal.The “markdown list” and reflowing issues come up a lot for me because I spend a lot of time editing Markdown lists, but I keep using Helix anyway so I guess they can’t be making me that mad.
I was worried that relearning 20 years of Vim muscle memory would be really hard.
It turned out to be easier than I expected, I started using Helix on a vacation for a little low-stakes coding project I was doing on the side and after a week or two it didn’t feel so disorienting anymore. I think it might be hard to switch back and forth between Vim and Helix, but I haven’t needed to use Vim recently so I don’t know if that’ll ever become an issue for me.
The first time I tried Helix I tried to force it to use keybindings that were more similar to Vim and that did not work for me. Just learning the “Helix way” was a lot easier.
There are still some things that throw me off: for example w in vim and w in
Helix don’t have the same idea of what a “word” is (the Helix one includes the
space after the word, the Vim one doesn’t).
For many years I’d mostly been using a GUI version of vim/neovim, so switching to actually using an editor in the terminal was a bit of an adjustment.
I ended up deciding on:
It works pretty well, I might actually like it better than my previous workflow.
I appreciate that my configuration is really simple, compared to my neovim configuration which is hundreds of lines. It’s mostly just 4 keyboard shortcuts.
theme = "solarized_light"
[editor]
# Sync clipboard with system clipboard
default-yank-register = "+"
[keys.normal]
# I didn't like that Ctrl+C was the default "toggle comments" shortcut
"#" = "toggle_comments"
# I didn't feel like learning a different way
# to go to the beginning/end of a line so
# I remapped ^ and $
"^" = "goto_first_nonwhitespace"
"$" = "goto_line_end"
[keys.select]
"^" = "goto_first_nonwhitespace"
"$" = "goto_line_end"
[keys.normal.space]
# I write a lot of text so I need to constantly reflow,
# and missed vim's `gq` shortcut
l = ":reflow"
There’s a separate languages.toml configuration where I set some language
preferences, like turning off autoformatting.
For example, here’s my Python configuration:
[[language]]
name = "python"
formatter = { command = "black", args = ["--stdin-filename", "%{buffer_name}", "-"] }
language-servers = ["pyright"]
auto-format = false
Three months is not that long, and it’s possible that I’ll decide to go back to Vim at some point. For example, I wrote a post about switching to nix a while back but after maybe 8 months I switched back to Homebrew (though I’m still using NixOS to manage one little server, and I’m still satisfied with that).
]]>You can get it for $12 here: https://wizardzines.com/zines/terminal, or get an 15-pack of all my zines here.
Here’s the cover:
Here’s the table of contents:
I’ve been using the terminal every day for 20 years but even though I’m very confident in the terminal, I’ve always had a bit of an uneasy feeling about it. Usually things work fine, but sometimes something goes wrong and it just feels like investigating it is impossible, or at least like it would open up a huge can of worms.
So I started trying to write down a list of weird problems I’ve run into in terminal and I realized that the terminal has a lot of tiny inconsistencies like:
^[[DIf you use the terminal daily for 10 or 20 years, even if you don’t understand exactly why these things happen, you’ll probably build an intuition for them.
But having an intuition for them isn’t the same as understanding why they happen. When writing this zine I actually had to do a lot of work to figure out exactly what was happening in the terminal to be able to talk about how to reason about it.
It turns out that the “rules” for how the terminal works (how do
you edit a command you type in? how do you quit a program? how do you fix your
colours?) are extremely hard to fully understand, because “the terminal” is actually
made of many different pieces of software (your terminal emulator, your
operating system, your shell, the core utilities like grep, and every other random
terminal program you’ve installed) which are written by different people with different
ideas about how things should work.
So I wanted to write something that would explain:
Terminal internals are a mess. A lot of it is just the way it is because someone made a decision in the 80s and now it’s impossible to change, and honestly I don’t think learning everything about terminal internals is worth it.
But some parts are not that hard to understand and can really make your experience in the terminal better, like:
cating a binary to stdout messes up your terminal, you can just type reset and move onWhen I wrote How Git Works, I thought I
knew how Git worked, and I was right. But the terminal is different. Even
though I feel totally confident in the terminal and even though I’ve used it
every day for 20 years, I had a lot of misunderstandings about how the terminal
works and (unless you’re the author of tmux or something) I think there’s a
good chance you do too.
A few things I learned that are actually useful to me:
reset works under the hood (it does the equivalent of stty sane; sleep 1; tput reset) – basically I learned that I don’t ever need to worry about
remembering stty sane or tput reset and I can just run reset insteadunbuffer program > out; less out)sqlite3 are so annoying to use (they use libedit instead of readline)As usual these days I wrote a bunch of blog posts about various side quests:
terminfo database is serving us well todayA long time ago I used to write zines mostly by myself but with every project I get more and more help. I met with Marie Claire LeBlanc Flanagan every weekday from September to June to work on this one.
The cover is by Vladimir Kašiković, Lesley Trites did copy editing, Simon Tatham (who wrote PuTTY) did technical review, our Operations Manager Lee did the transcription as well as a million other things, and Jesse Luehrs (who is one of the very few people I know who actually understands the terminal’s cursed inner workings) had so many incredibly helpful conversations with me about what is going on in the terminal.
Here are some links to get the zine again:
As always, you can get either a PDF version to print at home or a print version shipped to your house. The only caveat is print orders will ship in August – I need to wait for orders to come in to get an idea of how many I should print before sending it to the printer.
]]>make, if
it doesn’t work, either try to find a binary someone has compiled or give up”.
“Hope someone else has compiled it” worked pretty well when I was running Linux but since I’ve been using a Mac for the last couple of years I’ve been running into more situations where I have to actually compile programs myself.
So let’s talk about what you might have to do to compile a C program! I’ll use a couple of examples of specific C programs I’ve compiled and talk about a few things that can go wrong. Here are three programs we’ll be talking about compiling:
This is pretty simple: on an Ubuntu system if I don’t already have a C compiler I’ll install one with:
sudo apt-get install build-essential
This installs gcc, g++, and make. The situation on a Mac is more
confusing but it’s something like “install xcode command line tools”.
Unlike some newer programming languages, C doesn’t have a dependency manager. So if a program has any dependencies, you need to hunt them down yourself. Thankfully because of this, C programmers usually keep their dependencies very minimal and often the dependencies will be available in whatever package manager you’re using.
There’s almost always a section explaining how to get the dependencies in the README, for example in paperjam’s README, it says:
To compile PaperJam, you need the headers for the libqpdf and libpaper libraries (usually available as libqpdf-dev and libpaper-dev packages).
You may need
a2x(found in AsciiDoc) for building manual pages.
So on a Debian-based system you can install the dependencies like this.
sudo apt install -y libqpdf-dev libpaper-dev
If a README gives a name for a package (like libqpdf-dev), I’d basically
always assume that they mean “in a Debian-based Linux distro”: if you’re on a
Mac brew install libqpdf-dev will not work. I still have not 100% gotten
the hang of developing on a Mac yet so I don’t have many tips there yet. I
guess in this case it would be brew install qpdf if you’re using Homebrew.
./configure (if needed)Some C programs come with a Makefile and some instead come with a script called
./configure. For example, if you download sqlite’s source code, it has a ./configure script in
it instead of a Makefile.
My understanding of this ./configure script is:
Makefile or fails because you’re missing some
dependency./configure script is part of a system called
autotools
that I have never needed to learn anything about beyond “run it to generate
a Makefile”.I think there might be some options you can pass to get the ./configure
script to produce a different Makefile but I have never done that.
makeThe next step is to run make to try to build a program. Some notes about
make:
make -j8 to parallelize the build and make it go
fasterHere’s an error I got while compiling paperjam on my Mac:
/opt/homebrew/Cellar/qpdf/12.0.0/include/qpdf/InputSource.hh:85:19: error: function definition does not declare parameters
85 | qpdf_offset_t last_offset{0};
| ^
Over the years I’ve learned it’s usually best not to overthink problems like
this: if it’s talking about qpdf, there’s a good change it just means that
I’ve done something wrong with how I’m including the qpdf dependency.
Now let’s talk about some ways to get the qpdf dependency included in the right way.
Before we talk about how to fix dependency problems: building C programs is split into 2 steps:
gcc or clang)ld)It’s important to know this when building a C program because sometimes you need to pass the right flags to the compiler and linker to tell them where to find the dependencies for the program you’re compiling.
make uses environment variables to configure the compiler and linkerIf I run make on my Mac to install paperjam, I get this error:
c++ -o paperjam paperjam.o pdf-tools.o parse.o cmds.o pdf.o -lqpdf -lpaper
ld: library 'qpdf' not found
This is not because qpdf is not installed on my system (it actually is!). But
the compiler and linker don’t know how to find the qpdf library. To fix this, we need to:
"-I/opt/homebrew/include" to the compiler (to tell it where to find the header files)"-L/opt/homebrew/lib -liconv" to the linker (to tell it where to find library files and to link in iconv)And we can get make to pass those extra parameters to the compiler and linker using environment variables!
To see how this works: inside paperjam’s Makefile you can see a bunch of environment variables, like LDLIBS here:
paperjam: $(OBJS)
$(LD) -o $@ $^ $(LDLIBS)
Everything you put into the LDLIBS environment variable gets passed to the
linker (ld) as a command line argument.
CPPFLAGSMakefiles sometimes define their own environment variables that they pass to
the compiler/linker, but make also has a bunch of “implicit” environment
variables which it will automatically pass to the C compiler and linker. There’s a full list of implicit environment variables here,
but one of them is CPPFLAGS, which gets automatically passed to the C compiler.
(technically it would be more normal to use CXXFLAGS for this, but this
particular Makefile hardcodes CXXFLAGS so setting CPPFLAGS was the only
way I could find to set the compiler flags without editing the Makefile)
makeI learned thanks to @zwol that there are actually two ways to pass environment variables to make:
CXXFLAGS=xyz make (the usual way)make CXXFLAGS=xyzThe difference between them is that make CXXFLAGS=xyz will override the
value of CXXFLAGS set in the Makefile but CXXFLAGS=xyz make won’t.
I’m not sure which way is the norm but I’m going to use the first way in this post.
CPPFLAGS and LDLIBS to fix this compiler errorNow that we’ve talked about how CPPFLAGS and LDLIBS get passed to the
compiler and linker, here’s the final incantation that I used to get the
program to build successfully!
CPPFLAGS="-I/opt/homebrew/include" LDLIBS="-L/opt/homebrew/lib -liconv" make paperjam
This passes -I/opt/homebrew/include to the compiler and -L/opt/homebrew/lib -liconv to the linker.
Also I don’t want to pretend that I “magically” knew that those were the right arguments to pass, figuring them out involved a bunch of confused Googling that I skipped over in this post. I will say that:
-I compiler flag tells the compiler which directory to find header files in, like /opt/homebrew/include/qpdf/QPDF.hh-L linker flag tells the linker which directory to find libraries in, like /opt/homebrew/lib/libqpdf.a-l linker flag tells the linker which libraries to link in, like -liconv means “link in the iconv library”, or -lm means “link math”make $FILENAMEYesterday I discovered this cool tool called
qf which you can use to quickly
open files from the output of ripgrep.
qf is in a big directory of various tools, but I only wanted to compile qf.
So I just compiled qf, like this:
make qf
Basically if you know (or can guess) the output filename of the file you’re
trying to build, you can tell make to just build that file by running make $FILENAME
I sometimes write 5-line C programs with no dependencies, and I just learned
that if I have a file called blah.c, I can just compile it like this without creating a Makefile:
make blah
It gets automaticaly expanded to cc -o blah blah.c, which saves a bit of
typing. I have no idea if I’m going to remember this (I might just keep typing
gcc -o blah blah.c anyway) but it seems like a fun trick.
If you’re having trouble building a C program, maybe other people had problems building it too! Every Linux distribution has build files for every package that they build, so even if you can’t install packages from that distribution directly, maybe you can get tips from that Linux distro for how to build the package. Realizing this (thanks to my friend Dave) was a huge ah-ha moment for me.
For example, this line from the nix package for paperjam says:
env.NIX_LDFLAGS = lib.optionalString stdenv.hostPlatform.isDarwin "-liconv";
This is basically saying “pass the linker flag -liconv to build this on a
Mac”, so that’s a clue we could use to build it.
That same file also says env.NIX_CFLAGS_COMPILE = "-DPOINTERHOLDER_TRANSITION=1";. I’m not sure what this means, but when I try
to build the paperjam package I do get an error about something called a
PointerHolder, so I guess that’s somehow related to the “PointerHolder
transition”.
Once you’ve managed to compile the program, probably you want to install it somewhere!
Some Makefiles have an install target that let you install the tool on your
system with make install. I’m always a bit scared of this (where is it going
to put the files? what if I want to uninstall them later?), so if I’m compiling
a pretty simple program I’ll often just manually copy the binary to install it
instead, like this:
cp qf ~/bin
Once I figured out how to do all of this, I realized that I could use my new
make knowledge to contribute a paperjam package to Homebrew! Then I could
just brew install paperjam on future systems.
The good thing is that even if the details of how all of the different packaging systems, they fundamentally all use C compilers and linkers.
I think all of this is an interesting example of how it can useful to understand some basics of how C programs work (like “they have header files”) even if you’re never planning to write a nontrivial C program if your life.
It feels good to have some ability to compile C/C++ programs myself, even
though I’m still not totally confident about all of the compiler and linker
flags and I still plan to never learn anything about how autotools works other
than “you run ./configure to generate the Makefile”.
Two things I left out of this post:
LD_LIBRARY_PATH / DYLD_LIBRARY_PATH (which you use to tell the dynamic
linker at runtime where to find dynamically linked files) because I can’t
remember the last time I ran into an LD_LIBRARY_PATH issue and couldn’t
find an example.pkg-config, which I think is important but I don’t understand yetFor a long time I was vaguely aware of ANSI escape codes (“that’s how you make text red in the terminal and stuff”) but I had no real understanding of where they were supposed to be defined or whether or not there were standards for them. I just had a kind of vague “there be dragons” feeling around them. While learning about the terminal this year, I’ve learned that:
So I wanted to put together a list for myself of some standards that exist around escape codes, because I want to know if they have to feel unreliable and frustrating, or if there’s a future where we could all rely on them with more confidence.
Have you ever pressed the left arrow key in your terminal and seen ^[[D?
That’s an escape code! It’s called an “escape code” because the first character
is the “escape” character, which is usually written as ESC, \x1b, \E,
\033, or ^[.
Escape codes are how your terminal emulator communicates various kinds of information (colours, mouse movement, etc) with programs running in the terminal. There are two kind of escape codes:
ESC[D, “Ctrl+left arrow” might be ESC[1;5D, and clicking the mouse might
be something like ESC[M :3.Now let’s talk about standards!
The first standard I found relating to escape codes was ECMA-48, which was originally published in 1976.
ECMA-48 does two things:
ESC[ + something and “OSC” codes, which are ESC] + something)ESC[D, or “turn text red” is ESC[31m. In the spec, the “cursor left”
one is called CURSOR LEFT and the one for changing colours is called
SELECT GRAPHIC RENDITION.The formats are extensible, so there’s room for others to define more escape codes in the future. Lots of escape codes that are popular today aren’t defined in ECMA-48: for example it’s pretty common for terminal applications (like vim, htop, or tmux) to support using the mouse, but ECMA-48 doesn’t define escape codes for the mouse.
There are a bunch of escape codes that aren’t defined in ECMA-48, for example:
I believe (correct me if I’m wrong!) that these and some others came from xterm, are documented in XTerm Control Sequences, and have been widely implemented by other terminal emulators.
This list of “what xterm supports” is not a standard exactly, but xterm is extremely influential and so it seems like an important document.
In the 80s (and to some extent today, but my understanding is that it was MUCH more dramatic in the 80s) there was a huge amount of variation in what escape codes terminals actually supported.
To deal with this, there’s a database of escape codes for various terminals called “terminfo”.
It looks like the standard for terminfo is called X/Open Curses, though you need to create an account to view that standard for some reason. It defines the database format as well as a C library interface (“curses”) for accessing the database.
For example you can run this bash snippet to see every possible escape code for “clear screen” for all of the different terminals your system knows about:
for term in $(toe -a | awk '{print $1}')
do
echo $term
infocmp -1 -T "$term" 2>/dev/null | grep 'clear=' | sed 's/clear=//g;s/,//g'
done
On my system (and probably every system I’ve ever used?), the terminfo database is managed by ncurses.
I think it’s interesting that there are two main approaches that applications take to handling ANSI escape codes:
TERM environment variable. Fish does this, for example.Some examples of programs/libraries that take approach #2 (“don’t use terminfo”) include:
I got curious about why folks might be moving away from terminfo and I found this very interesting and extremely detailed rant about terminfo from one of the fish maintainers, which argues that:
[the terminfo authors] have done a lot of work that, at the time, was extremely important and helpful. My point is that it no longer is.
I’m not going to do it justice so I’m not going to summarize it, I think it’s worth reading.
I was just talking about the idea that you can use a “common set” of escape codes that will work for most people. But what is that set? Is there any agreement?
I really do not know the answer to this at all, but from doing some reading it seems like it’s some combination of:
and maybe ultimately “identify the terminal emulators you think your users are going to use most frequently and test in those”, the same way web developers do when deciding which CSS features are okay to use
I don’t think there are any resources like Can I use…? or Baseline for the terminal though. (in theory terminfo is supposed to be the “caniuse” for the terminal but it seems like it often takes 10+ years to add new terminal features when people invent them which makes it very limited)
I also asked on Mastodon why people found terminfo valuable in 2025 and got a few reasons that made sense to me:
TERM environment variable to
control how programs behave (for example with TERM=dumb), and there’s
no standard for how that should work in a post-terminfo worldThe way that ncurses uses the TERM environment variable to decide which
escape codes to use reminds me of how webservers used to sometimes use the
browser user agent to decide which version of a website to serve.
It also seems like it’s had some of the same results – the way iTerm2 reports itself as being “xterm-256color” feels similar to how Safari’s user agent is “Mozilla/5.0 (Macintosh; Intel Mac OS X 14_7_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/18.3 Safari/605.1.15”. In both cases the terminal emulator / browser ends up changing its user agent to get around user agent detection that isn’t working well.
On the web we ended up deciding that user agent detection was not a good practice and to instead focus on standardization so we can serve the same HTML/CSS to all browsers. I don’t know if the same approach is the future in the terminal though – I think the terminal landscape today is much more fragmented than the web ever was as well as being much less well funded.
A few more documents and standards related to escape codes, in no particular order:
I sometimes see people saying that the unix terminal is “outdated”, and since I love the terminal so much I’m always curious about what incremental changes might make it feel less “outdated”.
Maybe if we had a clearer standards landscape (like we do on the web!) it would be easier for terminal emulator developers to build new features and for authors of terminal applications to more confidently adopt those features so that we can all benefit from them and have a richer experience in the terminal.
Obviously standardizing ANSI escape codes is not easy (ECMA-48 was first published almost 50 years ago and we’re still not there!). I don’t even know what all of the challenges are. But the situation with HTML/CSS/JS used to be extremely bad too and now it’s MUCH better, so maybe there’s hope.
]]>~/.bashrc”, but what if you’re not using bash? What if your
bash config is actually in a different file? And how are you supposed to figure
out which directory to add anyway?
So I wanted to try to write down some more complete directions and mention some of the gotchas I’ve run into over the years.
Here’s a table of contents:
If you’re not sure what shell you’re using, here’s a way to find out. Run this:
ps -p $$ -o pid,comm=
97295 bash97295 zsh$$ isn’t valid syntax in fish, but in any case the error
message tells you that you’re using fish, which you probably already knew)Also bash is the default on Linux and zsh is the default on Mac OS (as of 2024). I’ll only cover bash, zsh, and fish in these directions.
~/.zshrc~/.bashrc, but it’s complicated, see the note in the next section~/.config/fish/config.fish (you can run echo $__fish_config_dir if you want to be 100% sure)Bash has three possible config files: ~/.bashrc, ~/.bash_profile, and ~/.profile.
If you’re not sure which one your system is set up to use, I’d recommend testing this way:
echo hi there to your ~/.bashrc~/.bashrc is being used! Hooray!~/.bash_profile~/.profile if the first two options don’t work.(there are a lot of elaborate flow charts out there that explain how bash decides which config file to use but IMO it’s not worth it to internalize them and just testing is the fastest way to be sure)
Let’s say that you’re trying to install and run a program called http-server
and it doesn’t work, like this:
$ npm install -g http-server
$ http-server
bash: http-server: command not found
How do you find what directory http-server is in? Honestly in general this is
not that easy – often the answer is something like “it depends on how npm is
configured”. A few ideas:
cargo, npm, homebrew, etc),
when you first set it up it’ll print out some directions about how to update
your PATH. So if you’re paying attention you can get the directions then.PATH for younpm config get prefix (then append /bin/)go env GOPATH (then append /bin/)asdf info | grep ASDF_DIR (then append /bin/ and /shims/)Once you’ve found a directory you think might be the right one, make sure it’s
actually correct! For example, I found out that on my machine, http-server is
in ~/.npm-global/bin. I can make sure that it’s the right directory by trying to
run the program http-server in that directory like this:
$ ~/.npm-global/bin/http-server
Starting up http-server, serving ./public
It worked! Now that you know what directory you need to add to your PATH,
let’s move to the next step!
Now we have the 2 critical pieces of information we need:
~/.npm-global/bin/)~/.bashrc, ~/.zshrc, or ~/.config/fish/config.fish)Now what you need to add depends on your shell:
bash instructions:
Open your shell’s config file, and add a line like this:
export PATH=$PATH:~/.npm-global/bin/
(obviously replace ~/.npm-global/bin with the actual directory you’re trying to add)
zsh instructions:
You can do the same thing as in bash, but zsh also has some slightly fancier syntax you can use if you prefer:
path=(
$path
~/.npm-global/bin
)
fish instructions:
In fish, the syntax is different:
set PATH $PATH ~/.npm-global/bin
(in fish you can also use fish_add_path, some notes on that further down)
Now, an extremely important step: updating your shell’s config won’t take effect if you don’t restart it!
Two ways to do this:
bash to start a new shell (or zsh if you’re using zsh, or fish if you’re using fish)I’ve found that both of these usually work fine.
And you should be done! Try running the program you were trying to run and hopefully it works now.
If not, here are a couple of problems that you might run into:
If the wrong version of a program is running, you might need to add the directory to the beginning of your PATH instead of the end.
For example, on my system I have two versions of python3 installed, which I
can see by running which -a:
$ which -a python3
/usr/bin/python3
/opt/homebrew/bin/python3
The one your shell will use is the first one listed.
If you want to use the Homebrew version, you need to add that directory
(/opt/homebrew/bin) to the beginning of your PATH instead, by putting this in
your shell’s config file (it’s /opt/homebrew/bin/:$PATH instead of the usual $PATH:/opt/homebrew/bin/)
export PATH=/opt/homebrew/bin/:$PATH
or in fish:
set PATH ~/.cargo/bin $PATH
All of these directions only work if you’re running the program from your shell. If you’re running the program from an IDE, from a GUI, in a cron job, or some other way, you’ll need to add the directory to your PATH in a different way, and the exact details might depend on the situation.
in a cron job
Some options:
/home/bork/bin/my-programecho "PATH=$PATH".I’m honestly not sure how to handle it in an IDE/GUI because I haven’t run into that in a long time, will add directions here if someone points me in the right direction.
PATH entries making it harder to debugIf you edit your path and start a new shell by running bash (or zsh, or
fish), you’ll often end up with duplicate PATH entries, because the shell
keeps adding new things to your PATH every time you start your shell.
Personally I don’t think I’ve run into a situation where this kind of
duplication breaks anything, but the duplicates can make it harder to debug
what’s going on with your PATH if you’re trying to understand its contents.
Some ways you could deal with this:
PATH, open a new terminal to do it in so you get
a “fresh” state. This should avoid the duplication.PATH at the end of your shell’s config (for example in
zsh apparently you can do this with typeset -U path)PATH when adding it (for
example in fish I believe you can do this with fish_add_path --path /some/directory)How to deduplicate your PATH is shell-specific and there isn’t always a
built in way to do it so you’ll need to look up how to accomplish it in your
shell.
PATHHere’s a situation that’s easy to get into in bash or zsh:
PATHbash to reload your configThis happens because in bash, by default, history is not saved until you exit the shell.
Some options for fixing this:
bash to reload your config, run source ~/.bashrc (or
source ~/.zshrc in zsh). This will reload the config inside your current
session.sourceWhen you install cargo (Rust’s installer) for the first time, it gives you
these instructions for how to set up your PATH, which don’t mention a specific
directory at all.
This is usually done by running one of the following (note the leading DOT):
. "$HOME/.cargo/env" # For sh/bash/zsh/ash/dash/pdksh
source "$HOME/.cargo/env.fish" # For fish
The idea is that you add that line to your shell’s config, and their script
automatically sets up your PATH (and potentially other things) for you.
This is pretty common (for example Homebrew suggests you eval brew shellenv), and there are
two ways to approach this:
. "$HOME/.cargo/env" to your shell’s config). "$HOME/.cargo/env" in my shell (or the fish version if using fish)echo "$PATH" | tr ':' '\n' | grep cargo to figure out which directories it added/Users/bork/.cargo/bin and shorten that to ~/.cargo/bin~/.cargo/bin to PATH (with the directions in this post)I don’t think there’s anything wrong with doing what the tool suggests (it might be the “best way”!), but personally I usually use the second approach because I prefer knowing exactly what configuration I’m changing.
fish_add_pathfish has a handy function called fish_add_path that you can run to add a directory to your PATH like this:
fish_add_path /some/directory
This is cool (it’s such a simple command!) but I’ve stopped using it for a couple of reasons:
fish_add_path will update the PATH for every session in the
future (with a “universal variable”) and sometimes it will update the PATH
just for the current session and it’s hard for me to tell which one it will
do. In theory the docs explain this but I could not understand them.PATH a few weeks or
months later because maybe you made a mistake, it’s kind of hard to do
(there are instructions in this comments of this github issue though).Hopefully this will help some people. Let me know (on Mastodon or Bluesky) if you there are other major gotchas that have tripped you up when adding a directory to your PATH, or if you have questions about this post!
]]>What’s the most frustrating thing about using the terminal for you?
1600 people answered, and I decided to spend a few days categorizing all the responses. Along the way I learned that classifying qualitative data is not easy but I gave it my best shot. I ended up building a custom tool to make it faster to categorize everything.
As with all of my surveys the methodology isn’t particularly scientific. I just posted the survey to Mastodon and Twitter, ran it for a couple of days, and got answers from whoever happened to see it and felt like responding.
Here are the top categories of frustrations!
I think it’s worth keeping in mind while reading these comments that
These comments aren’t coming from total beginners.
Here are the categories of frustrations! The number in brackets is the number of people with that frustration. I’m mostly writing this up for myself because I’m trying to write a zine about the terminal and I wanted to get a sense for what people are having trouble with.
People talked about struggles remembering:
One example comment:
There are just so many little “trivia” details to remember for full functionality. Even after all these years I’ll sometimes forget where it’s 2 or 1 for stderr, or forget which is which for
>and>>.
People talked about struggling with switching systems (for example home/work computer or when SSHing) and running into:
as well as differences inside the same system like pagers being not consistent with each other (git diff pagers, other pagers).
One example comment:
I got used to fish and vi mode which are not available when I ssh into servers, containers.
Lots of problems with color, like:
This comment felt relatable to me:
Getting my terminal theme configured in a reasonable way between the terminal emulator and fish (I did this years ago and remember it being tedious and fiddly and now feel like I’m locked into my current theme because it works and I dread touching any of that configuration ever again).
Half of the comments on keyboard shortcuts were about how on Linux/Windows, the keyboard shortcut to copy/paste in the terminal is different from in the rest of the OS.
Some other issues with keyboard shortcuts other than copy/paste:
Ctrl-W in a browser-based terminal and closing the windowCtrl-Shift-, no Super, no Hyper, lots of ctrl- shortcuts aren’t
possible like Ctrl-,)Ctrl+left arrow for something else)Aside from “the keyboard shortcut for copy and paste is different”, there were a lot of OTHER issues with copy and paste, like:
There were lots of comments about this, which all came down to the same basic complaint – it’s hard to discover useful tools or features! This comment kind of summed it all up:
How difficult it is to learn independently. Most of what I know is an assorted collection of stuff I’ve been told by random people over the years.
A lot of comments about it generally having a steep learning curve. A couple of example comments:
After 15 years of using it, I’m not much faster than using it than I was 5 or maybe even 10 years ago.
and
That I know I could make my life easier by learning more about the shortcuts and commands and configuring the terminal but I don’t spend the time because it feels overwhelming.
Some issues with shell history:
One example comment:
It wasted a lot of time until I figured it out and still annoys me that “history” on zsh has such a small buffer; I have to type “history 0” to get any useful length of history.
People talked about:
Here’s a representative comment:
Finding good examples and docs. Man pages often not enough, have to wade through stack overflow
A few issues with scrollback:
One example comment:
When resizing the terminal (in particular: making it narrower) leads to broken rewrapping of the scrollback content because the commands formatted their output based on the terminal window width.
Lots of comments about how the terminal feels hampered by legacy decisions and how users often end up needing to learn implementation details that feel very esoteric. One example comment:
Most of the legacy cruft, it would be great to have a green field implementation of the CLI interface.
Lots of complaints about POSIX shell scripting. There’s a general feeling that shell scripting is difficult but also that switching to a different less standard scripting language (fish, nushell, etc) brings its own problems.
Shell scripting. My tolerance to ditch a shell script and go to a scripting language is pretty low. It’s just too messy and powerful. Screwing up can be costly so I don’t even bother.
Some more issues that were mentioned at least 10 times:
Ctrl-S, cating a binary, etc)There were also 122 answers to the effect of “nothing really” or “only that I can’t do EVERYTHING in the terminal”
One example comment:
Think I’ve found work arounds for most/all frustrations
I’m not going to make a lot of commentary on these results, but here are a couple of categories that feel related to me:
Trying to categorize all these results in a reasonable way really gave me an appreciation for social science researchers’ skills.
]]>There are so many pieces to having a modern terminal experience. I wish it all came out of the box.
My immediate reaction was “oh, getting a modern terminal experience isn’t that hard, you just need to….”, but the more I thought about it, the longer the “you just need to…” list got, and I kept thinking about more and more caveats.
So I thought I would write down some notes about what it means to me personally to have a “modern” terminal experience and what I think can make it hard for people to get there.
Here are a few things that are important to me, with which part of the system is responsible for them:
p in vim to paste (text editor, maybe the OS/terminal emulator too)ls (shell config)Ctrl+left arrow to work (shell or application)less: (terminal emulator and applications)There are a million other terminal conveniences out there and different people value different things, but those are the ones that I would be really unhappy without.
My basic approach is:
fish shell. Mostly don’t configure it, except to:
EDITOR environment variable to my favourite terminal editorls to ls --color=autoneovim, with a configuration that I’ve been very slowly building over the last 9 years or so (the last time I deleted my vim config and started from scratch was 9 years ago)A few things that affect my approach:
What if you want a nice experience, but don’t want to spend a lot of time on configuration? Figuring out how to configure vim in a way that I was satisfied with really did take me like ten years, which is a long time!
My best ideas for how to get a reasonable terminal experience with minimal config are:
fish or zsh with oh-my-zshEDITOR environment variable to your favourite terminal text
editorls to ls --color=autoCtrl-C to copy, Ctrl-V to paste, Ctrl-A to select all) in micro and
they do what you’d expect. I would probably try switching to helix except
that retraining my vim muscle memory seems way too hard. Also helix doesn’t
have a GUI or plugin system yet.Personally I wouldn’t use xterm, rxvt, or Terminal.app as a terminal emulator, because I’ve found in the past that they’re missing core features (like 24-bit colour in Terminal.app’s case) that make the terminal harder to use for me.
I don’t want to pretend that getting a “modern” terminal experience is easier than it is though – I think there are two issues that make it hard. Let’s talk about them!
bash and zsh are by far the two most popular shells, and neither of them provide a default experience that I would be happy using out of the box, for example:
And even though I love fish, the fact that it isn’t POSIX does make it hard for a lot of folks to make the switch.
Of course it’s totally possible to learn how to customize your prompt in bash
or whatever, and it doesn’t even need to be that complicated (in bash I’d
probably start with something like export PS1='[\u@\h \W$(__git_ps1 " (%s)")]\$ ', or maybe use starship).
But each of these “not complicated” things really does add up and it’s
especially tough if you need to keep your config in sync across several
systems.
An extremely popular solution to getting a “modern” shell experience is oh-my-zsh. It seems like a great project and I know a lot of people use it very happily, but I’ve struggled with configuration systems like that in the past – it looks like right now the base oh-my-zsh adds about 3000 lines of config, and often I find that having an extra configuration system makes it harder to debug what’s happening when things go wrong. I personally have a tendency to use the system to add a lot of extra plugins, make my system slow, get frustrated that it’s slow, and then delete it completely and write a new config from scratch.
In the terminal survey I ran recently, the most popular terminal text editors
by far were vim, emacs, and nano.
I think the main options for terminal text editors are:
micro or helix which seem to offer a pretty good out-of-the-box
experience, potentially occasionally run into issues with using a less
mainstream text editorcode as their EDITOR in the terminal.The last issue is that sometimes individual programs that I use are kind of
annoying. For example on my Mac OS machine, /usr/bin/sqlite3 doesn’t support
the Ctrl+Left Arrow keyboard shortcut. Fixing this to get a reasonable
terminal experience in SQLite was a little complicated, I had to:
I find that debugging application-specific issues like this is really not easy and often it doesn’t feel “worth it” – often I’ll end up just dealing with various minor inconveniences because I don’t want to spend hours investigating them. The only reason I was even able to figure this one out at all is that I’ve been spending a huge amount of time thinking about the terminal recently.
A big part of having a “modern” experience using terminal programs is just
using newer terminal programs, for example I can’t be bothered to learn a
keyboard shortcut to sort the columns in top, but in htop I can just click
on a column heading with my mouse to sort it. So I use htop instead! But discovering new more “modern” command line tools isn’t easy (though
I made a list here),
finding ones that I actually like using in practice takes time, and if you’re
SSHed into another machine, they won’t always be there.
Something I find tricky about configuring my terminal to make everything “nice” is that changing one seemingly small thing about my workflow can really affect everything else. For example right now I don’t use tmux. But if I needed to use tmux again (for example because I was doing a lot of work SSHed into another machine), I’d need to think about a few things, like:
and probably more things I haven’t thought of. “Using tmux means that I have to change how I manage my colours” sounds unlikely, but that really did happen to me and I decided “well, I don’t want to change how I manage colours right now, so I guess I’m not using that feature!”.
It’s also hard to remember which features I’m relying on – for example maybe my current terminal does have OSC 52 support and because copying from tmux over SSH has always Just Worked I don’t even realize that that’s something I need, and then it mysteriously stops working when I switch terminals.
Personally even though I think my setup is not that complicated, it’s taken me 20 years to get to this point! Because terminal config changes are so likely to have unexpected and hard-to-understand consequences, I’ve found that if I change a lot of terminal configuration all at once it makes it much harder to understand what went wrong if there’s a problem, which can be really disorienting.
So I usually prefer to make pretty small changes, and accept that changes can
might take me a REALLY long time to get used to. For example I switched from
using ls to eza a year or two ago and
while I like it (because eza -l prints human-readable file sizes by default)
I’m still not quite sure about it. But also sometimes it’s worth it to make a
big change, like I made the switch to fish (from bash) 10 years ago and I’m
very happy I did.
Trying to explain how “easy” it is to configure your terminal really just made me think that it’s kind of hard and that I still sometimes get confused.
I’ve found that there’s never one perfect way to configure things in the terminal that will be compatible with every single other thing. I just need to try stuff, figure out some kind of locally stable state that works for me, and accept that if I start using a new tool it might disrupt the system and I might need to rethink things.
]]>top or vim or cat)The first three (your operating system, shell, and terminal emulator) are all kind of known quantities – if you’re using bash in GNOME Terminal on Linux, you can more or less reason about how how all of those things interact, and some of their behaviour is standardized by POSIX.
But the fourth one (“whatever program you happen to be running”) feels like it could do ANYTHING. How are you supposed to know how a program is going to behave?
This post is kind of long so here’s a quick table of contents:
Ctrl-CqCtrl-D on an empty lineCtrl-W should delete the last word- means stdin/stdoutAs far as I know, there are no real standards for how programs in the terminal should behave – the closest things I know of are:
cp should work but AFAIK it doesn’t have anything to say about how for
example htop should behave.But even though there are no standards, in my experience programs in the terminal behave in a pretty consistent way. So I wanted to write down a list of “rules” that in my experience programs mostly follow.
My goal here isn’t to convince authors of terminal programs that they should follow any of these rules. There are lots of exceptions to these and often there’s a good reason for those exceptions.
But it’s very useful for me to know what behaviour to expect from a random new terminal program that I’m using. Instead of “uh, programs could do literally anything”, it’s “ok, here are the basic rules I expect, and then I can keep a short mental list of exceptions”.
So I’m just writing down what I’ve observed about how programs behave in my 20 years of using the terminal, why I think they behave that way, and some examples of cases where that rule is “broken”.
There are a bunch of common conventions that I think are pretty clearly the program’s responsibility to implement, like:
~/.BLAHrc or ~/.config/BLAH/FILE or /etc/BLAH/ or something--help should print help textBut in this post I’m going to focus on things that it’s not 100% obvious are
the program’s responsibility. For example it feels to me like a “law of nature”
that pressing Ctrl-D should quit a REPL, but programs often
need to explicitly implement support for it – even though cat doesn’t need
to implement Ctrl-D support, ipython does. (more about that in “rule 3” below)
Understanding which things are the program’s responsibility makes it much less surprising when different programs’ implementations are slightly different.
Ctrl-CThe main reason for this rule is that noninteractive programs will quit by
default on Ctrl-C if they don’t set up a SIGINT signal handler, so this is
kind of a “you should act like the default” rule.
Something that trips a lot of people up is that this doesn’t apply to
interactive programs like python3 or bc or less. This is because in
an interactive program, Ctrl-C has a different job – if the program is
running an operation (like for example a search in less or some Python code
in python3), then Ctrl-C will interrupt that operation but not stop the
program.
As an example of how this works in an interactive program: here’s the code in prompt-toolkit (the library that iPython uses for handling input)
that aborts a search when you press Ctrl-C.
qTUI programs (like less or htop) will usually quit when you press q.
This rule doesn’t apply to any program where pressing q to quit wouldn’t make
sense, like tmux or text editors.
Ctrl-D on an empty lineREPLs (like python3 or ed) will usually quit when you press Ctrl-D on an
empty line. This rule is similar to the Ctrl-C rule – the reason for this is
that by default if you’re running a program (like cat) in “cooked mode”, then
the operating system will return an EOF when you press Ctrl-D on an empty
line.
Most of the REPLs I use (sqlite3, python3, fish, bash, etc) don’t actually use cooked mode, but they all implement this keyboard shortcut anyway to mimic the default behaviour.
For example, here’s the code in prompt-toolkit that quits when you press Ctrl-D, and here’s the same code in readline.
I actually thought that this one was a “Law of Terminal Physics” until very recently because I’ve basically never seen it broken, but you can see that it’s just something that each individual input library has to implement in the links above.
Someone pointed out that the Erlang REPL does not quit when you press Ctrl-D,
so I guess not every REPL follows this “rule”.
Terminal programs rarely use colours other than the base 16 ANSI colours. This
is because if you specify colours with a hex code, it’s very likely to clash
with some users’ background colour. For example if I print out some text as
#EEEEEE, it would be almost invisible on a white background, though it would
look fine on a dark background.
But if you stick to the default 16 base colours, you have a much better chance that the user has configured those colours in their terminal emulator so that they work reasonably well with their background color. Another reason to stick to the default base 16 colours is that it makes less assumptions about what colours the terminal emulator supports.
The only programs I usually see breaking this “rule” are text editors, for example Helix by default will use a purple background which is not a default ANSI colour. It seems fine for Helix to break this rule since Helix isn’t a “core” program and I assume any Helix user who doesn’t like that colorscheme will just change the theme.
Almost every program I use supports readline keybindings if it would make
sense to do so. For example, here are a bunch of different programs and a link
to where they define Ctrl-E to go to the end of the line:
None of those programs actually uses readline directly, they just sort of
mimic emacs/readline keybindings. They don’t always mimic them exactly: for
example atuin seems to use Ctrl-A as a prefix, so Ctrl-A doesn’t go to the
beginning of the line.
Also all of these programs seem to implement their own internal cut and paste
buffers so you can delete a line with Ctrl-U and then paste it with Ctrl-Y.
The exceptions to this are:
git, cat, and nc) don’t have any line editing support at all (except for backspace, Ctrl-W, and Ctrl-U)I wrote more about this “what keybindings does a program support?” question in entering text in the terminal is complicated.
I’ve never seen a program (other than a text editor) where Ctrl-W doesn’t
delete the last word. This is similar to the Ctrl-C rule – by default if a
program is in “cooked mode”, the OS will delete the last word if you press
Ctrl-W, and delete the whole line if you press Ctrl-U. So usually programs
will imitate that behaviour.
I can’t think of any exceptions to this other than text editors but if there are I’d love to hear about them!
Most programs will disable colours when writing to a pipe. For example:
rg blah will highlight all occurrences of blah in the output, but if the
output is to a pipe or a file, it’ll turn off the highlighting.ls --color=auto will use colour when writing to a terminal, but not when
writing to a pipeBoth of those programs will also format their output differently when writing
to the terminal: ls will organize files into columns, and ripgrep will group
matches with headings.
If you want to force the program to use colour (for example because you want to
look at the colour), you can use unbuffer to force the program’s output to be
a tty like this:
unbuffer rg blah | less -R
I’m sure that there are some programs that “break” this rule but I can’t think
of any examples right now. Some programs have an --color flag that you can
use to force colour to be on, in the example above you could also do rg --color=always | less -R.
- means stdin/stdoutUsually if you pass - to a program instead of a filename, it’ll read from
stdin or write to stdout (whichever is appropriate). For example, if you want
to format the Python code that’s on your clipboard with black and then copy
it, you could run:
pbpaste | black - | pbcopy
(pbpaste is a Mac program, you can do something similar on Linux with xclip)
My impression is that most programs implement this if it would make sense and I can’t think of any exceptions right now, but I’m sure there are many exceptions.
These rules took me a long time for me to learn because I had to:
Ctrl-C will exit programs")Ctrl-C will exit find but not less”)Ctrl-C will generally quit
noninteractive programs, but in interactive programs it might interrupt the
current operation instead of quitting the program")A lot of my understanding of the terminal is honestly still in the “subconscious pattern recognition” stage. The only reason I’ve been taking the time to make things explicit at all is because I’ve been trying to explain how it works to others. Hopefully writing down these “rules” explicitly will make learning some of this stuff a little bit faster for others.
]]>tail -f /some/log/file | grep thing1 | grep thing2
If log lines are being added to the file relatively slowly, the result I’d see is… nothing! It doesn’t matter if there were matches in the log file or not, there just wouldn’t be any output.
I internalized this as “uh, I guess pipes just get stuck sometimes and don’t
show me the output, that’s weird”, and I’d handle it by just
running grep thing1 /some/log/file | grep thing2 instead, which would work.
So as I’ve been doing a terminal deep dive over the last few months I was really excited to finally learn exactly why this happens.
The reason why “pipes get stuck” sometimes is that it’s VERY common for programs to buffer their output before writing it to a pipe or file. So the pipe is working fine, the problem is that the program never even wrote the data to the pipe!
This is for performance reasons: writing all output immediately as soon as you can uses more system calls, so it’s more efficient to save up data until you have 8KB or so of data to write (or until the program exits) and THEN write it to the pipe.
In this example:
tail -f /some/log/file | grep thing1 | grep thing2
the problem is that grep thing1 is saving up all of its matches until it has
8KB of data to write, which might literally never happen.
Part of why I found this so disorienting is that tail -f file | grep thing
will work totally fine, but then when you add the second grep, it stops
working!! The reason for this is that the way grep handles buffering depends
on whether it’s writing to a terminal or not.
Here’s how grep (and many other programs) decides to buffer its output:
isatty function
So if grep is writing directly to your terminal then you’ll see the line as
soon as it’s printed, but if it’s writing to a pipe, you won’t.
Of course the buffer size isn’t always 8KB for every program, it depends on the implementation. For grep the buffering is handled by libc, and libc’s buffer size is
defined in the BUFSIZ variable. Here’s where that’s defined in glibc.
(as an aside: “programs do not use 8KB output buffers when writing to a terminal” isn’t, like, a law of terminal physics, a program COULD use an 8KB buffer when writing output to a terminal if it wanted, it would just be extremely weird if it did that, I can’t think of any program that behaves that way)
One annoying thing about this buffering behaviour is that you kind of need to remember which commands buffer their output when writing to a pipe.
Some commands that don’t buffer their output:
I think almost everything else will buffer output, especially if it’s a command where you’re likely to be using it for batch processing. Here’s a list of some common commands that buffer their output when writing to a pipe, along with the flag that disables block buffering.
--line-buffered)-u)fflush() function)-l)-u)-u)Those are all the ones I can think of, lots of unix commands (like sort) may
or may not buffer their output but it doesn’t matter because sort can’t do
anything until it finishes receiving input anyway.
Also I did my best to test both the Mac OS and GNU versions of these but there are a lot of variations and I might have made some mistakes.
Also, here are a few programming language where the default print statement will buffer output when writing to a pipe, and some ways to disable buffering if you want:
setvbuf)python -u, or PYTHONUNBUFFERED=1, or sys.stdout.reconfigure(line_buffering=False), or print(x, flush=True))STDOUT.sync = true)$| = 1)I assume that these languages are designed this way so that the default print function will be fast when you’re doing batch processing.
Also whether output is buffered or not might depend on how you print, for
example in C++ cout << "hello\n" buffers when writing to a pipe but cout << "hello" << endl will flush its output.
Ctrl-C on a pipe, the contents of the buffer are lostLet’s say you’re running this command as a hacky way to watch for DNS requests
to example.com, and you forgot to pass -l to tcpdump:
sudo tcpdump -ni any port 53 | grep example.com
When you press Ctrl-C, what happens? In a magical perfect world, what I would
want to happen is for tcpdump to flush its buffer, grep would search for
example.com, and I would see all the output I missed.
But in the real world, what happens is that all the programs get killed and the
output in tcpdump’s buffer is lost.
I think this problem is probably unavoidable – I spent a little time with
strace to see how this works and grep receives the SIGINT before
tcpdump anyway so even if tcpdump tried to flush its buffer grep would
already be dead.
After a little more investigation, there is a workaround: if you find
tcpdump’s PID and kill -TERM $PID, then tcpdump will flush the buffer so
you can see the output. That’s kind of a pain but I tested it and it seems to
work.
It’s not just pipes, this will also buffer:
sudo tcpdump -ni any port 53 > output.txt
Redirecting to a file doesn’t have the same “Ctrl-C will totally destroy the
contents of the buffer” problem though – in my experience it usually behaves
more like you’d want, where the contents of the buffer get written to the file
before the program exits. I’m not 100% sure whether this is something you can
always rely on or not.
Okay, let’s talk solutions. Let’s say you’ve run this command:
tail -f /some/log/file | grep thing1 | grep thing2
I asked people on Mastodon how they would solve this in practice and there were 5 basic approaches. Here they are:
Historically my solution to this has been to just avoid the “command writing to pipe slowly” situation completely and instead run a program that will finish quickly like this:
cat /some/log/file | grep thing1 | grep thing2 | tail
This doesn’t do the same thing as the original command but it does mean that you get to avoid thinking about these weird buffering issues.
(you could also do grep thing1 /some/log/file but I often prefer to use an
“unnecessary” cat)
You could remember that grep has a flag to avoid buffering and pass it like this:
tail -f /some/log/file | grep --line-buffered thing1 | grep thing2
Some people said that if they’re specifically dealing with a multiple greps
situation, they’ll rewrite it to use a single awk instead, like this:
tail -f /some/log/file | awk '/thing1/ && /thing2/'
Or you would write a more complicated grep, like this:
tail -f /some/log/file | grep -E 'thing1.*thing2'
(awk also buffers, so for this to work you’ll want awk to be the last command in the pipeline)
stdbufstdbuf uses LD_PRELOAD to turn off libc’s buffering, and you can use it to turn off output buffering like this:
tail -f /some/log/file | stdbuf -o0 grep thing1 | grep thing2
Like any LD_PRELOAD solution it’s a bit unreliable – it doesn’t work on
static binaries, I think won’t work if the program isn’t using libc’s
buffering, and doesn’t always work on Mac OS. Harry Marr has a really nice How stdbuf works post.
unbufferunbuffer program will force the program’s output to be a TTY, which means
that it’ll behave the way it normally would on a TTY (less buffering, colour
output, etc). You could use it in this example like this:
tail -f /some/log/file | unbuffer grep thing1 | grep thing2
Unlike stdbuf it will always work, though it might have unwanted side
effects, for example grep thing1’s will also colour matches.
If you want to install unbuffer, it’s in the expect package.
It’s a bit hard for me to say which one is “best”, I think personally I’m
mostly likely to use unbuffer because I know it’s always going to work.
If I learn about more solutions I’ll try to add them to this post.
I think it’s not very common for me to have a program that slowly trickles data into a pipe like this, normally if I’m using a pipe a bunch of data gets written very quickly, processed by everything in the pipeline, and then everything exits. The only examples I can come up with right now are:
tail -fkubectl logsI think it would be cool if there were a standard environment variable to turn
off buffering, like PYTHONUNBUFFERED in Python. I got this idea from a
couple of blog posts by Mark Dominus
in 2018. Maybe NO_BUFFER like NO_COLOR?
The design seems tricky to get right; Mark points out that NETBSD has environment variables called STDBUF, STDBUF1, etc which gives you a
ton of control over buffering but I imagine most developers don’t want to
implement many different environment variables to handle a relatively minor
edge case.
I’m also curious about whether there are any programs that just automatically flush their output buffers after some period of time (like 1 second). It feels like it would be nice in theory but I can’t think of any program that does that so I imagine there are some downsides.
Some things I didn’t talk about in this post since these posts have been getting pretty long recently and seriously does anyone REALLY want to read 3000 words about buffering?
Luckily at this point I’ve mostly learned how to navigate this situation and either successfully use the library or decide it’s too difficult and switch to a different library, so here’s the guide I wish I had to importing Javascript libraries years ago.
I’m only going to talk about using Javacript libraries on the frontend, and only about how to use them in a no-build-system setup.
In this post I’m going to talk about:
There are 3 basic types of Javascript files a library can provide:
<script src> and it’ll Just Work. Great if you
can get it but not always availableI’m not sure if there’s a better name for the “classic” type but I’m just going to call it “classic”. Also there’s a type called “AMD” but I’m not sure how relevant it is in 2024.
Now that we know the 3 types of files, let’s talk about how to figure out which of these the library actually provides!
Every Javascript library has a build which it uploads to NPM. You might be thinking (like I did originally) – Julia! The whole POINT is that we’re not using Node to build our library! Why are we talking about NPM?
But if you’re using a link from a CDN like https://cdnjs.cloudflare.com/ajax/libs/Chart.js/4.4.1/chart.umd.min.js, you’re still using the NPM build! All the files on the CDNs originally come from NPM.
Because of this, I sometimes like to npm install the library even if I’m not
planning to use Node to build my library at all – I’ll just create a new temp
folder, npm install there, and then delete it when I’m done. I like being able to poke
around in the files in the NPM build on my filesystem, because then I can be
100% sure that I’m seeing everything that the library is making available in
its build and that the CDN isn’t hiding something from me.
So let’s npm install a few libraries and try to figure out what types of
Javascript files they provide in their builds!
First let’s look inside Chart.js, a plotting library.
$ cd /tmp/whatever
$ npm install chart.js
$ cd node_modules/chart.js/dist
$ ls *.*js
chart.cjs chart.js chart.umd.js helpers.cjs helpers.js
This library seems to have 3 basic options:
option 1: chart.cjs. The .cjs suffix tells me that this is a CommonJS
file, for using in Node. This means it’s impossible to use it directly in the
browser without some kind of build step.
option 2:chart.js. The .js suffix by itself doesn’t tell us what kind of
file it is, but if I open it up, I see import '@kurkle/color'; which is an
immediate sign that this is an ES module – the import ... syntax is ES
module syntax.
option 3: chart.umd.js. “UMD” stands for “Universal Module Definition”,
which I think means that you can use this file either with a basic <script src>, CommonJS,
or some third thing called AMD that I don’t understand.
When I was using Chart.js I picked Option 3. I just needed to add this to my code:
<script src="./chart.umd.js"> </script>
and then I could use the library with the global Chart environment variable.
Couldn’t be easier. I just copied chart.umd.js into my Git repository so that
I didn’t have to worry about using NPM or the CDNs going down or anything.
dist directoryA lot of libraries will put their build in the dist directory, but not
always! The build files’ location is specified in the library’s package.json.
For example here’s an excerpt from Chart.js’s package.json.
"jsdelivr": "./dist/chart.umd.js",
"unpkg": "./dist/chart.umd.js",
"main": "./dist/chart.cjs",
"module": "./dist/chart.js",
I think this is saying that if you want to use an ES Module (module) you
should use dist/chart.js, but the jsDelivr and unpkg CDNs should use
./dist/chart.umd.js. I guess main is for Node.
chart.js’s package.json also says "type": "module", which according to this documentation
tells Node to treat files as ES modules by default. I think it doesn’t tell us
specifically which files are ES modules and which ones aren’t but it does tell
us that something in there is an ES module.
@atcute/oauth-browser-client@atcute/oauth-browser-client
is a library for logging into Bluesky with OAuth in the browser.
Let’s see what kinds of Javascript files it provides in its build!
$ npm install @atcute/oauth-browser-client
$ cd node_modules/@atcute/oauth-browser-client/dist
$ ls *js
constants.js dpop.js environment.js errors.js index.js resolvers.js
It seems like the only plausible root file in here is index.js, which looks
something like this:
export { configureOAuth } from './environment.js';
export * from './errors.js';
export * from './resolvers.js';
This export syntax means it’s an ES module. That means we can use it in
the browser without a build step! Let’s see how to do that.
Using an ES module isn’t an easy as just adding a <script src="whatever.js">. Instead, if
the ES module has dependencies (like @atcute/oauth-browser-client does) the
steps are:
import { configureOAuth } from '@atcute/oauth-browser-client'; in your JS code<script type="module" src="YOURSCRIPT.js"></script>The reason we need an import map instead of just doing something like import { BrowserOAuthClient } from "./oauth-client-browser.js" is that internally the module has more import statements like import {something} from @atcute/client, and we need to tell the browser where to get the code for @atcute/client and all of its other dependencies.
Here’s what the importmap I used looks like for @atcute/oauth-browser-client:
<script type="importmap">
{
"imports": {
"nanoid": "./node_modules/nanoid/bin/dist/index.js",
"nanoid/non-secure": "./node_modules/nanoid/non-secure/index.js",
"nanoid/url-alphabet": "./node_modules/nanoid/url-alphabet/dist/index.js",
"@atcute/oauth-browser-client": "./node_modules/@atcute/oauth-browser-client/dist/index.js",
"@atcute/client": "./node_modules/@atcute/client/dist/index.js",
"@atcute/client/utils/did": "./node_modules/@atcute/client/dist/utils/did.js"
}
}
</script>
Getting these import maps to work is pretty fiddly, I feel like there must be a tool to generate them automatically but I haven’t found one yet. It’s definitely possible to write a script that automatically generates the importmaps using esbuild’s metafile but I haven’t done that and maybe there’s a better way.
I decided to set up importmaps yesterday to get github.com/jvns/bsky-oauth-example to work, so there’s some example code in that repo.
Also someone pointed me to Simon Willison’s download-esm, which will download an ES module and rewrite the imports to point to the JS files directly so that you don’t need importmaps. I haven’t tried it yet but it seems like a great idea.
I did run into some problems with using importmaps in the browser though – it needed to download dozens of Javascript files to load my site, and my webserver in development couldn’t keep up for some reason. I kept seeing files fail to load randomly and then had to reload the page and hope that they would succeed this time.
It wasn’t an issue anymore when I deployed my site to production, so I guess it was a problem with my local dev environment.
Also one slightly annoying thing about ES modules in general is that you need to
be running a webserver to use them, I’m sure this is for a good reason but it’s
easier when you can just open your index.html file without starting a
webserver.
Because of the “too many files” thing I think actually using ES modules with importmaps in this way isn’t actually that appealing to me, but it’s good to know it’s possible.
If the ES module doesn’t have dependencies then it’s even easier – you don’t need the importmaps! You can just:
<script type="module" src="YOURCODE.js"></script> in your HTML. The type="module" is important.import {whatever} from "https://example.com/whatever.js" in YOURCODE.jsIf you don’t want to use importmaps, you can also use a build system like esbuild. I talked about how to do that in Some notes on using esbuild, but this blog post is about ways to avoid build systems completely so I’m not going to talk about that option here. I do still like esbuild though and I think it’s a good option in this case.
CanIUse says that importmaps are in
“Baseline 2023: newly available across major browsers” so my sense is that in
2024 that’s still maybe a little bit too new? I think I would use importmaps
for some fun experimental code that I only wanted like myself and 12 people to
use, but if I wanted my code to be more widely usable I’d use esbuild instead.
@atproto/oauth-client-browserLet’s look at one final example library! This is a different Bluesky auth
library than @atcute/oauth-browser-client.
$ npm install @atproto/oauth-client-browser
$ cd node_modules/@atproto/oauth-client-browser/dist
$ ls *js
browser-oauth-client.js browser-oauth-database.js browser-runtime-implementation.js errors.js index.js indexed-db-store.js util.js
Again, it seems like only real candidate file here is index.js. But this is a
different situation from the previous example library! Let’s take a look at
index.js:
There’s a bunch of stuff like this in index.js:
__exportStar(require("@atproto/oauth-client"), exports);
__exportStar(require("./browser-oauth-client.js"), exports);
__exportStar(require("./errors.js"), exports);
var util_js_1 = require("./util.js");
This require() syntax is CommonJS syntax, which means that we can’t use this
file in the browser at all, we need to use some kind of build step, and
ESBuild won’t work either.
Also in this library’s package.json it says "type": "commonjs" which is
another way to tell it’s CommonJS.
Originally I thought it was impossible to use CommonJS modules without learning a build system, but then someone Bluesky told me about esm.sh! It’s a CDN that will translate anything into an ES Module. skypack.dev does something similar, I’m not sure what the difference is but one person mentioned that if one doesn’t work sometimes they’ll try the other one.
For @atproto/oauth-client-browser using it seems pretty simple, I just need to put this in my HTML:
<script type="module" src="script.js"> </script>
and then put this in script.js.
import { BrowserOAuthClient } from "https://esm.sh/@atproto/[email protected]"
It seems to Just Work, which is cool! Of course this is still sort of using a build system – it’s just that esm.sh is running the build instead of me. My main concerns with this approach are:
I also learned that you can also use esbuild to convert a CommonJS module
into an ES module, though there are some limitations – the import { BrowserOAuthClient } from syntax doesn’t work. Here’s a github issue about that.
I think the esbuild approach is probably more appealing to me than the
esm.sh approach because it’s a tool that I already have on my computer so I
trust it more. I haven’t experimented with this much yet though.
Here’s a summary of the three types of JS files you might encounter, options for how to use them, and how to identify them.
Unhelpfully a .js or .min.js file extension could be any of these 3
options, so if the file is something.js you need to do more detective work to
figure out what you’re dealing with.
<script src="whatever.js"></script>.umd.js extension<script src=... tag and see if it worksimport {whatever} from "./my-module.js" directly in your codeimport {whatever} from "my-module"
import or export statement. (not module.exports = ..., that’s CommonJS).mjs extension"type": "module" in package.json (though it’s not clear to me which file exactly this refers to)https://esm.sh/@atproto/[email protected]require() or module.exports = ... in the code.cjs extension"type": "commonjs" in package.json (though it’s not clear to me which file exactly this refers to)The main difference between CommonJS modules and ES modules from my perspective is that ES modules are actually a standard. This makes me feel a lot more confident using them, because browsers commit to backwards compatibility for web standards forever – if I write some code using ES modules today, I can feel sure that it’ll still work the same way in 15 years.
It also makes me feel better about using tooling like esbuild because even if
the esbuild project dies, because it’s implementing a standard it feels likely
that there will be another similar tool in the future that I can replace it
with.
A lot of the time when I talk about this stuff I get responses like “I hate javascript!!! it’s the worst!!!”. But my experience is that there are a lot of great tools for Javascript (I just learned about https://esm.sh yesterday which seems great! I love esbuild!), and that if I take the time to learn how things works I can take advantage of some of those tools and make my life a lot easier.
So the goal of this post is definitely not to complain about Javascript, it’s to understand the landscape so I can use the tooling in a way that feels good to me.
Here are some questions I still have, I’ll add the answers into the post if I learn the answer.
atcute-client.js), but so that in the browser I can still import multiple
different paths from that file (like both @atcute/client/lexicons and
@atcute/client)?Here’s a list of every tool we talked about in this post:
Writing this post has made me think that even though I usually don’t want to
have a build that I run every time I update the project, I might be willing to
have a build step (using download-esm or something) that I run only once
when setting up the project and never run again except maybe if I’m updating my
dependency versions.
Thanks to Marco Rogers who taught me a lot of the things in this post. I’ve probably made some mistakes in this post and I’d love to know what they are – let me know on Bluesky or Mastodon!
]]>One kind of thing I like to post on Mastodon/Bluesky is “hey, here’s a cool thing”, like the great SQLite repl litecli, or the fact that cross compiling in Go Just Works and it’s amazing, or cryptographic right answers, or this great diff tool. Usually I don’t want to write a whole blog post about those things because I really don’t have much more to say than “hey this is useful!”
It started to bother me that I didn’t have anywhere to put those things: for example recently I wanted to use diffdiff and I just could not remember what it was called.
So I quickly made a new folder called /til/, added some
custom styling (I wanted to style the posts to look a little bit like a tweet),
made a little Rake task to help me create new posts quickly (rake new_til), and
set up a separate RSS Feed for it.
I think this new section of the blog might be more for myself than anything, now when I forget the link to Cryptographic Right Answers I can hopefully look it up on the TIL page. (you might think “julia, why not use bookmarks??” but I have been failing to use bookmarks for my whole life and I don’t see that changing ever, putting things in public is for whatever reason much easier for me)
So far it’s been working, often I can actually just make a quick post in 2 minutes which was the goal.
My page is inspired by Simon Willison’s great TIL blog, though my TIL posts are a lot shorter.
This came about because I spent a lot of time on Twitter, so I’ve been thinking about what I want to do about all of my tweets.
I keep reading the advice to “POSSE” (“post on your own site, syndicate elsewhere”), and while I find the idea appealing in principle, for me part of the appeal of social media is that it’s a little bit ephemeral. I can post polls or questions or observations or jokes and then they can just kind of fade away as they become less relevant.
I find it a lot easier to identify specific categories of things that I actually want to have on a Real Website That I Own:
and then let everything else be kind of ephemeral.
I really believe in the advice to make email lists though – the first two (blog posts & comics) both have email lists and RSS feeds that people can subscribe to if they want. I might add a quick summary of any TIL posts from that week to the “blog posts from this week” mailing list.
]]>Ctrl-A, Ctrl-C, Ctrl-W, etc. What’s
the deal with all of them?
Here’s a table of all 33 ASCII control characters, and what they do on my machine (on Mac OS), more or less. There are about a million caveats, but I’ll talk about what it means and all the problems with this diagram that I know about.
You can also view it as an HTML page (I just made it an image so it would show up in RSS).
The first surprising thing about this diagram to me is that there are 33 control codes, split into (very roughly speaking) these categories:
3 (Ctrl-C), it’ll send a SIGINT signal to
the current programEnter, Tab, Backspace). For example when you press Enter, your
terminal gets sent 13.readline: “the application can do whatever it wants”
often means “it’ll do more or less what the readline library does,
whether the application actually uses readline or not”, so I’ve
labelled a bunch of the codes that readline usesCtrl-X has no standard meaning in the
terminal in general but emacs uses it very heavilyThere’s no real structure to which codes are in which categories, they’re all just kind of randomly scattered because this evolved organically.
(If you’re curious about readline, I wrote more about readline in entering text in the terminal is complicated, and there are a lot of cheat sheets out there)
Something else that I find a little surprising is that are only 33 control codes –
A to Z, plus 7 more (@, [, \, ], ^, _, ?). This means that if you want to
have for example Ctrl-1 as a keyboard shortcut in a terminal application,
that’s not really meaningful – on my machine at least Ctrl-1 is exactly the
same thing as just pressing 1, Ctrl-3 is the same as Ctrl-[, etc.
Also Ctrl+Shift+C isn’t a control code – what it does depends on your
terminal emulator. On Linux Ctrl-Shift-X is often used by the terminal
emulator to copy or open a new tab or paste for example, it’s not sent to the
TTY at all.
Also I use Ctrl+Left Arrow all the time, but that isn’t a control code,
instead it sends an ANSI escape sequence (ctrl-[[1;5D) which is a different
thing which we absolutely do not have space for in this post.
This “there are only 33 codes” thing is totally different from how keyboard
shortcuts work in a GUI where you can have Ctrl+KEY for any key you want.
Each of these 33 control codes has a name in ASCII (for example 3 is ETX).
When all of these control codes were originally defined, they weren’t being
used for computers or terminals at all, they were used for the telegraph machine.
Telegraph machines aren’t the same as UNIX terminals so a lot of the codes were repurposed to mean something else.
Personally I don’t find these ASCII names very useful, because 50% of the time the name in ASCII has no actual relationship to what that code does on UNIX systems today. So it feels easier to just ignore the ASCII names completely instead of trying to figure which ones still match their original meaning.
Another thing that’s a bit weird is that Ctrl-M is literally the same as
Enter, and Ctrl-I is the same as Tab, which makes it hard to use those two as keyboard shortcuts.
From some quick research, it seems like some folks do still use Ctrl-I and
Ctrl-M as keyboard shortcuts (here’s an example), but to do that
you need to configure your terminal emulator to treat them differently than the
default.
For me the main takeaway is that if I ever write a terminal application I
should avoid Ctrl-I and Ctrl-M as keyboard shortcuts in it.
While writing this I needed to do a bunch of experimenting to figure out what various key combinations did, so I wrote this Python script echo-key.py that will print them out.
There’s probably a more official way but I appreciated having a script I could customize.
Two of these codes (Ctrl-W and Ctrl-U) are labelled in the table as
“handled by the OS”, but actually they’re not always handled by the OS, it
depends on whether the terminal is in “canonical” mode or in “noncanonical mode”.
In canonical mode,
programs only get input when you press Enter (and the OS is in charge of deleting characters when you press Backspace or Ctrl-W). But in noncanonical mode the program gets
input immediately when you press a key, and the Ctrl-W and Ctrl-U codes are passed through to the program to handle any way it wants.
Generally in noncanonical mode the program will handle Ctrl-W and Ctrl-U
similarly to how the OS does, but there are some small differences.
Some examples of programs that use canonical mode:
grep or catgit, I thinkExamples of programs that use noncanonical mode:
python3, irb and other REPLsless or vimsttyI said that Ctrl-C sends SIGINT but technically this is not necessarily
true, if you really want to you can remap all of the codes labelled “OS
terminal driver”, plus Backspace, using a tool called stty, and you can view
the mappings with stty -a.
Here are the mappings on my machine right now:
$ stty -a
cchars: discard = ^O; dsusp = ^Y; eof = ^D; eol = <undef>;
eol2 = <undef>; erase = ^?; intr = ^C; kill = ^U; lnext = ^V;
min = 1; quit = ^\; reprint = ^R; start = ^Q; status = ^T;
stop = ^S; susp = ^Z; time = 0; werase = ^W;
I have personally never remapped any of these and I cannot imagine a reason I
would (I think it would be a recipe for confusion and disaster for me), but I
asked on Mastodon and people said the most common reasons they used
stty were:
stty sanestty erase ^H to change how Backspace worksstty ixoffSIGINT to a different key, like their DELETE keyTwo signals caveats:
ISIG terminal mode is turned off, then the OS won’t send signals. For example vim turns off ISIGCtrl-T) which sends SIGINFOYou can see which terminal modes a program is setting using strace like this,
terminal modes are set with the ioctl system call:
$ strace -tt -o out vim
$ grep ioctl out | grep SET
here are the modes vim sets when it starts (ISIG and ICANON are
missing!):
17:43:36.670636 ioctl(0, TCSETS, {c_iflag=IXANY|IMAXBEL|IUTF8,
c_oflag=NL0|CR0|TAB0|BS0|VT0|FF0|OPOST, c_cflag=B38400|CS8|CREAD,
c_lflag=ECHOK|ECHOCTL|ECHOKE|PENDIN, ...}) = 0
and it resets the modes when it exits:
17:43:38.027284 ioctl(0, TCSETS, {c_iflag=ICRNL|IXANY|IMAXBEL|IUTF8,
c_oflag=NL0|CR0|TAB0|BS0|VT0|FF0|OPOST|ONLCR, c_cflag=B38400|CS8|CREAD,
c_lflag=ISIG|ICANON|ECHO|ECHOE|ECHOK|IEXTEN|ECHOCTL|ECHOKE|PENDIN, ...}) = 0
I think the specific combination of modes vim is using here might be called “raw mode”, man cfmakeraw talks about that.
Related to “there are only 33 codes”, there are a lot of conflicts where
different parts of the system want to use the same code for different things,
for example by default Ctrl-S will freeze your screen, but if you turn that
off then readline will use Ctrl-S to do a forward search.
Another example is that on my machine sometimes Ctrl-T will send SIGINFO
and sometimes it’ll transpose 2 characters and sometimes it’ll do something
completely different depending on:
ISIG setreadline / imitates readline’s behaviourIn this diagram I’ve labelled code 127 as “backspace” and 8 as “other backspace”. Uh, what?
I think this was the single biggest topic of discussion in the replies on Mastodon – apparently there’s a LOT of history to this and I’d never heard of any of it before.
First, here’s how it works on my machine:
Backspace key127, which is called DEL in ASCII127 mapped to “backspace” (so it works both in canonical mode and noncanonical mode)If I press Ctrl+H, it has the same effect as Backspace if I’m using
readline, but in a program without readline support (like cat for instance),
it just prints out ^H.
Apparently Step 2 above is different for some folks – their Backspace key sends
the byte 8 instead of 127, and so if they want Backspace to work then they
need to configure the OS (using stty) to set erase = ^H.
There’s an incredible section of the Debian Policy Manual on keyboard configuration
that describes how Delete and Backspace should work according to Debian
policy, which seems very similar to how it works on my Mac today. My
understanding (via this mastodon post)
is that this policy was written in the 90s because there was a lot of confusion
about what Backspace should do in the 90s and there needed to be a standard
to get everything to work.
There’s a bunch more historical terminal stuff here but that’s all I’ll say for now.
I’ve probably missed a bunch more ways that “how it works on my machine” might be different from how it works on other people’s machines, and I’ve probably made some mistakes about how it works on my machine too. But that’s all I’ve got for today.
Some more stuff I know that I’ve left out: according to stty -a Ctrl-O is
“discard”, Ctrl-R is “reprint”, and Ctrl-Y is “dsusp”. I have no idea how
to make those actually do anything (pressing them does not do anything
obvious, and some people have told me what they used to do historically but
it’s not clear to me if they have a use in 2024), and a lot of the time in practice
they seem to just be passed through to the application anyway so I just
labelled Ctrl-R and Ctrl-Y as
readline.
Also I want to say that I think the contents of this post are kind of interesting
but I don’t think they’re necessarily that useful. I’ve used the terminal
pretty successfully every day for the last 20 years without knowing literally
any of this – I just knew what Ctrl-C, Ctrl-D, Ctrl-Z, Ctrl-R,
Ctrl-L did in practice (plus maybe Ctrl-A, Ctrl-E and Ctrl-W) and did
not worry about the details for the most part, and that was
almost always totally fine except when I was trying to use xterm.js.
But I had fun learning about it so maybe it’ll be interesting to you too.
]]>This hasn’t been a big priority for me: usually it just goes down for a few minutes while it restarts, and it only happens once a day at most, so I’ve just been ignoring. But last week it started actually causing a problem so I decided to look into it.
This was kind of winding road where I learned a lot so here’s a table of contents:
I run Mess With DNS on a VM without about 465MB of RAM, which according to
ps aux (the RSS column) is split up something like:
That leaves about 110MB of memory free.
A while back I set GOMEMLIMIT to 250MB to try to make sure the garbage collector ran if Mess With DNS used more than 250MB of memory, and I think this helped but it didn’t solve everything.
A few weeks ago I started backing up Mess With DNS’s database for the first time using restic.
This has been working okay, but since Mess With DNS operates without much extra
memory I think restic sometimes needed more memory than was available on the
system, and so the backup script sometimes got OOM killed.
This was a problem because
There’s probably more than one solution to this, but I decided to try to make Mess With DNS use less memory so that there was more available memory on the system, mostly because it seemed like a fun problem to try to solve.
I’d run a memory profile of Mess With DNS a bunch of times in the past, so I knew exactly what was using most of Mess With DNS’s memory: IP addresses.
When it starts, Mess With DNS loads this database where you can look up the
ASN of every IP address into memory, so that when it
receives a DNS query it can take the source IP address like 74.125.16.248 and
tell you that IP address belongs to GOOGLE.
This database by itself used about 117MB of memory, and a simple du told me
that was too much – the original text files were only 37MB!
$ du -sh *.tsv
26M ip2asn-v4.tsv
11M ip2asn-v6.tsv
The way it worked originally is that I had an array of these:
type IPRange struct {
StartIP net.IP
EndIP net.IP
Num int
Name string
Country string
}
and I searched through it with a binary search to figure out if any of the ranges contained the IP I was looking for. Basically the simplest possible thing and it’s super fast, my machine can do about 9 million lookups per second.
I’ve been using SQLite recently, so my first thought was – maybe I can store all of this data on disk in an SQLite database, give the tables an index, and that’ll use less memory.
So I:
This did solve the initial memory goal (after a GC it now hardly used any memory at all because the table was on disk!), though I’m not sure how much GC churn this solution would cause if we needed to do a lot of queries at once. I did a quick memory profile and it seemed to allocate about 1KB of memory per lookup.
Let’s talk about the issues I ran into with using SQLite though.
SQLite doesn’t have support for big integers and IPv6 addresses are 128 bits,
so I decided to store them as text. I think BLOB might have been better, I
originally thought BLOBs couldn’t be compared but the sqlite docs say they can.
I ended up with this schema:
CREATE TABLE ipv4_ranges (
start_ip INTEGER NOT NULL,
end_ip INTEGER NOT NULL,
asn INTEGER NOT NULL,
country TEXT NOT NULL,
name TEXT NOT NULL
);
CREATE TABLE ipv6_ranges (
start_ip TEXT NOT NULL,
end_ip TEXT NOT NULL,
asn INTEGER,
country TEXT,
name TEXT
);
CREATE INDEX idx_ipv4_ranges_start_ip ON ipv4_ranges (start_ip);
CREATE INDEX idx_ipv6_ranges_start_ip ON ipv6_ranges (start_ip);
CREATE INDEX idx_ipv4_ranges_end_ip ON ipv4_ranges (end_ip);
CREATE INDEX idx_ipv6_ranges_end_ip ON ipv6_ranges (end_ip);
Also I learned that Python has an ipaddress module, so I could use
ipaddress.ip_address(s).exploded to make sure that the IPv6 addresses were
expanded so that a string comparison would compare them properly.
I ran a quick microbenchmark, something like this. It printed out that it could look up 17,000 IPv6 addresses per second, and similarly for IPv4 addresses.
This was pretty discouraging – being able to look up 17k addresses per section is kind of fine (Mess With DNS does not get a lot of traffic), but I compared it to the original binary search code and the original code could do 9 million per second.
ips := []net.IP{}
count := 20000
for i := 0; i < count; i++ {
// create a random IPv6 address
bytes := randomBytes()
ip := net.IP(bytes[:])
ips = append(ips, ip)
}
now := time.Now()
success := 0
for _, ip := range ips {
_, err := ranges.FindASN(ip)
if err == nil {
success++
}
}
fmt.Println(success)
elapsed := time.Since(now)
fmt.Println("number per second", float64(count)/elapsed.Seconds())
I’d never really done an EXPLAIN in sqlite, so I thought it would be a fun opportunity to see what the query plan was doing.
sqlite> explain query plan select * from ipv6_ranges where '2607:f8b0:4006:0824:0000:0000:0000:200e' BETWEEN start_ip and end_ip;
QUERY PLAN
`--SEARCH ipv6_ranges USING INDEX idx_ipv6_ranges_end_ip (end_ip>?)
It looks like it’s just using the end_ip index and not the start_ip index,
so maybe it makes sense that it’s slower than the binary search.
I tried to figure out if there was a way to make SQLite use both indexes, but I couldn’t find one and maybe it knows best anyway.
At this point I gave up on the SQLite solution, I didn’t love that it was slower and also it’s a lot more complex than just doing a binary search. I felt like I’d rather keep something much more similar to the binary search.
A few things I tried with SQLite that did not cause it to use both indexes:
ANALYZEINTERSECT to intersect the results of start_ip < ? and ? < end_ip. This did make it use both indexes, but it also seemed to make the
query literally 1000x slower, probably because it needed to create the
results of both subqueries in memory and intersect them.My next idea was to use a trie, because I had some vague idea that maybe a trie would use less memory, and I found this library called ipaddress-go that lets you look up IP addresses using a trie.
I tried using it here’s the code, but I think I was doing something wildly wrong because, compared to my naive array + binary search:
I’m not really sure what went wrong here but I gave up on this approach and decided to just try to make my array use less memory and stick to a simple binary search.
One thing I learned about memory profiling is that you can use runtime
package to see how much memory is currently allocated in the program. That’s
how I got all the memory numbers in this post. Here’s the code:
func memusage() {
runtime.GC()
var m runtime.MemStats
runtime.ReadMemStats(&m)
fmt.Printf("Alloc = %v MiB\n", m.Alloc/1024/1024)
// write mem.prof
f, err := os.Create("mem.prof")
if err != nil {
log.Fatal(err)
}
pprof.WriteHeapProfile(f)
f.Close()
}
Also I learned that if you use pprof to analyze a heap profile there are two
ways to analyze it: you can pass either --alloc-space or --inuse-space to
go tool pprof. I don’t know how I didn’t realize this before but
alloc-space will tell you about everything that was allocated, and
inuse-space will just include memory that’s currently in use.
Anyway I ran go tool pprof -pdf --inuse_space mem.prof > mem.pdf a lot. Also
every time I use pprof I find myself referring to my own intro to pprof, it’s probably
the blog post I wrote that I use the most often. I should add --alloc-space
and --inuse-space to it.
I was storing my ip2asn entries like this:
type IPRange struct {
StartIP net.IP
EndIP net.IP
Num int
Name string
Country string
}
I had 3 ideas for ways to improve this:
Name and the Country, because a lot of IP ranges belong to the same ASNnet.IP is an []byte under the hood, which felt like it involved an unnecessary pointer, was there a way to inline it into the struct?I figured I could store the ASN info in an array, and then just store the index
into the array in my IPRange struct. Here are the structs so you can see what
I mean:
type IPRange struct {
StartIP netip.Addr
EndIP netip.Addr
ASN uint32
Idx uint32
}
type ASNInfo struct {
Country string
Name string
}
type ASNPool struct {
asns []ASNInfo
lookup map[ASNInfo]uint32
}
This worked! It brought memory usage from 117MB to 65MB – a 50MB savings. I felt good about this.
Here’s all of the code for that part.
As an aside – I’m storing the ASN in a uint32, is that right? I looked in the ip2asn
file and the biggest one seems to be 401307, though there are a few lines that
say 4294901931 which is much bigger, but also are just inside the range of a
uint32. So I can definitely use a uint32.
59.101.179.0 59.101.179.255 4294901931 Unknown AS4294901931
netip.Addr instead of net.IPIt turns out that I’m not the only one who felt that net.IP was using an
unnecessary amount of memory – in 2021 the folks at Tailscale released a new
IP address library for Go which solves this and many other issues. They wrote a great blog post about it.
I discovered (to my delight) that not only does this new IP address library exist and do exactly what I want, it’s also now in the Go
standard library as netip.Addr. Switching to netip.Addr was
very easy and saved another 20MB of memory, bringing us to 46MB.
I didn’t try my third idea (remove the end IP from the struct) because I’d already been programming for long enough on a Saturday morning and I was happy with my progress.
It’s always such a great feeling when I think “hey, I don’t like this, there must be a better way” and then immediately discover that someone has already made the exact thing I want, thought about it a lot more than me, and implemented it much better than I would have.
Even though I tried to explain this in a simple linear way “I tried X, then I tried Y, then I tried Z”, that’s kind of a lie – I always try to take my actual debugging process (total chaos) and make it seem more linear and understandable because the reality is just too annoying to write down. It’s more like:
runtime to check how much
memory everything is using, start doing thatSomeone asked why I don’t just give the VM more memory. I could very easily afford to pay for a VM with 1GB of memory, but I feel like 512MB really should be enough (and really that 256MB should be enough!) so I’d rather stay inside that constraint. It’s kind of a fun puzzle.
Folks had a lot of good ideas I hadn’t thought of. Recording them as inspiration if I feel like having another Fun Performance Day at some point.
ASNPool. Someone tried this and it uses more memory, probably because Go’s pointers are 64 bitsGOARCH=386 to use 32-bit pointers to sace space (maybe in combination with using unique!)I deployed the new version and now Mess With DNS is using less memory! Hooray!
A few other notes:
I’m honestly not sure if this will solve all my memory problems, probably not! But I had fun, I learned a few things about SQLite, I still don’t know what to think about tries, and it made me love binary search even more than I already did.
]]>So yesterday I decided to try to upgrade Hugo. There’s no real reason to do this – I’ve been using Hugo version 0.40 to generate this blog since 2018, it works fine, and I don’t have any problems with it. But I thought – maybe it won’t be as hard as I think, and I kind of like a tedious computer task sometimes!
I thought I’d document what I learned along the way in case it’s useful to anyone else doing this very specific migration. I upgraded from Hugo v0.40 (from 2018) to v0.135 (from 2024).
Here are most of the changes I had to make:
template "theme/partials/thing.html is now partial thing.htmlI had to replace a bunch of instances of {{ template "theme/partials/header.html" . }} with {{ partial "header.html" . }}.
This happened in v0.42:
We have now virtualized the filesystems for project and theme files. This makes everything simpler, faster and more powerful. But it also means that template lookups on the form {{ template “theme/partials/pagination.html” . }} will not work anymore. That syntax has never been documented, so it’s not expected to be in wide use.
.Data.Pages is now site.RegularPagesThis seems to be discussed in the release notes for 0.57.2
I just needed to replace .Data.Pages with site.RegularPages in the template on the homepage as well as in my RSS feed template.
.Next and .Prev got flippedI had this comment in the part of my theme where I link to the next/previous blog post:
“next” and “previous” in hugo apparently mean the opposite of what I’d think they’d mean intuitively. I’d expect “next” to mean “in the future” and “previous” to mean “in the past” but it’s the opposite
It looks they changed this in ad705aac064 so that “next” actually is in the future and “prev” actually is in the past. I definitely find the new behaviour more intuitive.
Figuring out why/when all of these changes happened was a little difficult. I ended up hacking together a bash script to download all of the changelogs from github as text files, which I could then grep to try to figure out what happened. It turns out it’s pretty easy to get all of the changelogs from the GitHub API.
So far everything was not so bad – there was also a change around taxonomies that’s I can’t quite explain, but it was all pretty manageable, but then we got to the really tough one: the markdown renderer.
The blackfriday markdown renderer (which was previously the default) was removed in v0.100.0. This seems pretty reasonable:
It has been deprecated for a long time, its v1 version is not maintained anymore, and there are many known issues. Goldmark should be a mature replacement by now.
Fixing all my Markdown changes was a huge pain – I ended up having to update 80 different Markdown files (out of 700) so that they would render properly, and I’m not totally sure
The obvious question here is – why bother even trying to upgrade Hugo at all if I have to switch Markdown renderers? My old site was running totally fine and I think it wasn’t necessarily a good use of time, but the one reason I think it might be useful in the future is that the new renderer (goldmark) uses the CommonMark markdown standard, which I’m hoping will be somewhat more futureproof. So maybe I won’t have to go through this again? We’ll see.
Also it turned out that the new Goldmark renderer does fix some problems I had (but didn’t know that I had) with smart quotes and how lists/blockquotes interact.
The hard part of this Markdown change was even figuring out what changed. Almost all of the problems (including #2 and #3 above) just silently broke the site, they didn’t cause any errors or anything. So I had to diff the HTML to hunt them down.
Here’s what I ended up doing:
public_oldpublicpublic/ and public_old with this diff.sh script and put the results in a diffs/ folderfind diffs -type f | xargs cat | grep -C 5 '(31m|32m)' | less -r over and over again to look at every single change until I found something that seemed wrong(the grep 31m|32m thing is searching for red/green text in the diff)
This was very time consuming but it was a little bit fun for some reason so I kept doing it until it seemed like nothing too horrible was left.
Here’s a list of every type of Markdown change I had to make. It’s very possible these are all extremely specific to me but it took me a long time to figure them all out so maybe this will be helpful to one other person who finds this in the future.
This doesn’t work anymore (it doesn’t expand the link):
<small>
[a link](https://example.com)
</small>
I need to do this instead:
<small>
[a link](https://example.com)
</small>
This works too:
<small> [a link](https://example.com) </small>
<< is changed into «I didn’t want this so I needed to configure:
markup:
goldmark:
extensions:
typographer:
leftAngleQuote: '<<'
rightAngleQuote: '>>'
This doesn’t render as a nested list anymore if I only indent by 2 spaces, I need to put 4 spaces.
1. a
* b
* c
2. b
The problem is that the amount of indent needed depends on the size of the list markers. Here’s a reference in CommonMark for this.
Previously the > quote here didn’t render as a blockquote, and with the new renderer it does.
* something
> quote
* something else
I found a bunch of Markdown that had been kind of broken (which I hadn’t noticed) that works better with the new renderer, and this is an example of that.
Lists inside blockquotes also seem to work better.
Previously this didn’t render as a heading, but now it does. So I needed to
replace the # with #.
* # passengers: 20
+ or 1) at the beginning of the line makes it a listI had something which looked like this:
`1 / (1
+ exp(-1)) = 0.73`
With Blackfriday it rendered like this:
<p><code>1 / (1
+ exp(-1)) = 0.73</code></p>
and with Goldmark it rendered like this:
<p>`1 / (1</p>
<ul>
<li>exp(-1)) = 0.73`</li>
</ul>
Same thing if there was an accidental 1) at the beginning of a line, like in this Markdown snippet
I set up a small Hadoop cluster (1 master, 2 workers, replication set to
1) on
To fix this I just had to rewrap the line so that the + wasn’t the first character.
The Markdown is formatted this way because I wrap my Markdown to 80 characters a lot and the wrapping isn’t very context sensitive.
There were a bunch of places where the old renderer (Blackfriday) was doing
unwanted things in code blocks like replacing ... with … or replacing
quotes with smart quotes. I hadn’t realized this was happening and I was very
happy to have it fixed.
The way this gets rendered got better:
"Oh, *interesting*!"
Before there were two left smart quotes, now the quotes match.
p tagPreviously if I had an image like this:
<img src="https://jvns.ca/images/rustboot1.png">
it would get wrapped in a <p> tag, now it doesn’t anymore. I dealt with this
just by adding a margin-bottom: 0.75em to images in the CSS, hopefully
that’ll make them display well enough.
<br> is now wrapped in a p tagPreviously this wouldn’t get wrapped in a p tag, but now it seems to:
<br><br>
I just gave up on fixing this though and resigned myself to maybe having some extra space in some cases. Maybe I’ll try to fix it later if I feel like another yakshave.
I also needed to
Here’s what I needed to add to my config.yaml to do all that:
markup:
highlight:
codeFences: false
goldmark:
renderer:
unsafe: true
parser:
autoHeadingIDType: blackfriday
Maybe I’ll try to get syntax highlighting working one day, who knows. I might prefer having it off though.
I also wrote a little program to compare the Blackfriday and Goldmark output for various markdown snippets, here it is in a gist.
It’s not really configured the exact same way Blackfriday and Goldmark were in my Hugo versions, but it was still helpful to have to help me understand what was going on.
My approach to themes in Hugo has been:
So I just need to edit the theme files to fix any problems. Also I wrote a lot of the theme myself so I’m pretty familiar with how it works.
Relying on someone else to keep a theme updated feels kind of scary to me, I think if I were using a third-party theme I’d just copy the code into my site’s github repo and then maintain it myself.
I asked on Mastodon if anyone had used a static site generator with good backwards compatibility.
The main answers seemed to be Jekyll and 11ty. Several people said they’d been using Jekyll for 10 years without any issues, and 11ty says it has stability as a core goal.
I think a big factor in how appealing Jekyll/11ty are is how easy it is for you to maintain a working Ruby / Node environment on your computer: part of the reason I stopped using Jekyll was that I got tired of having to maintain a working Ruby installation. But I imagine this wouldn’t be a problem for a Ruby or Node developer.
Several people said that they don’t build their Jekyll site locally at all – they just use GitHub Pages to build it.
Overall I’ve been happy with Hugo – I started using it because it had fast build times and it was a static binary, and both of those things are still extremely useful to me. I might have spent 10 hours on this upgrade, but I’ve probably spent 1000+ hours writing blog posts without thinking about Hugo at all so that seems like an extremely reasonable ratio.
I find it hard to be too mad about the backwards incompatible changes, most of
them were quite a long time ago, Hugo does a great job of making their old
releases available so you can use the old release if you want, and the most
difficult one is removing support for the blackfriday Markdown renderer in
favour of using something CommonMark-compliant which seems pretty reasonable to
me even if it is a huge pain.
But it did take a long time and I don’t think I’d particularly recommend moving 700 blog posts to a new Markdown renderer unless you’re really in the mood for a lot of computer suffering for some reason.
The new renderer did fix a bunch of problems so I think overall it might be a good thing, even if I’ll have to remember to make 2 changes to how I write Markdown (4.1 and 4.3).
Also I’m still using Hugo 0.54 for https://wizardzines.com so maybe these notes will be useful to Future Me if I ever feel like upgrading Hugo for that site.
Hopefully I didn’t break too many things on the blog by doing this, let me know if you see anything broken!
]]>So I asked people on Mastodon what problems they’ve run into with colours in the terminal, and I got a ton of interesting responses! Let’s talk about some of the problems and a few possible ways to fix them.
One of the top complaints was “blue on black is hard to read”. Here’s an
example of that: if I open Terminal.app, set the background to black, and run
ls, the directories are displayed in a blue that isn’t that easy to read:
To understand why we’re seeing this blue, let’s talk about ANSI colours!
Your terminal has 16 numbered colours – black, red, green, yellow, blue, magenta, cyan, white, and “bright” version of each of those.
Programs can use them by printing out an “ANSI escape code” – for example if you want to see each of the 16 colours in your terminal, you can run this Python program:
def color(num, text):
return f"\033[38;5;{num}m{text}\033[0m"
for i in range(16):
print(color(i, f"number {i:02}"))
This made me wonder – if blue is colour number 5, who decides what hex color that should correspond to?
The answer seems to be “there’s no standard, terminal emulators just choose colours and it’s not very consistent”. Here’s a screenshot of a table from Wikipedia, where you can see that there’s a lot of variation:
Bright yellow on white is even worse than blue on black, here’s what I get in a terminal with the default settings:
That’s almost impossible to read (and some other colours like light green cause similar issues), so let’s talk about solutions!
If you’re annoyed by these colour contrast issues (or maybe you just think the default ANSI colours are ugly), you might think – well, I’ll just choose a different “blue” and pick something I like better!
There are two ways you can do this:
Way 1: Configure your terminal emulator: I think most modern terminal emulators have a way to reconfigure the colours, and some of them even come with some preinstalled themes that you might like better than the defaults.
Way 2: Run a shell script: There are ANSI escape codes that you can print
out to tell your terminal emulator to reconfigure its colours. Here’s a shell script that does that,
from the base16-shell project.
You can see that it has a few different conventions for changing the colours –
I guess different terminal emulators have different escape codes for changing
their colour palette, and so the script is trying to pick the right style of
escape code based on the TERM environment variable.
I prefer to use the “shell script” method, because:
some advantages of configuring colours in your terminal emulator:
This is what my shell has looked like for probably the last 5 years (using the
solarized light base16 theme), and I’m pretty happy with it. Here’s htop:
Okay, so let’s say you’ve found a terminal colorscheme that you like. What else can go wrong?
Here’s what some output of fd, a find alternative, looks like in my
colorscheme:
The contrast is pretty bad here, and I definitely don’t have that lime green in my normal colorscheme. What’s going on?
We can see what color codes fd is using using the unbuffer program to
capture its output including the color codes:
$ unbuffer fd . > out
$ vim out
^[[38;5;48mbad-again.sh^[[0m
^[[38;5;48mbad.sh^[[0m
^[[38;5;48mbetter.sh^[[0m
out
^[[38;5;48 means “set the foreground color to color 48”. Terminals don’t
only have 16 colours – many terminals these days actually have 3 ways of
specifying colours:
#ffea03So fd is using one of the colours from the extended 256-color set. bat (a
cat alternative) does something similar – here’s what it looks like by
default in my terminal.
This looks fine though and it really seems like it’s trying to work well with a variety of terminal themes.
I think it’s interesting that some of these newer terminal tools (fd, cat,
delta, and probably more) have support for arbitrary custom themes. I guess
the downside of this approach is that the default theme might clash with your
terminal’s background, but the upside is that it gives you a lot more control
over theming the tool’s output than just choosing 16 ANSI colours.
I don’t really use bat, but if I did I’d probably use bat --theme ansi to
just use the ANSI colours that I have set in my normal terminal colorscheme.
A bunch of people on Mastodon mentioned a specific issue with grays in the Solarized theme: when I list a directory, the base16 Solarized Light theme looks like this:
but iTerm’s default Solarized Light theme looks like this:
This is because in the iTerm theme (which is the original Solarized design), colors 9-14 (the “bright blue”, “bright
red”, etc) are mapped to a series of grays, and when I run ls, it’s trying to
use those “bright” colours to color my directories and executables.
My best guess for why the original Solarized theme is designed this way is to make the grays available to the vim Solarized colorscheme.
I’m pretty sure I prefer the modified base16 version I use where the “bright” colours are actually colours instead of all being shades of gray though. (I didn’t actually realize the version I was using wasn’t the “original” Solarized theme until I wrote this post)
In any case I really love Solarized and I’m very happy it exists so that I can use a modified version of it.
If I my vim theme has a different background colour than my terminal theme, I get this ugly border, like this:
This one is a pretty minor issue though and I think making your terminal background match your vim background is pretty straightforward.
A few people mentioned problems with terminal applications setting an unwanted background colour, so let’s look at an example of that.
Here ngrok has set the background to color #16 (“black”), but the
base16-shell script I use sets color 16 to be bright orange, so I get this,
which is pretty bad:
I think the intention is for ngrok to look something like this:
I think base16-shell sets color #16 to orange (instead of black)
so that it can provide extra colours for use by base16-vim.
This feels reasonable to me – I use base16-vim in the terminal, so I guess I’m
using that feature and it’s probably more important to me than ngrok (which I
rarely use) behaving a bit weirdly.
This particular issue is a maybe obscure clash between ngrok and my colorschem, but I think this kind of clash is pretty common when a program sets an ANSI background color that the user has remapped for some reason.
A bunch of terminals (iTerm2, tabby, kitty’s text_fg_override_threshold, and folks tell me also Ghostty and Windows Terminal) have a “minimum contrast” feature that will automatically adjust colours to make sure they have enough contrast.
Here’s an example from iTerm. This ngrok accident from before has pretty bad contrast, I find it pretty difficult to read:
With “minimum contrast” set to 40 in iTerm, it looks like this instead:
I didn’t have minimum contrast turned on before but I just turned it on today because it makes such a big difference when something goes wrong with colours in the terminal.
TERM being set to the wrong thingA few people mentioned that they’ll SSH into a system that doesn’t support the
TERM environment variable that they have set locally, and then the colours
won’t work.
I think the way TERM works is that systems have a terminfo database, so if
the value of the TERM environment variable isn’t in the system’s terminfo
database, then it won’t know how to output colours for that terminal. I don’t
know too much about terminfo, but someone linked me to this terminfo rant that talks about a few other
issues with terminfo.
I don’t have a system on hand to reproduce this one so I can’t say for sure how
to fix it, but this stackoverflow question
suggests running something like TERM=xterm ssh instead of ssh.
A couple of problems people mentioned with designing / finding terminal colorschemes:
Another problem people mentioned is using a program like nethack or midnight commander which you might expect to have a specific colourscheme based on the default ANSI terminal colours.
For example, midnight commander has a really specific classic look:
But in my Solarized theme, midnight commander looks like this:
The Solarized version feels like it could be disorienting if you’re very used to the “classic” look.
One solution Simon Tatham mentioned to this is using some palette customization ANSI codes (like the ones base16 uses that I talked about earlier) to change the color palette right before starting the program, for example remapping yellow to a brighter yellow before starting Nethack so that the yellow characters look better.
If I run fd | less, I see something like this, with the colours disabled.
In general I find this useful – if I pipe a command to grep, I don’t want it
to print out all those color escape codes, I just want the plain text. But what if you want to see the colours?
To see the colours, you can run unbuffer fd | less -r! I just learned about
unbuffer recently and I think it’s really cool, unbuffer opens a tty for the
command to write to so that it thinks it’s writing to a TTY. It also fixes
issues with programs buffering their output when writing to a pipe, which is
why it’s called unbuffer.
Here’s what the output of unbuffer fd | less -r looks like for me:
Also some commands (including fd) support a --color=always flag which will
force them to always print out the colours.
ls and other commandsSome people mentioned that they don’t want ls to use colour at all, perhaps
because ls uses blue, it’s hard to read on black, and maybe they don’t feel like
customizing their terminal’s colourscheme to make the blue more readable or
just don’t find the use of colour helpful.
Some possible solutions to this one:
ls --color=never, which is probably easiestLS_COLORS to customize the colours used by ls. I think some other programs other than ls support the LS_COLORS environment variable too.NO_COLOR=true (there’s a list here)Here’s an example of running LS_COLORS="fi=0:di=0:ln=0:pi=0:so=0:bd=0:cd=0:or=0:ex=0" ls:
I used to have a lot of problems with configuring my colours in vim – I’d set up my terminal colours in a way that I thought was okay, and then I’d start vim and it would just be a disaster.
I think what was going on here is that today, there are two ways to set up a vim colorscheme in the terminal:
20 years ago when I started using vim, terminals with 24-bit hex color support were a lot less common (or maybe they didn’t exist at all), and vim certainly didn’t have support for using 24-bit colour in the terminal. From some quick searching through git, it looks like vim added support for 24-bit colour in 2016 – just 8 years ago!
So to get colours to work properly in vim before 2016, you needed to synchronize
your terminal colorscheme and your vim colorscheme. Here’s what that looked like,
the colorscheme needed to map the vim color classes like cterm05 to ANSI colour numbers.
But in 2024, the story is really different! Vim (and Neovim, which I use now)
support 24-bit colours, and as of Neovim 0.10 (released in May 2024), the
termguicolors setting (which tells Vim to use 24-bit hex colours for
colorschemes) is turned on by default in any terminal with 24-bit
color support.
So this “you need to synchronize your terminal colorscheme and your vim colorscheme” problem is not an issue anymore for me in 2024, since I don’t plan to use terminals without 24-bit color support in the future.
The biggest consequence for me of this whole thing is that I don’t need base16
to set colors 16-21 to weird stuff anymore to integrate with vim – I can just
use a terminal theme and a vim theme, and as long as the two themes use similar
colours (so it’s not jarring for me to switch between them) there’s no problem.
I think I can just remove those parts from my base16 shell script and totally
avoid the problem with ngrok and the weird orange background I talked about
above.
I think there are a lot of issues around the intersection of multiple programs, like using some combination tmux/ssh/vim that I couldn’t figure out how to reproduce well enough to talk about them. Also I’m sure I missed a lot of other things too.
I’ve personally had a lot of success with using
base16-shell with
base16-vim – I just need to add a couple of lines to my
fish config to set it up (+ a few .vimrc lines) and then I can move on and
accept any remaining problems that that doesn’t solve.
I don’t think base16 is for everyone though, some limitations I’m aware of with base16 that might make it not work for you:
base16-vim
access to more colours, which might not be relevant if you always use a
terminal with 24-bit color support, and can cause problems like the ngrok
issue aboveApparently there’s a community fork of base16 called tinted-theming, which I haven’t looked into much yet.
Just one so far but I’ll link more if people tell me about them:
We talked about a lot in this post and while I think learning about all these details is kind of fun if I’m in the mood to do a deep dive, I find it SO FRUSTRATING to deal with it when I just want my colours to work! Being surprised by unreadable text and having to find a workaround is just not my idea of a good day.
Personally I’m a zero-configuration kind of person and it’s not that appealing to me to have to put together a lot of custom configuration just to make my colours in the terminal look acceptable. I’d much rather just have some reasonable defaults that I don’t have to change.
My one big takeaway from writing this was to turn on “minimum contrast” in my terminal, I think it’s going to fix most of the occasional accidental unreadable text issues I run into and I’m pretty excited about it.
]]>I’ve never felt motivated to learn any of the Go routing libraries (gorilla/mux, chi, etc), so I’ve been doing all my routing by hand, like this.
// DELETE /records:
case r.Method == "DELETE" && n == 1 && p[0] == "records":
if !requireLogin(username, r.URL.Path, r, w) {
return
}
deleteAllRecords(ctx, username, rs, w, r)
// POST /records/<ID>
case r.Method == "POST" && n == 2 && p[0] == "records" && len(p[1]) > 0:
if !requireLogin(username, r.URL.Path, r, w) {
return
}
updateRecord(ctx, username, p[1], rs, w, r)
But apparently as of Go 1.22, Go now has better support for routing in the standard library, so that code can be rewritten something like this:
mux.HandleFunc("DELETE /records/", app.deleteAllRecords)
mux.HandleFunc("POST /records/{record_id}", app.updateRecord)
Though it would also need a login middleware, so maybe something more like
this, with a requireLogin middleware.
mux.Handle("DELETE /records/", requireLogin(http.HandlerFunc(app.deleteAllRecords)))
One annoying gotcha I ran into was: if I make a route for /records/, then a
request for /records will be redirected to /records/.
I ran into an issue with this where sending a POST request to /records
redirected to a GET request for /records/, which broke the POST request
because it removed the request body. Thankfully Xe Iaso wrote a blog post about the exact same issue which made it
easier to debug.
I think the solution to this is just to use API endpoints like POST /records
instead of POST /records/, which seems like a more normal design anyway.
I got a little bit tired of writing so much boilerplate for my SQL queries, but I didn’t really feel like learning an ORM, because I know what SQL queries I want to write, and I didn’t feel like learning the ORM’s conventions for translating things into SQL queries.
But then I found sqlc, which will compile a query like this:
-- name: GetVariant :one
SELECT *
FROM variants
WHERE id = ?;
into Go code like this:
const getVariant = `-- name: GetVariant :one
SELECT id, created_at, updated_at, disabled, product_name, variant_name
FROM variants
WHERE id = ?
`
func (q *Queries) GetVariant(ctx context.Context, id int64) (Variant, error) {
row := q.db.QueryRowContext(ctx, getVariant, id)
var i Variant
err := row.Scan(
&i.ID,
&i.CreatedAt,
&i.UpdatedAt,
&i.Disabled,
&i.ProductName,
&i.VariantName,
)
return i, err
}
What I like about this is that if I’m ever unsure about what Go code to write for a given SQL query, I can just write the query I want, read the generated function and it’ll tell me exactly what to do to call it. It feels much easier to me than trying to dig through the ORM’s documentation to figure out how to construct the SQL query I want.
Reading Brandur’s sqlc notes from 2024 also gave me some confidence that this is a workable path for my tiny programs. That post gives a really helpful example of how to conditionally update fields in a table using CASE statements (for example if you have a table with 20 columns and you only want to update 3 of them).
Someone on Mastodon linked me to this post called Optimizing sqlite for servers. My projects are small and I’m not so concerned about performance, but my main takeaways were:
db.SetMaxOpenConns(1) on it. I learned the hard way that if I don’t do this
then I’ll get SQLITE_BUSY errors from two threads trying to write to the db
at the same time.There are a more tips in that post that seem useful (like “COUNT queries are slow” and “Use STRICT tables”), but I haven’t done those yet.
Also sometimes if I have two tables where I know I’ll never need to do a JOIN
beteween them, I’ll just put them in separate databases so that I can connect
to them independently.
I run all of my Go projects in VMs with relatively little memory, like 256MB or 512MB. I ran into an issue where my application kept getting OOM killed and it was confusing – did I have a memory leak? What?
After some Googling, I realized that maybe I didn’t have a memory leak, maybe I just needed to reconfigure the garbage collector! It turns out that by default (according to A Guide to the Go Garbage Collector), Go’s garbage collector will let the application allocate memory up to 2x the current heap size.
Mess With DNS’s base heap size is around 170MB and the amount of memory free on the VM is around 160MB right now, so if its memory doubled, it’ll get OOM killed.
In Go 1.19, they added a way to tell Go “hey, if the application starts using this much memory, run a GC”. So I set the GC memory limit to 250MB and it seems to have resulted in the application getting OOM killed less often:
export GOMEMLIMIT=250MiB
I’ve been making tiny websites (like the nginx playground) in Go on and off for the last 4 years or so and it’s really been working for me. I think I like it because:
apt-get install golang-go or whatever and then a go build will build my projectServe(w http.ResponseWriter, r *http.Request) which read the request and send a response. If I need to
remember some detail of how exactly that’s accomplished, I just have to read
the function!net/http is in the standard library, so you can start making websites
without installing any libraries at all. I really appreciate this one.ioctl or
something that’s easy to doIn general everything about it feels like it makes projects easy to work on for 5 days, abandon for 2 years, and then get back into writing code without a lot of problems.
For contrast, I’ve tried to learn Rails a couple of times and I really want to love Rails – I’ve made a couple of toy websites in Rails and it’s always felt like a really magical experience. But ultimately when I come back to those projects I can’t remember how anything works and I just end up giving up. It feels easier to me to come back to my Go projects that are full of a lot of repetitive boilerplate, because at least I can read the code and figure out how it works.
some things I haven’t done much of yet in Go:
html/template a lot in Hugo (which I’ve used for this blog for the last 8 years)
but I’m still not sure how I feel about it.In general I’m not sure how to implement security-sensitive features so I don’t start projects which need login/CSRF/etc. I imagine this is where a framework would help.
Both of the Go features I mentioned in this post (GOMEMLIMIT and the routing)
are new in the last couple of years and I didn’t notice when they came out. It
makes me think I should pay closer attention to the release notes for new Go
versions.