Sequoia McDowell's Blog Feedhttps://sequoia.makes.software/Occassional posts about web programming, software engineering, and JavaScript.Wed, 13 Mar 2024 23:58:07 GMThttps://validator.w3.org/feed/docs/rss2.htmlThese two hands 🙌Keep Callm and Cairry Onhttps://sequoia.makes.software/keep-callm-and-cairry-on/ChatGPT: please write a blog post on the perils of LLM misuse from the perspective of a crotchety tech-skeptic. Incorporate the theme 'Invent the disease and you can sell the cure.' Add a hopeful reflection at the end so it's not too negative-sounding.https://sequoia.makes.software/keep-callm-and-cairry-on/Wed, 13 Mar 2024 04:00:00 GMTMany companies and individuals today are experimenting with LLMs. Proponents of GenAI suggest that in can improve our work by helping with tasks like authoring quarterly updates, writing emails, or even reading and summarizing emails you receive. Having AI both write and read emails raises the question of whether the AI is actually adding any value, but let's assume that someone is ultimately reading the AI generated prose.

A picture from Dr. Seuss's "sneeches on beaches" story. Rather than Sylvester McMonkey McBean charging sneeches to add stars then charging them again to remove those stars, it's OpenAI charging users to write long emails from a sentence, then charging users to summarize the long email down to a sentence. While this would seem to create value for the author of the text by saving them time in authorship, what is the impact, and the cost, on the readers of the text? The following questions jump to mind:

  1. What is the cost of the additional time spent reading?

    For example: a multi-paragraph email that might have been one or two sentences without GenAI augmentation.

  2. How do we guard against degeneration of the quality and relevancy of written output?
  3. How do we avoid implicitly punishing people who take the time to write meaningful texts "by hand"?

    Imagine that it takes thirty minutes to write one detailed, short, and meaningful document, but you can generate a similar but longer and less relevant update using an LLM in just five minutes. Assuming we judge people by "productivity," this penalizes the person who takes the time to do a good job and rewards the one who sends an email filled with junk an LLM barfed up.

  4. How do we ensure important human-written "needles" aren't lost as the "haystack" of text grows?

    LLMs will surely increase the volume of text we are expected to read, but the time we can devote to reading is fixed. This is likely to result in less more "skimming" and more skipping, i.e. simply not reading things. This increases the risk of readers missing important pieces of information.

  5. How do we ensure we don't use LLMs to create problems we then need LLMs to solve?

    "LLMs can summarize email threads and documents for you!" If the reason I need the document summarized is because the author created an overly-long document using an LLM, then GenAI's contribution to that interaction will have been all cost, no value.1

A picture from Dr. Seuss's "sneeches on beaches" story. Rather than Sylvester McMonkey McBean charging sneeches to add stars then charging them again to remove those stars, it's OpenAI charging users to write long emails from a sentence, then charging users to summarize the long email down to a sentence.

Reflections

As long as we keep the bar for quality high, insist that communications be meaningful and relevant, and have a feedback mechanism to let people know if their communications need improvement so they can correct course, then there's absolutely no problem with using LLMs. At the moment, however, it's not clear to me that we apply these standards of quality & "high signal-to-noise ratio" to internal communications–there's generally no consequence to writing excessively long emails or documents, because why would you criticize someone for taking the time to write things out? However, this calculus changes when it no longer requires time or effort to produce reams of text.

In my view, what's needed to keep LLM-spam from flooding our lives is raising the bar for written communication and cracking down on long-winded, irrelevant fluff in emails and other documents. Ultimately, as long as people are producing high quality work it doesn't matter where it came from.

If you have a plan for how we can avoid drowning in LLM slurry, please shoot me a note and I'll include it below!

Footnotes

1 Personally, I am prejudiced against reading prose generated by an LLM. If it wasn't worth your time to write the email, why the heck should I waste my time reading it?

]]>
Kubernetes Resource Optimization: Just The Basicshttps://sequoia.makes.software/kubernetes-resource-optimization-just-the-basics/How do you write about optimizing Kubernetes clusters without getting into the weeds? The whole thing is just weeds. Nonetheless, there are things you can do to reduce your kubernetes spend and maintain performance without an advanced degree in Kubernetology, and I'll go over some of those things in this post!https://sequoia.makes.software/kubernetes-resource-optimization-just-the-basics/Mon, 25 Jan 2021 05:00:00 GMTOne of the promises of container orchestration (e.g. Kubernetes) is that you can automatically scale up and down as needed and save money. This can be true with Kubernetes, but that automation doesn't happen automatically! When you set up your own cluster, you take on the responsibility for tuning it for performance and cost. If you don't tune, you don't save!

How much can running a Kubernetes cluster cost? As an single example, a friend at a mid-sized company I spoke with recently was spending 250,000 US dollars per month running applications in Kubernetes (not including storage & network costs!). A company like this has 250,000 reasons (per month) to pay attention to resource usage and work on reducing it. But how?

Pre(r)amble on Accuracy and Generalization

How do you write about optimizing Kubernetes clusters without getting into the weeds? The whole thing is just weeds.

Basically, every sentence in this post has one or more caveats. I have chosen to omit the "except in the following cases..." and "as long as the following is true..." statements. To get a more realistic picture, close your eyes and imagine a Kubernetes expert saying, "actually it's a bit more complicated than that" after every sentence.

But fear not! There are some "rules of thumb" that can help you realize significant savings without having to become a Kubernetes expert.

High-Level Goals

  • Goal 1: Cost: Use as few resources as possible to reduce how much we spend renting computers (where our Kubernetes cluster ultimately runs)
  • Goal 2: Availability: Keep our application availability high and keep response times within acceptable ranges
  • Goal 3: Scaling Handle spikes in traffic gracefully (sort of a sub-goal of Goal 2: Availability)

Background: How does Kubernetes Allocate Resources to Containers?

ℹ️ Skip this section if you're already familiar with how requests, limits, and HPAs work.

In a Kubernetes Cluster, each "workload" (for example, a web server application) runs in one or more pods (generally a wrapper around a single docker container). In a Kubernetes "deployment," we tell Kubernetes how many instances (pods) of our application to run, and what sort of resources we expect each of those pods to need (requests). We can also tell Kubernetes not to let a pod exceed a certain resource usage limit (limits) and to shut down any pod that does.

For example, we can tell Kubernetes, "Hey Kubernetes, run 3 Nginx instances; I expect each will use 1 CPU and 512Mi of RAM, but do not let it use more then 1Gi of RAM."

apiVersion: apps/v1
kind: Deployment
name: my-nginx
spec:
  replicas: 3              ⭐️ Run three instances
  template:
    spec:
      containers:
      - name: my-nginx
        image: nginx:1.14.2
        resources:
          requests:        ⭐️ request = "This is how much my application uses typically"
            cpu: 1000m        1000 "millicpus" = 1 CPU
            memory: 512Mi  
          limits:          ⭐️ limit = "Don't let the pod use more than this"
            memory: 1Gi

Telling Kubernetes an exact number of pods to run isn't very auto-scale-y, though, is it? We want more pods when we need them and fewer when we don't. We can do this by using a Horizontal Pod Autoscaler (HPA), which automatically increases and decreases the number of pods a deployment has.

How does the HPA know when to add or remove pods? Good question! The simplest way is for it to look at the average CPU usage across the pods in the deployment. When average CPU usage gets too high, add pods. When it gets too low, shut some pods down.

"Hey Kubernetes, run an HPA to scale the deployment we just made above. If the average CPU usage is well above 90%, bring more pods online. If it's much lower, shut some down. Also, don't let the total number of pods go below 2 or above 10."

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
spec:
  ⭐️ Scale up or down to keep the pods running ~90% CPU utilization
  targetCPUUtilizationPercentage: 90
  maxReplicas: 10  ⭐️ No more than 10 pods
  minReplicas: 2   ⭐️ No fewer than 2 pods
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-nginx

The key to optimizing our resource consumption is to run our pods as "hot" as possible, i.e. use all available CPU & memory. In an ideal world, this would mean allocating exactly as much memory as needed and no more, and consuming approximately 100% of the CPU requested.

So why not tell the HPA to do this ("Scale up to infinity, scale down to 0")? We could do this thus:

  • targetCPUUtilizationPercentage: 100
  • minReplicas: 1
  • maxReplicas: 100000
  • requests.cpu: 1m("one millicpu" i.e. 1/1000th of a CPU)
  • requests.memory 1 Mi
  • No CPU or RAM limit (infinity CPU & RAM!)

We've done it! Automatic scaling from 0 to inifinity with 100% efficiency––problem solved!

Unfortunately, It's Not That Simple...

Why It's Not That Simple

There are a couple reasons:

1. Your Containers Ultimately Run On Physical Computers With Physical Limits

The machines in a Kubernetes cluster are called "Nodes."

The requests you set for your containers tell Kubernetes how much CPU/RAM your application claims it needs to operate; Kubernetes uses this information to "schedule" your container to a node that can meet those requirements. It finds a node (physical or virtual machine) with the resources available to meet your container's requests and starts the container in that node.

If you only request cpu: 1m (one millicpu) for your application and Kubernetes has a node with 1 CPU (one thousand millicpu), Kubernetes will say "aha! This node can safely fit one thousand instances of your application!" When your 1000 pods start up on that node and each one actually wants 100m rather than 1m, that Kubernetes node will not be able to meet the pods' demands on that node. The pods will run really slowly or crash repeatedly, and Kubernetes will do nothing about it.

For this reason, it's important to tell Kubernetes roughly how much CPU & RAM your application needs so it can "set aside" an adequate amount.

2. Deployments Take Time To Scale

Assume each instance of your application can serve 1000 requests per second (rps). Your traffic is steady around 10,000rps, and you have exactly 10 pods handling it, all of them running at 100% CPU (maximum efficiency, baby!!).

All of a sudden traffic increases to 15,000rps. No problem, bring another 5 pods online.

Please wait 2-3 minutes to bring additional pods online

Two to three minutes?! Uh-oh! What are you going to do with that extra load for two to three minutes? Latency goes up, overloaded pods start to crash, and you've got an incident on your hands.

To avoid this problem, your deployments should have enough buffer (extra resources allocated) to handle spikes in traffic for two or three minutes. This means setting your HPA's target CPU utilization far enough below 100% to be able to absorb spikes in traffic long enough for Kubernetes to bring more pods online.

What Things Can You Fiddle With to Increase Efficiency?

A brief review of what requests and limits mean:

  • requests = "please make sure this much schedulable RAM/CPU is available on a node before putting this pod on that node" (how much your application typically uses.) This is a target (e.g. p50 behavior), not a maximum.
  • limits = "don't let my pod use beyond this amount.
    • In the case of memory immediately kill it
    • In the case of CPU throttle how much time it gets on the CPU

For a more detailed description check out this video from google, especially the first half.

With those basics out of the way, let's look at what settings you can change to Improve Your Efficiency:

Per Container: CPU (requests.cpu and limits.cpu)

Very High Level/General Goal: looking at a Grafana chart of pod resource usage for a specific deployment such as the example below, you want the "used" line (blue) to sit at or above the "requested" line (green). (NB: the particulars of this chart are specific to one cluster, but they are generally derived from cadvisor)

Grafana chart of CPU usage with blue "used" line below green "requested" line

Used should be at or above requested?! That doesn't make sense! How do I use more than I requested?

It's complicated but basically there's usually extra CPU available on the node beyond the total amount of CPU requested. As explained to me by a smart Kubernetes experts:

Remember, a pod is not a virtual machine with a fixed amount of physical CPU and memory; it is a group of "containerized processes" that run on a shared virtual machine with other pods.

Anyway just try to make the blue line sit at or above the green line.

Other Rules of thumb:

  1. Don't request above 1 aka "1000m" (unless you have an application specifically designed to make use of multiple cores) more info under "CPU"
  2. Don't request below 1: Assuming your workload is CPU bound (a single process can scale up 'til it runs out of CPU) it probably doesn't make sense to put this below 1 CPU. Instead, consider increasing the targetCPUUtilizationPercentage on the HPA (see below).
  3. Set limit to 1.5 times whatever the request is (e.g. requests.cpu: 1;limits.cpu: 1.5)

Per Container: Memory (requests.memory and limits.memory)

Memory you must be a bit more careful with. If a container doesn't have quite enough CPU it may run slowly. If it requires more memory than allowed by it's limit, it will be killed.

Very High Level/General Goal: looking at the graph of RAM usage like the one below, you want the blue line (used) to sit at the green line (requested). (It's OK for "used" to go over "requested" for short periods of time but not all the time.)

Grafana chart of RAM usage with blue "used" line far below green "requested" line

Rules of thumb:

  1. Set requests at or just slightly above what you typically observe a container to use
  2. Set limits above the request value to give your pod some room to handle periods of time where it needs a bit more memory while protecting the node from any one process attempting to use all of the available memory on the node. Without a RAM limit, the Kubernetes controller will not prevent a container from using 100% of the RAM on node, which would cause all other pods on the node to crash.

Horizontal Pod Autoscaler: Minimum Replicas

"How many pods do I need running during overnight hours?"

  1. Turn this very low so your deployment can scale down overnight to a bare minimum (assuming your business is primarily diurnal and your cluster serves a limited number of adjacent timezones)
  2. Consider scale-up time! If your application takes several minutes to start up, you should keep more pods online at minimum traffic (higher minReplicas value) to handle any unexpected load.
  3. Scaling below 2 is probably unwise as it leaves no redundancy at all.

Horizontal Pod Autoscaler: Maximum Replicas

"Beyond what point of scaling is my deployment obviously malfunctioning and running out of control?"

The maxReplicas setting allows you to throttle your deployment's horizontal scaling in order to control resource consumption. If yours is a business-to-consumer company, you probably don't want to do this with customer-facing services. If you were to run a TV ad that spiked traffic and website deployment wants to jump from 250 pods up to 650, you probably want to let it do that. Availability (Goal 2) is typically much more important than platform cost (Goal 1) when it comes to serving customers.

For this reason, Maximum Replics should be set to whatever the highest number of pods you've observed your deployment needing, plus 50 or 100%. Increasing this number doesn't cost anything directly.

Another consideration is resource constraints of service(s) your service connects to. For example, if your database can only handle a maximum of 1000 connections you probably want to set your max application replicas so that you do not exceed that capacity.

Horizontal Pod Autoscaler: CPU Utilization Target

This is one of the most important settings, especially for large clusters. You want this value to be as close to 100% as possible: the "hotter" your pods run, the less idle CPU you're paying for.

On the other hand, if you have _no_ idle CPU, you are unlikely to be able to handle a spike in traffic (Goal 3) while waiting for the deployment to scale up.

Interaction between requests.cpu and targetCPUUtilizationPercentage

Earlier we said "look at CPU efficiency and reduce requests.cpu if you are requesting more than you're using." That advice can be misleading when you're using an HPA that scales your cluster up and down based on average CPU utilization.

Example: If it looks like you're only running at 70% CPU efficiency, you may think "requests.cpu is 30% higher than needed, I should turn it down." But if targetCPUUtilizationPercentage it's set to 70, your CPU isn't overprovisioned, the deployment is just getting scaled up every time the average CPU across pods in the deployment goes much over 70%*, so the average usage always hovers around 70%! If your pods each request 1 CPU, the deployment will scale so they each use approximately 70% of 1 CPU. If your pods each request 600m, the deployment will scale so they each use approximately 70% of 600m.

Turn down requests.cpu all you want, utilization will continue to hover at 70%. In this case, you would need to increase targetCPUUtilizationPercentage, not decrease requests.cpu in order to increase efficiency.

  • I'm not sure how much... You set it to "70" and the HPA figures out when to scale up or down to keep the average close to 70.

How High Can You Go?

That's the $64 question. The formula for this is basically:

100% - (however much buffer you need to handle traffic spikes for the time it takes additional pods to come online and be ready to serve).

For more information on how to set this, see two places in this post:

  1. The introduction to Horizontal Pod Autoscalers in this section
  2. The bullet point titled "Fine-tune the HPA utilization target"

As mentioned above, The interplay between requests.cpu and hpa.targetCpuUtilizationPercentage can be hard to grok, but you should make sure you consider both settings when setting either. As a rule of thumb, increase targetCpuUtilizationPercentage as much as possible before reducing requests.cpu. It's hard to know much CPU your container can make use of if it's being aggressively throttled by a low targetCpuUtilizationPercentage.

You'll know you've set targetCpuUtilizationPercentage too high if response latency starts to climb when you get traffic spikes (or if your pods start crashing and you have an outage 😄).

Other General Tips

Start Small!

There's no need to make big infrastructure changes all at once, and in many ways it's inadvisable! Instead of turning targetCpuUtilizationPercentage from 70 to 95, consider taking a stepwise approach: step it up to 80 and observe for a few days or a week, then try 90, observe a while, rinse and repeat. This is a safe and easy way to get started if you're not sure what settings are best!

Conclusion

Hopefully these "rules of thumb" give you enough information to get started right-sizing your deployments. There is of course more fine-tuning you can do, but following these rules should get you at least 50% of the way there–there's lots of low-hanging fruit!

If you want to read more about this fascinating topic, please see the links below.

Further Reading

]]>
Reducing Docker Image Size (Particularly for Kubernetes Environments)https://sequoia.makes.software/reducing-docker-image-size-particularly-for-kubernetes-environments/While a few hundred megabytes in an application image seems a small concern in this day and age, when you're running scores or hundreds of instances in a cluster environments such as Kuberenetes, those megabytes start to add up. This can lead to spending more than needed on infrastructure, CI builds taking longer, and performance issues during cluster scaling. Read this post for a few simple tricks to slim down your docker image.https://sequoia.makes.software/reducing-docker-image-size-particularly-for-kubernetes-environments/Tue, 05 Jan 2021 05:00:00 GMT One Day on Slack...

1.3gb for a web app?! The size of your Docker image is getting out of control!

Uh-oh... The infrastructure team is calling you out for your Docker image size! Larger images means...

All of these are small problems but they add up! So your image is too big–don't panic! Following a few simple steps, you can cut your Docker image down to size in next to no time.

*this post assumes you are running Docker images in Kubernetes.

Contents of This Post

  1. Analyzing your image to see why it's big
  2. Chopping your image in two
  3. Cleaning up image contents

Analyzing Your Image

How big is your image? Assuming you've run docker build to build your image locally, this is easy to check with docker images:

➜ docker images
REPOSITORY                           TAG      IMAGE ID      CREATED       SIZE
gcr.io/ns-1/toodle-app               d82c28d  e4f0fd00de6d  4 months ago  1.32GB
gcr.io/ns-2/go-af                    v0.12.1  d665db43eb95  4 months ago  911MB

Our toodle-app image is 1.32 GB. But why is it so big? To figure that out, we'll use a handy tool called dive to analyze the image layer by layer.

➜ dive gcr.io/ns-1/toodle-app:d82c28d
Image Source: docker://gcr.io/ns-1/toodle-app:d82c28d
Fetching image... (this can take a while for large images)

When it completes it will show a view like this:

dive command output

There's a lot going on here!

  • The top-left panel shows you layers, each of which corresponds to a Dockerfile command. (If the command is truncated, find the full command below in the "Layer Details" section.)
  • The right column shows the filesystem tree for the currently selected layer–more on this later
  • The bottom left is Image Details and does not change as you navigate through the layers

Use the arrow keys to navigate up and down in the currently selected pane. Use tab to switch from the Layers pane to Current Layer Contents and back. Here I've pressed the down arrow several times to get to the 309 MB RUN make build/bin/server layer, then used tab to switch focus to the Current Layer Contents panel:

dive command output: "RUN make build/bin/server" layer

By default, the Current Layer Contents shows you a full tree of the filesystem up to and including the selected layer. What's typically more useful when analyzing your image size by layer is to see what files were added by that layer. Use ctrl+u (see "^U Unmodified" in the bottom right of the screenshot) to toggle that option off, which hides files unmodified by the current layer. This leaves visible only files that were Added, Removed, or Modified by this layer:

dive command output: "RUN make build/bin/server" layer with unmodified files hidden

Hello, what's this–this layer (which runs go build to build the actual toodle-app binary) add 309MB, but 237MB of that is go mod cache, which we do not need after the binary has been built!

Now we know why this layer is larger than it should be and we can see about cleaning it up (we'll do this below). Repeat the process for other large layers, or just poke around and see what each layer is adding or modifying.

Now that we know how to figure out why it's big, let's look at some strategies to cut down an image's size...

Chopping Your Image in Twain

When we build a project inside a docker image, each of the things we pull or copy into that image falls into one of two categories:

  1. Stuff we need to build the application
  2. Stuff we need to run the application

Some of the things we add to our toodle-app image, above:

  • make: needed to build the application
  • gcc: needed to build the application
  • go modules: needed to build the application
  • nginx: needed to run the application
  • ./build/client/strings: needed to run the application
  • the build/bin/server binary we create: needed to run the application

The stuff we need only at build time (make, gcc, etc.) does not need to be shipped as part of the image because it is not needed at runtime. We could uninstall make gcc etc. after running the build, but there is an even cleaner way: create one image just for building the application and one image just for running the application.

This has become a common pattern, and there are two ways to do this:

Two Separate Docker Files ❌ (old approach, should not be needed anymore)

With this approach you have one "builder" image and a separate "runtime" image. From a high level:

  1. A Dockerfile.builder Dockerfile defines your "builder" image. This builds an image based on....
  2. A separate runtime Dockerfile contains only runtime dependencies

Your CI step (e.g. on Google Cloud Build) loads the "Builder" image and runs docker inside that image to produce your runtime image.

Multi-Stage Builds ✅ (current approach: use this one 😄_)_

Multi-Stage Builds vastly simplify this process! A multi-stage docker file has multiple FROM commands, the first one for the "builder" and the second one for the "runtime." Basically you install all the build dependencies in your builder, run your build, then in the runtime build you COPY the build artifact into your runtime image which you can then deploy.

# Base image for our "builder" contains the go binary which we
# do NOT need at runtime (only to build the server application binary)
FROM golang:1.7.3 AS sequoiasbuilder
WORKDIR /tmp/foo
COPY src/main.go . # copy from host into builder

# build our go binary
go build -o my-application ./main.go

# The second FROM is a new image!
# (our "runtime" image)
FROM alpine:latest # using a stripped down linux (no go!)
WORKDIR /root/

# This has _nothing_ from the builder unless we copy it in
COPY --from=sequoiasbuilder /tmp/foo/my-application .
CMD ["./my-application"]

Now only those things necessary for runtime will be shipped to kubernetes, and the go binary (and all the go modules that go build pulled in) etc. are discarded! Read this short article for more.

Which one to use? A note on layer caching

The main reason to use the "multiple dockerfiles" approach is because the underlying "builder" image can be built once and reused across many builds. But Docker image layers by default, so why would you need this? You would need this if your (CI) build environment is discarding Docker image layers after each build, as Google Cloud Platform does by default. Discard docker images after each build = build from scratch each time.

There is a simple fix for this, however: the Kaniko builder allows layers to be stored, cached, and reused.

❗️ On GCB, using Kaniko is recommended for both builder and multi-stage patterns. Read more.

Cleaning Up Image Contents

Assuming you don't go the Multi-Stage route (above), or even if you did, you may be able to reduce your image size by removing stuff you don't actually need.

Ensure you actually need everything you've added

Did you start building your dockerfile by copying an existing one? If so, perhaps you have a command like this near the top

RUN apk add --no-cache make git curl bash nginx pkgconfig zeromq-dev \
     gcc musl-dev autoconf automake build-base libtool python

Check that you actually need all these things! Some may be cruft from another project, or the dependency may have been replaced. This is especially important if you're building off a shared "base" image file. When using a shared base image, it's very likely that there's stuff in there you don't need. Easy money!

Remove build tools and assets after the build completes

As we saw above using dive, the toodle-app go build was downloading and caching 237 MB of go modules, which were needed during the build but not after:

│ Current Layer Contents ├──────────────────────────────────────
Permission     UID:GID       Size  Filetree
drwxr-xr-x         0:0      72 MB  ├── mosmos
drwxr-xr-x         0:0      72 MB  │   └── toodle-app
drwxr-xr-x         0:0      72 MB  │       └── build
drwxr-xr-x         0:0      72 MB  │           └── bin
-rwxr-xr-x         0:0      72 MB  │               └── server
drwx------         0:0     237 MB  └── root
drwxr-xr-x         0:0     237 MB      └── .cache
drwxr-xr-x         0:0     237 MB          └── go-build

The following change fixed this problem in toodle-app:

- RUN make build/bin/server
+ RUN make build/bin/server && go clean -cache

Other examples of this are removing gcc/make/webpack or removing dev-dependencies for a JavaScript project.

Remove static assets when possible

You may have static assets in your image that rarely change and are not actually needed within the application. For example, the toodle-app image contains various reports and media assets:

-rw-r--r--         0:0      12 MB                  ├── MarketReport.pdf
-rw-r--r--         0:0      12 MB                  ├── EconReport.pdf
-rw-r--r--         0:0      34 MB                  ├── Toodle-MediaKit.zip
drwxr-xr-x         0:0     4.3 MB                  ├── press-releases

It's not huge, but this is 62MB that gets pulled by the Kubernetes controller for every deployment and copied into every container (the image upon which this post is based was running on 268 containers at the time of writing), all of which need garbage collection... it adds up!!

Conclusion

Making your images smaller is easy, it improves infrastructure performance and it saves money. What's not to like? If you've got more tips for shaving bits off your image size, drop me a line & I'll add them below!

]]>
Parsing JSON at the CLI: A Practical Introduction to `jq` (and more!)https://sequoia.makes.software/parsing-json-at-the-cli-a-practical-introduction-to-jq-and-more/JSON is everywhere you look these days. The `jq` tool makes it easy to slice, dice, and transform JSON from the command line. It can be hard to map the official manual to real-world applications, so let's look at some practical examples of `jq` and its cousins that handle YAML & HTML!https://sequoia.makes.software/parsing-json-at-the-cli-a-practical-introduction-to-jq-and-more/Mon, 21 Dec 2020 05:00:00 GMTjq is a command line tool for parsing and modifying JSON. It is useful for extracting relevant bits of information from tools that output JSON, or REST APIs that return JSON. Mac users can install jq using homebrew (brew install jq); see here for more install options.

In this post we'll examine a couple "real world" examples of using jq, but let's start with...

jq Basics

The most basic use is just tidying & pretty-printing your JSON:

$ USERX='{"name":"duchess","city":"Toronto","orders":[{"id":"x","qty":10},{"id":"y","qty":15}]}'
$ echo $USERX | jq '.'

outputs

{
  "name": "duchess",
  "city": "Toronto",
  "orders": [
    {
      "id": "x",
      "qty": 10
    },
    {
      "id": "y",
      "qty": 15
    }
  ]
}

I like this pretty-printing/formatting capability so much, I have an alias that formats JSON I've copied (in my OS "clipboard") & puts it back in my clipboard:

alias jsontidy="pbpaste | jq '.' | pbcopy"

The '.' in the jq '.' command above is the simplest jq "filter." The dot takes the input JSON and outputs it as is. You can read more about filters here, but the bare minimum to know is that .keyname will filter the result to a property matching that key, and [index] will match an array value at that index:

$ echo $USERX | jq '.name'
"duchess"
$ echo $USERX | jq '.orders[0]'
{
  "id": "x",
  "qty": 10
}

And [] will match each item in an array:

echo $USERX | jq '.orders[].id'
"x"
"y"

Filtering output by value is also handy! Here we use | to output the result of one filter into the input of another filter and select(.qty>10) to select only orders with qty value greater than 10:

echo $USERX | jq '.orders[]|select(.qty>10)'
{
  "id": "y",
  "qty": 15
}

One more trick: filtering by key name rather than value:

$ ORDER='{"user_id":123,"user_name":"duchess","order_id":456,"order_status":"sent","vendor_id":789,"vendor_name":"Abe Books"}'
$ echo $ORDER | jq '.'
{
  "user_id": 123,
  "user_name": "duchess",
  "order_id": 456,
  "order_status": "sent",
  "vendor_id": 789,
  "vendor_name": "Abe Books"
}
$ echo $ORDER | jq 'with_entries(select(.key|match("order_")))'
{
  "order_id": 456,
  "order_status": "sent"
}

(cheat sheet version: with_entries(select(.key|match("KEY FILTER VALUE"))))

Check out more resources below to learn about other stuff jq can do!

A Usecase: Debugging Some Prometheus Metrics

I have a prometheus metric showing up locally that doesn't look quite right:

async_task_total{task_name="/Users/duchess/charmoffensive/toodle-app/pkg/web/page/globals.go(189):(*GlobalsPopulator).Populate"} 6

The fact that the task_name value is a filename is a red flag–it's bad to have labels with high cardinality and I'm not sure how many of these there are. I want to find out:

  1. What do these task_name labels look like in production?
  2. How many unique values are there for these labels?

1. Getting the label values in production

At my company there is a CLI tool we'll call pquery that allows prometheus metrics to be queried from the command line, and it outputs JSON–how conventient! I use this tool in the following examples. You don't have this tool, but fear not: this wonderful post explains how to query prometheus using curl which is essentially what pquery does.

Using pquery we can view prometheus metrics from our various clusters. But even if we filter for this exact metric name, it's more data than we can easily look at. We'll use wc -l (wordcount: count lines) to get a rough idea of how much data we're working with:

$ pquery 'async_task_total' | wc -l
316117

316,117 lines of JSON! Oof! We want to iterate over the metrics. But what jq filter do we need to access the array of metrics? I find head useful for figuring out what the top level keys are for a large json structure:

$ pquery 'async_task_total' | head -n 20
{
    "data": {
        "result": [
            {
                "metric": {
                    "__name__": "async_task_total",
                    "app": "toodle-app-alpha",
                    "instance": "10.55.55.55:9393",
                    "job": "toodle-app-alpha",
                    "kubernetes_pod_name": "toodle-app-b446b7ccd-6mls6",
                    "namespace": "noweb",
                    "netpol": "toodle-app",
                    "node_name": "gke-production-04-3455c6df-j526",
                    "release": "toodle-app",
                    "task_name": "/charmoffensive/toodle-app/pkg/core/user/user.go(67):GetAccountDetails"
                },
                "value": [
                    1600981630.344,
                    "2"

You can also use jq 'keys' if you just want the key names:

$ pquery 'async_task_total' | jq 'keys'
[
  "data",
  "status"
]

Anyway we can see from above that .data.result is the "filter" path for the metrics themselves. Let's get the first result ([0]) of this array so we can see what one metric looks like:

$ pquery 'async_task_total' | jq '.data.result[0]'
{
  "metric": {
    "__name__": "async_task_total",
    "app": "toodle-app-alpha",
    "instance": "10.55.55.55:9393",
    "job": "toodle-app-alpha",
    "kubernetes_pod_name": "toodle-app-b446b7ccd-6mls6",
    "namespace": "noweb",
    "netpol": "toodle-app",
    "node_name": "gke-production-04-3455c6df-j526",
    "release": "toodle-app",
    "task_name": "/charmoffensive/toodle-app/pkg/core/user/user.go(67):GetAccountDetails"
  },
  "value": [
    1600981906.069,
    "2"
  ]
}

Oops! That app value (toodle-app-alpha) indicates a mistake: I'm only interested in results from the toodle-app app, not from other apps that may also emit this metric (such as the alpha deployment we see here). We could select for this using jq, but promql already lets us filter by metric names so we'll do that instead: pquery 'async_task_total{app="toodle-app"}'.

We're interested in the task_name value in the metric object, so let's pluck that from each item in the array above:

$ pquery 'async_task_total{app="toodle-app"}' \
| jq '.data.result[].metric.task_name'
"/charmoffensive/toodle-app/pkg/core/guides/guides.go(411):generateGuideFromDefinition"
"/charmoffensive/toodle-app/pkg/core/place/place.go(122):FetchPlaceDetailForCollection"
"/charmoffensive/toodle-app/pkg/core/place/place.go(132):FetchPlaceDetailForCollection"
"/charmoffensive/toodle-app/pkg/core/user/user.go(67):GetAccountDetails"
"/charmoffensive/toodle-app/pkg/core/user/user.go(73):GetAccountDetails"
"/charmoffensive/toodle-app/pkg/web/page/area.go(160):(*areaView).fetchData"
"/charmoffensive/toodle-app/pkg/web/page/area.go(166):(*areaView).fetchData"
"/charmoffensive/toodle-app/pkg/web/page/area.go(172):(*areaView).fetchData"
"/charmoffensive/toodle-app/pkg/web/page/area_category.go(140):(*areaCategoryView).fetchData"
"/charmoffensive/toodle-app/pkg/web/page/area_category.go(146):(*areaCategoryView).fetchData"
{... + 18009 more lines}

📝 Update: It was pointed out to me that as this is a post about jq, not about promql, a jq solution is more appropriate here. I'd originally used promql because it's more efficient to filter on the server when possible. Here's the jq version which uses the select filter:

$ pquery 'async_task_total' \
| jq '.data.result[].metric | select(.app == "toodle-app").task_name'

Back to the post...

Eighteen thousand values for that label!? That's bad!! But wait a tic–if other labels are varying, some of these may actually be duplicates. Let's sort them and see:

$ pquery 'async_task_total{app="toodle-app"}' \
| jq '.data.result[].metric.task_name' | sort | head -n10
"/charmoffensive/toodle-app/pkg/core/collection/resolvers/query.go(221):(*queryResolver).Verticals"
"/charmoffensive/toodle-app/pkg/core/collection/resolvers/query.go(221):(*queryResolver).Verticals"
"/charmoffensive/toodle-app/pkg/core/collection/resolvers/query.go(221):(*queryResolver).Verticals"
"/charmoffensive/toodle-app/pkg/core/collection/resolvers/query.go(221):(*queryResolver).Verticals"
"/charmoffensive/toodle-app/pkg/core/collection/resolvers/query.go(221):(*queryResolver).Verticals"
"/charmoffensive/toodle-app/pkg/core/collection/resolvers/query.go(221):(*queryResolver).Verticals"
"/charmoffensive/toodle-app/pkg/core/collection/resolvers/query.go(221):(*queryResolver).Verticals"
"/charmoffensive/toodle-app/pkg/core/guides/guides.go(411):generateGuideFromDefinition"
"/charmoffensive/toodle-app/pkg/core/guides/guides.go(411):generateGuideFromDefinition"
"/charmoffensive/toodle-app/pkg/core/guides/guides.go(411):generateGuideFromDefinition"

Yep: most of these are actually not unique names. uniq to the rescue!

$  pquery 'async_task_total{app="toodle-app"}' \
| jq '.data.result[].metric.task_name' | sort | uniq
"/charmoffensive/toodle-app/pkg/core/collection/resolvers/query.go(221):(*queryResolver).Verticals"
"/charmoffensive/toodle-app/pkg/core/guides/guides.go(411):generateGuideFromDefinition"
"/charmoffensive/toodle-app/pkg/core/place/place.go(122):FetchPlaceDetailForCollection"
"/charmoffensive/toodle-app/pkg/core/place/place.go(132):FetchPlaceDetailForCollection"
"/charmoffensive/toodle-app/pkg/core/user/user.go(67):GetAccountDetails"
"/charmoffensive/toodle-app/pkg/core/user/user.go(73):GetAccountDetails"
"/charmoffensive/toodle-app/pkg/web/page/area.go(160):(*areaView).fetchData"
"/charmoffensive/toodle-app/pkg/web/page/area.go(166):(*areaView).fetchData"
"/charmoffensive/toodle-app/pkg/web/page/area.go(172):(*areaView).fetchData"
"/charmoffensive/toodle-app/pkg/web/page/area_category.go(140):(*areaCategoryView).fetchData"
{... more}

Now I've got a full list of all the distinct values for this label, which answers my first question.

How many unique values are there for these labels?

Well that's pretty easy at this point...

$ pquery 'async_task_total{app="toodle-app"}' \
| jq '.data.result[].metric.task_name' | sort | uniq | wc -l
92

Ninety-two! Not so bad. Mystery solved, and I can say with reasonable confidence "the cardinality of these labels isn't terribly high, I'm leaving this alone 😅"

More jq Use Cases

Getting The Statuses of a Kubernetes Deployment

Techniques and features used in this task:

  • Concatenating different fields as strings!
  • Using -r to output raw output rather than escaped/quoted
$ kubectl get deployments toodle-app -o json \
| jq '.status.conditions[]|(.reason + ": " + .message)' -r
NewReplicaSetAvailable: ReplicaSet "toodle-app-545b65cfd4" has successfully progressed.
MinimumReplicasAvailable: Deployment has minimum availability.

Getting All Kubernetes Annotations with the prometheus. Prefix

$ kubectl get service toodle-app -o json \
| jq '.metadata.annotations | with_entries(select(.key|match("prometheus")))'
{
  "prometheus.io/path": "/varz",
  "prometheus.io/port": "9393",
  "prometheus.io/scrape": "true"
}

There's a Version for yaml as well!!

$ cat cronjob.yaml
apiVersion: batch/v1beta1
kind: CronJob
spec:
  schedule: "*/1 * * * *" # once per minute
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: deployment-scanner
              image: deployment-scanner:38

$ brew install yq
$ yq '.spec.jobTemplate.spec.template.spec.containers[0].image' cronjob.yaml
"deployment-scanner:38"

I used this to build a new docker image tag each time I incremented the image value in cronjob.yaml, before applying the configuration (while I was developing a kubernetes cronjob locally):

docker build -t $(yq '.spec.jobTemplate.spec.template.spec.containers[0].image' cronjob.yaml -r) . && kubectl apply --filename=cronjob.yaml

And a similar tool for HTML?!

➜ curl -sL https://postmates.com/feed | pup 'head title'
<title>
  postmates: Food Delivery, Groceries, Alcohol - Anything from Anywhere
</title>
➜ curl -sL https://postmates.com/feed | pup 'head meta[charset]'
<meta charset="UTF-8">
➜ curl -sL https://postmates.com/feed | pup 'head meta[charset] json{}'
[
  {
  "charset": "UTF-8",
  "tag": "meta"
  }
]

The End

What do you use jq or yq for? Will you be adding pup to your workflow? Sound off in the comments, which is to say "drop me a line!"

More Resources

  • jq play: a jq playground to try stuff out
  • TFM: The Friendly Manual
  • yq: like jq for yaml
  • pup: like JQ for HTML!

Comments

I needed this tutorial 6 months ago (and 6 months before that, and 6 months before that). :D Highly recommend looking at and maybe including gron in this as a very nice complement to jq. It fills in some use cases in a very straightforward way that are pretty cumbersome in jq, such as finding a field deeply nested in an optional parent.

- heleninboodler,

Thanks helen, I didn't know about that tool & it does look quite useful! I'd probably add it into the "figuring out the structure of the data" step in the workflow described above, to complement head. Thanks for the tip!

More Comments

👉 Some good discussion & lots of tips & links to similar articles on hackernews.

]]>
A Fond Farewell to My Greatest Fanshttps://sequoia.makes.software/a-fond-farewell-to-my-greatest-fans/I am removing the comment forms from my website due to spam. But are spam comments all bad? Not necessarily! Some of my best comments have been spam.https://sequoia.makes.software/a-fond-farewell-to-my-greatest-fans/Wed, 13 May 2020 04:00:00 GMTI've been using Simple Form to power the comment boxes on this (static) site for several years. I was JAMStack before it was cool, who knew? Unfortunately, even with Akismet configured, I still get heaps of spam comments. To be honest, it's just about 100% spam. It turns out writing infrequently, deleting ones twitter account, and doing nothing to promote ones writing does not result in a huge amount of user engagement! But I digress.

While the spam comments are bothersome overall, I feel the need to give credit where credit is due: some of the best comments I've recieved have been spam. There have been times when I'm having a bad day, then I check my email and see something like this:

Superb, what a web site it is! This weblog provides helpful data to us, keep it up.

- similar internet page, 2017

Wow! Thank you similar internet page!! I was feeling a discouraged the day I got this, and I can't lie–this comment made me feel a bit better. I know you're only a bot, but I choose to take the encouraging words at face value, and they make me feel appreciated.

So, as a farewell to my loyal spam bots, I will feature and respond to the best comments here.

The Best Spam Comments I've Received

Thanks a lot for giving everyone such a brilliant chance to discover important secrets from this site. It is often very useful plus packed with a great time for me and my office co-workers to visit your web site more than three times per week to see the latest issues you will have. Not to mention, I am also always amazed for the spectacular secrets served by you. Selected 3 facts on this page are completely the simplest I've had.

- buy cial𝚒s, 2017

Hi buy cial𝚒s! Thank you for the kinds words, and please give my warm regards to your coworkers. I must say I'm surprised to hear you visit three times weekly! I'll have to start posting a lot more frequently to keep the content fresh.

What i do not understood is in reality how you're no longer really much more well-appreciated than you may be now. You're so intelligent. You recognize thus significantly in terms of this topic, produced me in my opinion consider it from so many numerous angles. Its like women and men don't seem to be fascinated until it's something to accomplish with Lady gaga! Your individual stuffs great. Always care for it up!

- https://[removed]/sildenafil-generique-forum/, 2017

Oh my goodness... this comment almost makes me tear up. I do feel underappreciated at times, and it really means a lot to me to have someone notice the care and work I put into the content here. As for why I'm not "much more well-appreciated" despite being "so intelligent" (flattery will get you nowhere!), the fact is it takes much more than intelligence or even good work to succeed. Marketing and branding are hugely important, and for better or worse, those are things I'm not terribly interested in.

It's not unlike the joke about the traveller trying to smuggle coffee into Haifa without paying the required duties. The customs official looks in his sacks and asks him what it is he's carrying. "Birdseed," the traveller replies. "Since when do birds eat coffee?" asks the incredulous customs officer.

Whereup the traveller replies with a shrug "they'll eat if they want; if they don't want to, they won't."

It's in a similar spirit that I publish my posts.

Howdy! This post could not be written any better! Looking through this post reminds me of my previous roommate! He constantly kept talking about this. I will send this article to him. Pretty sure he'll have a great read. Thanks for sharing!

- lirik lagu doa suci imam s arifin, 2017

Hi lirik lagu doa suci imam s arifin! I am happy to share, and be sure to say "hi" to your roommate for me!!

you are really a good webmaster. The web site loading speed is amazing. It seems that you are doing any unique trick. Furthermore, The contents are masterwork. you've performed a wonderful task on this topic!

- finance, 2018

Thanks finance! It's a static site served from Github pages so I am not surprised to learn it's fast! Furthermore, I wrote my own CSS (with a couple bits copy/pasted from a framework) and there's almost no JavaScript on the site. It's quite simple really, just keep it light & static. I'm glad the performance does not go unnoticed! As for the contents being a "masterwork," well, I'll have to leave that for the Nobel committee to judge.

At this moment I am going away to do my breakfast, later than having my breakfast coming over again to read other news.

- student loans, 2018

Thank you for the update student loans! I was wondering why you hadn't finished reading, but it makes sense now. Nothing is more important than your health, so (this goes for all my readers) if you find yourself excessively hungry, thirsty, or tired while reading one of my posts, take a break! It will be here when you get back, I promise.

This site was... how do you say it? Relevant!! Finally I've found something which helped me.

- buycialis, 2018

Ha! Thanks buycialis!!

You're so cool! I do not believe I have read through a single thing like this before. So great to discover someone with genuine thoughts on this topic. Seriously. many thanks for starting this up. This site is one thing that's needed on the internet, someone with a bit of originality!

- importance of education, 2019

This type of feedback is what keeps me doing this when it seems "pointless," which it does at times. In particular I appreciate what you said about "someone with a bit of originality." Seriously. Thank you importance of education!

This is really fascinating, You're an excessively professional blogger. I've joined your feed and stay up for looking for more of your excellent post. Also, I have shared your website in my social networks

- student loans, 2019

student loans! You're back!! How was breakfast? You'll be happy to know, incidentally, that I have added an RSS feed (at the request of a reader!), which you can find here.

My brother recommended I might like this blog. He was totally right. This post actually made my day. You can not imagine just how much time I had spent for this info! Thanks!

- Recommended website, 2020

You know what, Recommended website? Your comment made my day. Thank you.

Do you mind if I quote a couple of your posts as long as I provide credit and sources back to your webpage? My blog is in the very same niche as yours and my visitors would truly benefit from a lot of the information you provide here. Please let me know if this okay with you. Thanks!

- best sandwich in North Carolina

Not at all, best sandwich in North Carolina! Provided you include attribution, I've no issue with quoting my work. If you can avoid copying posts from start to finish I would appreciate it, but short of that, go ham!

Speaking of pork, I am thrilled to get comments from the Old North State, my home sweet home! Go Heels!!


With that, I'll wrap up this little tribute to my loyal and enthusiastic spam commenters. Thanks again to everyone, and if you wish to contact me it's still possible! My email address can be found on the contact page.

Happy spamming!

]]>
Two Painful Ways to Misuse JavaScript's Symbol Descriptionshttps://sequoia.makes.software/two-painful-ways-to-misuse-javascripts-symbol-descriptions/ES2019 introduced a new way to access the (non-unique) description property of (unique) Symbol objects. As with any new JS feature, every developer's first question is "how can I shoot myself in the foot with this?" Read and find out! https://sequoia.makes.software/two-painful-ways-to-misuse-javascripts-symbol-descriptions/Tue, 06 Aug 2019 04:00:00 GMTI was reading an article on new features in ES2019 earlier today, and one jumped out at me: Symbol.prototype.description. "Wow," I thought, "this feature will be really easy to misuse!" In this post, we'll look at a couple of ways you can start misusing this cutting edge JavaScript feature today!

Background

Symbols were introduced in ECMAScript 6 (ES2015) as a way to create truly unique values in JavaScript applications. They have several cool features, but the main point of Symbols is that they are unique. Although multiple Symbols can be created with identical descriptions (e.g. x = Symbol('a'); y = Symbol('a')), the Symbols themselves are different. The description is just a helpful label, almost like a comment: it cannot be directly accessed from the Symbol once it's created.

Until ES2019! Now the Symbol's description property can be directly accessed via mySymbol.description. Why is useful? Who cares!1 This blog post is not about what's useful, it's about misusing JavaScript for pain and heartache! So without further ado,

Method 1: Comparing Symbols by Description

As mentioned, Symbols are unique.2 This means if one is created by a vendor:

// vendor/x.js
catalog_id = Symbol('cat_id');
module.export = catalog_id;

...and then another by me...

// lib/y.js
cat_id = Symbol('cat_id');
module.export = cat_id;

they will be unique values:

catalog_id = require('vendor/x.js');
cat_id = require('lib/y.js');

const item = {};
item[catalog_id] = 123;

// Check if catalog id is set:

// 1. get the object keys that are symbols:
const symbolProps = Object.getOwnPropertySymbols(item);

// 2. see if that array contains catalog id
hasCatalogId = symbolProps.includes(cat_id);

hasCatalogId is false! What gives?? The Symbol I defined in lib/y.js is supposed to reference the same property as that referenced by the Symbol created in vendor/x.js! I created mine to match theirs (they have the same description). There must be a way to see that they are actually "the same"... Symbol.prototype.description to the rescue:

//... require(), const item etc.

// 1. get the DESCRIPTION of object keys that are symbols:
const symbolPropDescriptions = Object.getOwnPropertySymbols(item)
  .map(symb => symb.description);

// 2. see if that array contains catalog id
hasCatalogId = symbolPropDescriptions.includes(cat_id.description);

Problem solved: hasCatalogId is now (correctly) true!

Method 2: Serializing Using Description

In this case, I have Symbols representing the unique roles my user's might have (author, admin, etc).

const admin = Symbol('admin');
const author = Symbol('author');

I also have a collection of users with their roles defined:

const users = [
    {name: 'vimukt', role: admin},
    {name: 'danilo', role: admin}
];

log(users[0].role === admin); // true
log(users[0].role.description); // "admin"

I want to serialize these for some reason:

usersJSON = JSON.stringify(users);

But when I deserialize, my roles are gone:

deserialized = JSON.parse(usersJSON);
log(deserialized[0].role); // undefined

JSON.stringify is refusing to convert my Symbol values to strings! Don't worry, with a little trickery, we can get around this limitation:

function serializeWithRoles(users){
    return JSON.stringify(
        users.map(user => {
            // convert the role Symbols to strings so they serialize
            user.role = user.role.description;
            return user;
        })
    )
}

function deserializeWithRoles(userJSON){
    return JSON.parse(userJSON)
        .map(user => {
            // convert role strings back to symbols
            user.role = Symbol(user.role);
            return user;
        });
}

Let's try it:

const usersJSON = serializeWithRoles(users);
const deserialized = deserializeWithRoles(usersJSON);

log(deserialized[0].role); // Symbol(admin)
log(deserialized[0].role.description); // "admin"

Et voilà! Serializing & deserializing with our roles "works", and we have Symbols at the finish, just as we did at the start.

Spoilers: Why These Methods Are Bad

Comparing Symbols by Description

This is bad because it breaks a major feature of Symbols: the fact that they're unique. The proper way to use a Symbol defined elsewhere is to import the that Symbol and use it directly. If it's not not exported, it probably is not meant to be used externally. If it is meant to be used externally but was not exported, that's a bug.

If you don't care about using the exact same copy of a Symbol object property or Symbol value, or you want to define such values in multiple places and compare them, a string is probably more appropriate. If you want to use the same Symbol but access it from multiple places using the description, use Symbol.for (note the caveats about namespacing this type of Symbol!).

Serializing Using Description

The fact that the built-in JSON.stringify method refuses to convert Symbols to a string (JSON) representation gives us a hint that doing this is probably not a good idea. In fact, it's impossible to convert a Symbol into a string and then back into the same Symbol because a) the Symbol exists uniquely only within the context of a running application and b) while the Symbol description may be a string which can be serialized (as we did above), the description is not the symbol.

"The Treachery of Images" by René Magritte

Attempting to serialize and deserialize Symbols, which exist only in the context of a running application, cannot work. In our example above, while the admin Symbol is "serialized" by description string then deserialized by passing the string to Symbol(), each of the Symbols created in the deserialization is unique. This means that while users[0].role === users[1].role was true before serializing & deserializing, it is false after. You could use Symbol.for to get around this, but at that point the Symbol is no more reliable or unique that its description, in which case why not just use the description.

Conclusion

When I read of the introduction of Symbol.prototype.description, the antipatterns it would make easier were the first thing that came to mind. I am sure both of the methods I describe above will exist in the wild soon, so when you come across one of them remember: you heard it here first!

Footnotes

1 If you do want to learn more about the uses of Symbols, see this informative article.
2 With the exception of global symbols find-or-create'ed using Symbol.for, but these will never have the same value as a Symbol created using Symbol().

]]>
Avoiding Ticket Scope Creep While Improving Codehttps://sequoia.makes.software/avoiding-ticket-scope-creep-while-improving-code/We all want to "improve as we go" when writing code, but how do we do this while also getting the feature development task at hand done and keeping our PRs small?https://sequoia.makes.software/avoiding-ticket-scope-creep-while-improving-code/Wed, 05 Jun 2019 00:00:00 GMT"Leave code better than you found it," the idea that one should make minor improvements and refactors in the course of feature development rather than leave improvements for later, is an important strategy for staying on top of technical debt & keeping your code clean. There is such a thing as “too much of a good thing,” however!

In the course of developing a feature, you might notice a library that needs upgrading, which requires some minor refactors, some repeated code to extract into functions, a small change that could result in a performance improvement, and a dozen other issues. If you attempt to address them all in the moment, two things will happen:

  1. The feature you were working on will take a week or more to finish rather than the "one or two days" you estimated
  2. Your pull request will become enormous, making review arduous and time-consuming, further delaying feature delivery (and likely frustrating the reviewer!)

The following strategies can help you stay on track with the task at hand and keep your pull requests manageable while ensuring you don't lose track of the issues you uncovered in the course of development.

“To Do Later” List

Upon starting a task, make a list for “To Do Later” actions. When, in the course of working on Issue-A, you come across something that should be improved but a) will take time and b) is not necessary to complete Issue-A, put this item on the “To Do Later” list.

After completing Issue-A, go over your “To Do Later” items and do them (small items) or create tickets for them (larger items), as appropriate. Noting items for later allows you to stay focused on your current task without losing track of the improvement you’d like to make.

This idea was adapted from the “Parking Lot” concept for meetings.

Improvements Budget

Upon starting a coding task (“Issue-A”), make a numbered “Improvements” list (1,2,3) for code improvements. When you find a small issue you’d like to address (outside the ticket scope), fix it, and add it to the “Improvements” list as item one. Do this again for the second and third small issues you fix, then stop.

Once you’ve made three small improvements, your “Improvement Budget” has been spent, and no more out-of-scope improvements should be worked on as part of Issue-A. Any additional out-of-scope issues must be put on the “To Do Later” list.

This strategy is a compromise between “focus only on the issue at hand and don’t improve anything” and “fix everything issue you find, as you find it, even if this means Issue-A takes weeks to complete rather than days.” The “budget” can of course be adjusted to a number of items besides 3.

This idea was inspired by the Most Important Tasks strategy, which also has the concept of a budget of three items.

What Else?

Do you use these strategies, and if so are they useful to you? Do you have other strategies to balance code improvement and feature delivery? If so please let me know in the comment box below. Happy coding!

]]>
When to Use a Programming Frameworkhttps://sequoia.makes.software/when-to-use-a-programming-framework/Frameworks are good when they speed up stuff you already know how to do; if they are just magic incantations and you have no idea what they’re actually doing, that’s when you run into trouble. https://sequoia.makes.software/when-to-use-a-programming-framework/Fri, 15 Feb 2019 05:00:00 GMTAgainst my own best judgement, I recently read a clickbait article with title like "Why You Shouldn't Use Web Frameworks." While I do hold a general wariness of web application frameworks, I disagree that frameworks are bad in all cases. In fact, in many cases you'd be downright crazy not to use a framework.

And what cases are those, pray tell?

Read on and find out!

When to Use a Framework

In short, it makes sense to use a framework for:

  1. Prototyping: Getting an idea to a usable piece of software very quickly
  2. Expediency: You could solve the problem the framework addresses, but someone has already come up with a good solution, and/or
  3. Extending your reach: The framework helps you do something outside your core skills, which you have no interest in learning how to do

Prototyping

If you're making a website for a business idea, using Rails & Bootstrap can get you up and running with layouts, login/auth, routing, admin screens, callout boxes, modals, etc. in less than a day, provided you're familiar with the tools. This is pretty amazing!

Expediency

A friend was building a site with Drupal and he wanted to implement search with Solr. There was a Drupal plugin to do it, so he used that plugin. He was a skilled developer and could have written a CMS from scratch and implemented Solr search to boot, but this would have been a waste of time because someone had already done it. He wasn't using a framework out of ignorance or inability to complete the task without one, but out of expediency.

Likewise, writing basic HTTP routing logic and query string parsing is not a herculean task, but if someone has done it already, why reinvent the wheel? A key component to this rationale, however, is that you could do the task by hand if you wanted to. We'll discuss why this is important later...

Extending Your Reach

Sometimes...

  1. you you don't know how to solve a problem from scratch
  2. you have no interest in learning how to do so, and
  3. a framework offers a ready made solution

This is a great time to use a framework!

If you're a python developer and you want a nicely layed out website that works well on mobile phones and looks professional, you can get there with Bootstrap without spending time learning a lot of HTML, CSS, and other browser technologies. A framework is a very powerful tool in this case, extending your ability to create things far beyond your realm of expertise.

But you should learn the underlying technologies!!

- Strawman Who Hates Frameworks

If writing HTML based user interfaces is a core skill of yours (or you wish it to be), yes, you should learn the underlying technologies. But if it's not a core skill and you don't wish it to be, then learning how flexbox works is a waste of your time. It is best to focus your time on those things you _do_ wish to be an expert in, and use out-of-the-box solutions for the rest.

When Not to Use a Framework

There are at least four situations where it's not appropriate to use a framework:

  1. When you're learning: When you are just starting on your journey to becoming a professional programmer
  2. Because it's all you know: Not everything is a nail, you should have more tools than just a hammer
  3. When it's overkill: When the framework has ten features and you only need one
  4. When it makes absolutely no sense: More common than you'd think!

Learning

This is the counterpoint to the "use a framework if you don't care to learn the underlying technologies" argument. If you _do_ want to learn and develop expertise in the underlying technologies, a framework is not a good way to start, in my opinion.

For example, if you are just getting started with web programming, and want to become a professional JavaScript programmer, do not start out by using Angular or React/Redux/Webpack!! These tools assume a high degree of familiarity with JavaScript. They are built for professionals to speed development and scaling of complex applications. They are not built to help beginners learn JavaScript & HTML.

Starting your learning journey with a big framework has many disadvantages:

  1. It's likely to be overwhelming and confusing
  2. You're dependant on the framework, and if you need to augment or extend its behavior you won't know how
  3. You'll need to rely on "experts" to help you when you get stuck, because you'll lack the skills needed to actually understand the framework internals once it becomes necessary to do so
  4. It doesn't teach you the fundamentals of the language or environment, so you'll be stuck using that framework until you bite the bullet and actually learn the underlying tools
  5. You won't know why the framework does what it does–this understanding only comes from working without a framework

Instead of starting with a framework, just start with HTML, JavaScript, and a tab pinned to https://devdocs.io/. Try stuff out! Read the docs! Don't be afraid to write "bad" code–doing so is essential to learning.

When certain tasks become tedious, you'll know it's time to pull in a library. Eventually you'll get to a point where you say "gee, I wish there were an easier way to do X", for example, create HTTP requests. At that point you pull in a library to do that task. Whereas a framework gives you a full toolbox and a set of instructions, writing by hand and pulling in libraries as needed will help you understand why it is useful to use that tool., which is a crucial to programming effectively, with or without a framework!

It's All You Know

If all you know is React/Webpack, you will struggle to solve problems that React was not designed to solve. Ideally, you should analyze a business problem first, then decide what the best tool is to solve that problem. If all you have is one tool, you are not capable of doing this.

Having only one tool that you know frequently leads to the next two framework-use-antipatterns...

Overkill

Imagine you have a bunch of IoT thermometers, and they need a server to periodically send data to, which will write that data to a CSV file. This server needs exactly 1 endpoint: record_temperature.

If all you know is Ruby on Rails, you will probably create a new Rails app with a database, a complex & powerful ORM, models, controllers, flexible authentication options, admin routes, json marshalling, HTML templating, and dozens of other features. This is overkill! Furthermore, the tool isn't even built to do what you need it to do (Rails is designed to work on a database, not a single CSV file). If you learned "Ruby" to start, rather than "Ruby on Rails", you would be able to easily build a tiny server, probably with one single file and zero dependencies, and this is guaranteed to be cheaper to run and easier to maintain.

When It Makes No Sense

Once, for a coding test, my employer asked engineering job candidates to build a sample application that took text as input (from the command line), did some processing, and output some other text. The candidate could choose whatever language they were most comfortable with. A typical solution might contain two or three source files of Java, Python, or JavaScript (Node.js).

I was reviewing one candidate's submission, and found a half dozen directories, config files for eslint, vscode and webpack, several web-font files, an image optimization pipeline, all of React.js and far more.

It Doesn't Make Sense

The candidate had clearly learned to use create-react-app to start projects, and had learned no other way. That lead them to submit a solution one hundredfold more complex than was needed, and that didn't meet the requirements–we didn't ask ask for a web app! This is an extreme case but it illustrates the fact that if you only know one tool, you will invariably attempt to use it to solve problems it's not well suited for.

Conclusion

Programming frameworks can be useful tools, but they can only be deployed appropriately if you've learned enough to be able to pick the right tool for the job. To learn this skill, you must first learn to work without frameworks.

Put another way, the best way to ensure you use frameworks properly (as a beginner) is to not use them at all. Does that make sense?

Comments

I remember that my first contact with the world of web apps was using Ruby on Rails. It surely felt like magic and it was amazing working with the framework, but years later I started struggling to understand simple HTTP requests and MVC concepts. Therefore, I couldn't have put it better: the better way to start learning how to build professional web apps is to start with just a piece of wood, a hammer and a nail - but not with an IKEA box with a book containing complicated instructions. Thanks Sequoia!

- Leonardo Lima,

Thank you for the generous feedback, and I'm glad to hear this post reflects your experience well. I like your "Ikea" analogy–I kept struggling for analogy around a gas-powered ditch digger vs. a shovel, but Ikea furniture is much better. Cheers!

Great blog post! If I could contribute one thing to this article it would be that with new frameworks hooking up a debugger and watching the entire stack from the beginning of a request to the end is a tremendous learning experience. This is a great way to get exposed to more complex topics when you are at a more junior level.

- Ori Zigindere,

]]>
The Rabbi and the Open Source Projecthttps://sequoia.makes.software/the-rabbi-and-the-open-source-project/A rabbi, an élite open source project maintainer, and a workaday developer walk into a tech conference... https://sequoia.makes.software/the-rabbi-and-the-open-source-project/Tue, 30 Oct 2018 04:00:00 GMTA rabbi finds himself at a tech conference, perusing the vendor booths, when he is approached by a Frustrated User who has a pressing need to vent.

"Rabbi, can you believe how stuck up and unfriendly these Programming Élites are?" The User asks. "I am struggling to keep my head above water with all the new frameworks and tools coming out daily, and the documenation for these libraries are terrible to non-existent, but when I post a polite question on Github about about a problem I'm having, someone closes my issue, tells me to learn programming basics & says I should read the source and improve the docs myself!

"I'm trying to understand the thing, how am I supposed to write the docs? And what an insulting thing to say, 'go read a book on programming.' Isn't this a terrible way to treat a beginner trying to ask a question & learn?"

The rabbi considers the matter for a few moments, then responds: "You're right."

The User leaves satisfied, but the rabbi is approached by second person. "Rabbi," says The Maintainer, "I overheard your conversation, and I want to tell you different story.

"I maintain a very popular library, for free, and every day I get feature demands, people getting angry at me, people expecting to be spoon-fed answers without reading the documenation or the source code, and thinking someone else is going to do the work of fixing bugs, writing documenation, creating new features, and all the other work of maintaining an open source library, when in fact it's their job just as much as mine. Don't you agree these users are horribly entitled?"

The rabbi thinks for a moment then looks at The Maintainer and says "You're right." The Maintainer, satisfied he's won the rabbi to his way of thinking, walks off towards the speaker lounge.

A vendor, having overheard both exchanges, calls the rabbi over to his booth. "Rabbi, you just told The User he was right, but then you turned around and told The Maintainer he was right—they can't both be right!

The rabbi thinks a minute. "You're right!"

Nota Bene

I drafted this post in January of 2016 and didn't get around to posting it for a few years. If you'd like it to be topical for 2018, pretend one party is the developer of a popular open source project who's frustrated about people making money off the software and not contributing, and the other is a SAAS vendor who feels that if the software is free it's free, and the developer needs to live with that choice, or any other two parties who can't both be right.

]]>
Conference Tipshttps://sequoia.makes.software/conference-tips/Some thoughts on getting the most out of tech conferences.https://sequoia.makes.software/conference-tips/Mon, 15 Jan 2018 05:00:00 GMTI've travelled to a number of tech conferences over the years and learned a few things along the way. Here are some of the strategies I've developed for getting the most out of a conference:

  1. 🥗 Bring Food
  2. 🏃‍ Pace Yourself!
  3. 👩‍💻 Put Your Laptop Away
  4. 📝 Take Notes on Paper
  5. 🚶‍♀️ Don't Be Afraid to Leave A Talk
  6. ⌛️ Don't Linger
  7. 💬 Hallway Track

🥗 Bring Food

Conferences are usually catered, but many catering services, well, leave something to be desired. In particular, breakfasts can be rough: pastries, fruit, more pastries... you get the idea. Eating a healthy breakfast that makes you feel good is more than just a luxury when travelling: it can be the difference between having a good morning and a crummy one. Take care of your body at meal-time and it will take care of you later!

When I arrive in a new town for a conference that I will be spending more that one night at, immediately after checking into the hotel I head to the grocery store for:

  • Fruit
  • Bread
  • Cheese
  • Apples
  • Yogurt

Plane travel, being in a new place, eating out three times a day—this is all hard on your body! Not to mention your wallet and brain. Sometimes sitting quietly and eating a simple meal beats another hour of socializing over beers & burgers. That's where the fruit, bread, and cheese come in: when you need food and a quiet break, you have a meal ready in your hotel room.

To summarize:

  1. Don't be 100% reliant on conference catering
  2. Have a meal or two on hand to save money and brainwaves when you're tired & hungry

🏃‍♀️ Pace Yourself!

The first couple conferences I went to, I had my whole day planned out: this talk at 9:00am, that one at 10:00, then 11:00, 12:00, 1:00, 2:00... this is crazy! Remember: there's a limit to how much you can take in. If you do attend eight talks in a day, it's unlikely you'll be able to really focus on what you're hearing at all of them. Rather than power through eight talks in a day, pick four and give each of them your full attention—you'll get more out of it overall.

The FOMO is strong, but no matter how hard you try I guarantee you'll be missing something. So take my advice and stop worrying about it! Rather than stressing about everything you "missed," you'll have a much better time if you set realistic expectations and give yourself breaks. So:

  1. Limit the number of talks per day to something reasonable (for me this is about 4 talks)
  2. Don't worry about missing talks! There are always more talks. 😊
  3. Take a break when you're feeling tired or frazzled. You'll enjoy the rest of the day a lot more!

What to do when you're not in talks?

  • Get some exercise at the hotel gym or walking outside
  • If you're tired: take a nap
  • Follow up on notes or questions gathered at talks
  • Hallway track

👨‍💻 Put Your Laptop Away

It's tempting to google the topic being presented, check twitter, email, try to run the code examples etc.. You do not need to travel to a conference to do this. As such, it is (in my opinion) a poor use of conference time. If you're watching a talk, tune in and pay attention! If the talk is really so uninteresting that you don't feel the need to pay attention, why stay? There are usually better places to sit and work than an auditorium seat.

📝 Take Notes on Paper

Taking notes is a great way to remember what you heard, note things you want to look up later, and to capture follow-up questions. So take out your laptop and fire up Evernote, right? Wrong! When you get out your computer, it's really hard to stick to just taking notes. When you're taking notes on your computer, and you have a question, and the talk is a bit slow, and it will only take just a second to find the answer on google...

Carrying a pen or pencil and a notebook is a great strategy for those of us who are easily distracted by the wide internet. If you've never tried this remove-the-temptation strategy, you may be surprised how much more you get you get out of a talk when you give it 100% of your attention. Paper notes may help you retain the information better, and they're useful whilst chatting at the after-party: peek at your notebook to easily recall insights and questions. (Bonus: it makes you look super organized! 😄)

🚶‍♀️ Don't Be Afraid to Leave A Talk

I used to feel like I had to "commit to" a session I was watching, or that if I missed the beginning of a session it was pointless to join late. Not true! If you get five minutes into watching a talk and realize it's not what you expected or you just change your mind, quietly slip out & try another talk.

I'll admit that as a speaker, it's not my favorite thing to see people walking out of a talk I'm giving. However, I know that you're at the conference to learn new things, not to flatter conference speakers. You don't owe it to anyone to sit through a talk. Furthermore, if my talk is bad and everyone sits through it just to be polite, I'll never know it's bad. Do speakers the courtesy of giving them honest feedback: if a talk is bad, don't sit through it, be honest and walk out! You'll be doing yourself and the speaker a favor.

⌛️ Don't Linger

This one really comes down to personal preference, but I've found that sticking around for the conference closing party or staying an extra day to visit the host city is rarely worth the extra time in the hotel. I used to extend trips a night or two for this reason, but I am usually so exhausted after a conference it's hard to enjoy being a tourist, and I've found that the value of sleeping in my own bed sooner almost always outweighs the value of another evening of schmoozing.

There is definitely a point of diminishing returns at the end of a conference: the crowd starts to thin out, the vendors pack up, and eventually there's naught left but a lone nerd, tinkering with a new framework at an empty buffet table, or wandering the vendor floor aimlessly with her sponsor shirt and enormous backpack. It's a bit depressing, frankly. 😛 Don't feel the need to stay 'til the bitter end!

These days I try to get a flight out as close to after-closing-ceremonies as possible, or a bit before if the other option is staying an extra night.

💬 Hallway Track

You've probably heard this old chestnut, but it bears repeating: there's a lot more value in conferences than what you get from attending talks. Networking, trading tips, finding job leads, and making new friends: these are all things you'll find on "the hallway track," i.e. by hanging out in the hallway, chatting with your peers. Conferences are the best venue for networking (read: "finding work") I've found, but you won't access this value if you're in talks all day. To get the most out of your experience, be sure to make time for the hallway track!

The End

I hope at least one of these tips has been useful to you! If you have feedback or can think of one of the many points I missed here, please do send a comment and I'll add it below. Happy conferencing!

]]>
Session Management with Microserviceshttps://sequoia.makes.software/session-management-with-microservices/Microservices make some tasks easier and introduce some challenges where they didn't exist before. In this post we'll look at sharing sessions across microservices on the `now` platform. https://sequoia.makes.software/session-management-with-microservices/Mon, 29 May 2017 04:00:00 GMTThe microservice architecture is the New Hot Thing in server application architecture and it presents various benefits, including ease of scaling and the ability to use multiple programming languages across one application. As we know, however, there's no such thing as free lunch! This flexibility comes with costs and presents some challenges that are not present in classic "monolith" applications. In this post we'll examine one such challenge: sharing sessions across services.

Sharing Sessions

When we split authentication off from a "monolith" application, we have two challenges to contend with:

  1. Sharing cookies between the auth server(s) and application server(s) On one server on one domain, this was not an issue. With multiple servers on multiple domains, it is. We'll address this challenge by running all servers under one domain and proxying to the various servers. (Don't worry, it's easier than it sounds!)
  2. Sharing a session store across server(s) With a single monolith, we can write sessions to disk, store them in memory, or write them to a database running on the same container. This won't work if we want to be able to scale our application server to many instances as they will not share memory or a local filesystem. We'll address this challenge by externalizing our session store and sharing it across instances.

For the purposes of demonstrating session sharing, we'll be creating two simple servers: writer, our "auth" server that sets and modifies sessions, and reader, our "application" server that checks login and reads sessions. Code for this demo can be found here: https://github.com/Sequoia/sharing-cookies.

NB: You may be thinking "let's use JWTs! They are stateless and circumvent the cookie sharing issue completely." Using JWTs to reimplement sessions is a bad idea for various reasons, so we won't be doing it here

Setting up "Auth" Server

In order to share sessions across servers, we'll use an external redis server to store session info. I'm using a free redis instance from https://redislabs.com/ for this demo.

Setup

Here we set up an express server with redis-based session tracking and run our server on port 8090.

// writer/index.js
const express = require('express');
const session = require('express-session');
const RedisStore = require('connect-redis')(session);
const app = express();

const redisOptions = {
  url : process.env.REDIS_SESSION_URL
}

const sessionOptions = {
  store: new RedisStore(redisOptions),
  secret: process.env.SESSION_SECRET,
  logErrors: true,
  unset: 'destroy'
}

app.use(session(sessionOptions));

app.listen(8090, function(){
  console.log('WRITE server listening');
});

Environment Variables

Our application relies on REDIS_SESSION_URL and SESSION_SECRET being available as environment variables. These are externalized both for security and to allow us to share these values across different application instances.

Routes

For our demo, our express-based auth server will have three paths:

  1. /login: set a user session.

    app.get('/login', function(req, res){
      // .. insert auth logic here .. //
      if(!req.session.user){
        req.session.user = {
          id : Math.random()
        };
      }
    
      res.json({
        message : 'you are now logged in',
        user : req.session.user
      });
    });
    
  2. /increment: increment a counter on the session (update session data)
    app.get('/increment', function incrementCounter(req, res){
      if(req.session.count){
        req.session.count++;
      }else{
        req.session.count = 1;
      }
      res.json({
        message : 'Incremented Count',
        count: req.session.count
      });
    });
    
  3. /logout: destroy a session
    app.get('/logout', function destroySession(req, res){
      if(req.session){
        req.session.destroy(function done(){
          res.json({
            message: 'logged out : count reset'
          });
        });
      }
    });
    

Running the Server

Our server is set up to run via npm start in our package.json file:

...
  "scripts" : {
    "start" : "node index.js"
  }
...

We start by running npm run with the appropriate environment variables set. There are many ways to set environment variables, here we will simply pass them at startup time:

$ REDIS_SESSION_URL=redis://hostname:port?password=s3cr3t SESSION_SECRET='abc123' npm start

Now, assuming redis connected properly, we can start testing our URLS

GET localhost:8090/login:

{
  "message": "you are now logged in",
  "user": {
    "id": 0.36535326065695717
  }
}

GET localhost:8090/increment

{
  "message": "Incremented Count",
  "count": 1
}

It works! To verify that the session is independent of the server instance, you can try shutting down the server, restarting it, and checking that your user.id and count remain intact.

Checking our Session

We can see our sessions in redis by connecting with the redis-cli:

$ redis-cli -h <host> -p <port> -a <password>
host:43798> keys *
1) "sess:q5t7q67lzOsCJDca-kvT63Yk6n6kVvpL"
host:43798> get "sess:q5t7q67lzOsCJDca-kvT63Yk6n6kVvpL"
"{\"cookie\":{\"originalMaxAge\":null,\"expires\":null,\"httpOnly\":true,\"path\":\"/\"},\"user\":{\"id\":0.36535326065695717},\"count\":1}"

Setting up Our "App" Server

The application (reader) server has one single path:

  1. /: read current count.

The server setup code is the same as above, with the exception that our server is run on 8080 rather than 8090 so we can run both locally at the same time.

Requiring Login

In order to ensure users who hit our "application" server have logged in, we'll add a middleware that checks that the session is set and it has a user key:

// reader/index.js
app.use(function checkSession(req, res, next){
  if(!req.session.user){
    //alternately: res.redirect('/login')
    return res.json(403, {
      'message' : 'Please go "log in!" (set up your session)',
      'login': '/login'
    });
  }else{
    next();
  }
});

Then we'll add our single route:

// reader/index.js
app.get('/', function displayCount(req, res){
  res.json({
    user : req.session.user,
    count: req.session.count
  })
});

Running the server

Start this server as we started the other:

  1. Pass the appropriate environment variables
  2. npm run

Now we can check that it works:

GET localhost:8080

{
  "user": {
    "id": 0.36535326065695717
  },
  "count": 1
}

Try it from a private tab or different browser, where we haven't yet logged in:

GET localhost:8080

{
  "message": "Please go \"log in!\" (set up your session)",
  "login":"/login"
}

It works!

I thought Cookies Couldn't Be Shared?!

In fact, browsers do not take port number into consideration when determining what the host is and what cookies belong to that host! This means that we can run our auth server locally on :8090 and the app server on :8080 and they can share cookies, as long as we use the hostname localhost for both!

Deploying

This works fine locally, now let's see it in The Cloud. We'll be using https://zeit.co/now for hosting. now is microservice oriented hosting platform that allows us to easily deploy Node.js applications and compose application instances to work together, so it's a great choice for this demo!

now expects node.js applications to start with npm start, luckily we've already configured our application to do that, so all that's left to do is to deploy it!

$ cd writer
$ now     # missing environment variables...
> Deploying ~/projects/demos/sharing-cookies/writer under sequoia
> Using Node.js 7.10.0 (default)
> Ready! https://writer-xyz.now.sh (copied to clipboard) [1s]
> Synced 2 files (1.19kB) [2s]
> Initializing…
> Building
...

This will deploy our application to now, but it won't actually work, because the application will not have the environment variables it needs. We can fix this by putting the environment variables in a file called .env (that we do not check in to git!!!) and passing that file as a parameter to now. It will read the file and load those variables into the environment of our deployment.

# .env
REDIS_SESSION_URL="your redis url here"
SESSION_SECRET="abc123"
$ echo '.env' >> ../.gitignore  # important!!
$ now --dotenv=../.env
> Deploying ~/projects/demos/sharing-cookies/writer under sequoia
> Using Node.js 7.10.0 (default)
> Ready! https://writer-gkdldldejq.now.sh (copied to clipboard) [1s]
> Synced 2 files (1.19kB) [2s]
> Initializing…
> Building

Once the command finishes, we can load that URL in our browser:

GET https://writer-gkdldldejq.now.sh/login

{
"message": "you are now logged in",
  "user": {
    "id": 0.31483764592524177
  }
}

Deploying the Application Server (reader)

We repeat the above steps in our /reader directory, passing the same .env file to now --dotenv...

$ cd ../reader
$ now --dotenv=../.env
> Deploying ~/projects/demos/sharing-cookies/reader under sequoia
> Using Node.js 7.10.0 (default)
> Ready! reader-irdrsmayqv.now.sh (copied to clipboard) [1s]
> Synced 2 files (1.19kB) [2s]
> Initializing…
> Building
...

Once it's done we check via our browser...

GET https://reader-irdrsmayqv.now.sh

{
  "message": "Please go \"log in!\" (set up your session)",
  "login": "/login"
}

We're not logged in! What happened?

We noted above that in order to share sessions, we needed to share two things:

  1. A session store (redis)
  2. Cookies

Because our servers run on different domains now, we're not sharing cookies. We'll fix that with a simple reverse-proxy set up now refers to as "aliases."

Aliases

We want both of our applications running on the same domain so they can share cookies (as well as other reasons including avoiding extra DNS lookups and obviating the need for CORS headers). now allows aliasing to any arbitrary subdomain under now.sh, and I've chosen counter-demo.now.sh for this post.

We want routing to work as follows:

  • /: application server (https://reader-irdrsmayqv.now.sh/)
  • login, increment, logout: "auth" server (https://writer-gkdldldejq.now.sh/)

To configure multiple forwarding rules for one "alias" (domain), we'll first define them in a json file:

{
  "rules" : [
    { "pathname" : "/login", "dest" : "writer-gkdldldejq.now.sh" },
    { "pathname" : "/increment", "dest" : "writer-gkdldldejq.now.sh" },
    { "pathname" : "/logout", "dest" : "writer-gkdldldejq.now.sh" },
    { "dest" : "reader-irdrsmayqv.now.sh" }
  ]
}

We pass these to now alias using the --rules switch, along with our desired subdomain:

$ now alias counter-demo.now.sh --rules=./now-aliases.json
> Success! 3 rules configured for counter-demo.now.sh [1s]

Now to try it out:

logging into counter-demo.now.sh

It works! Two servers running two separate applications, each sharing sessions and cookies.

Next Steps

This is a rudimentary reverse proxy set up, but with this in place we can...

  1. Deploy new versions of our application and switch the alias to point to them, allowing us to switch back if there's a problem
  2. Run any number of applications in different containers (yes, docker containers!) while still presenting one face to the client
  3. Proxy requests through to external servers (a server run by your IT department) or external services (AWS lambda etc.), still presenting a single domain to the client
  4. Scale our application server up or down (now scale reader-irdrsmayqv.now.sh 2) without breaking our session management system
  5. Turn the whole thing off and turn it back on again without disrupting user sessions.

Now go try it out! https://github.com/Sequoia/sharing-cookies

Comments

Really nice article, but I think the last part (aliases) should be longer and more in-depth. The current implementation is dependent on now.sh's particular feature and the actual mechanism isn't detailed. It would be great if you provided more implementations for the aliasing with different servers (like apache or nginx), so we could build a production environment without using now.sh. What do you think?

- Semmu

Thanks, Semmu! It's true, the approach described here is dependant on now.sh's aliasing feature, and yes, there are certainly other ways to do it! I featured now.sh here in part because it is very simple to use and explain. An explanation of how to tie this together with nginx (I'd pick it over Apache for this use-case) would be useful! I don't have such an explanation on hand but I'll try to write a blog post in the future describing reverse proxying with nginx. Thanks for the comment!

Hi Sequoia, Great article! A downside I see from sharing the same session storage is the coupling between the services. In your example, if someone decides to use a different web framework (like Rails) or even a different version of express js, the session format created by the services might not be compatible anymore. In other words, we would be giving up on the tech-agnostic benefit that microservices are supposed to provide. I see two possible solutions to this problem:

  1. Make the session format standard across all the microservices and implement the standard in libraries for each language (instead of using express-session)
  2. Have every microservice use a sidecar container that writes and reads sessions from the shared session storage. This sidecar container will return the session in a standard JSON format.

Please, let me know if you see other solutions or if I have any faulty assumption on my analysis. Thanks,

Arturo

- Arturo

Thank you Arturo for the thoughtful feedback! I would agree that any time that you share any data between systems, each system will need to be designed to accommodate that format of data, and sessions is no exception. For this example of sharing session data, I think creating a "standard" format for session data would be overkill, as the concept is the same regardless of the format of the session data or the specific tools used in each service.

Even if this were production, I would use out-of-the-box Express session to start and keep the system as simple as possible. I would consider making a cross-framework session format only at the point where that became an actual requirement, and not a minute sooner! At that point, I'd tweak the easiest-to-tweak system to fit the format of the other one. Only once there were three or four different systems that all needed to share sessions would I consider a system as complex as a sidecar container (which, incidentally, would force you off the Node deploys on Now.sh and onto Docker deploys).

Thank you again for your well-considered feedback, and don't forget to Keep It Simple!

]]>
What is "JavaScript?" Part 2: Solutionshttps://sequoia.makes.software/what-is-javascript-part-2-solutions/"Bleeding edge" is all well and good, but can't we at least opt-in? Here's one weird trick to bring some transparency into the JavaScript module ecosystem with regard to new language features… https://sequoia.makes.software/what-is-javascript-part-2-solutions/Thu, 02 Mar 2017 05:00:00 GMTIn my last post I outlined my concerns about lack of visibility into the incorporation of experimental features into popular JavaScript libraries. In short, the problems are:

  1. The lack of a clear, standard indicator of when a library incorporates experimental language features
  2. The inability to consciously opt in to using these features

In this post I'll outline my proposals for addressing these issues.

Proposal 1a: minimum-proposal-stage

This idea is lifted from Composer, which has minimum-stability property for projects. My idea is as follows:

  1. Libraries indicate if they are using experimental language features and from what stage
  2. Project authors specify the lowest proposal stage they're comfortable with
  3. npm install warns if you are installing a library with features newer than desired

For example, if you only want "finished" (Stage 4) or higher features in your project, you add the following to your package.json:

 "minimum-proposal-stage" : 4

Aurelia would indicate that it incorporates a Stage 2 proposed/experimental feature (decorators) by adding the following to its package.json files:

"lowest-proposal-stage" : 2

Upon attempting to install Aurelia, npm would warn you that the library's lowest-proposal-stage is lower than your minimum-proposal-stage. Basically: "hey! You're about to install a library with language features more experimental than you might be comfortable with!"

Pros

  • More granularity: This solution allows me to say "I am comfortable with Stage 4 (finalized, but not yet released) features, but nothing below that."

Cons

  • Requires users to learn more about the TC39 feature proposal process (perhaps this is actually a "pro"?)
  • Would require libraries to update their lowest-proposal-stage property as features are adopted into the language

Proposal 1b: maximum-ecmascript-version

This is like above, but pegged to ECMAScript versions.

Example: in my project, I don't want code newer than ES7 (the current released standard at the time of this writing), i.e. I don't want unreleased features:

    "maximum-ecmascript-version" : 7

In the library's package file, they indicate that the library incorporates features that do not exist in any current version of ECMAScript:

    "ecmascript-version" : "experimental"

npm would warn me before installing this package. This one, on the other hand, would install without complaint:

    "ecmascript-version" : 5

because the ecmascript-version is lower than my maximum.

Pros

  • Simpler, easier to understand
  • Released ES versions do not change, no need to update ecmascript-version if it's set to a released version

Cons

  • Not possible to allow some proposal stages but forbid others with this system

The two systems could also be used in conjunction with one another; I won't go into that possibility here.

Possible Solution 2: Badges

Add badges to README.md files to indicate whether experimental features are used in the library. Here are some sample badges that use the proposal stage names rather than numbers:

(Please excuse the largeness of these badges)

Alternately, the language version could be used:

Pros

  • Does not require tooling (npm) updates—authors can start doing this right away
  • Human readable

Cons

  • Not machine readable: npm cannot alert you if you attempt to install something with features less stable than you prefer

Conclusion

Change is good, but stability is also good. Everyone should be able to easily choose to use or not use the latest and greatest JavaScript features and proposed features. Increasing visibility into experimental feature dependencies will...

  1. Give users a better understanding of the JavaScript feature development process and what it means for something to be "JavaScript"
  2. Allow users to consciously opt-in to using experimental language features
  3. Allow those who prioritize stability to opt-out of using experimental language features
  4. Give frazzled, overwhelmed users a bit of solid ground to stand on in the Churning Sea of JavaScript
  5. Make JavaScript look a bit less scary to enterprise organizations

Please let me know what you think with a comment (below) or on hackernews.

]]>
What is "JavaScript?" Part 1: The Problemhttps://sequoia.makes.software/what-is-javascript-part-1-the-problem/In the age of Babel, babel-plugins, and widespread transpilation, what does it mean for something to be JavaScript, and why does that question matter? https://sequoia.makes.software/what-is-javascript-part-1-the-problem/Thu, 02 Mar 2017 05:00:00 GMTAs Babel took over the JavaScript scene, it became possible to use features from the newest ECMAScript specification before browsers (or Node) had implemented them. It also became possible to use proposed ECMAScript features before they'd been finalized and officially incorporated into ECMAScript. While this allowed lots of exciting new developments, it introduced a good bit of confusion as well. Can you tell me which of the following is "just JavaScript/ECMAScript"?

  1. let n = { x, y, ...z };
  2. Promise.resolve(123).then(::console.log);
  3. Promise.resolve(2).finally(() => {})
  4. @observable title = "";
  5. '[1...10]'.forEach(num => console.log(num))

If you said "none of these are 'just JavaScript'," you were right! The first four are proposed features. Number five is a feature from another language, but you can use it with babel!

Feature Proposals

In order for new features to land in the ECMAScript specification, they must go through several proposal stages, as described here. The difference between JS and most other ecosystems is that in most ecosystems, language features must exist in the specification before they are incorporated into userland code. Not so JavaScript! With babel, you can start using Stage 3 ("Candidate"), Stage 1 ("Proposal"), or even Stage 0 ("Strawman") features in production right away, before they are finalized.

What does it mean for a feature proposal to be at Stage 2 ("Draft")? According to the TC39, it means the feature implementations are "experimental," and "incremental" changes to the feature can be expected. Basically, this means the behavior of the proposed feature may change before the feature is finalized.

This is great for those who want to live on the edge, but what about those of us who must prioritize stability over bleeding-edgeness? Can we just stick to finalized features and avoid experimental ones? It is possible, but it's not as simple as you might expect...

Libraries and Feature-Confusion

The fuzzy boundary between what "is JavaScript" and what are "JavaScript feature proposals" creates a lot of ambiguity and confusion. It's common to mistakenly refer to any "new JavaScript feature" as ES6, ES7, ES.Next or ES2016, more or less interchangeably. Unfortunately, authors of many popular JavaScript libraries do just this, exacerbating the misunderstanding. I'll pick on two lovely library authors here because they are very cool people & I'm sure they know I don't mean it as a personal criticism. ^_^

Exhibit A: Mobx

I recently found myself looking into new JavaScript libraries and I encountered some syntax I was not familiar with in JavaScript:

class Todo {
    id = Math.random();
    @observable title = "";
    @observable finished = false;
}

@observable? Huh! That looks like an annotation from Java. I didn't know those existed in the current language specification. It took looking it up to find out that it is in fact not JavaScript as currently specified, but a proposed feature. (In fairness, Mobx does explain that this feature is "ES.Next", but that term is vaguely defined and often used to refer to ES6 or ES7 features as well.)

Exhibit B: Aurelia

From the website (emphasis added):

What is it?

Well, it's actually simple. Aurelia is just JavaScript. However, it's not yesterday's JavaScript, but the JavaScript of tomorrow. By using modern tooling we've been able to write Aurelia from the ground up in ECMAScript 2016. This means we have native modules, classes, decorators and more at our disposal...and you have them too.

Well, now we know: decorators were added to JavaScript in the ES2016 language specification. Just one problem... no they weren't!!! Decorators are still a Stage 2 feature proposal. Aurelia is not "just JavaScript," it's "JavaScript [plus some collection of experimental language features]"

So What?

This matters because as anyone involved the JavaScript ecosystem these days knows, "it's hard to keep up with all the latest developments" is probably the #1 complaint about the ecosystem. This causes users to throw their hands up, exasperated, and it causes enterprise organizations to avoid JavaScript altogether. Why invest in a platform where it's difficult to even ascertain what the boundaries of the language are?

Also, as mentioned above, these features are not officially stable. This means that if you write code depending on the current (proposed) version of the feature, that code may stop working when the feature is finalized. While you may consider this an acceptable risk, I assure you there are many users and organizations that do not. Currently, making an informed decision to opt-in to using these experimental features is difficult and requires a high level of expertise—users must be able to identify each new feature & manually check where it is in the proposal or release phase. This is especially challenging for organizations for whom JavaScript is not a core-competency.

Finally, (this is my own opinion) it's just plain annoying to constantly encounter unfamiliar language syntax and be left wondering "Is this JavaScript? Is this Typescript? Is this JSX? Is this..." I don't want to have to google "javascript ::" to figure out what the heck that new syntax is and whether it's current JavaScript, a feature proposal, a super-lang, or Just Some Random Thing Someone Wrote a Babel Transform For.

Why Does in Matter if a Lib Uses Experimental Features Internally?

This probably does not matter if the exposed interfaces do not use or require the use of experimental language features. A library could be written in JavaScript, Coffescript or Typescript as long as the dist/ is plain JavaScript. Annotations are an example of an experimental feature that some libraries encourage or require the use of in user code. Further, some libraries do not distribute a build artifact, instead directing users to npm install the source and build locally. In these cases, there is the potential for breakage if draft specifications of experimental features change, and warning users of this is warranted (IMO).

Are You Saying No One Should Use Experimental Features?

No! By all means, use them! All I'm saying is that it would be useful to be able to make an informed choice to opt-in to using experimental features. That way, organizations that prefer stability can say "no thank you" and users who want to be on the bleeding edge can keep bleeding just as they're doing today.

Composer has a mechanism to allow users to allow or disallow unstable versions of dependencies from being installed and it does not prevent people from using unstable releases, it merely gives them the choice to op-in or out.

An added benefit of increasing visibility into experimental feature use would be to help users understand the TC39 process. Currently there is not enough understanding of what it means for something to be ES6, or ES7, or Proposal Stage 2, as evidenced by the way these terms are thrown around willy-nilly.

In my next post I'll go over my proposals for addressing this issue.

Comments

Thank you for this post. You are doing the Lord's work

- Uzo Olisemeka,

Thanks Uzo! I really like your post on the subject as well, especially this point: "On a language level, overlap is a problem. If there’s more than 3 ways of doing a thing, I must know all three and everyone I’m working with must know all three."

]]>
Let's Code It: Static Site Generator with Rx.jshttps://sequoia.makes.software/lets-code-it-static-site-generator-with-rxjs/Watching a filesystem for changes and building an input (markdown) file to output (HTML) on each change? This sounds like a job for... Observables!https://sequoia.makes.software/lets-code-it-static-site-generator-with-rxjs/Thu, 19 Jan 2017 05:00:00 GMTLast post, we went over building a Static Site Generator (SSG) in Node.js. We used Promises for flow control, and that worked for reading each Markdown input file, transforming it into HTML, and writing that to disk, once per file. What if we instead of running this build process once per input file, we want it to run once per input file every time that input file is created or changed?

If our goal is to map a sequence of events over time (file creation or modification) to one or more operations (building Markdown to HTML and writing to disk), it's very likely Observables are a good fit! In this blog post, we'll look at how to use Observables and RX.js to create a SSG with built-in, incremental watch rebuilds, and with with multiple output streams (individual posts and blog index page).

This post loosely follows the demo project here, so if you prefer to look at all the code at once (or run it) you can do so.

What is RX.js

Observables can be confusing so reading a more detailed intro is advisable. Here I'll give a simplified explanation of Observables that is inadequate to understand them fully, but will hopefully be enough for this blog post!

As alluded to, Observables are a tool for modeling and working with events over time. One way to conceptualize Observables if you are familiar with Promises is to think of an Observable as a Promise that can emit multiple values. A Promise has the following "things it can do:"

  1. Emit an error (and settle the Promise)
  2. Emit a value (and settle the Promise)

The things an Observable can do are:

  1. Emit an error (like reject)
  2. Emit a value
  3. Emit a "complete" notification

The key difference is that that emitting a value and "settling" (called "completing" in RX.js Observables) are split into separate actions in Observables, and because emitting a value does not "complete" an Observable, it can be done over and over.

To illustrate this comparison further, with code, let us imagine a new utility method for constructing Promises called Promise.create. It behaves the same as the Promise constructor, but the signature of its function argument is slightly different.

// Promise constructor
const p1 = new Promise(function settle(resolve, reject){
  if(foo){ resolve(value); }
  else{ reject('Error!'); } 
});

p1.then(console.log);

// Promise.create (imaginary API)
const p2 = Promise.create(function settle(settler){
  if(foo){ settler.resolve(value); }
  else{ settler.reject('Error!'); } 
});

p2.then(console.log);

As you can see, Promise.create takes a settle function which receives an object with resolve and reject methods, rather than separate resolve and reject functions. From here it is a short step to Rx.Observable.create:

const o = Rx.Observable.create(function subscribe(subscriber) {
  try{
    while(let foo = getNextFoo()){
      subscriber.next(foo);   // emit next value 
    }
    subscriber.complete();    // emit "complete"
  }
  catch(e){
    subscriber.error(e);      // emit "error"
  }
});

When you want to use the results of a Promise, you attach a function via .then. With an Observable, when you wish to use the results to produce side effects, you attach an "Observer" via .subscribe:

myPromise.then(foo => console.log('foo is %s', foo))
 .catch(e => console.error(e));

myObservable.subscribe({
  next: foo => console.log('next foo is %s', foo),
  error: e => console.error(e),
  complete: () => console.log('All done! No more foo.')
})

Observers and Subscribers

In the style of programming with Observables that RX.js allows, tasks are often conceptualized as being composed of two parts: Observables (inputs), and Subscriptions or side effects (outputs). Side effects describe what you want to ultimately do with the values from an observable.

If you describe your goals as side effects, you can work backward to figure out what sort of Observables you need to create to provide the values those side effects need. For example, if you wish a counter to be incremented each time a button is clicked, "increment the counter" is the side effect, and you need an Observable of button clicks for that increment function to "subscribe" to.

For our static site generator, we have the following high-level goals:

  1. Write to disk an updated HTML version of each post:
    1. When the post input file is created
    2. When the post input file is changed and it results in different output
  2. Write to disk an updated blog index page:
    1. When we start our script
    2. Each time a post's metadata (title, description, etc.) is changed

The Observables we can map to each of these goals are:

  1. parsedAndRenderedPosts$: emits post output each time a post is created or changed. Subscribe to this and write the new post contents to disk on each emit.
  2. latestPostMetadata$: emits a collection of the latest metadata when the script starts or the metadata for a post changes. Subscribe to this and write the rendered index page to disk on each emit.

Each of these two Observables will be composed of or created as a result of other Observables. As we build these up, we'll learn about different methods Rx.js has for creating and transforming Observables. Let's begin!

Goal 1: Write Posts

We know each of our Observables should emit based on file changes and additions, so we'll start by creating an Observable of file changes and additions called changesAndAdditions$. The chokidar module can be used to create an event emitter that emits change and add events on filesystem changes, so let's start there:

const chokidar = require('chokidar');
const dirWatcher = chokidar.watch('./_posts/*.md');

We want to create Observables of file changes & additions so we can manipulate & combine them with Rx.js. Rx.js provides a utility method to create an Observable from EventEmitter by event name. We are interested in the add event and the change events, so let's use fromEvent to create Observables of them:

const Rx = require('rxjs');
// note: `add` is emitted for each file on startup, when chokidar first scans the directory
const newFiles$     = Rx.Observable.fromEvent(dirWatcher, 'add');
const changedFiles$ = Rx.Observable.fromEvent(dirWatcher, 'change');

Now newFiles$ will emit a new value (a filename) when dirWatcher emits an add event, and changedFiles$ behaves similarly with change events. We can create an observable of both of these events by using .merge.

const changesAndAdditions$ = newFiles$.merge(changedFiles$);

Mapping Filename to File Contents

To get the file contents, we can .map the name of the file to the contents of that file by using a function that reads files. Reading a file is (typically) an asynchronous operation. If we were using Promises, we might write a function that takes a filename and returns a Promise that will emit the file contents. Similarly, using Observables, we use a function that takes a filename and returns an Observable that will emit the file's contents.

Just as Promise.promisify will convert a callback based function to one that returns a Promise, Rx.Observable.bindNodeCallback converts a callback based function to one that returns an Observable:

const fs = require('fs');
const readFileAsObservable = Rx.Observable.bindNodeCallback(fs.readFile);

const fileContents$ = changesAndAdditions$
  .map(readFileAsObservable) // map filename observable of file contents
  .mergeAll();               // Unwrap Observable<"file contents"> to get "file contents"

fileContents$
  .subscribe(content => console.log(content)); // log contents of each file

Now we'll log the contents of each file as it is created or changed. Let's take a closer look at our use of .mergeAll: readFileAsObservable is a function that takes a String (filename) as input and returns an Observable<String> (an observable of the "file contents" string).

This means that by mapping changesAndAdditions$ over readFileAsObservable, we took an Observable<String> (an observable of strings, namely, file names) and converted each String value to a new Observable<String>. This means we have Observable<Observable<String>>: an Observable of Observables of Strings.

We don't actually want an Observable of file contents, we want filename in, file contents out. For this reason we use .mergeAll to "unwrap" the file contents strings from the inner Observables as they are emitted. If you are confused by this, don't worry: it is in fact confusing! For now it's only important to understand that .mergeAll converts Observable<Observable<String>> to Observable<String>, so we can process the string (in this case file contents).

NB: Mapping a value to an observable then unwrapping that inner observable as we’ve done here is an extremely common operation in Rx.js, and can be achieved using the .mergeMap(fn) shorthand, which is the equivalent of .map(fn).mergeAll().

Emitting Only When Contents is Changed

When our script starts, newFiles$ will emit each filename once when chokidar first scans our _posts directory, and this will be merged into changesAndAdditions$. While editing a post in your text editor, each time you "save" the Markdown file, changedFiles$ will emit the filename, regardless of whether the contents of the file actually changed. If you hit ^S ten times in a row, changesAndAdditions$ will emit that filename 10 times and we'll read the file 10 times.

If the file contents hasn't changed, we don't want to send it down the pipe to be parsed, templated, and written as an updated HTML file-- we only want to do this latter processing (right now just console.log(contents)) if the contents are actually different from the last contents that were emitted. Luckily, Rx.js has a method for this built in: .distinctUntilChanged will emit a value one time, but will not emit again until the value changes. That means if a file is saved 10 times with the same contents, it will emit the file contents the first time and drop the rest.

const latestFileContents$ = fileContents$.distinctUntilChanged();

latestFileContents$.subscribe(content => console.log(content));

Now we'll only see file contents logged if it's different from the last contents that were emitted. There's a logic problem here, however. Consider the following scenario:

  1. Save foo.md
    1. changesAndAdditions$ emits "foo.md"
    2. fileContents$ emits "contents of foo.md"
    3. last value (null) is distinct from "contents of foo.md"
    4. last value updated to "contents of foo.md"
    5. latestFileContents$ emits "contents of foo.md"
  2. Save bar.md
    1. changesAndAdditions$ emits "bar.md"
    2. fileContents$ emits "contents of BAR.md"
    3. last value ("contents of foo.md") is distinct from "contents of BAR.md"
    4. last value updated to "contents of BAR.md"
    5. latestFileContents$ emits "contents of BAR.md"
  3. Save foo.md again
    1. changesAndAdditions$ emits "foo.md"
    2. fileContents$ emits "contents of foo.md"
    3. last value ("contents of BAR.md") is distinct from "contents of foo.md"
    4. last value updated to "contents of foo.md"
    5. latestFileContents$ emits "contents of foo.md"

As you can see, the contents of the two files never changes, but the latestFileContents$ considers it "changed" because it's different from the last value, which was from the other file. The solution is to create an observable of file contents that is distinct until changed for each file, so the new contents of foo.md are compared to the last contents of foo.md, regardless of whether bar.md was changed since then. This is a bit more complicated than merging the newFiles$ and changedFiles$ Observables, but it's doable!

Because we want one observable of file changes per file, we must perform the "read file & see if it changed" per file, not on a merged stream of all files. The plan of attack is as follows: For each add event (new file created or read on startup)...

  1. Create an Observable of change events for this file only
  2. Start that Observable by emitting the filename once (for the add event)
  3. Map the filename to an Observable of the file contents (as above)
  4. Use .mergeAll to unwrap Observable from step 3
  5. Emit contents only if it's distinct from the last contents
// for each added file...  
const latestFileContents$ = newFiles$.map(addedName => {

  // 1. create Observable of file changes...
  const singleFileChangesAndAdditions$ = changedFiles$
  // ...only taking those for THIS file
    .filter(changedName => changedName === addedName)
  // 2. emit filename once to start (on "add")
    .startWith(addedName);

  const singleFileLatestContents$ = singleFileChangesAndAdditions$
  // 3. map the filename to an observable of the file contents
    .map(filename => readFileAsObservable(filename, 'utf-8'))
  // 4. Merge the Observable<Observable<file contents>> to Observable<file contents>
    .mergeAll()
  // 5. don't emit unless the file contents actually changed
    .distinctUntilChanged();

  // 6. return an observable of changes per added filename 
  return singleFileLatestContents$;

})
.mergeAll(); //unwrap per-file Observable of changes 

We're using .mergeAll twice because we're mapping strings to Observables twice:

  1. filename string from changedFiles$ mapped to an observable of file contents in step 4
  2. filename string from newFiles$ mapped to an observable returned in step 6

Because we go Observable<String> to Observable<Observable<String>> twice, we have to reverse the process with .mergeAll twice.

Since we have one singleFileChangesAndAdditions$ observable per file added, we are able to perform the "map filename to contents and compare with last value" check per file. latestFileContents$ can still be consumed as it was before.

Templating and Writing HTML to Disk

That was a lot, but it's the bulk of the Rx.js logic for our "write blog posts" goal. Now that we have an Observable that emits the contents of our Markdown blog posts each time they change, we can map that over our frontmatter, markdown parsing, template, and write-to-disk functions much as we did before with Promises. We'll start by creating a few utility functions as before:

const md = require('markdown-it')();
const frontmatter = require('frontmatter');
const pug = require('pug');

const writeFileAsObservable = Rx.Observable.bindNodeCallback(fs.writeFile);
const renderPost = pug.compileFile(`${__dirname}/templates/post.pug`);

// IN:  { content, data : { title, description, ...} }
// OUT: { content, title, description, ... }
function flattenPost(post){
  return Object.assign({}, post.data, { content : post.content });
}

// parse markdown to HTML then send the whole post object to the template function
function markdownAndTemplate(post){
  post.body = md.render(post.content);
  post.rendered = renderPost(post);   //send `post` to pug render function for post template
  return post;
}

// take post object with:
// 1. `slug` (e.g. "four-es6-tips") to build file name and
// 2. `rendered` contents: the finished HTML for the post
// write this to disk & output error or success message
function writePost(post){
  var outfile = path.join(__dirname, 'out', `${post.slug}.html`);
  writeFileAsObservable(outfile, post.rendered)
    .subscribe({
      next: () => console.log('wrote ' + outfile),
      error: console.error
    });
}

NB: see the previous post for details on frontmatter, md.render, etc.

We use our Observable utility functions to string them together:

latestFileContents$
  .map(frontmatter)        // trim & parse frontmatter
  .map(flattenPost)        // format the post for Pug templating
  .map(markdownAndTemplate)// render markdown & render template
  .subscribe(writePost);

Now we have a working, Rx.js version of our Static Site Generator that does the same as it did with Promises, but with built-in file watch and rebuild!

animated gif illustrating running application live-updating output HTML on markdown edits

On to our next goal, the index page...

Goal 2: Write Index Page with Latest Post Metadata

Our index page template, index.pug:

html
  head
    title Welcome to my blog!
  body
    h1 Blog Posts:

    //- Output h2 with link & paragraph tag with description
    for post in posts
      h2: a.title(href='/' + post.slug + '.html')= post.title
      p= post.description

The data our template expects must be structured thus:

{
  posts : [
    { title:"Intro To Rx.js", slug: "intro-to-rx-js", description: "..."},
    { title:"Post Two", slug: "post-2", description: "..."},
    //...
  ]
}

An Observable of Post Metadata

Earlier, we mapped the latestFileContents$ over the frontmatter function. We need to use that Observable for our index page as well, so let's modify our code from above to capture that Observable and set it aside:

const postsAndMetadata$ = latestFileContents$
  .map(frontmatter);

postsAndMetadata$ //same as before:
  .map(flattenPost)
  .map(markdownAndTemplate)
  .subscribe(writePost);

The frontmatter function returns an object with data and contents keys, but we only need the value of data for our index page template so we'll pluck that property from the object:

const metadata$ = postsAndMetadata$
  .pluck('data');

At this point we have an Observable that emits latest metadata for a file when that file is created or saved. We need to transform our Observable ("collection over time") to an array ("collection over space"). Rx.js's has a reduce method that can do this, but it waits for an Observable to "complete" before emitting one final "reduced" value, and our file-watching $metadata Observable never "completes."

We need a way to aggregate values into an accumulator like reduce does, but that emits the new accumulator value on each iteration so we don't have to wait for the "complete" that will never come. Rx.js has a method called .scan that does just this:

const metadataMap$ = metadata$
  .scan(function(acc, post){
    acc[post.slug] = post;
    return acc;
  }, {});

By making the slug the keys in the acc object, there will be only one property per post. When we first start our script, we'll get a post object from $metadata with the slug post-2 and add it to the accumulator as acc['post-2'].

When post two is updated and saved, its metadata will be sent to .scan again, but it won't add a new key to acc: it will overwrite the existing acc['post-2']. In this way, metadataMap$ will emit an object containing the latest metadata for all posts, with one key per post. The output will look thus:

{
  'intro-to-rx-js' : { title, description, slug },
  'post-2' : { title, description, slug }
}

Transforming the Data for Pug.

We now have an object with an entry for each post, but this does not match the format we outlined above ({ posts : [ post, post, post ] }). In the next two steps we can transform the object into an array and then insert it into a wrapper object:

const indexTemplateData$ = metadataMap$
  .map(function getValuesAsArray(postsObject){ // (or Object.values in ES2017)
    // IN: { 'slug' : postObj, 'slug2' : postObj2, ... }
    return Object.keys(postsObject)
      .reduce(function(acc, key){
        acc.push(postsObject[key]);
        return acc;
      }, []);
    // OUT: [postObj, postObj2, ...]
  })
  .map(function formatForTemplate(postsArray){
    return {
      posts : postsArray
    };
  })

Reducing Repetition and Noise

Now we get have an Observable that emits the latest listing of post metadata, formatted for the index.pug template, on each add or change event. This isn't quite what we want, however, for two reasons.

First: in the course of editing a post, most of your changes will be to content, not to the metadata. Content changes don't affect the index page, so we want to drop any emissions from indexTemplateData$ where the data is the same as the previous emission. This is another case where .distinctUntilChanged comes in handy:

const distinctITD$ = indexTemplateData$
  .distinctUntilChanged(function compareStringified(last, current){
    //true = NOT distinct; false = DISTINCT 
    return JSON.stringify(last) === JSON.stringify(current);
  });

We pass a comparator function to distinctUntilChanged this time because formatForTemplate (above) returns a newly created object each time-- the new object it emits will always be "distinct" from the last one, even if their contents are identical. We stringify the last and current objects in order to compare their contents and emit only when they differ.

Second: when we first start our script, it reads each file and emits its contents once. This means that while files are initially being read, indexTemplateData$ will emit a bunch of incomplete objects consisting of whatever posts have been read so far. If we have 4 posts, emissions will look like this:

  1. { post1 }
  2. { post1, post2 }
  3. { post1, post2, post3 }
  4. { post1, post2, post3, post4 }

Only the last version represents a collection of metadata from all pages; the others can be ignored. In order to get around the flood of events on indexTemplateData$ on startup, we'll use .debounceTime, which will wait until an Observable stops emitting for a fixed amount of time before emitting the latest result:

const distinctDebouncedITD$ = distinctITD$
  .debounceTime(100); // wait 'til the observable STOPS emitting for 100ms, then emit latest

This probably isn't the most graceful solution, but when indexTemplateData$ gets its initial flood of emissions, distinctDebouncedITD$ will only emit once, once it's finished.

Writing the Index Page

The only thing left is to pass the values from distinctDebouncedITD$ to the template rendering function then write the results to disk:

const renderIndex = pug.compileFile(`${__dirname}/templates/index.pug`);

function writeIndexPage(indexPage){
  var outfile = path.join(__dirname, 'out', 'index.html');
  writeFileAsObservable(outfile, indexPage)
    .subscribe({
      next: () => console.log('wrote ' + outfile),
      error: console.error
    });
}

postsListing$
  .map(renderIndex)
  .subscribe(writeIndexPage);

Now index.html will be rewritten when we edit a post, but only if the metadata changed:

animated gif illustrating running application live-updating index page but only with metadata changes

That's it!

Conclusion and Next Steps

If you made it this far, congratulations! My goal was not to explain each Rx.js concept introduced herein in detail, but to walk through the process of using Rx.js to complete a real-world programming task. I hope this was useful! If this post has piqued your interest, I highly recommend running the full version of the code this post was based on, which you can find in this repository. As always, if you have any questions or Rx.js corrections please feel free to contact me. Happy coding!

Comments

This is a great overview - thank you! What are you using to do those awesome terminal gifs?

- Mark,

Thanks, Mark! I use licecap to make gifs on my Macintosh. Tips to keep the sizes/zoom consistent:

  • Use a tiling tool like Spectacle to tile to e.g. ¼ screen
  • Use the same screen resolution when recording these gifs. If you switch from desktop to laptop you'll get different zoom

See my post on avoiding livecoding in demos for more such tricks!

]]>
Let's Code It: Static Site Generatorhttps://sequoia.makes.software/lets-code-it-static-site-generator/Markdown in, HTML out... do we really need a framework for this? I Don't Think So!!https://sequoia.makes.software/lets-code-it-static-site-generator/Thu, 05 Jan 2017 05:00:00 GMTTraditionally, if you wanted to create a blog or website that you can update easily without having to directly edit HTML, you'd use a tool like Wordpress. The basic flow for serving a website from a CMS like Wordpress is as follows:

  1. Store content (e.g. "posts") in a database
  2. Store display configuration (templates, CSS, etc.) separately
  3. When a visitor requests a page, run a script to...
    1. Pull the content from the database
    2. Read the appropriate template
    3. Put them together to build page HTML
    4. Send HTML to the user

Enter Static Site Generators

It occurred to some people that it didn't make sense to run step three every single time someone hit a page on their site. If step three (combining template with page content) were done in batch beforehand, all of the site's pages could be stored on disk and served from a static server! An application that takes this approach, generating "static" webpages and storing them as flat HTML files, is referred to as a Static Site Generator (or SSG). An SSG has the following benefits over a CMS:

  1. It eliminates the need to run a database server
  2. It eliminates the need to execute PHP or any application logic on the server
  3. It allows the site to be served from a highly performant file server like NGINX...
  4. ...or any service that offers free static hosting (namely Github Pages)
  5. Content written as flat markdown files can easily be tracked in a git repo & collaborated on thus

Points one and two dramatically reduce the attack surface of a web server, which is great for security. Point three (in conjunction with one and two) allows for greater site reliability and allows a server to handle much more traffic without crashing. Point four is very attractive from a cost perspective (as are one, two, and three if you're paying for hosting). The benefits of static site generators are clear, which is why many organization and individuals are using them, including the publisher of this blog and the author of this post!

OK, Let's Use an SSG

There are many available SSG tools, one hundred and sixty two listed on a site that tracks such tools at the time of writing. One of the reasons there are so many options is that building an SSG isn't terribly complicated. The core functionality is:

  1. Read markdown files (content)
  2. Parse frontmatter (we'll look at this more later)
  3. Convert markdown to HTML
  4. Highlight code snippets
  5. Insert the content into the appropriate template and render page HTML
  6. Write this HTML content to disk

I've simplified the process a bit here, but overall, this is a pretty straightforward programming task. Given libraries to do the heavy lifting of parsing markdown, highlighting code, etc., all that's left is the "read input files, process, write output files."

So can we write our own static site generator in Node.js? In this blog post we'll step through each of the steps outlined above to create the skeleton of an SSG. We'll skip over some non-page-generation tasks such as organizing images & CSS, but there's enough here to give you a good overview of what an SSG does. Let's get started!

Building an SSG

1. Read Markdown Files

No Wordpress means no WYSIWYG editor, so we'll be authoring our posts in a text editor. Like most static site generators, we will store our page content as Markdown files. Markdown is a lightweight markup alternative to HTML that's designed to be easy to type, human readable, and typically used to author content that will ultimately be converted to and published as HTML, so it's ideal for our purpose here. A post written in markdown might look like this:

# The Hotdog Dilemma

*Are hotdogs sandwiches*? There are [many people](https://en.wikipedia.org/wiki/Weasel_word) who say they are, including:

* Cecelia
* Donald
* James

## Further Evidence
... etc. ...

We'll put our posts in a directory called _posts. This will be like the "Posts" table in a traditional CMS, in the sense that it's where we'll look up our content when it's time to generate the site.

To read each file in the _posts directory, we need to list all the files, then read each one in turn. The node-dir package that does that for us, but the API isn't quite what we need, however, as it's callback based and oriented towards getting file names rather than compiling a array of all file contents. Creating a wrapper function that returns a Bluebird promise containing an array of all file contents is tangential to the topic of this post, but let's imagine we've done so and we have an API that looks like this:

getFiles('_posts', {match: /.*\.md/})
  .then(posts){
    posts.forEach(function(contents){
      console.log('post contents:');
      console.log(contents);
    })
  });

Because we're using Bluebird promises and our Promise result is an array, we can map over it directly:

getFiles('_posts', {match: /.*\.md/})
  .map(function processPost(content){
    // ... process the post
    // ... return processed version
  })
  .map(nextProcessingFunction)
  //...

This set up will make it easy to write functions to transform out input to our output step by step, and apply those functions, in order, to each post.

2. Parse Frontmatter

In a traditional CMS, the Posts table holds not just the contents of the post, but also metadata such as its title, author, publish date, and perhaps a permanent URL or canonical link. This metadata is used both on the post page or in a page <title> and on index pages. In our flat-file system, all the information for a post must be contained in the markdown file for that post. We'll use the same solution for this challenge that is used by Jekyll and others: YAML frontmatter.

YAML is a data serialization format that's basically like JSON but lighter weight. It looks like this:

key: value
author: Sequoia McDowell
Object:
  key: http://example.com
  wikipedia: https://wikipedia.com
List:
  - First
  - Second
  - Third

"Frontmatter" on Markdown files is an idea borrowed from Jekyll. Very simply, it means putting a block of YAML at the top of your markdown file containing metadata for that file. The SSG separates this YAML data from the rest of the file (the contents) and parses it for use in generating the page for that post. With YAML frontmatter, our post looks like this:

---
title: The Hotdog Dilemma
author: Sequester McDaniels
description: Are hotdogs sandwiches? You won't believe the answer!
path: the-hotdog-dilemma.html
---

*Are hotdogs sandwiches*? There are [many people](https://en.wikipedia.org/wiki/Weasel_word) who say they are, including:

...

Trimming this bit of YAML from the top of our post and parsing it is easy with front-matter, the node package that does exactly this! That means this step is as simple as npm installing the library and adding it to our pipeline:

const getFiles = require('./lib/getFiles');
const frontmatter = require('front-matter');

getFiles('_posts', {match: /.*\.md/})
  .map(frontmatter) // => { data, content }
  .map(function(parsedPost){
    console.log(post.data.title);   // "The Hotdog Dilemma"
    console.log(post.data.author);  // "Sequester McDaniels"
    console.log(post.content);      // "*Are hotdogs sandwiches*? There are [many people](https: ..."
  });

Now that our metadata is parsed and removed from the rest of the markdown content, we can work on converting the markdown to HTML.

3. Convert Markdown to HTML

As mentioned, Markdown is a markup language that provides an easy, flexible way to mark up documents text in a human readable way. It was created by John Gruber in 2004 and introduced in a blog post that serves as the de-facto standard for the markdown format. This blog post would go on to be referenced by others who wished to build markdown parsers in Ruby, Javascript, PHP, and other languages.

The problem with having only a "de-facto" standard for a format like markdown is that this means there is no actual, detailed standard. The result is that over the years different markdown parsers introduced their own quirks and differences in parsing behavior, as well as extensions for things like checklists or fenced code blocks. The upshot is this: there is no single "markdown" format-- the markdown you write for one parser may not be be rendered the same by another parser.

In response to this ambiguity, the CommonMark standard was created to provide "a strongly defined, highly compatible specification of Markdown." This means that if you use a CommonMark compatible parser in JavaScript and later switch to a CommonMark compatible parser in Ruby, you should get the exact same output.

The main JavaScript implementation of CommonMark is markdown-it, which is what we'll use:

const getFiles = require('./lib/getFiles');
const frontmatter = require('front-matter');
const md = require('markdown-it')('commonmark');

function convertMarkdown(post){
  post.content = md.render(post.content);
  return post;
}

getFiles('_posts', {match: /.*\.md/})
  .map(frontmatter) // => { data, content:md }
  .map(convertMarkdown) // => { data, content:html }
  .map(function(post){
    console.log(post.content);
    // "<p><em>Are hotdogs sandwiches</em>? There are <a href="proxy.php?url=https://en.wikipedia.org/wiki/Weasel_word">many people</a> who..."
  });

Now our markdown is HTML!

4. Highlight Code Snippets

We're writing a technical blog, so we want to display code with syntax highlighting. If I write:

Here's a *pretty good* function:

```js
function greet(name){
  return "Hello " + name;
}
```

It should be output thus:

<p>Here's a <em>pretty good</em> function:</p>

<pre><code class="language-js"><span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">greet</span>(<span class="hljs-params">name</span>)</span>{
  <span class="hljs-keyword">return</span> <span class="hljs-string">"Hello "</span> + name;
}
</code></pre>

These classes allow us to target each piece of the code (keywords, strings, function parameters, etc.) separately with CSS, as is being done throughout this blog post. The markdown-it docs suggest using highlight.js so that's what we'll do:

const getFiles = require('./lib/getFiles');
const frontmatter = require('front-matter');
const md = require('markdown-it')('commonmark', {
  highlight: function (str, lang) {
    // "language" is specified after the backticks:
    // ```js, ```html, ```css etc.
    // "str" is the contents of each fenced code block
    return hljs.highlight(lang, str).value;
  }
});

// ... unchanged ...

Now we can used fenced code blocks as above. We're almost there!

5. Templating

There are plenty of templating libraries in JavaScript; we'll use Pug (formerly "Jade") here. First we'll create a template for posts:

//templates/post.pug
- var thisYear = (new Date()).getFullYear();
doctype html
html(lang='en')
  head
    title= title
    meta(name='description', content=description)
  body
    h1= title 
    | !{content}
    footer &copy; #{author} #{thisYear}

We won't dwell on the Pug syntax, but the important bits here are where are our data is injected into the template. Note in particular:

  1. title= title for the <title> tag
  2. h1= title for the page header
  3. | !{content} to output page contents, directly in the body, without escaping HTML

Next we must create a function that uses this template file to render a "post object" to HTML.

//...
const pug = require('pug');
const postRenderer = pug.compileFile('./templates/post.pug');

//function for our posts promise pipeline:
function renderPost(){
  post.content = postRenderer(post);
  return post;
}   

We'll also need a function to flatten the post object for Pug's consumption

// IN:  { content, data : { title, description, ...} }
// OUT: { content, title, description, ... }
function flattenPost(post){
  return Object.assign({}, post.data, { content : post.content });
}

Now we can plug these two new functions into our pipeline

//...

getFiles('_posts', {match: /.*\.md/})
  .map(frontmatter) // => { data, content:md }
  .map(convertMarkdown) // => { data, content:html }
  .map(flattenPost)
  .map(renderPost)
  .map(post => {
    console.log(post.content); // '<!DOCTYPE html><html lang="en"><head><title> ...'
    console.log(post.path);    // 'the-hotdog-dilemma.html'
  })

Finally we're at the last step: writing posts to an output directory.

6. Writing HTML output

We're going to write our HTML files to a directory named out. This will contain the final output, ready to publish to a web server. Our function should, for each post, write the post.content to a path specified by post.path. Since we're using Bluebird already, we'll use the promisified version of the file system API.

//...
const Promise = require('bluebird');
const fs = Promise.promisifyAll(require('fs'));
const path = require('path');
const outdir = './out'

function writeHTML(post){
  return fs.writeFileAsync(path.join(outdir, post.path), post.content);
} 

Putting it All Together

Now we have a script that fulfills all of our original goals.

// requires...
// utility functions...

//Read posts & generate HTML:

getFiles('_posts', {match: /.*\.md/}) // 1
  .map(frontmatter)                   // 2
  .map(convertMarkdown)               // 3
  .map(flattenPost)
  .map(renderPost)                    // 4, 5
  .map(writeHTML)                     // 6
  .then(function(){
    console.log('done!');
  })
  .catch(function(e){
    console.error('there was an error!')
    console.error(e)
  });

That's it!

Conclusion and Next Steps

There is a lot we did not go over in this post, such as generating an index page, file watching and automatic re-running and publishing*, but this post shows the basics of static site generation, and how the main logic can be captured on just a few dozen lines. (Admittedly, my production version is a bit more complex.)

By writing your own tool you miss out out on the reusability of existing tools, but you gain full control over your blog build and less reliance on a third party tool you don't control. For me, the tradeoff of effort for control was worth it. Perhaps it is for you too!

* My next post will go over those features and more, so stay tuned!

]]>
Interactive Debugging with Node.jshttps://sequoia.makes.software/interactive-debugging-with-nodejs/Interactive Debuggers are familiar to every Java developer, but they are much less well known in the JavaScript world. If you're a JavaScript developer who hasn't experienced the power of step-thru debugging, read this post to find out what you're missing!https://sequoia.makes.software/interactive-debugging-with-nodejs/Wed, 09 Nov 2016 05:00:00 GMTA "step through debugger" is a powerful tool that is very handy when your application isn't behaving the way you expect it to. A step through debugger (a.k.a. "interactive debugger" or just "debugger") allows you to pause code execution in your application in order to:

  • inspect or alter application state
  • see the code execution path ("call stack") that lead to the currently executing line of code, and
  • inspect the application state at earlier points on that path

Interactive Debuggers* are familiar to every Java developer (among others), but they are much less well known in the JavaScript world. This is unfortunate, both because debuggers can be so helpful in diagnosing logic issues, and because the debugging tools in JavaScript today are the best & easiest to use they've ever been! This post will introduce the Node.js debugging tools in VS Code in a way that's accessible to programmers who have never used a debugger before.

Why Use a Debugger?

Debuggers are useful when writing your own, original code, but they really show their value when you're working with an unfamiliar codebase. Being able to step through the code execution line by line and function by function can save hours of poring over source code, trying to step through it in your head.

The ability to change the value of a variable at runtime allows you to play through different scenarios without hardcoding values or restarting your application. Conditional breakpoints let you halt execution upon encountering an error to figure out how you got there. Even if using a debugger isn't part of your everyday process, knowing how they work adds a powerful tool to your toolbox!

Setting Up

We'll be using VS Code, which has built-in Node.js debugging capabilities. In order to run code in the context of the VS Code debugger, we must first make VS Code aware of our Node.js project. For this demo I'll create a simple express app with express-generator, you can follow along by running the commands below:

$ npm install express-generator -g
$ express test-app   # create an application
$ cd test-app        
$ npm install
$ code .             # start VS Code

With VS Code open, we need to open the Debug pane by clicking the bug icon in left sidebar menu. With the debug pane open, you may note that the gear icon at the top of the pane has a red dot over it. This is because there are currently no "launch configurations:" configuration objects which tell VS Code how to run your application. Click the gear icon, select "Node.js," and VS Code will generate a boilerplate launch configuration for you.

There are two ways to attach the VS Code debugger to your application:

  1. Set up a launch configuration and launch your app from within VS Code
  2. Start your app from the console with node --debug-brk your-app.js and run the "Attach" launch configuration

The first approach normally requires some setup (which is beyond the scope of this post but you can read about here), but because our package.json has a run script, VS Code automagically created a launch configuration based on that script. That means we should be able to simply click the green "run" button to start our application in the debugger.

debug "launch" button with mouse pointer over it

If all went well, you should see a new toolbar at the top of your screen with pause and play buttons (among others). The Debug Console at the bottom of the screen should tell you what command it ran & what output that command yielded, something like this:

node --debug-brk=18764 --nolazy bin/www 
Debugger listening on port 18764

When we load http://localhost:3000 in a browser, we can see the Express log messages in this same Debug Console:

GET / 304 327.916 ms - -
GET /stylesheets/style.css 304 1.313 ms - -

Now that we have the code running with the VS Code debugger attached, let's set some breakpoints and start stepping through our code!

Breakpoints

A breakpoint is a marker on a line of code that tells the debugger "pause execution here." To set a breakpoint in VS Code, click the gutter just to the left of the line number. I've opened routes/index.js in order to set a breakpoint in the root request listener:

setting breakpoint on line 6 of routes/index.js

Note that the breakpoints pane at the bottom left has a listing for this new breakpoint (along with entries for exceptions, which we'll talk about momentarily). Now, when I hit http://localhost:3000 in a browser again, VS Code will pause at this point & allow me to examine what's going on a that point:

VS Code paused on breakpoint

With the code paused here, we can examine variables and their values in the variables pane and see how we got to this point in the code in the call stack pane. You may also have noticed that the browser has not loaded the page-- that's because it's still waiting for our server to respond! We'll take a look at each of the sidebar panes in turn, but for now, I'll press the play button to allow code execution to continue.

VS Code play button press

Now the server should send the finished page to the browser. Note that the with code execution resumed, the "play" button is no longer enabled.

Other types of breakpoints

In addition to breaking on a certain line each time it's executed, you can add dynamic breakpoints that pause execution only in certain circumstances. Here's a few of the more useful ones:

  1. Conditional Breakpoint: After setting a breakpoint, right click on it & select "edit breakpoint," this will allow you to enter an expression to conditionally activate a breakpoint. For example, if you wanted to activate a breakpoint only if the user is an admin, you might add user.role === "admin" to your conditional breakpoint.
  2. Uncaught Exception: This is enabled by default. With this enabled, you don't have to set any breakpoints in order to locate errors, the debugger will pause on any (uncaught) exceptions.
  3. All Exceptions: If you have robust error handling in your application, but you still want to see where errors are coming from before they're caught and handled, enable this setting. Be warned, however, that many libraries throw and catch errors internally in the normal course of their execution, so this can be pretty noisy.

Variable pane

In this pane, you can examine and change variables in the running application. Let's edit our homepage route in routes/index.js to make the title a variable:

/* GET home page. */
var ourTitle = 'Express';
router.get('/', function(req, res, next) {
  res.render('index', { title: ourTitle });
});

After editing our code, we'll need to restart the debugger so it picks up the new code. We can do this by clicking the green circle/arrow button in the top toolbar. After editing a file with a breakpoint already set and restarting the debugger (as we just did), you'll also want to check that your breakpoints are still in the right spot. VS Code does a pretty good job of keeping the breakpoint on the line you expect but it's not perfect.

With our breakpoint on what's now line 7 and with the debugger restarted, let's refresh our browser. The debugger should stop on line seven. We don't see ourTitle in the variable pane right away, because it's not "Local" to that function, but expand the "Closure" section just below the "Local" section and there it is!

variable pane with the closure section expanded showing variable named "ourTitle" with value "Express"

Double-clicking ourTitle in the Variables Pane allows us to edit it. This is a great way to tinker with your application and see what happens if you switch a flag from true to false, change a user's role, or do something else-- all without having to alter the actual application code or restart your application!

The variable pane is also a great way to poke around and see what's available in objects created by libraries or other code. For example, under "Local" we can see the req object, see that its type is IncomingMessage, and by expanding it we can see the originalUrl, headers, and various other properties and methods.

Stepping

Sometimes, rather than just pausing the application, examining or altering a value, and setting it running again, you want to see what's happening in your code line by line: what function is calling which and and how that's changing the application state. This is where the "Debug Actions" menu comes in: it's the bar at the top of the screen with the playback buttons. We've used the continue (green arrow) and restart (green circle arrow) buttons so far, and you can hover over the others to see the names and associated keyboard shortcuts for each. The buttons are, from left to right:

  • Continue/Pause: Resume execution (when paused) or pause execution.
  • Step over: Execute the current line and move to the next line. Use this button to step through a file line by line.
  • Step in: When paused on a function call, you can use this button to step into that function. This can get a bit confusing if there are multiple function calls on one line, so just play around with it.
  • Step out: Run the current function to its return statement & step out to the line of code that invoked that function.
  • Restart: Stop your debugging session (kill your application) and start it again from the beginning. Use this after altering code.
  • Stop: Kill your application.

Watch Expressions

While stepping through your code, there may be certain values you always want to see the current value of. A "watch expression" will run (in the current scope!) at each paused/stopped position in your code & display the return value of that expression. Hover over the Watch Expression pane and click the plus to add an expression. I want to see the user agent header of each request as well as ourTitle, whether the response object has had headers sent, and the value of 1 + 1, just for good measure, so I'll add the following watch expressions:

req.headers['user-agent']
ourTitle
res._headerSent
1 + 1

When I refresh the browser the debugger pauses once again at the breakpoint on line 7 and we can see the result of each expression:

watch expressions with values

Call Stack

The Call Stack Pane shows us the function calls that got us to the current position in the code when execution is paused, and allow us to step back up that stack and examine the application state in earlier "frames." By clicking the frame below the current frame you can jump to the code that called the current function. In our case, the current frame is labeled (anonymous function) in index.js [7], and the one before that the handle function in layer.js, which is a component of the Express framework:

call stack with "handle" frame selected

Note that the request handling function is unnamed, hence "(anonymous function)." "Anonymous function?!" What's that? Who knows! Moral: always name your functions!

Stepping down into the Express framework is not something I do every day, but when you absolutely need to understand how you got to where you are, the Call Stack Pane is very useful!

One especially interesting use of the Call Stack Pane is to examine variables at earlier points in your code's execution. By clicking up through the stack, you can see what variables those earlier functions had in their scope, as well as see the state of any global variables at that point in execution.

All This and More...

There are many more features of the interactive debugger than I went over here, but this is enough to get you started. If you want to learn more, take a look at the excellent documentation from Microsoft on the VS Code Debugger and using it with Node.js. Oh, and I should probably mention that all the debugging features outlined here (and more) are built-in to Firefox as well as Chrome, should you wish to use them on browser-based code. Happy Debugging!

* There's no specific term I've found for this common collection of application debugging tools so I'm using the term "interactive debugging" in this article.

]]>
The Node.js Debug Module: Advanced Usagehttps://sequoia.makes.software/the-nodejs-debug-module-advanced-usage/So you're familiar with the `debug` node module. Let's take a look at some more advanced uses and useful tricks!https://sequoia.makes.software/the-nodejs-debug-module-advanced-usage/Wed, 12 Oct 2016 04:00:00 GMTIn a previous post, I mentioned having used the debug module to help me understand some complex interactions between events in Leaflet & Leaflet.Editable. Before we go over that, however, let's lay the groundwork with a couple organizational tips that makes debug easier to use. This post assumes you have either used debug or read the previous post, so please do one of those first!

Namespacing Debug Functions

The debug module has a great namespacing feature which allows you to enable or disable debug functions in groups. It is very simple-- namespaces are separated by colons:

debug('app:meta')('config loaded')
debug('app:database')('querying db...');
debug('app:database')('got results!', results);

Enable debug functions in Node by passing the name the process via the DEBUG environment variable. The following would enable the database debug function but not meta:

$ DEBUG='app:database' node app.js

To enable both, list both names, comma separated:

$ DEBUG='app:database,app:meta' node app.js

Alternately, use a "splat" (*) to enable any debugger in that namespace. The following enables any debug function whose name starts with app::

$ DEBUG='app:*' node app.js

You can get as granular as you want with debug namespaces...

debug('myapp:thirdparty:identica:auth')('success!');
debug('myapp:thirdparty:twitter:auth')('success!');

...but don't overdo it. Personally, I try not to go deeper than two or sometimes three levels.

More Namespace Tricks

The "splat" character * can match a namespace at any level when enabling a debug function. Given the two debug functions above above, you can enable both thus:

$ DEBUG='myapp:thirdparty:*:auth' node app.js

The * here will match identica, twitter, or any other string.

It's frequently useful to enable all debug functions in a namespace with the exception of one or two. Let's assume we have separate debug functions for each HTTP status code that our app response with (a weird use of debug, but why not!):

const OK = debug('HTTP:200');
const MOVED = debug('HTTP:301');
const FOUND = debug('HTTP:302');
const UNAUTHORIZED = debug('HTTP:403');
const NOTFOUND = debug('HTTP:404');
// etc.

We can turn them all on with HTTP:*, but it turns out that 200 comes up way too frequently so we want it turned off. The - prefix operator can be used to explicitly disable a single debugger. Here, we'll enable all debuggers in this namespace then disable just HTTP:200:

$ DEBUG='HTTP:*,-HTTP:200' node app.js

Externalizing Debug Functions

debug() is factory function, and when you call it it returns another function, which can be called to actually write to the console (more specifically, STDERR in Node.js):

debug('abc');        // creates function, doesn't write anything 
debug('foo')('bar'); // writes `foo: bar` (assuming that debugger is enabled)

If we want to reuse this debugger, we can assign the function to a variable:

var fooLogger = debug('foo');

fooLogger('bar');                    // writes `foo: bar`
fooLogger('opening pod bay door...') // writes `foo: opening pod bay door...`

While it's easy to create one-off debug functions as needed as in the first example, it's important to remember that the debug module does not write anything unless that particular debugger is enabled. If your fellow developer does not know you created a debugger with the name foo, she cannot know to turn it on! Furthermore, she may create a debugger with the name foo as well, not knowing you're already using that name. For this reasons (read: discoverability), it's useful to group all such debug logging functions in one file, and export them from there:

// lib/debuggers.js
const debug = require('debug');

const init = debug('app:init');
const menu = debug('app:menu');
const db = debug('app:database');
const http = debug('app:http')

module.exports = {
  init, menu, db, http
};

NB: using ES2015 object property shorthand above

This way we can discover all available debuggers and reuse debuggers across files. For example, if we access the database in customer.js & we wish to log the query, we can import that debugger & use it there:

// models/customer.js
const debugDB = require('../lib/debuggers').db;
// ...

debugDB(`looking up user by ID: ${userid}`);
db.Customer.findById(userid)
  .tap(result => debugDB('customer lookup result', result))
  .then(processCustomer)
//.then(...)

NB: using the Bluebird promises library's tap above.

We can later use the same debugger in another file, perhaps with other debuggers as well:

// config.js
debugDB = require('../lib/debuggers').db;
debugInit = require('../lib/debuggers').init;
// ...

debugInit('configuring application...');

if(process.env !== 'DEV'){
  debugInit('env not DEV, loading configs from DB');
  debugDB('reading site config from database');
  db.Config.find()
    .tap(debugDB)
    .then(config){
      configureApp(config);
    }
}else{
  debugInit('local environment: reading config from file');
  // ...
}

Then when we're confused why the app fails on startup on our local machine, we can enable app:init (or app:*) and see the following in our console...

app:init env not DEV, loading configs from DB +1ms

...and quickly discover that a missing environment variable is what's causing our issue.

Debugging All (known) Events on an Event Emitter

Background

My goal was to run my newFeatureAdded function whenever a user created a new "feature" on the map. (This example is browser-based, but the approach works just as well with Node.js EventEmiters.)

When I started, I attached my newFeatureAdded function to editable:created:

map.on('editable:created', function(e){
  newFeatureAdded(e.layer);
});

But it wasn't firing when I expected, so I added a debug function call to see what was going on:

map.on('editable:created', function(e){
  eventDebug('editable:created', e.layer);
  newFeatureAdded(e.layer);
});

This revealed that the event was fired when the user clicked "create new feature", not when they placed the feature on the map. I fixed the issue, but I found myself adding debug function calls all over the place, with almost every event handler function:

map.on('editable:drawing:commit', function(e){
  eventDebug('FIRED: editable:drawing:commit');
  handleDrawingCommit(e);
});

map.on('click', function(e){
  eventDebug('FIRED: click');
  disableAllEdits();
});

map.on('editable:vertex:clicked', function(e){
  eventDebug('FIRED: editable:vertex:clicked');
  handleVertexClick(e);
});

This is starting to look redundant, and doubly bad as it's forcing us to wrap our handler calls in extra anonymous functions rather than delegate to them directly, i.e. map.on('click', disableEdits). Furthermore, not knowing the event system well, I want to discover other events that fire at times that might be useful to me.

Another Approach...

In order to build my UI, I needed to understand the interactions between Leaflet's 35 events and Leaflet.Editable's 18 events, which overlap, trigger one another, and have somewhat ambiguous names (layeradd, dragend, editable:drawing:dragend, editable:drawing:end, editable:drawing:commit, editable:created etc.).

We could pore over the docs and source code to find the exact event we need for each eventuality... or we could attach debug loggers to all events and see what we see!

The approach is as follows:

  1. Create an array of all known events
  2. Create a debug function for each event
  3. Attach that function to the target event emitter using .on
// 1. Create list of events
const leafletEditableEvents = [
  'editable:created',
  'editable:enable',
  'editable:drawing:start',
  'editable:drawing:end',
  'editable:vertex:contextmenu',
// ...
];

const leafletEvents = [
  'click',
  'dblclick',
  'mousedown',
  'dragend',
  'layeradd',
  'layerremove',
// ...
];

Because we want to be able to use our event debugging tool on any event emitter, we'll make a function that takes the target object and events array as arguments:

function debugEvents(target, events){
  events
    // 2. Create debug function for each
    // (but keep the function name as well! we'll need it below)
    // return both as { name, debugger }
    .map(eventName => { return { name: eventName, debugger: debug(eventName) }; })
    // 3. Attach that function to the target
    .map(event => target.on(event.name, event.debugger));
}

debugEvents(mapObject, leafletEditableEvents);
debugEvents(mapObject, leafletEvents);

Assuming we set localStorage.debug='*' in our browser console, we will now see a debug statement in the console when any of the Leaflet.Editable events fire on the map object!

debugger output

Note that whatever data is passed to an event handler attached with .on() is passed to the our debug functions. In this case it's the event object created by Leaflet, shown above in the console as ▶ Object.

mousemove etc. are not in any namespace above, and it's best to always namespace debug functions so they don't collide, to add context, and to allow enabling/disabling by namespace. Let's improve our debugEvents function to use a namespace:

function debugEvents(target, events, namespace){
  events
    .map(eventName => { return {
      name: eventName,
      debugger: debug(`${namespace}:${eventName}`)
    } } )
    .map(event => target.on(event.name, event.debugger));
}

//editable events already prefixed with "editable", so "events:editable:..."
debugEvents(mapObject, leafletEditableEvents, 'event');
//map events not prefixed so we'll add `map`, so they're "events:map:..."
debugEvents(mapObject, leafletEvents, 'event:map');

We can enable all event debuggers in our console, or just editable events, or just core map events, thus:

> localStorage.debug = 'event:*'
> localStorage.debug = 'event:editable:*'
> localStorage.debug = 'event:map:*'

Conveniently, the Leaflet.Editable events are all already "namespaced" & colon separated, just like our debug namespaces!

> localStorage.debug = 'event:editable:*' //enable all editable
> localStorage.debug = 'event:editable:drawing:*'  //just editable:drawing events

Fine Tuning the output

Let's enable all event debuggers and see what some interactions look like...

gif of debugger output with rapidly flowing debug statments during user interaction with map. Lots and lots of "event:map:mousemove" events.

Looks nice, but the mousemove events are coming so fast they push everything else out of the console, i.e. they are noise. Some trial and error taught me it that drag events are equally noisy and that I don't need to know the core map events most of the time, just the editable events.

With this info we can tune our logging down to just what we need, enabling only editable: events & ignoring all drag & mousemove events:

> localStorage.debug = 'event:editable:*,-event:*:drag,-event:*:mousemove'

gif of debugger output with a smaller number of events. Console screen does not overflow

Looks good!

Conclusion

While debug is a very small & easy-to-get-started-with module, it can tuned in very granular ways and is a powerful development tool. By attaching debug statements to all events, outside of our application code, we can trace the path of an event system & better understand how events interact, without adding any debug statments into our application code. If you've found another novel use of this library or have any questions about my post, let me know. Happy logging!

NB: I use the term "debugger function" and "debug logging" rather than "debugger" and "debugging" in this post advisedly. A "debugger" typically refers to a tool that can be used to pause execution & alter the code at runtime, for example the VSCode debugger. What we're doing here is "logging."

]]>
Let's Code It: The `debug` Modulehttps://sequoia.makes.software/lets-code-it-the-debug-module/What if, instead of commenting out or deleting our useful log statements when we're not using them, we could turn them on when we need them and off when we don't? The `debug` module lets us do that-- but how does it work? Let's find out!https://sequoia.makes.software/lets-code-it-the-debug-module/Thu, 15 Sep 2016 04:00:00 GMTI did some fun stuff with the debug module recently for a web map project. I needed to understand the somewhat complex interactions between events in Leaflet.js in order to figure out what events to attach to... but that's the next post. Before I get to that, I want to go over the debug module itself.

A trip down memory lane...

console.log: the JavaScript programmer's oldest friend*. console.log was probably one of the first things you learned to use to debug JavaScript, and while there are plenty of more powerful tools, console.log is still useful to say "event fired", "sending the following query to the database...", etc..

So we write statements like console.log(`click fired on ${event.target}`). But then we're not working on that part of the application anymore and those log statements just make noise, so we delete them. But then we are working on that bit again later, so we put them back-- and this time when we're finished, we just comment them out, instead of moving them. Before we know it our code looks like this:

fs.readFile(usersJson, 'utf-8', function (err, contents){
  // console.log('reading', usersJson);
  if(err){ throw err; }
  var users = JSON.parse(contents);
  // console.log('User ids & names :');
  // console.log(users.map(user => [user.id, user.name]));
  users.forEach(function(user){
    db.accounts.findOne({id: user.id}, function(err, address){
      if(err){ throw err; }
      var filename = 'address' + address.id + '.json';
      // console.log(JSON.parse('address'));
      // console.log(`writing address file: ${filename}`)
      fs.writeFile(filename, 'utf-8', address, function(err){
        if(err){ throw err; }
        // console.log(filename + ' written successfully!');
      });
    });
  });
});

"There's got to be a better way!"

What if, instead of commenting out or deleting our useful log statements when we're not using them, we could turn them on when we need them and off when we don't? This is a pretty simple fix:

function log(...items){   //console.log can take multiple arguments!
  if(typeof DEBUG !== 'undefined' && DEBUG === true){
    console.log(...items)
  }
}

NB: Using ES6 features rest parameters and spread syntax in this function

Now we can replace our console.log() statements with log(), and by setting DEBUG=true or DEBUG=false in our code, we can turn logging on or off as needed! Hooray! Well, actually, there are still a couple problems...

Problem 1: Hardcoding

In our current system, DEBUG must be hardcoded, which is bad because

  1. it can't be enabled or disabled without editing the codebase
  2. it can accidentally be checked into our code repository enabled

We can fix that by setting DEBUG to true or false somewhere outside our script, and reading it in. In node it would make sense to use an environment variable:

const DEBUG = process.env.DEBUG; // read from environment

function log(...items){
// ...

Now we can export DEBUG=true on our dev machine to turn it on all the time. Alternately, we can turn /j #it on by setting an environment variable just for one process when we launch it (shell command below):

$ DEBUG=true node my-cool-script.js

If we want to use our debugger in the browser, we don't have process.env, but we do have localStorage:

var localEnv; //where do we read DEBUG from?

if(process && process.env){                //node
  localEnv = process.env;
}else if(window && window.localStorage) {  //browser
  localEnv = window.localStorage;
}

const DEBUG = localEnv.DEBUG;

function log(...items){
  // ...

Now we can set DEBUG in localStorage using our browser console...

> window.localStorage.DEBUG = true;

...reload the page, and debugging is enabled! Set window.localStorage.DEBUG to false & reload and it's disabled again.

Problem 2: All or Nothing

With our current setup, we can only chose "all log statements on" or "all log statements off." This is OK, but if we have a big application distinct parts, and we're having a database problem, it would be nice to just turn on database-related debug statements, but not others. If we only have one debugger and one debug on/off switch (DEBUG), this isn't possible, so we need:

  1. Multiple debug functions
  2. Multiple on/off switches

Let's tackle the second problem first. Instead of a boolean, let's make debug an array of keys, each representing a debugger we want turned on:

DEBUG = ['database'];        // just enable database debugger
DEBUG = ['database', 'http'];// enable database & http debuggers
DEBUG = undefined;           // don't enable any debuggers

We can't set arrays as environment variables, but we can set it to a string...

$ DEBUG=database,http node my-cool-script.js

...and it's easy to build an array from a string...

// process.env.DEBUG = 'database,http'
DEBUG = localEnv.DEBUG.split(',');

DEBUG === ['database', 'http'] // => true 

Now we have an array of keys for debuggers we want enabled. The simplest way to allow us to enable just http or just database debugging would be to add an argument to the log function, specifying which "key" each debug statement should be associated with:

function log(key, ...items){
  if(typeof DEBUG !== 'undefined' && DEBUG.includes(key)){ 
    console.log(...items)
  }
}

log('database','results recieved');             // using database key
log('http','route not found', request.url);     // using http key

NB: Array.prototype.includes only exists in newer environments.

Now we can enable enable and disable http and database debug logging separately! Passing a key each time is a bit tedious however, so let's revisit the proposed solution above, "Multiple debug functions." To create a logHttp function, we basically need a pass-through that takes a message and adds the http "key" before sending it to log:

function logHttp(...items){
  log('http', ...items);
}

logHttp('foo'); // --> log('http', 'foo');

Using higher-order functions (in this case a function that returns a function), we can make a "factory" to produce debugger functions bound to a certain key:

function makeLogger(fixedKey){
  return function(...items){
    log(fixedKey, ...items)
  }
}

Now we can easily create new "namespaced" log functions and call them separately:

const http = makeLogger('http');
const dbDebug = makeLogger('database');

dbDebug('connection established');     // runs if "database" is enabled
dbDebug('Results recieved');           // runs if "database" is enabled

http(`Request took ${requestTime}ms`); // runs if "http" is enabled 

That's it!

That gets us just about all the way to the debug module! It has a couple more features than what we created here, but this covers the main bits. I use the debug module in basically all projects & typically start using it from day 1: if you never put console.log statements in your code you have nothing to "clean up," and those debug log statements you make during active development can be useful later on, so why not keep them?

Next steps: go check out the the debug module. In the next post I'll go over some advanced usage. Thanks for reading!

*second oldest ;)

]]>
Type Hinting in JavaScipthttps://sequoia.makes.software/type-hinting-in-javascipt/What type is this object, what properties does it have, what arguments does this function take... there's a lot I don't miss about writing Java full-time, but boy do I miss this! Can we get these type hints in JavaScript? Let's find out!https://sequoia.makes.software/type-hinting-in-javascipt/Tue, 28 Jun 2016 04:00:00 GMTFor small projects, the amount of overhead that goes into documenting every function parameter, return value, and variable can be overkill. If your program fits in one or two files, you can just pull up that other file & check whether that function returns a string or an array. When, however, your application starts to span dozens or hundreds of files, or the number of developers working on it begins to climb, this system can quickly lead to a huge mess. When you get to this point, it's very helpful to offload some of this "checking that function signature" to your IDE or text editor.

example of tooltips and type hinting in JavaScript using VisualStudio Code

On projects of any size, code hinting reduces typos, makes coding easier, and obviates the need to check a module's documentation every few minutes. Programmers who use strongly typed languages like Java and IDEs like Eclipse take this sort of automated code-assistance for granted. But what about programmers who use JavaScript?

JavaScript is weakly typed, so when you declare var animals;, there's no way to know whether animals will be an array, a string, a function, or something else. If your IDE or editor doesn't know that animals will eventually be an array, there's no way for it to helpfully tell you that animals has the property length and the method map, among others. There's no way for the IDE to know it's an array... unless you tell it!

In this post we'll look at a couple ways to clue your IDE in to the types of the variables, function parameters, and return values in your program so it clue you in on how they should be used. We'll go over two ways to "tell" your IDE (and other developers) what types things are, and see how to load type information for third party libraries as well. Before we start writing type annotations, however, let's make sure we have a tool that can read them.

Setting up The Environment

The first thing we'll need is a code editor that recognizes & supports the concept of "types" in JavaScript. You can either use a JavaScript oriented IDE such as Webstorm or VisualStudio Code, or if you already have a text-editor you like, you can search the web to find out if it has a type hinting plugin that supports JavaScript. There's one for Sublime and Atom, among others.

If the goal is getting type hinting in JavaScript (and it is here), I use & recommend Visual Studio Code, the following reasons:

  • It has code-hinting for JavaScript built in, no plugins needed
  • It's from Microsoft, which has ample experience creating IDEs
  • Microsoft is also the creator of Typescript so it has excellent support for Typescript definitions, one of the tools we'll use herein
  • It's Open Source
  • It's free!

With VS Code installed, let's create a new project and get started!

Built-in Types

I've used npm init to start a new JavaScript project. At this point, we already get quite a bit from our IDE, which has JavaScript APIs (Math, String, etc.) and browser APIs (DOM, Console, XMLHttpRequest etc.) built in.

Here's some of what we get out of the box:

demo of type hinting for String, Math, Console, and Document

Nice! But we're more interested in Node.js annotations and sadly, VS Code does not ship with those. Type declarations for Node.js core APIs do exist, however, in the form of Typescript declaration files. We just need a way to add them to our workspace so VS Code can find them. Enter Typings.

Typings

Typings is a "Typescript Definition Manager", which means it helps us install the Typescript Definitions (or "Declarations") we need for our IDE to know what the JavaScript APIs we're working with look like. We'll look more at the format of Typescript Declarations later, for now we'll stay focused on our goal of getting our IDE to recognize Node.js core APIs.

Install typings thus:

$ npm install --global typings

With typings installed on our system, we can add those Node.js core API type definitions to our project. From the project root:

$ typings install dt~node --global --save

Let's break that command down:

  1. install the node package...
  2. ...from dt~, the DefinitelyTyped repository, which hosts a huge collection of typescript definitions
  3. we add the --global switch because we want access to definitions for process and modules from throughout our project
  4. Finally, the --save switch causes typings save this type definition as a project dependency in a typings.json, which we can check into our repo so others can install these same types. (typings.json is to typings install what package.json is to npm install.)

Now we have a new typings/ directory containing the newly downloaded definitions, as well as our typings.json file.

One More Step...

We now have these type definitions in our project, and VS Code loads all type definitions in your project automatically. However, it identifies the root of a JavaScript project by the presence of a jsconfig.json file, and we don't have one yet. VS Code can usually guess if your project is JavaScript based, and when it does it will display a little green lightbulb in the status bar, prompting you to create just such a jsconfig.json file. Click that button, save the file, start writing some Node and...

demo of looking up core node.js api properties & methods using external node.js typescript definition file

It works! We now get "Intellisense" code hints for all Node.js core APIs. Our project won't just be using Node core APIs though, we'll be pulling in some utility libraries, starting with lodash. typings search lodash reveals that there's a lodash definition from the npm source as well as global and dt. We want the npm version since we'll be consuming lodash as module included with require('lodash') and it will not be globally available.

$ typings install --save npm~lodash
[email protected]
└── (No dependencies)

$ npm install --save lodash
[email protected] /Users/sequoia/projects/typehinting-demo
└── [email protected]

Now we can require lodash and get coding:

example demonstrating property and method lookup of a application dependency (lodash)

So far we've seen how to install and consume types for Node and third party libraries, but we're going to want these annotations for our own code as well. We can achieve this by using JSDoc comments, writing our own Typescript Declaration files, or a combination of both.

JSDoc Annotations

JSDoc is a tool that allows us to describe the parameters and return types of functions in JavaScript, as well as variables and constants. The main advantages of using JSDoc comments are:

  1. They're lightweight & easy to get started with (just add comments to your JS)
  2. The comments are human-readable, so the comments are useful even if you're reading the code on github or in a simple text editor
  3. The syntax is very similar to Javadoc and for the most part fairly intuitive.

There are many annotations JSDoc supports, but you can get a long way just by learning a few, namely @param and @return. Let's annotate this simple function, which checks whether one string contains another string:

function contains(input, search){
  return RegExp(search).test(input);
}

contains('Everybody loves types. It is known.', 'known'); // => true

With a function like this, it's easy to forget the order of arguments or their types. Annotations to the rescue!

/**
 * Checks whether one string contains another string
 * 
 * @param {string} input   - the string to test against
 * @param {string} search  - the string to search for
 * 
 * @return {boolean}
 */
function contains(input, search){
  return RegExp(search).test(input);
}

While writing this, we realized it that this function actually works with regular expressions as the search parameter as well as strings. Let's update that line to make clear that both types are supported:

/** 
 * ...
 * @param {string|RegExp} search  - the string or pattern to search for
 * ...
 */

We can even add examples & links to documentation to help the next programmer out:

/**
 * Checks whether one string contains another string
 * 
 * @example 
 * ```
 * contains("hello world", "world"); // true
 * ```
 * @example
 * ```
 * const exp = /l{2}/;
 * contains("hello world", exp);  // true
 * ```
 * @see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp
 * 
 * @param {string} input          - the string to test against
 * @param {string|RegExp} search  - the string or pattern to search for
 * 
 * @return {boolean}
 */

...and away we go!

example of hinting function parameters & types based on JSDoc comments

JSDoc works great and we've only scratched the surface of what it can do, but for more complex tasks or cases where you're documenting a data structure that exists e.g. in a configuration file, typescript declaration files are often the better choice.

Typescript Declarations

A typescript declaration file uses the extension .d.ts and describes the shape of an API, but does not contain the actual API implementation. In this way, they are very similar to the Java or PHP concept of an Interface. If we were writing Typescript, we would declare the types of our function parameters and so on right in our code, but JavaScript's lack of types makes this impossible. The solution: declare the types in an JavaScript library in a Typescript (definition) file that can be installed alongside the JavaScript library. This is the reason we installed the lodash type definitions separately from lodash.

Setting up external type definitions for an API you plan to publish and and registering them on the typings repository is a more involved task that we'll cover today, but you can read up about it here. For now, let's consider the case of a complex configuration file.

Imagine we have an application that creates a map and allows users to add features to that map. We'll be deploying these editable maps to different client sites, so we want be able to configure the types of features users can add and the coordinates to center the map on on a per-site basis.

Our config.json looks like this:

{
  "siteName": "Strongloop",
  "introText":  {
    "title": "<h1> Yo </h1>",
    "body": "<strong>Welcome to StrongLoop!</strong>"
  },
  "mapbox": {
    "styleUrl": "mapbox://styles/test/ciolxdklf80000atmd1raqh0rs",
    "accessToken": "pk.10Ijoic2slkdklKLSDKJ083246ImEiOi9823426In0.pWHSxiy24bkSm1V2z-SAkA"
  },
  "coords": [73.153,142.621],
  "types": [
    {
      "name": "walk",
      "type": "path",
      "lineColor": "#F900FC",
      "icon": "test-icon-32.png"
    },
    {
      "name": "live",
      "type": "point",
      "question": "Where do you live?",
      "icon": "placeLive.png"
    }
    ...

We don't want to have to go read over this complex JSON file each time we want to find the name of a key or remember the type of a property. Furthermore, it's not possible to document this structure in the file itself because JSON does not allow comments.* Let's create Typescript Declaration called config.d.ts to describe this config object, and put it in a directory in our project called types/.

declare namespace Demo{
  export interface MapConfig {
      /** Used as key to ID map in db  */
      siteName: string;
      mapbox: {
        /** @see https://www.mapbox.com/studio/ to create style */
        styleUrl: string;
        /** @see https://www.mapbox.com/mapbox.js/api/v2.4.0/api-access-tokens/ */
        accessToken: string;
      };
      /** @see https://www.mapbox.com/mapbox.js/api/v2.4.0/l-latlng/ */
      coords: Array<number>;
      types : Array<MapConfigFeature>;
  }

 interface MapConfigFeature {
    type : 'path' | 'point' | 'polygon';
    /** hex color */
    lineColor?: string;
    name : string;
    /** Name of icon.png file */
    icon: string;
  }  
}

You can read more in the Typescript docs about what all is going on here, but in short, this file:

  1. Declares the Demo namespace, so we don't collide with some other MapConfig interface
  2. Declares two interfaces, essentially schemas describing the structure and purpose of our JSON
  3. Defines the types property of the first interface as an array whose members are MapConfigFeatures
  4. Exports MapConfig so we can reference it from outside the file.

VS Code will load the file automatically because it's in our project, and we'll use the @type annotation to mark our conf object as a MapConfig when we load it from disk:

/** @type {Demo.MapConfig} */
const conf = require('./config.js');

Now we can access properties of the configuration object & get the same code-completion, type info, and documentation hints! Note how in the following gif, VS Code identifies not only that conf.types is an array, but when we call an .filter on it, knows that each element in the array is a MapConfigFeature type object:

example demonstrating looking up properties on an object based on a local Typescript Definition

I have been very much enjoying the benefits of JSDoc, Typescript Declarations and the typings repository in my work. Hopefully this article will help you get started up and running with type hinting in JavaScript. If you have any questions or corrections, or if this post was useful to you, please let me know!

* There is in fact a way to document the properties of json files, I hope to write about it in the future!

]]>
Techniques to Avoid Live Coding, Part 1https://sequoia.makes.software/techniques-to-avoid-live-coding-part-1/The demo-gods are unmerciful and typing is hard with hundreds of people looking on. Learn to write code without writing code and your presentations will go off without a hitch!https://sequoia.makes.software/techniques-to-avoid-live-coding-part-1/Mon, 23 May 2016 04:00:00 GMTYou’re on stage. You’re about to demo your Cool New Thing. Just gotta add a few… lines… of code… and [drumroll] ERROR. Oops, heh… here’s what we did wrong. Just fix this and… [drumroll][drumroll] hmm. [drumroll] Looks like the wifi isn’t working…

"Well trust me, it works."

Live coding! The worst! We’ve all seen the previous scenario play out in a presentation, perhaps you’ve even been a victim. But what can a presenter do? The code demos have multiple steps, there are terminal interactions and the tool relies on an internet connection. How can we fit all that into our slides?? Fear not! I’ll go over some tips to keep your presentation running smoothly without sacrificing a calf to the Demo Gods beforehand. In this post, we’ll look at coding demos: demos where you’re writing code for attendees to see on a projector.

It’s not uncommon, when demoing a programming practice or tool, to do a “build-it-up” type demo, where you start with something simple & add functionality through the presentation. This is a great pattern for demoing frameworks & libraries, but many steps means many opportunities for mistakes. I lean on two main approaches to handling these issues.

Solution 1: Tagged Steps

The first solution for this one is the most simple: tagged steps in a git repo. By coding beforehand & tagging each step, you have the flexibility to live code if you want (you can always jump to the next step if you get off track), or don’t code at all and just check out step after step.

This approach requires some preparation! While you can retrofit this approach onto an existing repo, it’s much easier if you plan things out ahead of time. Here’s my approach:

  1. Create the outline of the steps you’re going to go over in code examples
  2. As you create your example app, commit at each step from your outline
  3. Tag each commit with a step name or number (don’t do this ’til you’re done!!)
  4. Optionally: link to the tags in your slides

When you’re done, you’ll have something like this:

List of named tags on a repository on github. Tags have names like "test," "tables," "search," "progress-bars" etc.

Now you can still code in your demos, but if something goes awry, you have a safety net: git checkout the next step & you’re back to working code and ready to move on! Bonus: attendees can now peruse your examples at their leisure after the talk.

Example: in this talk, each of the titles in the “steps” slides are links to a tag on github.

Solution 2: Code "Fade-ins” (and Highlighting)

Depending on the nature of your presentation, you may not have the ability or the time to actually step through the code and look at it in an editor. For fast-paced talks, it can be useful to have everything in your slides, code included.

Putting code in slides can go very wrong, however: if you dump a big block of code on the screen, attendees don’t know where to look and it can be hard to tell what’s important. This, for example, is a mess:

Slide about building progress bars. There are many many lines of code visible.

Where am I supposed to look? Which part of this is important? I can’t read all that!!!

My solution to this is two-fold:

  1. Highlight the important bits
  2. Fade lines of code in to match the order of your explanation

By fading code in line by line or in blocks, it simulates “writing” the code, and it presents lines of code at a rate people can actually process. Highlighting simply calls attention to the important bits, so people know where to look. In the example above, which relates to adding progress bars to a file-download feature, the steps I was going over were:

  1. Make request & pipe it to disk
  2. Add a response listener
  3. Set up the progress bar
  4. Update it when data arrives
  5. Finally, output the filename in cyan

With transitions in place, I was able to go thru each of the steps above one by one, “building up” the example in bite-sized pieces:

Slide explaining how to create progress bars with portions of code fading in bit by bit

NB: This gif cycles thru the steps much more quickly that you would on stage.

This is one of my favorite approaches to live-coding (fake it! :p), but it can be difficult to set up. In the example above, there is a separate text-box for each of the portions to fade in, painstakingly positioned to appear as one big file. Another challenge was getting highlighted code into google slides at all: when you copy code from most editors or IDEs, they don’t bring syntax highlighting along. I found that PHPStorm did allow you to copy code with syntax highlighting, so I opened files there any time I needed to copy with highlighting. Yes, all this was time consuming. 🙂

This approach is also possible in tools like Reveal.js, but not using markdown. In order to achieve line-by-line fade-ins with Reveal.js, you’d need to first convert the code examples to HTML, then manually add the fragment class to elements you wish to fade in. I haven’t tried it but I believe it would work– if you have done this, please let me know!

That’s all for today! In part II of this post, we’ll look at how to demo command-line tools and interactions on stage… without opening a terminal. Stay tuned!

]]>
Migrating a Legacy System to a Modern API Frameworkhttps://sequoia.makes.software/migrating-a-legacy-system-to-a-modern-api-framework/LoopBack makes it easy to develop greenfield APIs, but in real business environments, data isn’t always tidy & cooperative. In this case study we'll see what it takes to migrate a complex legacy application to LoopBack.https://sequoia.makes.software/migrating-a-legacy-system-to-a-modern-api-framework/Thu, 03 Mar 2016 05:00:00 GMTWe know LoopBack makes it a breeze to create APIs that expose SQL & NoSQL databases (among others), but in real business environments, data isn’t always this tidy. As business systems develop over years, it’s not uncommon for information to be scattered across databases, flat files, or even third party servers outside of your control. Given a complex situation like this, is it still possible to build an API using LoopBack? Yes it is, and that’s just what we’ll do now!

Background

Al’s Appliances is a retail chain that specializes in ACME products. Al’s has a website where customers can order products, but getting replacement parts for products is more complicated.

To wit:

  1. Customer calls Sales Rep with product name.
  2. Sales Rep looks up product number in the products database.
  3. Sales Rep uses product number to find the proper parts list, supplied by ACME.
  4. Customer asks about one or more parts.
  5. Sales Rep goes to ACME’s wholesaler portal and looks up price and availability of the parts, one by one.

Yikes! Al’s IT team would like to build an interface to support the following workflow:

  1. Customer looks up product on website.
  2. Customer clicks to see list of associated parts with up-to-date price and availability info.
  3. Alternately, Customer can enter a part number and get up-to-date info for that part.

Their team can build the web UI, but it’s up to us to tie the data together & expose it via an API.

Lay of the Land

We have the following assets to work with:

  1. Products Database: a SQL database with a Product table listing name, description, and ID of each product.
  2. Parts lists (CSVs): ACME delivers CSV files, named by product number & containing a list of part names & SKUs. For example:

    //mvwave_0332.csv//
    door handle,8c218
    rotator base,f74af
    rotator axel,15b4c
    ...,...
    

    These CSVs are the “single source of truth” for parts info and they’re sometimes updated or replaced. Business processes for other departments rely on them, so unfortunately we must do the same (we must use the CSVs, moving the data to a database is not an option).

  3. Parts API: We’re in luck: ACME exposes a rudimentary API to access part information, so we don’t have to scrape the website! Unfortunately, it’s very simple and only exposes one endpoint to look up a single part at a time:

    //GET api.acme.com/parts/f74af
    {
      "name": "rotator base",
      "sku": "f74af",
      "qty_avail": 0,
      "price": "2.32"
    }
    

IT has requested an API that exposes the following endpoints:

  1. /v1/products → Array of products
  2. /v1/products/{id} → Object representing a single product
  3. /v1/products/{id}/parts → Array of parts for a product
  4. /v1/parts/{sku} → Object representing a single part

Why LoopBack?

Given this nonstandard, somewhat complicated data architecture, why not build a 100% custom solution instead of using LoopBack, which best shows its strengths with more structured data? Using LoopBack here will require us to go “off the beaten path” a bit, but in return we get…

  • easy extension of our application with any of the components in the LoopBack ecosystem
  • API discovery and exploration tools
  • powerful configuration management
  • loose coupling to the current data sources

That last point is important, as it will allow us to eventually replace the directory of CSVs with a database table, once the business is ready for this, without major rewrites. Plugging into the LoopBack ecosystem gives us access to ready solutions for auth, data transformation, logging, push notification, throttling etc. when our requirements grow or change. Broadly speaking we’ll be building a highly extensible, highly maintainable application that can serve as a foundation for future projects, and this is makes LoopBack a good choice.

Setting Up To get started we’ll install Strongloop tools

$ npm install -g strongloop

and scaffold a new LoopBack application in a new directory.

$ slc loopback als-api

Now we can switch to the new als-api directory and generate our models. We’ll keep them server-only for now, we can easily change that later.

$ cd als-api
$ slc loopback:model
? Enter the model name: Product
? Select the data-source to attach Product to: db (memory)
? Select model's base class PersistedModel
? Expose Product via the REST API? Yes
? Custom plural form (used to build REST URL): n
? Common model or server only? server

Let’s add some Product properties now.

? Property name: name
invoke loopback:property
? Property type: string
? Required? Yes

...etc...

NB: You can see a detailed example of this process here.

Once we finish this process, we have models for Product, Part, and PartsList, with corresponding js and json files in server/models/. The PartsList is a join model that connects a Product to its Parts. That model requires some custom code, so we’ll save that bit for last and start by wiring the Product and Part model to their datasources.

Product

Our generated server/models/product.json:

{
  "name": "Product",
  "properties": {
    "name": {
      "type": "string",
      "required": true
    }
  },
  "description": {
    "type": "string",
    "required": true
  },
  "id": {
    "type": "string",
    "required": true
  }
 },
. . .
}

The products are in a SQL database (SQLite for our example). There are three steps to connecting the model to its data:

  1. Install the appropriate connector. Loopback has many data connectors but only the “in memory” database is bundled. The list of StrongLoop supported connectors doesn’t include SQLLite, but the list of community connectors indicates that we should install “loopback-connector-sqlite”:

    $ npm install --save loopback-connector-sqlite
    
  2. Create a datasource using that connector. To create a sqlite datasource called “products,” we add the following to server/datasources.json:

    "products": {
      "name": "products",
      "connector": "sqlite",
      "file_name": "./localdbdata/
      local_database.sqlite3",
      "debug": true
    }
    

    In our local setup our sqlite database resides in ./localdbdata/ we can later add another configuration for the production environment.

  3. Connect the model to the datasource. /server/modelconfig.json manages this:

    "Product": {
      "dataSource": "products",
      "public": true
    },
    

    There is an additional step for this particular connector, specifying which field is the primary key. We do this by adding "id": true to a property in /server/models/product.json:

    . . .
    "properties": {
      . . .
      "id": {
        "type": "string",
        "id": true,
        "required": true
      }
    },
    . . .
    

Before we start our server to see if this works, let’s update the server configuration to expose the API on /v1/ rather than the default path (/api/) in server/config.json:

. . .
  "restApiRoot": "/v1",
  "host": "0.0.0.0",
. . .

The API will now be served from /v1/ per IT’s specifications. Now we can start our server…

$ npm start

and start querying products from http://localhost:3000/

//GET /v1/products
[
  {
    "name": "Microwelle Deluxe",
    "description": "The very best microwave money can buy",
    "id": null
  },
  {
    "name": "Microwelle Budget",
    "description": "The most OK microwave money can buy",
    "id": null
  },
. . .
]

Uhoh! The ids are strings and idInjection makes LoopBack treat them as numbers. Let’s fix that in server/models/product.json:

. . .
  "idInjection": false,
. . .

Now let’s try again:

//GET /v1/products
[
  {
    "name": "Microwelle Deluxe",
    "description": "The very best microwave money can buy",
    "id": "microwelle_010"
  },
  {
    "name": "Microwelle Budget",
    "description": "The most OK microwave money can buy",
    "id": "microwelle_022"
  },
. . .
]

//GET /v1/products/microwelle_010
{
  "name": "Microwelle Deluxe",
  "description": "The very best microwave money can buy",
  "id": "microwelle_010"
}

That’s better! Our Products are now being served so Endpoints 1 (/v1/products) and 2 (/v1/products/{id}) are working. Now let’s configure our Parts datasource and set up Endpoint 4 (/v1/parts/{sku}).

Part Our generated server/models/part.json:

{
  "name": "Part",
  "properties": {
    "sku": {
      "type": "string",
      "required": true
    },
    "qty_avail": {
      "type": "number",
      "required": true
    },
    "price": {
      "type": "number",
      "required": true
    },
    "name": {
      "type": "string",
      "required": true
    }
  }
. . .
}

We’ll need to follow the same three steps to connect the Parts model its datasource, a remote server this time.

  1. Install connector:

    $ npm install --save loopback-connector-rest
    
  2. Create Datasource: Because there’s no universal standard for what parameters REST endpoints take, how they take them (query, post data, or part of URL), or what sort of data they return, we must configure each method manually for a REST datasource.

    //server/datasources.json:
    . . .
      "partsServer": {
        "name": "partsServer",
        "connector": "rest"
        "operations": [{
          "template": {
            "method": "GET",
            "url": "http://api.acme.com/parts/{sku}",
            "headers": {
              "accepts": "application/json",
              "contenttype": "application/json"
            }
          },
          "functions": {
            "findById": ["sku"]
          }
        }]
      }
    . . .
    

    This will create a method called findById on any model attached to this datasource. That method takes one parameter (sku) that will be plugged into the url template. Everything else here is default.

    We named the “operation” findById to conform to LoopBack convention. Because it has this name, LoopBack will know to exposed the method on /v1/parts/{id} .

  3. Connect the model to the datasource. /server/modelconfig.json:

    . . .
      "Part": {
        "dataSource": "partsServer",
        "public": true
      },
    . . .
    

Let’s restart the server and try it out:

//GET /v1/parts/f74af
{
  "name": "rotator base",
  "sku": "f74af",
  "qty_avail": 0,
  "price": "2.11"
}

Endpoint 4 (/v1/parts/{sku}) is now working! It’s just a passthrough to the ACME API right now, but this has advantages: we can set up logging, caching, etc., we don’t have to worry about CORS, and if ACME makes a breaking API change, we can fix it in one place in our server code and clients are none the wiser.

With the easy parts out of the way, it’s time to tackle our CSVs…

PartsList

Although the part lists CSVs contain product names, we’re relying on the remote server for this, so the CSVs are being used as simple many-to-many join tables. Many-to-many tables don’t generally need their own model, so why are we creating one in this case? There are two reasons:

  1. Rather than a normal join table filled with product_id, sku pairs, we have a bunch of files named like {product_id}.csv that contain lists of skus. This will require custom join logic, and,
  2. We want to encapsulate this logic in one place so the Product and Part models are not polluted with CSV and file-reading concerns.

If we stop using CSVs in the future we can delete this model and update the relationship configurations on Product, and that model can continue working without changes.

We’re going to use a hasManyThrough relationship to tie Products to their Parts, and because we’re not concerned with the part name in the PartsList, our partslist.json is does not specify any properties:

{
  "name": "PartsList",
  "base": "PersistedModel",
  "properties": {
  },
. . .
}

We’re not exposing PartsLists directly via the API, just using them for Endpoint 3 (/v1/products/{id}/parts), so we’ll just set it up to support this relationship. This first step here is to add the relationship from Product to Part, which we can do using the relationship generator:

$ slc loopback:relation
? Select the model to create the relationship from: Product
? Relation type: has many
? Choose a model to create a relationship with: Part
? Enter the property name for the relation: parts
? Optionally enter a custom foreign key:
? Require a through model? Yes
? Choose a through model: PartsList

Now when we hit /v1/products/thing_123/parts, LoopBack will attempt to figure out what Parts are related to our Product by calling find on the join model, more or less like this:

PartsList.find(
  {
    where: { productId: 'thing_123' },
    include: 'part',
    collect: 'part'
  },
  {},
  function callback(err, res){ /*...*/ }
);

How will we make this work? We’ll definitely need to read CSVs from the filesystem, so let’s get that configuration out of the way.

##Configuration

Our PartsList CSVs exist in /vol/NAS_2/shared/parts_lists but of course we don’t wish to hardcode this path in our model. Instead, we’ll put it into a local config file where it can easily be overridden in other environments:

//server/config.local.json:
{
  'partsListFilePath' : '/vol/NAS_2/shared/parts_lists'
}

Overriding PartsList.find

We know that when querying related models, LoopBack will call find on the “through” model (aka join model), so we’ll override PartsList.find and make it:

  1. read thing_123.csv
  2. get the skus
  3. call Part.findOne on each sku
  4. pass an array of Parts to the callback

We’ll need to override the method in server/models/partslist.js. To override a data access method like this, we listen for the attached event to fire then overwrite the method on the model. We’ll be using two node modules to help: async to manage “wait for multiple async calls (calls to ACME API) to finish then call our done callback with the results,” and csvparse to parse our CSVs:

//server/model/partslist.js:
var fs = require('fs'var async = require('async'//npm install!
var csvParse = require('csvparse'//npm install!
var path = require('path'module.exports = function(PartsList) {
  PartsList.on('attached', function(app){

    PartsList.find = function(){
      //variable arguments, filter always first callback always last
      var filter = arguments[0var done = arguments[arguments.length-1//0. build the filename
      var filename = filter.where.productId + '.csv';
      var csvPath = path.join(app.get('partsListFilePath'),
filename);
      //1. read the csv
      fs.readFile(csvPath, 'utf-8', function getParts(err, res){
        if(err) return done(err);

        //parse the csv contents
        csvParse(res, function(err, partlist){
          if(err) return done(err);

          //2. get the skus from ['part name', 'sku'] tuples
          var skus = partlist.map(function getSku(partTuple){
            return partTuple[1];
          });

          //3. call Part.findOne on each sku
          async.map(skus, app.models.Part.findById, function (err,
parts){
            if(err) return done(err);

            //4. pass an array of Parts to the callback
            done(null, parts);
          });
        });
      });
    };
  });
};

This could certainly be broken up into named functions for easier reading, but it works and for our purposes that’s good enough! One issue, however, is that the repeated calls to Part.findById is a “code smell:” we have Part logic (get all Parts by list of skus) in the PartsList model. It would be much better to pass our array of skus to a Part method and let it handle the details. Let’s change step (3) above so it looks like this:

//3. pass our list of SKUs and `done` callback to Part.getAll
app.models.Part.getAll(skus, done);

//4. pass an array of Parts to the callback
//   ^-- this happens in Part.getAll

Now we add this new method to Part:

//server/model/part.js:
var async = require('async'module.exports = function(Part) {
  Part.getAll = function(skus, cb) {
    async.map(skus, Part.findById, function (err, parts){
      if(err) return cb(err);
      cb(null, parts);
    });
  }
};

Now our Parts logic is nicely encapsulated in the Part model & the logic in our PartsList model is a bit simpler. Let’s give our last API endpoint a try:

//GET /v1/Products/mvwave_0332/parts
[
  {
    "name": "door handle",
    "sku": "8c218",
    "qty_avail": 0,
    "price": "1.22"
  },
  {
    "name": "rotator base",
    "sku": "f74af",
    "qty_avail": 0,
    "price": "8.35"
  },
  {
    "name": "rotator axel",
    "sku": "15b4c",
    "qty_avail": 0,
    "price": "2.32"
  }
]

It works!

Next Steps

We managed to tie together a motley collection of data sources, represent them with LoopBack models, and expose them on an API built to IT’s specifications. That’s a good stopping point for now. Obvious next steps would be to disable unused methods (this API is read-only, after all), build a client to interact with our API, and to set up auth if needed. By using LoopBack to build our API, we’ve positioned ourselves to be able to complete these tasks easily. We can now answer my initial question with greater confidence: yes, LoopBack can do it!

Want to see all this stuff actually work? Check out the demo app!

]]>
Higher Order Functions in ES6: Easy as a => b => c;https://sequoia.makes.software/higher-order-functions-in-es6-easy-as-a-b-c/New language features can make an expression that was cumbersome to write in ES5 easy in ES6, enabling and encouraging the use of this type of expression. We’re going to look at one such case here: how arrow functions make it easier to write higher-order functions in ES6.https://sequoia.makes.software/higher-order-functions-in-es6-easy-as-a-b-c/Mon, 11 Jan 2016 05:00:00 GMTES6 is nigh! As more and more libraries & Thought Leaders start incorporating ES6 into their code, what used to be nice-to-know ES6 features are becoming required knowledge. And it’s not just new syntax – in many cases, new language features can make an expression that was cumbersome to write in ES5 easy in ES6, enabling and encouraging the use of this type of expression. We’re going to look at one such case here: how arrow functions make it easier to write higher-order functions in ES6.

A higher order function is a function that does one or both of the following:

  1. takes one or more functions as arguments
  2. returns a function as its result.

The purpose of this post is not to convince you to adopt this new style right away, although I certainly encourage you to give it a try! The purpose is to familiarize you with this style, so that when you run into it in someone’s ES6-based library, you won’t sit scratching your head wondering what you’re looking at as I did the first time I saw it. If you need a refresher in arrow syntax, check out this post first.

Hopefully you’re familiar with arrow functions that return a value:

const square = x => x * x;

square(9) === 81; // true

But what’s going on in the code below?

const has = p => o => o.hasOwnProperty(p);
const sortBy = p => (a, b) => a[p] > b[p];

What’s this “p returns o returns o.hasOwnProperty…”? How can we use has?

Understanding the syntax

To illustrate writing higher order functions with arrows, let’s look at a classic example: add. In ES5 that would look like this:

function add(x){
  return function(y){
    return y + x;
  };
}

var addTwo = add(2);
addTwo(3);          // => 5
add(10)(11);        // => 21

Our add function takes x and returns a function that takes y which returns y + x. How would we write this with arrow functions? We know that…

  1. an arrow function definition is an expression, and
  2. an arrow function implicitly returns the results of a single expression

…so all we must do is make the body of our arrow function another arrow function, thus:

const add = x => y => y + x;
// outer function: x => [inner function, uses x]
// inner function: y => y + x;

Now we can create inner functions with a value bound to x:

const add2 = add(2);// returns [inner function] where x = 2
add2(4);            // returns 6: exec inner with y = 4, x = 2
add(8)(7);          // 15

Our add function isn’t terribly useful, but it should illustrate how an outer function can take an argument (x) and reference it in a function it returns.

Sorting our users

So you’re looking at an ES6 library on github and encounter code that looks like this:

const has = p => o => o.hasOwnProperty(p);
const sortBy = p => (a, b) => a[p] > b[p];

let result;
let users = [
  { name: 'Qian', age: 27, pets : ['Bao'], title : 'Consultant' },
  { name: 'Zeynep', age: 19, pets : ['Civelek', 'Muazzam'] },
  { name: 'Yael', age: 52, title : 'VP of Engineering'}
];

result = users
  .filter(has('pets'))
  .sort(sortBy('age'));

What’s going on here? We’re calling the Array prototype’s sort and filter methods, each of which take a single function argument, but instead of writing function expressions and passing them to filter and sort, we’re calling functions that return functions, and passing those to filter and sort.

Let’s take a look, with the expression that returns a function underlined in each case.

Without higher order functions

result = users
  .filter(x => x.hasOwnProperty('pets')) //pass Function to filter
  .sort((a, b) => a.age > b.age);        //pass Function to sort

With higher order functions

result = users
  .filter(has('pets'))  //pass Function to filter
  .sort(sortBy('age')); //pass Function to sort

In each case, filter is passed a function that checks if an object has a property called “pets.”

Why is this useful?

This is useful for a few reasons:

  • It reduces repetitive code
  • It allows for easier reuse of code
  • It increases clarity of code meaning

Imagine we want only users with pets and with titles. We could add another function in:

result = users
  .filter(x => x.hasOwnProperty('pets'))
  .filter(x => x.hasOwnProperty('title'))
  ...

The repetition here is just clutter: it doesn’t add clarity, it’s just more to read and write. Compare with the same code using our has function:

result = users
  .filter(has('pets'))
  .filter(has('title'))
  ...

This is shorter and easier to write, and that makes for fewer typos. I consider this code to have greater clarity as well, as it’s easy to understand its purpose at a glance.

As for reuse, if you have to filter to pet users or people with job titles in many places, you can create function to do this and reuse them as needed:

const hasPets = has('pets');
const isEmployed = has('title');
const byAge = sortBy('age');

let workers = users.filter(isEmployed);
let petOwningWorkers = workers.filter(hasPets);
let workersByAge = workers.sort(byAge);

We can use some of our functions for single values as well, not just for filtering arrays:

let user = {name: 'Assata', age: 68, title: 'VP of Operations'};
if(isEmployed(user)){   // true
  //do employee action
}
hasPets(user);          // false
has('age')(user);       //true

A Step Further

Let’s make a function that will produce a filter function that checks that an object has a key with a certain value. Our has function checked for a key, but to check value as well our filter function will need to know two things (key and value), not just one. Let’s take a look at one approach:

//[p]roperty, [v]alue, [o]bject:
const is = p => v => o => o.hasOwnProperty(p) && o[p] == v;

// broken down:
// outer:  p => [inner1 function, uses p]
// inner1: v => [inner2 function, uses p and v]
// inner2: o => o.hasOwnProperty(p) && o[p] = v;

So our new function called “is” does three things:

  1. Takes a property name and returns a function that…
  2. Takes a value and returns a function that…
  3. Takes an object and tests whether the object has the property specified with the value specified, finally returning a boolean.

Here is an example of using this is to filter our users:

const titleIs = is('title');
// titleIs == v => o => o.hasOwnProperty('title') && o['title'] == v;

const isContractor = titleIs('Contractor');
// isContractor == o => o.hasOwnProperty('title') && o['title'] == 'Contractor';

let contractors = users.filter(isContractor);
let developers  = users.filter(titleIs('Developer'));

let user = {name: 'Viola', age: 50, title: 'Actress', pets: ['Zak']};
isEmployed(user);   // true
isContractor(user); // false

A note on style

Scan this function, and note the time it takes you to figure out what’s going on:

const i = x => y => z => h(x)(y) && y[x] == z;

Now take a look at this same function, written slightly differently:

const is = prop => val => obj => has(prop)(obj) && obj[prop] == val;

There is a tendency when writing one line functions to be as terse as possible, at the expense of readability. Fight this urge! Short, meaningless names make for cute-looking, hard to understand functions. Do yourself and your fellow coders a favor and spend the extra few characters for meaningful variable and function names.

One more thing. . .

What if you want to sort by age in descending order rather than ascending? Or find out who’s not an employee? Do we have to write new utility functions sortByDesc and notHas? No we do not! We can wrap our functions, which return Booleans, with a function that inverts that boolean, true to false and vice versa:

//take args, pass them thru to function x, invert the result of x
const invert = x => (...args) => !x(...args);
const noPets = invert(hasPets);

let petlessUsersOldestFirst = users
  .filter(noPets)
  .sort(invert(sortBy('age')));

Conclusion

Functional programming has been gaining momentum throughout the programming world and ES6 arrow functions make it easier to use this style in JavaScript. If you haven’t encountered FP style code in JavaScript yet, it’s likely you will in the coming months. This means that even if you don’t love the style, it’s important to understand the basics of this style, some of which we’ve gone over here. Hopefully the concepts outlined in this post have helped prepare you for when you see this code in the wild, and maybe even inspired you to give this style a try!

Comments

I think you did a very good (difficult to do a great) job, particularly with your examples. Your post motivates me to dive deeper into FP.

- Brent Enright,

Thanks Brent!! Feedback like this makes my day!

Thanks. This gave me a bit to think about and mull over. It's not something that I have spent a lot of time on, but it's pretty important as I try to better grok the functional programming thing.

- Brian,

Nice!! FP is fun, I hope you keep it up!

type IPropertyKey = string | number | symbol;
const h = Object.prototype.hasOwnProperty;
export const hasOwnProperty = (obj, property: IPropertyKey): boolean => h.call(obj, property);

Another reason to use Object.prototype.hasOwnProperty.call() https://github.com/jquery/jquery/issues/4665

More and more people are using const a = Object.create(null); a.something = 'abc'. There is no prototype... so hasOwnProperty is undefined there. But you can use Object.prototype. hasOwnProperty.call(a, 'something');

- Darcy,

That is a good point and one I had not considered. Your comment should serve as ample warning to those considering copy/pasting from this post pell-mell, methinks. Thank you for pointing this out! Here's the JavaScript version for readers not versed in Typescript:

const hasOwnProperty = (obj, propname) =>
  Object.prototype.hasOwnProperty.call(object, propname);

Darcy: why are "more and more people" using a = Object.create(null)? Is there some percieved performance benefit? Strikes me as a bit... well it would certainly be better if it weren't necessary.

Follow-up from Darcy:

To answer your question about why more and more people are using Object.create(null):

Object.create() is useful for some types of prototype composition. (ES6 class provides nicer syntax for many types of composition... but there is still use for Object.create() for special cases. mixins being one example.) One use case for Object.create(null) is for when you don't want OOTB methods from Object.prototype on your object. But I wouldn't do so without thinking about the consequences first. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object/create has some discussion on it.

A popular use case is when the object is being used as a hash map where keys are strings (and ES6 Map is not available). This is what JQuery was doing (it was used as a cache). By having no prototype, the cache is safe for keys like 'constructor', and other values on prototype chain. In some use cases the object may store user generated keys and someone could create a key like 'hasOwnProperty' that blocks the 'hasOwnProperty' method on prototype chain. In this case, if cache.hasOwnProperty() is called, it would throw an error. Of course it really depends on the use cases of the hash map. If you know there won't be any key collisions, it does not matter.

Performance is arguably better too because with a null prototype, there is one less prototype in the chain to resolve a key's value. And it could help prevent the need for even using hasOwnProperty in some cases. But I don't think performance is really the motivating reason for Object.create(null). JS engines are pretty fast (relative to other work) at resolving a value by looking up a key in prototype chain.

Thank you for the follow-up, Darcy! As I suspected, Object.create(null) is being used for "something weird" (to wit: trying to create an object that behaves like Map where Map is not available). Regarding performance, I will quibble slightly here and insist that unless one has actually tested and measured the performance impact, this is not work doing "for performance" (I know you're not strictly suggesting it but still).

  1. Some things we think improve performance actually make no difference
  2. The JS JITC & engine are good at optimizing code (so we don't need to do so much manually and may even get in the way of it's optimizations) and
  3. The performance benefit must be weighed against the cost, such as, in this case, breaking normal object behavior (to wit: .hasOwnProperty is not a function).

Thank you for the suggestion and the follow-up!!

]]>
Building to Github Pages with Grunthttps://sequoia.makes.software/building-to-github-pages-with-grunt/Wherein I document my Grunt+github-pages+bookmarklet build which required templating, switching branches, rebasing, commit, and all sorts of craziness.https://sequoia.makes.software/building-to-github-pages-with-grunt/Mon, 02 Dec 2013 00:00:00 GMTA while ago I created a bookmarklet to anonymize Facebook for screenshots, the Afonigizer. To distribute it, I chose to use Github Pages, Github's free hosting service. To automate the process of getting updates to my bookmarklet from a javascipt file in my repository to a page on github.io, I used Grunt. Besides building and distributing distributing bookmarklets, I am sure there are other reasons to build to github pages (or another branch on your repo), so I'm sharing my workflow here.

The following example assumes you are familiar with bookmarklets, Github Pages, and that you've used Grunt. We'll use a modified version of my bookmarklet build as an example.

So, we have our bookmarklet written, linted, minified, and committed to the master branch, and we're ready to publish it to the web. From a high level, this means checking out the gh-pages branch, rebasing onto master to get the latest javascript, interpolating the javascript file into the template, committing the new index.html file, and checking out master again to wrap up. Switching branches and rebasing are peculiar tasks for a build, but it can be done (even if it shouldn't be) and the following Gruntfile snippet explains how.

Prerequisites and setup

We need a minified javascript file:

(function(){var a="@",b="_",c="sequoia";alert(a+b+c);})();

And a template file to serve the bookmarklet:

<!-- filename: index.html.tpl -->
<html><body> 
    <a href='javascript:void(<%= marklet %>);'>Bookmarklet!</a>
</body></html>

The Gruntfile

abridged; see full version here

//we'll need `fs` to read the bookmarklet file
fs = require('fs');
module.exports = function(grunt) {

  // Project configuration.
  grunt.initConfig({

/* ... */

    gitcheckout: {
      //note that (non-"string") object keys cannot contain hyphens in javascript
      ghPages : { options : { branch : 'gh-pages' } },
      master : { options : { branch : 'master' } }
    },
    gitcommit: {
      bookmarkletUpdate : {
        //add <config:pkg.version> or something else here
        //for a more meaningful commit message
        options : { message : 'updating marklet' },
        files :  { src: ['index.html'] }
      }
    },
    gitrebase: {
      master : { options : { branch : 'master' } }
    },
    template : {
      'bookmarkletPage' : {
        options : {
          data : function(){
            return {
              //the only "data" are the contents of the javascript file
              marklet : fs.readFileSync('dist/afonigizer.min.js','ascii').trim()
            };
          }
        },
        files : {
          'index.html' : ['index.html.tpl']
        }
      }
    }
  });

/* ... */

  grunt.loadNpmTasks('grunt-git');
  grunt.loadNpmTasks('grunt-template');

  //git rebase will not work if there are uncommitted changes,
  //so we check for this before getting started
  grunt.registerTask('assertNoUncommittedChanges', function(){
    var done = this.async();

    grunt.util.spawn({
      cmd: "git",
      args: ["diff", "--quiet"]
    }, function (err, result, code) {
      if(code === 1){
        grunt.fail.fatal('There are uncommitted changes. Commit or stash before continuing\n');
      }
      if(code <= 1){ err = null; } //codes 0 & 1 are expected, not errors
      done(!err);
    });
  });


  //this task is a wrapper around the gitcommit task which
  //checks for updates before attempting to commit.
  //Without this check, an attempt to commit with no changes will fail
  //and exit the whole task.  I didn't feel this state (no changes) should
  //break the build process, so this wrapper task just warns & continues.
  grunt.registerTask('commitIfChanged', function(){
    var done = this.async();
    grunt.util.spawn({
      cmd: "git",
      args: ["diff", "--quiet", //just exists with 1 or 0 (change, no change)
        '--', grunt.config.data.gitcommit.bookmarkletUpdate.files.src]
    }, function (err, result, code) {
      //only attempt to commit if git diff picks something up
      if(code === 1){
        grunt.log.ok('committing new index.html...');
        grunt.task.run('gitcommit:bookmarkletUpdate');
      }else{
        grunt.log.warn('no changes to index.html detected...');
      }

      if(code <= 1){ err = null; } //code 0,1 => no error
      done(!err);
    });
  });

  grunt.registerTask('bookmarklet', 'build the bookmarklet on the gh-pages branch',
    [ 'assertNoUncommittedChanges',    //exit if working directory's not clean
      'gitcheckout:ghPages',           //checkout gh-pages branch
      'gitrebase:master',              //rebase for new changes
      'template:bookmarkletPage',      //(whatever your desired gh-pages update is)
      'commitIfChanged',               //commit if changed, otherwise warn & continue
      'gitcheckout:master'             //finish on the master branch
    ]
  );

/* ... */

};

That's it! 😊

Additional Notes

Grunt tasks used here were grunt-template and grunt-git (the latter of which I contributed the rebase task to, for the purpose of this build).

Why use rebase?: We're using rebase here instead of merge because it keeps all the gh-pages changes at the tip of the gh-pages branch, which makes the changes on that branch linear and easy to read. The drawback is that it requires --force every time you push your gh-pages branch, but it allows you to easily roll back your gh-pages stuff (roll back to the last version of your index.html.tpl e.g.) and this branch is never shared or merged back into master, so it seems a worthwhile trade.

Is it realy a good idea to be switching branches, rebasing, etc. as part of an automated build? Probably not. :) But it's very useful in this case!

Please let me know if you found this post useful or if you have questions or feedback.

]]>
Fun With Toilethttps://sequoia.makes.software/fun-with-toilet/Shell feeling shabby? Spice things up with Toilet, the ASCII generator!https://sequoia.makes.software/fun-with-toilet/Sat, 03 Nov 2012 00:00:00 GMTShell feeling shabby? Let Toilet spice it up! Toilet is a shell utility from caca labs that outputs text in large block type. It is modeled after figlet, adding color and Unicode support.
Note: some examples given here are non-POSIX. I use bash and have confirmed the examples work in zsh.

I installed toilet on my (Ubuntu) system using the following command. Use your package manager or get the source.

sudo apt-get install toilet toilet-fonts

Toilet comes with a number of fonts by default, installed to /usr/share/figlet on my system. The following command, run from the directory containing the fonts, will create a file that contains the name of each available font followed by an example. Note that while the name of the font file with the extension is used in the following command, the extension is not necessary.

for font in *; do
    echo "$font" && toilet Hello -f "$font";
done > ~/toilet_fonts.txt

Now you have a file with all the fonts, useful for reference.

$ head ~/toilet_fonts.txt
ascii12.tlf

 mm    mm            mmmm      mmmm
 ##    ##            ""##      ""##
 ##    ##   m####m     ##        ##       m####m
 ########  ##mmmm##    ##        ##      ##"  "##
 ##    ##  ##""""""    ##        ##      ##    ##
 ##    ##  "##mmmm#    ##mmm     ##mmm   "##mm##"
 ""    ""    """""      """"      """"     """"

Toilet also comes with options to further transform or decorate your text, called "filters." The following command will output the name of each filter followed by an example, as above. This command outputs to the terminal rather than a file because the filters that add color may not come thru in the saved file

while read -r filt;
    do echo "$filt";
    toilet -f mono12 $USER -F "$filt";
done < <(toilet -F list | sed -n 's/\"\(.*\)\".*/\1/p')

I like border, flip, and left, but the best filter is of course "gay". the word "diamonds" in block text with the "gay" filter applied Mix and match fonts and filters to come up with a combination you like. Note that the filter switch can take a colon separated list of filters e.g.

toilet "07734" -F gay:180 -f smblock

Excepting the "metal" and "gay" filters, Toilet does not add colors to your text. This is as it should be as there are already utilities to add color to text in the terminal. I know what you're thinking: "but those terminal escape sequences are a nightmare!" and I was thinking the same thing 'til the fine folks of #bash set me straight. To wit, there is a tool called tput which handles color more gracefully than escape sequences. I encourage you to check out the examples of using tput to color terminal text. If you just want to get started, use tput setaf x and tput setab x to color your foreground and background, respectively, substituting x with a number 0-9 for different colors. See man tput and man terminfo ("Color Handling" section) for more. examples of using tput with toilet

So as for what you can actually do with toilet... that will be an excercise left to the reader. A friendly greeting in bashrc or a big red warning message are two uses that spring to mind. Drop me a line and let me know how you use it. Have fun!

]]>
LoLshield Sequencerhttps://sequoia.makes.software/lolshield-sequencer/How do you encode animations for a 9x14 LED matrix? By building a web-app, of course!https://sequoia.makes.software/lolshield-sequencer/Thu, 19 Jul 2012 00:00:00 GMTThe goal of this project was to create a tool to make it easier to create "animations" on the LoL Shield, an Arduino shield with a bunch of LEDs. The existing tool that was offered to assist in mapping out the shield states was a Google spreadsheet which I could never figure out how to use. I wanted a tool that:

  • was point & click (& generally easy to use)
  • made it easy to visualize how a state would look on the actual board
  • helped one understand the way the board states are encoded (more on this below)
  • allowed one to sequence out and play thru an entire animation
  • allowed nontechnical, hardware curious users control the blinkenlights right away, for immediate gratification (came up with this later) The impetus for the project was as follows: I wanted to do an Arduino project and thought it'd be cool to have an LED sign that said "diamonds" and did little diamonds animations so I could hang it around my neck on a chain like some post-modern jewelery. I ordered a LoL Shield and while I was waiting for it to come I took a look at the code and found this:
//Horizontal swipe
{1, 1, 1, 1, 1, 1, 1, 1, 1} ,
{3, 3, 3, 3, 3, 3, 3, 3, 3},
{7, 7, 7, 7, 7, 7, 7, 7, 7},
{15, 15, 15, 15, 15, 15, 15, 15, 15},
{31, 31, 31, 31, 31, 31, 31, 31, 31},
{63, 63, 63, 63, 63, 63, 63, 63, 63},
{127, 127, 127, 127, 127, 127, 127, 127, 127},
{255, 255, 255, 255, 255, 255, 255, 255, 255},
{511, 511, 511, 511, 511, 511, 511, 511, 511},
{1023, 1023, 1023, 1023, 1023, 1023, 1023, 1023, 1023},
{2047, 2047, 2047, 2047, 2047, 2047, 2047, 2047, 2047},
{4095, 4095, 4095, 4095, 4095, 4095, 4095, 4095, 4095},
{8191, 8191, 8191, 8191, 8191, 8191, 8191, 8191, 8191},
{16383, 16383, 16383, 16383, 16383, 16383, 16383, 16383, 16383},
{16382, 16382, 16382, 16382, 16382, 16382, 16382, 16382, 16382},
{16380, 16380, 16380, 16380, 16380, 16380, 16380, 16380, 16380},
{16376, 16376, 16376, 16376, 16376, 16376, 16376, 16376, 16376},
{16368, 16368, 16368, 16368, 16368, 16368, 16368, 16368, 16368},
{16352, 16352, 16352, 16352, 16352, 16352, 16352, 16352, 16352},
{16320, 16320, 16320, 16320, 16320, 16320, 16320, 16320, 16320},
{16256, 16256, 16256, 16256, 16256, 16256, 16256, 16256, 16256},
{16128, 16128, 16128, 16128, 16128, 16128, 16128, 16128, 16128},
{15872, 15872, 15872, 15872, 15872, 15872, 15872, 15872, 15872},
{15360, 15360, 15360, 15360, 15360, 15360, 15360, 15360, 15360},
{14336, 14336, 14336, 14336, 14336, 14336, 14336, 14336, 14336},
{12288, 12288, 12288, 12288, 12288, 12288, 12288, 12288, 12288},
{8192, 8192, 8192, 8192, 8192, 8192, 8192, 8192, 8192},
{0, 0, 0, 0, 0, 0, 0, 0, 0}, 
{18000}

What was all this?! The numbers go up and down and somehow this turns the lights on and off. I had to learn more. I looked into it a bit, it turns out its actually not that complicated: there are 9 rows of 14 lights, so each number represents one row, each array of 9 numbers represents one shield state. Each row of lights on the shield is just a binary number with the least significant bit on the left! So to turn the first light on, flip the first bit (1), to turn the last light on, flip the 14th bit (8192) etc. (if the dec->bin conversion isn't clear, read here). Sequencing out these animations manually, row by row, light by light, was obviously impractical, so I set out building a tool to make it easier. I also wanted an excuse to use bitwise operators, which I never have occasion to use at work. :p

Please take a break now to look at the LoL Shield Sequencer, if you haven't already.

Well I didn't get the diamonds animations done ready in time for HOPE, but I did get the sequencer working. I switched tack and decided to make the browser tool drive the physical shield directly, rather than requiring one to cut & paste into the sketch and load it onto the Arduino. This required transmitting the shield states to the Arduino via the USB port. I was following this tutorial which told how to read a file with Processing and transmit the data to the Arduino with the serial library. So I need to write to the file, which obviously the browser can't do. My steps are now

  1. Send shield state from browser (ajax-wise)
  2. Receive the state on server (same box) and write it to a file (I used PHP in the interest of expediency)
  3. Read the file with Processing and write the state to the serial port
  4. Receive the state on the Arduino and draw it to the shield.

That's a lot of steps! On a lark, I tried writing some text directly to the USB device (echo "1" > /dev/ttyUSB0) and what's this? It worked! It turns out the Linux kernel writes to the USB port at 9600 baud by default (I'm not sure exactly where this default is set but it is set by U-Boot; see man termios for info on changing it). So I can cut step 3 and write directly from PHP to the USB port. The initial PHP script, in its entirety:

<?php
  file_put_contents(/dev/ttyUSB0 , $_POST['frame']);
?>

Much simpler! I let people mess around with it and they did! I was very happy. Young man using the LSS in the browser with the Arduino shield updating in real time A small group of people messing with the LSS What heartwarming hacker-con moments! Later I added a localStorage component so people could make little animations & save them, then see what others had done. I'll have that up and running next con for sure.

Tools & Technologies I used

  • Arduino & soldering iron
  • Arduino IDE & C
  • underscore, jquery & require.js & Jam.js
  • Mousetrap.js (ok not yet but soon!)
  • PHP & Linux

What I learned

  • Serial programming is much harder than I expected! It turns out the software can read faster than the hardware can write so if you write your code wrong your reads will actually "overtake" the serial buffer on the Arduino.
  • Require.js seems useful but jam.js doesn't really offer much. The latter doesn't keep an up-to-date version of the former, which makes it much less useful (lost a lot of time figuring out why my shims didn't work)
  • Linux writes to serial port at 9600 baud by default
  • Learned to solder

Next steps

  • Add a few more functions: shift right, left, up, & down; invert (yay more bitwise math!)
  • Abstract the DOM interactions (writing to DOM, listening for events) to its own module so one could conceivably use the LSS module without the browser
  • Get Jimmie Rodgers to link to the tool instead of the Google doc!!!
  • Tests? ehh... we'll see :)
]]>