Getaround Engineering

JPEG and EXIF Data Manipulation in Javascript

Cédric Patchane — Mon, 11 Sep 2023 00:00:00 +0000

The Exchangeable Image File Format (EXIF) is a standard that specifies formats for images and sounds. It stores technical details through metadata, data that describes other data, such as the camera make and model and the date and time the image was taken.

Initially, EXIF was used for two image formats, JPEG and TIFF. But today, other file formats such as PNG, WEBP, or HEIC also support EXIF for metadata.

This article will focus on the JPEG format. In the first part, we will explore its structure before seeing how to read and update associated metadata through Javascript in a browser environment.

Before moving on, it is essential to review some key concepts:

📌 What is the 0x notation? 0x indicates that the following number is in hexadecimal format, which uses a base-16 number system (as opposed to the base-10 decimal system). This notation is case-insensitive, meaning that 0XFF and 0xff are exactly the same.

📌 What is a bit or a byte? In computer science, a bit is the smallest and the most basic unit of information. It is a binary digit (base 2) representing 0 or 1. A byte (or octet) is a group of eight bits. Since there are 256 possible combinations of 8 bits, a byte can be expressed as a hexadecimal number. For example:

The byte 0x00 represents 0 in decimal and corresponds to 0000 0000 in binary, which is the minimum 8-bit value.
The byte 0xD8 represents 216 and corresponds to 1101 1000.
The byte 0xFF represents 255 and corresponds to 1111 1111, which is the maximum 8-bit value.

For multiple-byte words, the hex numbers are just combined: 0xFFD8 is a two-byte word, and 0x45786966 is a four-byte word.

📌 What is Endianness? This is how a set of bytes is stored in memory. In big-endian, the most significant byte (leftmost) comes first, while in little-endian, the least significant byte (rightmost) comes first.

For example, let’s consider the two-byte word 0x0124. In a big-endian system, it will be written as 01 24, whereas in a little-endian one, it will be written as 24 01. Knowing whether an image has been written on a big or little-endian device is essential to read its data correctly.

The EXIF segment in the JPEG structure

Segment delimitations

The structure of a JPEG image is divided into parts marked by two-byte markers, always starting with a 0xFF byte. Below is a list of key markers found in the pages 20/21 of the JPEG compression specification:

0xFFD8: SOI (Start of Image); indicates the beginning of the image structure.
0xFFE*n*: APPn (Application-related tags); following the SOI marker, with n between 0 and F (full list). For example, APP11 (or 0xFFEB) is for HDR data, APP13 (or 0xFFED) for Photoshop and APP1 (or 0xFFE1) for EXIF.
0xFFDA: SOS (Start of Scan); indicates the beginning of the image-related data.
0xFFD9: EOI (End of Image); indicates the end of the image.

The first four file bytes, here FF D8 FF E0 for JPEG, are also known as magic numbers and are used by software to identify the file type.

Segment size

The size of a segment can be determined by reading the two bytes following its marker. For example, if the segment starts with FFE1 0124 XXXXXXX, then the APP1 segment size is 292 bytes, with 0124 being the size’s hexadecimal representation.

IFD: Image File Directory

Data in JPEG structure is grouped into directories called IFDs. For example, IDF0 is located in the APP1 segment, and IFDExif is a sub-IFD of IDF0.

The IFD dataset includes a two-byte word indicating the number of tags, followed by the tags data and ending with the four-byte offset of the next IFD (or 0 if none).

IFD Tag

A tag, like all EXIF tags, is a twelve-byte length sequence made up of:

Tag ID (bytes 0-1): A two-byte word identifying the tag
Tag type (bytes 2-3): A two-byte word indicating the type. For example, a value of 1 for a BYTE (one-byte integer), 3 for a SHORT (two-byte integer), or 4 for a LONG (four-byte integer). For further details, see the pages 25 and 26 of the JPEG compression specification.
Tag count (bytes 4-7): A four-byte word indicating the number of values (usually 1)
Tag value or value offset (bytes 8-11): For SHORT values, two bytes are read; for LONG values, four bytes are read. If the value is longer than four bytes (e.g., RATIONAL type), these four bytes store the offset needed to reach the actual value.

IFD tag example: the ExifImageWidth tag

Locate the EXIF part

From image to bytes

Time to code! The FileReader API is here used to read the image as a buffer. Then it is transformed into a DataView for easier byte manipulation.

The next step is to examine the start of the JPEG structure, which should be the SOI marker:

// Where the final image with updated metadata will be stored
let finalImageBlob = null

const reader = new FileReader()
reader.addEventListener("load", ({ target }) => {
  if (!target) throw new Error("no blob found")
  const { result: buffer } = target
  if (!buffer || typeof buffer === "string") throw new Error("not a valid JPEG")

  const view = new DataView(buffer)
  let offset = 0
  const SOI = 0xFFD8
  if (view.getUint16(offset) !== SOI) throw new Error("not a valid JPEG")
  // Here will happen the image metadata manipulation
})
// Image given as a Blob, but readAsArrayBuffer can also take a File
reader.readAsArrayBuffer(imageBlob)

Note: The getUint16 function in Javascript is used to read two bytes (2*8 = 16bits), and there is a similar function for four bytes, getUint32.

Segments reading

From here can begin the loop through the image data to locate the EXIF section. The EXIF segment uses the APP1 marker followed by a special six-byte ASCII code Exif (0x457869660000) immediately following the APP1 size data.

Reaching SOS marker is reached means reaching the start of the image data so the end of the metadata section.

const SOS = 0xFFDA
const APP1 = 0xFFE1
// Skip the last two bytes 0000 and just read the four first bytes
const EXIF = 0x45786966

let marker = null
// The first two bytes (offset 0-1) was the SOI marker
offset += 2
while (marker !== SOS) {
  marker = view.getUint16(offset)
  const size = view.getUint16(offset + 2)
  if (marker === APP1 && view.getUint32(offset + 4) === EXIF) {
    // EXIF segment found!
    // Following code will be here
  }
  // Skip the entire segment (header of 2 bytes + size of the segment)
  offset += 2 + size
}

The last thing to do here is to determine which is the endianness used to encode that image. In the JPEG structure, the endianness is provided thanks to the two-bytes word following the Exif special word. If the word is 0x4949, it means it’s little endian, otherwise it is 0x4D4D for big endian. This endianness data must be followed by the two bytes 0x002A (42 in decimal).

Note: From now on, always provide the endianness to the getUint16/getUint32 functions to correctly read the bytes.

const LITTLE_ENDIAN = 0x4949
const BIG_ENDIAN = 0x4d4d

// The APP1 here is at the very beginning of the file
// So at this point offset = 2,
// + 10 to skip to the bytes after the Exif word
offset += 10

let isLittleEndian = null
if (view.getUint16(offset) === LITTLE_ENDIAN) isLittleEndian = true
else if (view.getUint16(offset) === BIG_ENDIAN) isLittleEndian = false
else throw new Error("invalid endian")
// From now, the endianness must be specify each time bytes are read
// The 42 word
if (view.getUint16(offset + 2, isLittleEndian) !== 0x2a) throw new Error("invalid endian")

If APP1 appears at the very beginning of the image structure (which is usually the case), then the structure should be as follows:

JPEG starting structure

Read and replace EXIF tags

All the necessary information are now known to search for the EXIF tags:

Orientation, located in the IFD IFD0
ExifImageWidth or PixelXDimension tag, located in the IFD IFDExif, provided by the ExifOffset tag of IFD0
ExifImageHeight or PixelYDimension tag, also located in IFDExif

IFD0

To locate the IFD0, its offset is given by the 4-byte word immediately after the endianness 42 number.

This sequence that includes the endianness two-byte word, 42, and the IFD0 offset four-byte word is commonly referred to as the “TIFF (Tagged Image File Format) Header”:

The TIFF header

At this point, there are two tags that need to be found through the IFD0 data:

The Orientation tag (hex 0x0112) which is a SHORT value that must be replaced by 1
The EXIF specific IFD offset provided by the ExifOffset tag (hex 0x8769) which is a LONG value allowing to find the EXIF IFD tags

As mentioned earlier, the first two-byte word of the IFD indicates the number of tags in the IFD. Since each tag is 12 bytes long, multiplying the number of tags by 12 gives the size of all the IFD tags, allowing for looping through them.

const TAG_ID_EXIF_SUB_IFD_POINTER = 0x8769
const TAG_ID_ORIENTATION = 0x0112
const newOrientationValue = 1

// Here offset = 12
// IFD0 offset given by the 4 bytes after 42
const ifd0Offset = view.getUint32(offset + 4, isLittleEndian)
const ifd0TagsCount = view.getUint16(offset + ifd0Offset, isLittleEndian)
// IFD0 ends after the two-byte tags count word + all the tags
const endOfIFD0TagsOffset = offset + ifd0Offset + 2 + ifd0TagsCount * 12

for (
  let i = offset + ifd0Offset + 2;
  i < endOfIFD0TagsOffset;
  i += 12
) {
  // First 2 bytes = tag type
  const tagId = view.getUint16(i, isLittleEndian)

  // If Orientation tag
  if (tagId === TAG_ID_ORIENTATION) {
    // Skipping the 2 bytes tag type and 4 bytes tag count
    // Type is SHORT, so 2 bytes to write
    view.setUint16(i + 8, newOrientationValue, isLittleEndian)
  }

  // If ExifIFD offset tag
  if (tagId === TAG_ID_EXIF_SUB_IFD_POINTER) {
    // Type is LONG, so 4 bytes to read
    exifSubIfdOffset = view.getUint32(i + 8, isLittleEndian)
  }
}

Note: Following the same logic as for reading, the setUint16/setUint32 functions are used to respectively write two or four bytes.

EXIF Sub-IFD

Once the offset of the EXIF sub-IFD is found, a new loop must be executed through that IFD’s data to find the remaining height and width tags.

Here is information about the two tags that need to be replaced:

ExifImageWidth or PixelXDimension tag (hex 0xa002), a LONG value that must be replaced by 1920
ExifImageHeight or PixelYDimension tag (hex 0xa003), a LONG value that must be replaced by 1080

As a reminder of what was previously stated, the IFD tag is composed of 2 bytes for the type, 4 for the count, and 4 for the value.

const TAG_ID_EXIF_IMAGE_WIDTH = 0xa002
const TAG_ID_EXIF_IMAGE_HEIGHT = 0xa003
const newWidthValue = 1920
const newHeightValue = 1080

if (exifSubIfdOffset) {
  const exifSubIfdTagsCount = view.getUint16(offset + exifSubIfdOffset, isLittleEndian)
// This IFD also ends after the two-byte tags count word + all the tags
  const endOfExifSubIfdTagsOffset =
    offset +
    exifSubIfdOffset +
    2 +
    exifSubIfdTagsCount * 12
  for (
    let i = offset + exifSubIfdOffset + 2;
    i < endOfExifSubIfdTagsOffset;
    i += 12
  ) {
    // First 2 bytes = tag type
    const tagId = view.getUint16(i, isLittleEndian)

    // Skipping the 2 bytes tag type and 4 bytes tag count
    // The two types are LONG, so 4 bytes to write
    if (tagId === TAG_ID_EXIF_IMAGE_WIDTH) {
      view.setUint32(i + 8, newWidthValue, isLittleEndian)
    } else if (tagId === TAG_ID_EXIF_IMAGE_HEIGHT) {
      view.setUint32(i + 8, newHeightValue, isLittleEndian)
    }
  }
}

Write back the image

Getting the final image is as simple as building a new Blob from the updated buffer data:

finalImageBlob = new Blob(view)

In the end, the updated blob can be converted to a file or downloaded, depending on the application’s needs.

Conclusion

This article covers the basics of reading and updating tags, but you can expand the code by adding more tags. All the hex codes for the tags can be found on exiftools.org or this tags reference site.

It’s worth noting that there are existing libraries like exif-js or piexifjs for manipulating EXIF data, but they may be larger than what is needed here and seems not being actively maintained.

If you want to see the full code used to write this article, feel free to check out this gist.

Babel, JavaScript Transpiling And Polyfills

Cédric Patchane — Thu, 17 Nov 2022 00:00:00 +0000

BabelJS or Babel is a prevalent tool in the JavaScript ecosystem. Most developers know that it is essential when developing using brand new JavaScript features. But how does this system work?

In this article, we’ll explain what Babel is doing under the hood to allow the use of state-of-the-art JavaScript features, and even TypeScript, without manually dealing with older browsers’ version compatibility.

State-of-the-art JavaScript Features

Let’s take a step back and look at the context.

JavaScript is a language that has evolved and is still evolving, especially in the last few years. It is based on a specification named ECMAScript, provided by TC39 (Technical Committee 39).

Note: It’s a common mistake to think that ECMAScript is the “new JavaScript” or a “standardized JavaScript”. ECMAScript is a specification for creating a scripting language when JavaScript is a scripting language. It may even happen that rare features are not following the ECMAScript specification in the experimental versions of some browsers.

To be included in the ECMAScript specification, a new feature passes through a specific process with five phases. So it could take some time before it is integrated into the specifications and then implemented in the browsers.

While we, developers, can’t wait to use new exciting features that improve our ~~life~~ code, most browsers don’t support them yet, so we can’t just deliver the code as we wrote it.

Babel To The Rescue

Babel is a tool that allows you to write code in the latest (or even experimental) version of JavaScript. Because not all of the browsers currently support those hot features, it will transform the cutting-edge source code down to a code supported by older browsers. Babel’s primary purpose is about two things: JavaScript transpiling and polyfills handling.

Let’s take a look at the key points of interest regarding Babel:

Its configuration is defined in a babel.config.js file located at the project’s root.
Babel uses plugins to be as modular as possible (example: @babel/plugin-transform-arrow-functions). Each plugin is often related to one functionality or a minimal scope of functionalities.
You can create a preset from a configuration to easily share it between projects, like @babel/preset-env to manage Babel plugins, @babel/preset-typescript for TypeScript usage, or @babel/preset-react for React applications.
By providing a list of targeting node environments or browsers (using browserslist syntax) as an option to @babel/preset-env, it will automatically decide which plugins and polyfills (thanks to core-js) have to be applied when processing the code.

Below is a simple Babel configuration file that we will use as an example throughout this article:

babel.config.js

module.exports = {
  presets: [
    // Babel plugin/preset options are passing using an array syntax
    [
      "@babel/preset-env",
      {
        targets: [
          "> 0.25%",
        ],
        // Specify the version of core-js used, the last minor version is [email protected]
        corejs: "3.26",
        // Specify how to handle polyfills, see polyfills handling section below
        useBuiltIns: "usage", // "entry", "usage" or false by default
      }]
    "@babel/preset-react",
    "@babel/preset-typescript"
  ],
  // Specify some plugins enabled in any cases
  plugins: [
      "dynamic-import-webpack",
  ]
}

Note: core-js is an NPM package that contains all polyfills for every possible ECMAScript feature. It must be installed for Babel to work with this latter. We’ll learn more about its usage in the polyfills handling section below.

JavaScript Transpiling

A code transpiler, slightly different from a compiler, will read the source code written in one language (here, modern JavaScript code) to produce the equivalent code in another language (here, an older and more supported JavaScript code). Afterward, a compiler, like Webpack, is still needed to collect, optimize and build a project’s final output(s).

Example: Let’s say we are writing a piece of code using arrow functions and template literals:

const MyFunc = arg => `Using my argument: ${arg}`

With the configuration from the previous section, Babel will output the code as follows:

var MyFunc = function MyFunc(arg) {
  return "Using my argument: ".concat(arg)
}

As a result, arrow functions are transformed to the basic function syntax supported by every browser. Same thing for .concat(), which is more widely supported by browsers than template literals. Finally, const is transformed to var, mainly for IE 11.

To do this transformation, Babel creates an AST (Abstract syntax tree), a tree representation of the code structure. Then it applies plugins that use this AST to transform and output the code. In this example, the plugin @babel/plugin-transform-arrow-functions was used to transform arrow functions, but there are a lot of other Babel plugins to handle any transformation.

The good news is that it’s not necessary to know all of them to transform the code correctly, thanks to the @babel/preset-env preset.

Indeed, this preset has a built-in list of plugins matching browser versions. So, according to the browser versions list provided, it knows precisely which plugins need to be applied.

Now that we know how to transform the code, there is still something to tackle: how to add not supported yet implementations of very recent JavaScript functions.

Polyfills Handling

According to the MDN:

A polyfill is a piece of code used to provide modern functionality on older browsers that do not natively support it.

Let’s take an example. Here is a code using Array.prototype.find to find the first element matching a condition in an array:

const garage = [
  {name: 'Model 3', electric: true},
  {name: 'Punto', electric: false},
  {name: '208', electric: false}
];

function isElectric(car) {
  return car.electric === true
}

const myFirstElectricCar = garage.find(isElectric);

This code works well on recent browsers, but when running on Internet Explorer 11, it throws an error Object doesn't support property or method 'find'. Indeed, the find() method for arrays doesn’t exist for this browser and won’t exist since this browser is not updated anymore.

The solution is to ~~drop IE 11 support~~ provide a polyfill. In this case, it could be as simple as copying/pasting this polyfill from the MDN directly into the code to make it work.

But it is more complicated to do that for every feature used in a codebase. It’s easy to forget or duplicate too many of them in the code, while it’s complicated to test and monitor. This is where the NPM package core-js full of ECMAScript polyfills, comes in.

As for the code transpiling, the preset @babel/preset-env has a built-in list of core-js polyfills names that match browsers versions. According to the targeting environments, it knows which polyfills to include. From this point on, you have three ways to do that using the useBuiltIns option of @babel/preset-env.

`useBuiltIns: "entry"`

This option requires the core-js module to be imported (and only once) at the entry point of the project. According to the standard level targeted, many import options are available:

// Must be at the root, the very beginning of the code, before anything else

// polyfill all `core-js` features
import "core-js"
// OR polyfill only stable `core-js` features - ES and web standards
import "core-js/stable"
// OR polyfill only stable ES features
import "core-js/es"
// OR any other module/folder from core-js

Then Babel will parse the code, and when it finds the core-js import, it will transform this one-line import into multiple imports of unit modules from core-js. As a result, it’ll only import polyfills necessary for the targeting environments whether or not the features are used. Here’s what that looks like by importing core-js/es:

require("core-js/modules/es.symbol");
require("core-js/modules/es.symbol.description");
require("core-js/modules/es.symbol.async-iterator");
require("core-js/modules/es.symbol.has-instance");
require("core-js/modules/es.symbol.is-concat-spreadable");
require("core-js/modules/es.symbol.iterator");
require("core-js/modules/es.symbol.match");
require("core-js/modules/es.symbol.replace");
require("core-js/modules/es.symbol.search");
// ... and all other polyfills that exist in core-js/es...

`useBuiltIns: "usage"`

This option tells Babel to automatically write the polyfill imports related to a feature each time it encounters it.

Thus, this code:

/* We keep the previous example with the garage of cars */

const myFirstElectricCar = garage.find(isElectric);
const haveMyElectricCar = garage.includes(myFirstElectricCar);

will be transformed by Babel to this:

require("core-js/modules/es7.array.includes");
require("core-js/modules/es6.array.find");

var myFirstElectricCar = garage.find(isElectric);
var haveElectricCar = garage.includes(myFirstElectricCar);

It’s important to understand that it’s no longer needed to write core-js imports. Polyfills imports will automatically be added at every part of the code that needs one or many polyfills. It also means that if a modern feature is used multiple times at different places, it will result in multiple imports of the same polyfills. Indeed, it assumes that a bundler (like Webpack) will collect and deduplicate imports so that polyfills are only included once in the final output(s).

This is the most optimized and automatic way to include only the polyfills that are needed and remove them when they become unnecessary (whether they are not used anymore or the targeting environments list evolved to more recent ones).

`useBuiltIns: false`

This will tell Babel not to handle polyfills at all. Every polyfill from the different core-js imports will be included without fine selections according to the targeted environments. It will be still possible to import core-js manually. There won’t be any filtering but the selected modules imports:

// polyfill everything from `core-js`
import "core-js"
// polyfill only array ES features
import "core-js/es/array"
// polyfill only array.includes ES feature
import "core-js/es/array/includes"

...Your highly up to date JavaScript code here...

Using this way, be sure never to import polyfill twice; otherwise, it will throw an error.

Transpiling: The Case Of TypeScript

The documentation states, “TypeScript is a typed superset of JavaScript that compiles to plain JavaScript.” This means that any JavaScript code is a valid TypeScript code, but TypeScript is not necessarily valid JavaScript and, therefore, not supported by browsers.

At Getaround, we use TypeScript to develop our front-end features. Consequently, we need to transform our TypeScript code to “classical JavaScript” before deploying it.

To do so, TypeScript comes with a code transpiler. This latter will transform the TypeScript code to an ECMAScript 3 code, so it could be wrongly thought that we don’t need to transpile with Babel anymore.

But there are some points that we need to highlight here:

You’ll have two different configurations to handle JavaScript-related files in the project. Babel for .js files and Typescript for .ts files.
TypeScript doesn’t handle polyfills as Babel does, and Babel doesn’t do type-checking as TypeScript does.
Babel is much more extensible and has a more extensive plugin ecosystem than TypeScript.
There are some incompatibilities between the two tools (you can read this article from Microsoft).

So a solution here would be to keep using Babel for both cases, thanks to a dedicated preset named @babel/preset-typescript that allows Babel to transform TypeScript code correctly. And for the type-checking, we can still rely on the tsc CLI provided by TypeScript.

Our open-source preset configuration

At Getaround, we use a custom preset to share our configuration across all front-end apps. It is publicly available on our drivy/frontend-configs Github repository, along with all of our other front-end configurations.

You can also find it on NPM: @getaround-eu/babel-preset-app.

Don’t hesitate to take a look and to use it!

Building a modular multiple flows wizard in Ruby

Rémy Hannequin — Mon, 12 Sep 2022 00:00:00 +0000

Wizards are a common component in a lot of applications. Either for signing up new users, creating products, purchasing items and many more.

They can be tricky to manage once they get bigger and more complex. At Getaround, we have several wizards which don’t share their architecture. A common architecture cannot fit every use case with different needs, flow, and user experience.

To list new cars on our platform, hosts provide multiple pieces of information on the vehicle, themselves, and their needs. We may ask for more information or skip some steps. Such constraints lead to complexity and difficulty in handling and testing every variation.

After multiple iterations, we ended up with a modular architecture that was less strict than a decision tree and allowed us to design wizards with complex or simple logic.

In this article, I’ll try to guide you through building such a modular architecture. We’ll use a form object for each step and a Manager to orchestrate everything.

Form object interface

ActiveModel provides convenient modules to create custom form objects and manipulate attributes. We are going to use the following modules:

ActiveModel::Model so that our form object behaves like a regular model
ActiveModel::Attributes to access submitted fields as attributes
ActiveModel::Validations to benefit from convenient attribute validations

Using these modules will also allow us to use Rails form helpers as if the manipulated object were an actual model. Many thanks to Intrepidd for sharing code that led to this base form.

Our BaseForm would then look like this:

require "active_model"

class BaseForm
  include ActiveModel::Model
  include ActiveModel::Attributes
  include ActiveModel::Validations

  attribute :car

  def complete?
    raise NotImplementedError
  end

  def next_step
    raise NotImplementedError
  end

  def submit(params)
    params.each { |key, value| public_send("#{key}=", value) }

    perform if valid?
  end

  private

  def perform
    raise NotImplementedError
  end
end

Let’s take the time to explain this code. First, we’re declaring attribute :car because our form objects will be initialized with a Car model. This object will be the source of truth, the one we will fill with new data and rely on to determine what’s missing from it.

complete? will be the method called to know if a step has been successfully completed. In this method we can for example check if a particular attribute has been filled in on our car record.

next_step handles the logic to compute what the next step will be. A step knows what the next one is because it will rely on what was submitted previously.

submit is the method to submit our params from the associated form. It will only call perform, supposed to be implemented on each form object, if all validations passed.

With this public interface, we can create as many steps as we want and they will create the flow by themselves using perform to save data, and then complete? and next_step to handle going from one step to another.

Manager

Having steps handling themselves is great, but we still need some logic to initiate the wizard and determine which step the user is currently on.

Our Manager object will handle this logic. It also can manage having available steps, and non-available steps. For instance, if we have steps A, B and C, with step A already being submitted. The next step to submit is B, but I should be allowed to access step A again if I want to correct what I submitted. Step C is not accessible as long as step B is not complete, it shouldn’t even be visible to the user.

Our Manager could then look like this:

class Manager
  FIRST_STEP = :country

  STEP_FORMS = {
    country: CountryForm,
    insurance_provider: InsuranceProviderForm,
    mileage: MileageForm,
  }.freeze

  def initialize(car:)
    @car = car
    @instantiated_forms = {}
    @possible_steps = compute_possible_steps
  end

  def current_step
    @possible_steps.find do |step|
      return nil if step.nil?

      form = find_or_instantiate_form_for(step)
      !form.complete?
    end
  end

  def form_for(step)
    STEP_FORMS.fetch(step)
  end

  private

  def compute_possible_steps
    steps = []
    steps_path(FIRST_STEP, steps)
    steps
  end

  def steps_path(starting_step, steps_acc)
    steps_acc.push(starting_step)

    return if starting_step.nil?

    form = find_or_instantiate_form_for(starting_step)
    steps_path(form.next_step, steps_acc) if form.complete?
  end

  def find_or_instantiate_form_for(step)
    @instantiated_forms.fetch(step) do
      form_for(step)
        .new(car: @car)
        .tap { |form| @instantiated_forms[step] = form }
    end
  end
end

Every step is declared with its associated form object. At initialization, all possible steps are computed using the public complete?, calculating one step after another with next_step. The method form_for will allow the controller to manipulate the right form object from the manager.

As we support multiple flows, the last step may not be the last one defined in the list. We then expect next_step to return nil when there’s no step left.

Steps

In the manager, I mentioned three steps, country, insurance_provider and mileage. Let’s build them and see how with only 3 steps we can already have multiple flows.

Country

This step will simply save the selected country on the car record. However, its next step will depend on what country was selected.

class CountryForm < BaseForm
  ALLOWED_COUNTRIES = %w[CA ES PK JP].freeze
  COUNTRY_REQUIRING_INSURANCE_PROVIDER = %w[ES].freeze

  attribute :country, :string

  validates_inclusion_of :country, in: ALLOWED_COUNTRIES

  def perform
    car.update!(country: country)
  end

  def complete?
    !car.country.nil?
  end

  def next_step
    if COUNTRY_REQUIRING_INSURANCE_PROVIDER.include?(car.country)
      :insurance_provider
    else
      :mileage
    end
  end
end

First we can see how readable the attributes and validations are thanks to ActiveModel. country is a string attribute and we expect it to be one of the allowed countries, defined in a constant. The perform method will only be called if the requirements are met, if valid? returns true.

complete? only checks if country has been successfully saved on the car record.

Finally, next_step depends on the country selected. If an insurance provider is required in the country, then we’ll need the user to provide this data. If not, we decide to go directly to the next one, mileage.

Of course we could improve things here, especially in perform. We probably don’t want to use update! which raises when it fails, but we still want to be sure what’s inside this method is properly executed. For the sake of simplicity I didn’t add such a logic here, but we can easily play with errors available thanks to ActiveModel::Validations.

Insurance provider

Nothing particular to say about this one, except that it will be displayed only if the car’s country requires an insurance provider.

The next step is defined as mileage, but from here we could imagine another branch in the decision tree, multiple flows, multiple possibilities.

class InsuranceProviderForm < BaseForm
  attribute :insurance_provider, :string

  def perform
    car.update!(insurance_provider: insurance_provider)
  end

  def complete?
    !car.insurance_provider.nil?
  end

  def next_step
    :mileage
  end
end

Mileage

In our example, mileage is the last step. Once a mileage integer is submitted, validated and saved, there’s no other step to go to. next_step returns nil to announce that the wizard is finished.

class MileageForm < BaseForm
  attribute :mileage, :integer

  validates :mileage, numericality: { greater_than: 0 }

  def perform
    car.update!(mileage: mileage)
  end

  def complete?
    !car.mileage.nil?
  end

  def next_step
    nil
  end
end

Controller and routes

We only need two routes to support this wizard: show and update. show will display the step to the user while update will handle the step submission.

# config/routes.rb

resources :car_wizards, only: %i[show update]

On the controller side, we’re supposed to let all the logic come from the manager and only handle rendering, form submission and redirections.

class CarWizardController < ApplicationController
  before_action :initialize_variables, only: %i[show update]

  def show
    if @step == :current
      redirect_to car_wizard_path(@car, @step_manager.current_step)
    elsif @step_manager.possible_step?(@step)
      set_form
      render @step
    else
      render "errors/not_found", status: :not_found
    end
  end

  def update
    if !@step_manager.possible_step?(@step)
      return render("errors/not_found", status: :not_found)
    end

    set_form

    if @form.submit(params[:car])
      redirect_to car_wizard_path(@car, @form.next_step)
    else
      render @step
    end
  end

  private

  def initialize_variables
    @car = current_user.cars.incomplete.find(params[:car_id])
    @step = params[:id].to_sym
    @step_manager = Manager.new(car: @car)
  rescue ActiveRecord::RecordNotFound
    redirect_to root_path, error: "Car not found"
  end

  def set_form
    @form ||= @step_manager.form_for(@step).new(car: @car)
  end
end

View

Finally, each step has its associated view. A step view only needs a form helper instance, based on the form object, to display form fields. At Getaround, we also have an associated presenter for each step, which allows us to share information between web, web mobile and mobile apps.

simple_form_for @form, as: :car, url: car_wizard_path(@car, @step), method: :put do |f|
  = f.select :mileage, car_mileage_options_for_select, label: t("car_wizard.steps.mileage.attributes.mileage.label")

Pros and cons

With this architecture, we can build a complex wizard, with multiple flows. A user can stop and resume it any time, and it is possible to have many flows with many rules without having to write the entire logic in one single file, which would be much harder to understand and maintain.

Each step having its own logic allows us to test the flow step by step, independently. The simple public API helps us to test service perfoming logic and attribute validation separately.

It is easy to integrate into our MVC pattern with a very simple controller and basic views. The wizard manager itself is only a simple algorithm to compute possible steps.

However, having most logic inside the form objects forces us to read each step to understand how the flows work. If the whole wizards gets too complicated, computing each step could begin to take some time, so this is something to watch out on the long term.

Also, the step completion is based on saving things on database records. Having informational steps is a challenge to handle because we need to find other ways to store state to indicate that they have been seen, or rely on the state of adjacent steps.

For the simplicity of the article we haven’t show all the features we have based on the car wizard. For instance, we have a logic to handle tracking on each step automatically. Also, the mobile wizard is driven by the backend, based on a dedicated API. With this in mind, such an architecture allows us to reorder, add or remove any step without having to deploy a new version of our mobile apps. Well, to be honest, it’s a little bit more complicated than that depending on what the mobile app supports, but you get the idea.

Conclusion

This modular flow is one solution to the wizard problem. It won’t suit every need, but a similar architecture has its advantages if you seek to manage multiple flows with complex decision trees.

Feel free to comment and let us know what you think of this architecture and how we could improve it.

Cheers.

JavaScript smooth API with named-arguments and TypeScript

Thibaud Esnouf — Wed, 04 May 2022 00:00:00 +0000

As a JavaScript developer, you surely have encountered some functions that require a lot of arguments to be called. Because the argument list is an Array-like object, all the values need to be set and so it may have given you a headache to understand the order and purpose of each argument.

Let see an approach to define developer-friendly function signatures with the named-arguments pattern and TypeScript.

The original issues with functions and arguments

Passing multiples arguments to a function can leads to several issues:

Example with the infamous null in the middle

    getItem(itemId, null, null, true)

Unless some variables with good naming are used, you have no clue of what the values stand for.
itemId is fine but what are behind the null and true values ?
The order is important. Miss it and your code is broken.
You can’t skip optional parameter if they are defined in the middle of the argument list.
You’ll have to pass the full requested list of arguments, using default values like null.

Those issues apply for both the developer calling the function and the one reviewing the resulting code.

The more arguments you pass to a function, the faster it will become a nightmare to call it.

TypeScript can help you by providing an IDE integration displaying the function input naming.
Type checking can also help you in some cases with the order.

Example of a documentation based on TypeScript in VSCode

But you still have to pass the full argument list even though you only want to use 2 of them. Plus the TypeScript « auto documentation » may not be available when performing a read-only review of some code, on GitHub for example

So let’s jump in a solution that resolve all those issues.

`Named-arguments` pattern (with TypeScript)

The trick is to "replace" the argument list by a single javascript object that will embed all the arguments as properties. This pattern is called `named-arguments`

This pattern has always been available but it has been made easy with es6 Object destructuring and TypeScript

Single Object as an argument

Using a single object as a wrapper and using properties to pass arguments will resolve all the issues:

If a parameter is optional and you don’t want to define it, just omit the related property
You can define the properties in any order. It doesn’t matter.
with nicely named-properties, you’ll always have a clue of the purpose of a value

getItem({
    id: itemId,
    disabled: true,
})

The true value makes perfect sense now! Plus, we don’t need anymore to pass extra values for optional parameters. No more extra null values !

Calling the function with some extra parameters

getItem({
    id: itemId,
    createdAfter: date1,
    createdBefore: date2,
    disabled: true,
})

If we want to add some optional parameters, we just need to define the related properties (createdAfter & createdAfter)

So now, let’s have a look of such a function declaration

Using a single object as an argument will make it more obtrusive to guess the input and harder to retrieve the values?

function getItem(data) { // no idea of the data structure :(

    // we have to retrieve the values one by one….
    const id = data.id
    const createdAfter = data.createdAfter || null  // setting a default value
    ...

}

Not at all, thanks to ES6 object destructuring that can be used straight to function inputs

Object destructuring

Object destructuring to the rescue

function getItem({ id, createdAfter = null, createdBefore = null, disabled = false }) {
    // our arguments are listed back, with some default values !
    console.log("my input id", id)
    ...
}

It looks like the classic Array-like argument list (positional arguments) with some extra brackets around {}.

Notice that we are using default parameters to define optional arguments:

createdAfter = null, createdBefore = null, disabled = false

If the createdAfter or createdBefore property is missing (undefined) the null value will be set automatically.
disabled argument if omitted will be set to false

If you are familiar with React you have notice that this mechanism is used for Component props, taking advantage that adding attributes to a HTML( JSX) element has the same behavior that adding properties to an object:

No order.
Optional argument can be omitted.
Defining the attribute name allow you to understand the nature & purpose of the data.

TypeScript enhancement

Although all the issues as been resolved using our object, the developer experience will be really smooth when coupled with TypeScript. Your function API will be auto-documented showing to the developer the available property names and their related types.

Sometimes, the name of a property is not enough to understand the type of the value that should be used. Should a date argument be a Date object ? A formatted string ? Or a timestamp ?
TypeScript will give you this information.

interface ApiInterface {
    id: string, // oh ! So the id should be a string and not a number
    createdAfter?: Date | null // date is a Date object ! +optional
    createdBefore?: Date  | null //optional
    disabled?: boolean // optional
}

// just by reading the interface we get a clear understanding of the argument natures
function getItem({ id, createdAfter = null, createdBefore = null, disabled = false } : ApiInterface) {
    ...
}

The input interface gives a good understanding of the arguments used by the function.
At a glance we can see which arguments are optionals thanks to the ?

If you omit a mandatory argument, TypeScript will notify you.

So TypeScript will help both the developer calling the function and a reviewer reading the code from the function itself

Shorthand property names

Cherry on the cake, shortand property names can ease the process of calling such a function.

If you’re passing values using variable names matching the expected properties, you’ll just have to call the function wrapping the variables inside brackets

getItem({ id, disabled })

It is the same thing than

getItem({ id: id, disabled: disabled })

If one of your variable name doesn’t match an expected property name, you can of course mix shorthand property names with classic declarations.

getItem({ id: itemId, disabled })

The nice thing with the named-arguments pattern is that the more arguments are required for a function, the more sense it will make to use it.

Use it wisely

That’s not because this pattern has some advantages that you should use it everywhere:

If your function accepts a single argument, it doesn’t make sense to wrap it in an object.
You should just ensure that the naming give a good hint of the input nature. Using TypeScript won’t even make it an issue.
You could be tempted to use named-argument because of potential future evolution of your API but you shouldn’t forecast it by sacrifying simplicity. If you need additional arguments later, let’s refactor it at this moment.

In my opinion, if your function only accepts 2 arguments, it is still excessive to use this pattern, especially if the second one is optional.
But it could be a good call to use named-arguments if the 2 properties have the same types.

For example building a range of values (min/max) or a range of dates (start/end)
The order may be important and TypeScript won’t help in this case.

isValidRange(startDate, endDate) // should startDate be the first argument ?
isValidRange({ start: startDate, end: endDate}) // no more ambiguity

Keep in mind that if you have a lot of properties in your argument object, this can be a good hint that you are doing too many things at once.
For example, if you return an object using the RORO (Receive Object / Return Object) pattern, you could split your function into multiple sub tasks using composition.
As always, when your single input and output share the same type, it can be a good hint that you can enable composing, splitting the manipulation on your input data on multiple functions.

GDPR compliance and account deletion

Eric Favre — Thu, 02 Dec 2021 00:00:00 +0000

The GDPR has been around for several years now, and as advocates of data privacy, we are convinced by the legitimacy of such a regulation. However, as good as this measure is from a user’s perspective, it comes with its own puzzles and challenges for an online service provider… Here we’ll try to describe the solution we implemented to deal with the user’s data deletion, which is one of the rights granted by the GDPR (General Data Protection Regulation) to any European user of a service collecting personal data. As a result, this piece does not try to cover all the implications of the GDPR, nor does it pretend to bring a one size fits all solution deal with user data deletion.

GDPR specifies different roles and responsibilities. As an online service provider directly dealing with the end users, Getaround falls into the controller category. And as such we must comply with some obligations, notably:

Collect data limited to what is necessary for the purposes for which they are processed, and keep it for the time strictly necessary
Collect explicit consent over personal data collection and usage
Ensure data security
Upon user request, provide:
- Access to personal data (right of access)
- Means to delete personal data (right to erasure)
- Export of personal data (right to data portability)
- Change inaccurate or incomplete personal data (right to rectification)
- Means to object to the data treatment (Right to object)
- Enable limitation of personal data treatment (Right to restriction of processing)
Address and communicate data breaches

Data Retention

In the description of the data usage, we define its processing and why the data is needed. The data retention is subject to 3 phases:

active database
intermediate archiving
deletion

We can keep the data in an active database for the time necessary to execute the specific purpose for which it was collected. The data can be afterwards kept in an intermediate archive for a legitimate purpose (essentially to serve a legal obligation or when it corresponds to a user’s legal right) providing the archived data is only the one which is necessary for that purpose and access is strictly limited. Afterwards, we have to erase the data and can do it through full anonymization. These principles applying to data archiving and deletion prevent abusive data retention for an undetermined period “just in case”, and this also allows damage control in case of data leaks. Once data is fully removed from the system, it is “unleakable”. Similarly, the users are the primary owners of their data, so they can spontaneously ask for their personal data deletion.

This basically means the system must offer a way to delete a user’s personal details, whether upon their own request, or according to a time-based data expiration rule. While a manual solution can be acceptable for smaller systems, a large scale product such as ours requires a real technical solution.

Getaround Context

Our service requires personal data from our users. We collect names, birth dates, id documents for driver vetting, etc. This is legitimate data to collect in a car rental context, and can be kept as long as it’s relevant (user is active, has some recent or upcoming rentals, has an ongoing claim, etc), but we must make sure that it’s thoroughly deleted when it’s no longer deemed relevant.

We also rely on 3rd parties (as per GDPR terminology) which process some of this data (for email campaigns, identity document authentication, customer support, etc.). The personal data communicated to these 3rd parties needs to be deleted when the user’s data is removed from our platform. Fortunately, most of these services also provide an API to automatically remove all data related to a user.

Finally, in the case where a user requests their account deletion, and some legal constraints force us to delay the actual data deletion, we must still make their account unusable and invisible to the other users or administrators.

The User Lifecycle

Now that the legal context is laid out, let’s dive into the implementation. First, we tried to materialize the different needs under a formalized user account lifecycle.

User lifecycle

As illustrated, we have 2 paths to a user account’s deletion. A main passive one, and a spontaneous one when the user requests their own account deletion. The passive one is the nominal user lifecycle path:

User is active: they use our service
User is inactive: in the database, the user status is still active, however they haven’t had any recent activity logged in our system. When the latest monitored activity reaches a certain age (in our current configuration, 3 years), we send a notification to warn the user that their account will be automatically deleted if they do not log back in soon. If they do come back and create new activity, then they’re back to the active stage.
User gets archived if the user didn’t create any new activity when the time comes. Once archived, an account is unusable. From any user’s perspective, the personal data of an archived user (and their car, reviews…) are not viewable. At the time of archival, a deletion date is also determined. Most of the time (unless specific criteria apply) this deletion date is the next day. The date is stored along with the user account.
The user account becomes deleted when the deletion date is reached. The anonymization of the user data is performed. We chose to keep a fully anonymized record rather than completely deleting the record out of concern for referential integrity and for statistical analysis.

The spontaneous account deletion happens upon the user’s own request. When they do so, the inactivity phase is skipped, and they are directly processed through the archiving phase, with the determination of the deletion date. Then the same process applies.

Once this lifecycle logic is laid out, the only remaining matter is the technical implementation.

Technical Implementation

Flow Management

We chose to define a dedicated model that holds the archiving and deletion logic. Let’s call it UserDeletionFlow, and define its attributes like this:

create_table :user_deletion_flows do |t|
    t.references :user, null: false
    t.string :state, null: false, index: true
    t.datetime :archive_notice_email_sent_at
    t.datetime :archive_after, null: false
    t.datetime :archived_at
    t.datetime :delete_after, null: false
    t.datetime :deleted_at

    t.timestamps null: false

    t.index [:archive_after, :state]
    t.index [:delete_after, :state]
end

A User has many UserDeletionFlow, but only one can be active at any time. The state column stores the state machine step where the UserDeletionFlow is. It applies the following sequence:

UserDeletionFlow state machine

When a user has been inactive for 3 years, a related UserDeletionFlow is created with a state archive_eligible. An archive notice is sent to the user, and the current date is stored in archive_notice_email_sent_at. The archive_after date is set to 1 month later and stored.
When the archive_after date is passed, the user’s latest activity is reassessed. If there was new activity, the UserDeletionFlow is discarded. Otherwise, the UserDeletionFlow state is set to deletion_eligible, and the delete_after date is computed based on several parameters and stored in the UserDeletionFlow.
Once the delete_after date is passed, the UserDeletionFlow state is set to completed, and an anonymization process takes over to erase the user’s data.

These steps are preformed by nightly cron jobs that query the database to retrieve all impacted accounts. If a user spontaneously requests the deletion of their data, their related UserDeletionFlow is immediately created in the deletion_eligible state, and the delete_after column is populated similarly to the passive deletion flow.

Anonymization Process

When a user account gets deleted, we immediately create N DataDeletionAttempt for N user “areas”, and trigger asynchronous jobs to actually perform these DataDeletionAttempt. We have designed several data erasers, each taking care of anonymizing a specific area of the user’s data. They fall into 2 categories:

Internal erasers

Each of these bears the responsibility to anonymize one specific area of the user data stored in our database. For instance, there is an eraser for the user’s identity (users.first_name, users.last_name, users.birth_date,…), another one for the user’s cars (cars.registration_plate, cars.vehicle_identification_number, etc.). The erasers anonymize the data by replacing them with placeholders, either static or randomly generated, so that the database constraints and referential integrity are respected.

And since we’re using PaperTrail on some models to keep an audit trail of the changes that are applied, these erasers also have the responsibility to anonymize the versions that tracked some personal data changes.

Finally, some of these erasers remove the possible files that were stored for the deleted account.

Third party erasers

These erasers are clients to our 3rd party providers’ APIs, and request their users endpoint to request the user’s data deletion.

All erasers are run asynchronously, and some of the 3rd party erasers need a personal identifier from our database. For instance, to erase a user’s history on Zendesk, we first need their Zendesk identifier, which we can get by searching for the user’s email on Zendesk API. But it can happen that the user’s email has already been erased when the Zendesk eraser runs. To address this situation, we denormalize some deletion arguments into the related DataDeletionAttempt. When the eraser succeeds, this denormalized data is of course nullified to guarantee the full removal of personal data.

If any data eraser fails for any reason, we are notified on our bug tracking system, and we make sure to address the situation.

This solution took some time to implement and still has room for improvement, but we are satisfied with the upside it already brings. It’s fully automated, flexible and easy to maintain. More importantly, we take some pride in continuously working on improving our compliance with European regulation requirements and make sure we provide a platform which is respectful of our users’ privacy.

MySQL 8 Features

Michael Bensoussan — Fri, 05 Nov 2021 00:00:00 +0000

MySQL 8 was released in 2018 and is the next release after 5.7.
Since 2018 and as of today there was 27 minor versions bringing the last version to 8.0.27. It’s important because MySQL did bring a lot of feature enhancements in these minor versions as we’ll see in the next part.

Common Table Expressions (CTEs)

A Common Table Expression (also known as WITH query) is a named temporary result set.
It exists only in the scope of a single statement and can be later referred within that statement.
You create a CTE using a WITH query, then reference it within a SELECT, INSERT, UPDATE, or DELETE statement.

CTEs make a query more readable, allow to better organize long queries and better reflects human logic (like functions does). It’s particularly useful when you need to reference a derived table multiple times in a single query. There is also a specific category of CTEs called recursive CTEs that are allowed to reference themselves. These CTEs can solve problems that cannot be addressed with other queries.

This feature is available as of MySQL 8.0 and some edge cases have been handled in 8.0.19 (recursive SELECT with LIMIT clause).
Documentation is here.

Window Functions

PostgreSQL’s documentation does an excellent job of introducing the concept of Window Functions:

A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function. But unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row — the rows retain their separate identities. Behind the scenes, the window function is able to access more than just the current row of the query result.

Here is an example that shows how to compare each employee’s salary with the average salary in his or her department:

MySQL comes with the following WINDOW functions:

WINDOW functions probably deserves an article on their own. You can find one here. This feature is available as of MySQL 8.0.
Documentation is here.

Expressions as Default Values

MySQL now supports use of “expressions” as default values. Expressions are distinguished by the use of parenthesis. BLOB, TEXT, GEOMETRY, and JSON data types can be assigned a default value only if the value is written as an expression, even if the expression value is a literal.

This feature is available starting MySQL 8.0.13. Before 8.0.13, the only expression supported was CURRENT_TIMESTAMP.
Documentation is here.

Indexing key parts

MySQL now supports indexing expression values referencing other keys rather than column values or column prefixes. Using parts of a function key allows you to index values that are not directly stored in the table.

This feature is available starting MySQL 8.0.13. Prior to 8.0.13, you could achieve the same result by using virtual columns and indexing them but this is clearly way more straightforward.
Documentation is here.

Descending Indexes

MySQL now supports descending indexes (DESC). Previously, indexes could be scanned in reverse order but at a performance penalty. A descending index can be scanned in forward order, which is more efficient.

Descending indexes also make it possible for the optimizer to use multiple-column indexes when the most efficient scan order mixes ascending order for some columns and descending order for others.

This feature is available as of MySQL 8.0.
Documentation is here.

Invisible indexes

MySQL now supports invisible indexes. An invisible index is not used by the optimizer, but is otherwise maintained normally.

It’s basically a toggle for indexes.

Hidden index can be used to quickly test the impact of index deletion or index creation on query performance without index deletion and reconstruction. If the index is needed, it is good to set it visible again. This is undoubtedly very useful in large table testing, because it consumes performance for index deletion and addition of large table, and even affects the normal operation of the table.

This feature is available as of MySQL 8.0.
Documentation is here.

EXPLAIN ANALYZE Statement

This statement provides expanded information about the execution of SELECT statements in TREE format. This includes startup cost, total cost, number of rows returned by iterator, and the number of loops executed.

The cost is an arbitrary unit but it is consistent between queries and usually a good proxy to answer the question “is this query faster than this other query?”.

This feature is available in 8.0.18.
Documentation is here.

What else?

MySQL 8 comes with tons of other features like:

Better UTF8 support
New “role” system allowing to give/remove permissions to groups of people
User comments and user attributes
JSON enhancements ( ->> operator, JSON_PRETTY(), merge function, aggregation functions, JSON schema validation draft 4 …)
Better regexp support
…

And of course performance and stability improvements.

The full list of changes is here.

Your job is not just to write good code

Michael Bensoussan — Fri, 25 Jun 2021 00:00:00 +0000

This is an opinion piece.

I usually write about technical subjects but I recently wanted to formalize my opinion on the role of the software engineer and decided to publish it in the hope it could benefit some readers or trigger interesting discussions.

TLDR;

You have two jobs:

Write good and maintainable code
Make it easy to work with you

Writing good code is the easy part; it has challenges of course but most engineers struggle with the second part. That’s what I’ll be talking about.

There’s a famous quote attributed to Phil Karlton:

There are only two hard things in Computer Science: cache invalidation and naming things.

That quote is certainly true while purely discussing software engineering but if we expand to areas that use computer science, it’s clear that communication wins as the hardest part.

Let’s try to illustrate that point with some examples.

Code reviews

Reviewing someone’s work is a personal process, and criticism, whether it’s constructive or not, can be difficult to digest. But it’s not just about practicing good “etiquette”—this is the easy part.

It’s also important to convey intention and adapt to your audience. Let’s imagine a scenario with Junior Billy writing some code and Senior Bob doing a review using the following code:

if ok {
  return 32
} else {
  return 36
}

Senior Bob: These numbers are a bit puzzling! And to which Junior Bob replies with the updated code:

if ok {
  return "OK"
} else {
  return "NOT_OK"
}

And a whole bunch of updates everywhere the function was called.

Was it necessary? Maybe. Was it needed for this PR? Probably not.

Senior Bob could have replied with:

Senior Bob: These numbers are a bit puzzling. Maybe to make it a bit clearer you could introduce either a named variable to convey the meaning in this function or a constant to be used elsewhere in the code at a later stage or in a follow-up PR if you have the time.

Senior Bob could even give a code example; it would cost him less than a minute and would help Junior Bob grow and not lose hour(s) of his/her time on a wrong rewrite.

Going the extra mile in code reviews always pays and is another example where making sure everyone is on the same page is beneficial.

Note: Some people like to use an Emoji legend to convey intention. This is especially useful when you or the person being reviewed have not completely mastered the language you’re doing the review in. In this case, codifying intention is a good approach to avoid misunderstandings.

Meetings

A lot of things can go wrong in a meeting, so communication is key here. I won’t create a list of all the mistakes we, as developers, make. Instead, let’s focus on three points I see a lot of people struggling with:

Your audience matters

The importance of identifying the target audience is true for all kinds of meetings. But as developers, we have to learn to simplify concepts and keep things simple.

The next time you are in a meeting, take a step back and put yourself in the other person’s shoes. Think about how someone with no technical knowledge would perceive the information you are about to share with them.

🙅‍♂️ Don’t say, “This will probably raise the mean time by 2000ms for the whole website because the transaction will generate a table lock.”

👍 Say, “This is going to lead to a significant performance issue, which could slow down the system.”

This comes with some practice but you can practice with your spouse, friends, and family. This is key to being an effective communicator.

You can and probably should still teach your non-tech coworkers some “crucial” tech terms and, in some companies it is part of the company glossary or part of the onboarding (terms like “front-end”, “backend”, “iterate” or “mvp”, for example). Similarly developers should probably learn some business vocabulary to understand their coworkers’ jobs and challenges like “churn” or “ltv”.

Listening

This may sound ridiculous to some but communication is not just about presenting ideas. Careful listening requires energy and can be difficult and, in certain situations, boring or unpleasant, that doesn’t mean you can’t do it.

Avoid multitasking even if you can multitask or think you can. Be patient and hold your tongue.
And try to interpret what’s being said.

Based on the purpose of your discussion, what the speaker cares most about, or what’s been said before, what does this mean? Use what you know to interpret what they’re trying to tell you, and ask questions if you need more details.

Rephrase your understanding if needed.

Also, assume the best. Assume that the intention of the person opposite to you is good and clever, even if you understand it at first as stupid and offensive. Again, repeat and rephrase.

Meetings would go faster if people took the time to listen.

Careful listening also applies to written communication; read twice before answering, put the message in context and don’t hesitate to ask questions before having a (strong) opinion.
Double check what you wrote before hitting send as well, disambiguation is often more painful on a written discussion.

Making sure everyone is on the same page

Because of a failure to take the first two points into account, but also for countless other reasons, I have often witnessed people leaving a meeting with a different understanding of the next steps. And you might discover this days, weeks or even months later!

A good reflex is to always have a “Scribe”, someone taking notes during a meeting with a specific section for action items and their owners in the meeting notes. I also like to keep the last part of meetings to focus on the recap and the next steps:

So we only have five minutes left, let’s stop here. What are the next steps?

And enumerate each of them with the people who own them.

Sometimes, it also makes sense to repeat and rephrase something to make sure everyone has the same understanding.

Let’s imagine a scenario with Developer Jim and Product Manager Laura:

Developer Jim: It should take two days.
Product manager Laura: Alright, so it’s Tuesday, I guess it should be ready by Thursday night for a release Friday morning?
Developer Jim: Ah, no, I meant two days of full work. But Wednesday afternoon I do pair programming with Junior Billy on his assigned task and Thursday afternoon I have my performance review. I’d therefore like to avoid releasing on Friday. It should probably be released Monday morning!

Here, Laura had the right reflex by not assuming two days meant two calendar days and to make sure Laura and Jim had the same understanding of the situation.

Communication is hard but going the extra mile always pays. Communication helps to build strong and healthy relationships, so don’t underestimate its power.

Most of the difficult problems engineers face include both technical and human components, and the greatest engineers can address both.

What's a good team process?

Eric Favre — Thu, 03 Jun 2021 00:00:00 +0000

You may have heard about the five monkeys experiment, a cautionary tale sometimes used to illustrate how we can get locked up in an organisational harness without sufficient hindsight, power or leeway to change the way things are.

This comparison may sound unflattering, but I have seen similar situations in previous lives, where a process I don’t understand is enforced, and I simply end up complying and assimilating it as normal. No-one is really responsible for that process, it may not even be relevant anymore, but it’s still there and is so much part of the habits that nobody thinks of questioning it, or the ones who do get dismissed with a “that’s how it is”.

Of course not all processes are like this. All organisations have frictions, and introducing processes or rituals is a sensible answer to some of them. When well thought out, good processes can avoid outage, enforce continuous improvement, ease up newcomers’ onboarding, pay out tech debt… Bottom line, they can help save time, secure the future and improve customer satisfaction.

So we’ve tried to formalise below a few basic rules that should help building a good team process.

A good process is fully understood

People will apply it more easily if they understand where it comes from. Making sure everyone knows what the process achieves, for whom, and the context in which it emerged will ease up a great deal the team’s adhesion to it. It’s therefore the manager’s responsibility to carefully onboard a newcomer on the existing processes, so they can adhere to them.

For instance, in a consulting company, when you’ve gotten a glimpse at the work of an accountant who needs to bill the clients at the end of each month, you’ll be much more inclined to fill your timesheets thoroughly.

A good process emerges from the team who’ll apply it

A corollary to the previous point is that a process will be even better understood if the team that ends up applying it contributes to its definition. Even if the initial need is not theirs, it’s much more empowering and deemed to succeed if the team members actually comes up with the solution by themselves. The underlying need must first be stripped from any assumed solution. It can then be well explained and discussed beforehand, so that the team can fully appreciate what it’s about, and come up with the best solution to address it in their day to day context. Sometimes the solution may even appear to be much easier and more definitive than expected.

Team retrospectives are often a good opportunity to identify a friction that could be addressed by a new or updated process. It’s also ok to get inspiration from elsewhere, but beware of cargo cult. Don’t parachute new tools or processes into your organisation because you’ve heard about the results they’ve achieved somewhere else. Understand what they’re trying to achieve and how to adapt them to your specific context.

A good process is challengeable

A process addresses a need at a given time. This need, and its context, will very likely evolve, possibly to the point where the process is no longer the best answer to a changed situation. So every now and then, it’s always healthy to discuss it, reassess its relevance, improve it or remove it altogether. A newcomer’s arrival in the team, providing a fresh perspective and some different experience, is often a good opportunity to challenge the status quo and make sure the team still has the best fit processes with regards to their tasks.

For instance, we have daily stand-ups with the whole dev team, so we can keep up with the other squad’s ongoing works, identify mutualisable effort, and ask for / offer help. This used to be done in a big conference room with the few remote workers connected to a meet, and that worked pretty well at that time. After the first Covid lockdown and related furlough, though, some of us were working part time, most of us fully remotely, and this format soon proved unadapted to this new situation. Instead of forcibly maintaining this ritual as is, we iterated on new formats until we reached the current one where we’re having synchronous fully online stand-ups twice a week, and async written stand-ups every other days. That works well now, and may change again as the situation evolves.

A good process is well tooled

It can already feel cumbersome enough to comply with some processes, so you should make sure you do everything possible to make it as seamless as possible. Automate everything you can, make sure the team members are reminded of the process when the time is right, document all useful resources, links and details in the reminder, and provide all the possible tooling that can help complete the process in an automated fashion. The automation also saves the manager or the stakeholder the painful task of manually chasing up the people involved in the process. Ideally the tooling is also flexible and accessible enough for the team to update and improve it by themselves as they get more familiar with the process.

The tooling must be helpful, and thought out to support the process ; the process must not be bent to suit the tool. Most recent softwares expose APIs to ease custom tooling. As an example, we built many custom integrations between Slack, Github, Bugsnag, New Relic, Google Suite… so that most unnecessary overhead is automated, and only the human added value is required from the teams.

The key take-away here is for the team to own the processes, instead of letting them own the team. Start with the need, get the people who will end up addressing this need to help, figure out a solution and its tooling together, and iterate over it whenever the necessity arises.

What I learned in two years at Getaround

Rémy Hannequin — Tue, 20 Apr 2021 00:00:00 +0000

I joined Getaround, which was still named Drivy back then, two years ago. My previous and most extended professional experience had an internal organization that did not allow me to code full time, so many of my technical projects were actually side projects working alone.

Although I could choose my topics and constraints, working alone does not always help to learn good practices and tips that make a developer efficient and aware of the different technical challenges.

A few weeks ago, I took the time to understand how working in a (brilliant) team made me progress so much, not only as a Ruby developer but as a “Tech”. Here are a few topics that I learned or progressed on in the past two years.

`tap` and `then`

I love Kernel#tap because it lets me compose objects with conditions without having to add multiple conditional blocks.

def elements
  arr = [first_element]
  arr << second_element if available?
  arr
end

# versus

def elements
  [first_element]
    .tap { |arr| arr << second_element if available? }
end

We can argue that this is neither necessary nor more performant, but most of us love Ruby because it makes us write concise and straightforward code. I am feeling more comfortable with less procedural code.

On the same topic, we also have Kernel#then which is comparable to tap but returns the result of the block. This is very helpful when building conditional requests without having to add big if blocks:

Order
  .where(country: :fr)
  .then { |relation| completed? ? relation.order(:completed_at) : relation }
  .last

then is just an alias for yield_self introduced in Ruby 2.5.

Active Record Transactions

Transactions enforce the integrity of the database by wrapping several SQL statements into one atomic action. I find them not only useful but sometimes even essential. In some cases, you have to ensure several changes were made successfully or to cancel them all.

The following example is quite explicit about how convinient transactions are:

ActiveRecord::Base.transaction do
  @order.cancel!
  @car.available!
  create_ticket(@car)
end

If ticket creation were to fail, I am sure not to leave the car available or the order canceled, since the transaction will roll back.

Design Patterns

Command pattern

A lot of articles exist about this topic on the web. We even wrote about it a while ago in our Code Simplicity series by Nicolas.

The command pattern is a great way to extract business logic from controllers or even models, stay tied to the Single-responsibility principle and share a common API for service objects.

Although as a pattern, this one must be used carefully because it cannot resolve every situation. Jason Swett has even an interesting point of view about using this pattern in the Rails community.

Form objects pattern

A form object is a simple class that handles logic from a form submission. This class can be associated with the command pattern to share a common API with multiple form objects in your app.

Not only does this pattern allow you to extract business code from the controller and make it more testable, but it is also a great way to have different validations for the same model. You cannot always share a common form or even common validations depending on your action, for instance, when handling a user account. The rules applied to form parameters in a user registration are not the same as an account update.

Take the terms of service for instance. You probably want to ensure a terms_of_service parameter is present and true when signing up, but this requirement is unnecessary for a user updating her account. Having multiple form objects depending on the feature is a great help for this.

Jean also wrote about it on our blog a few years ago.

Facade pattern

The Facade pattern is proper when (but not only) decoupling business code from third-party code.

Let’s take the example of a third-party web API prividing its own gem.

car = GreatApi::Car.fetch(id)

Using it directly sometimes can be less maintainable as you don’t own its public API and are vulnerable to changes. What if you need to update this great_api gem for security reasons, but the gem changed its Car::fetch method to Vehicle::get? You would need to change every occurrence of GreatApi::Car::fetch in your business code to handle this breaking change.

Building a gateway around the gem ensures you to own it and encapsulate third-party code in one single place.

class Getaround::Gateway::GreatApi
  class << self
    def get_car(id)
      GreatApi::Car.fetch(id)
    end
  end
end

car = Getaround::Gateway::GreatApi.get_car(id)

Tell, don’t ask

I try to remember the “Tell, don’t ask” principle when designing a brand new object to keep in mind what OOP is about: designing objects being able to interact. Therefore an object should describe itself its behavior rather than having a program asking it what it is composed of to predict its behavior.

This example of Thoughbot’s blog is quite explicit:

# Instead of asking the system monitor for temperature
# in order to trigger an alarm

def check_for_overheating(system_monitor)
  if system_monitor.temperature > 100
    system_monitor.sound_alarms
  end
end

# Let it internally handle the rules (attributes)
# and trigger the alarm (behaviour)

class SystemMonitor
  def check_for_overheating
    if temperature > 100
      sound_alarms
    end
  end
end

system_monitor.check_for_overheating

Gems

Delayed deprecations

Sometimes we want to ship fast, but we still want to ship well. There are some cases where we want to release code that is meant to be temporary, or to be reminded to monitor some behaviors once a feature has been live for a few weeks.

Temporary code is often associated with forgotten code and then technical dept, if not bugs. But delayed deprecations are a great way to keep a codebase clean month after month.

DelayedDeprecation.new("Only for April fools day",
  reconsider_after: Date.new(2021, 4, 1),
  owner: "Alice",
)

Deprecations trigger notifications to their owners for both Ruby and JavaScript code. This can be useful to remind you to clean up a piece of code.

I enjoy using them because it helps me staying efficient while maintaining a clean codebase.

Feature flipper

Another game-changer for our velocity and confidence when shipping new features is the feature flipper. It enables us to make some features available for a percentage of users or a percentage of time.

This is particularly useful to test changes and measure their impact without risking changing habits for all our users. If we need to urgently cancel a feature - because Murphy’s Law is always lurking - we can do so without deploying an urgent fix to hide it.

It doesn’t prevent us from being cautious and striving to ship high-quality tested code. Still, we are far more confident in ourselves when we know we can quickly handle unpredicted behaviors.

Timecop

Quite often, we have to test behaviors in the past or the future. Sometimes we also want to test a feature without time variation, for example, to avoid test flakiness.

Timecop is the perfect tool for this with a simple and comprehensive API.

context "when rental is ended for a month" do
  it "still respects some fundamentals rules" do
    Timecop.freeze(1.month.from_now) do
      # ...
    end
  end
end

context "when booking failed 5 minutes ago" do
  before do
    Timecop.travel(5.minutes.ago) do
      booking.failure!
    end
  end

  it "created a notification" do
    # ...
  end
end

Good practices

Zero downtime migrations

Zero downtime migrations are a pretty common thing, but to be honest, I never had not the chance to work with this process before working at Getaround.

The rule of thumb is to ensure any migration being deployed is compatible with the code already running. For instance, you cannot deploy at once a migration renaming a table’s column and the code handling the new column name. There is a very high chance that someone will run the app while the migrations haven’t been run yet and the column name doesn’t refer to anything yet.

Simple caution must be taken with multiple deployments such as:

Add a new column with the new name
Ensure both old and new columns are equally filled
Back-fill data from the old column to the new one
Stop using and referring to the old column
Ignore the old column
Remove the old column

Specs to cover future changes

Writing decoupled and reusable code is great. Ensuring this code will be properly used by others is even better. When I write code that can be shared or with variable data, I try to make sure nobody can add use cases that would break my code.

Let’s take an example, I am adding a state attribute to Car and I want to localize each state.

en:
  activerecord:
    attributes:
      car:
        states:
          active: Active
          deactivated: Not available

This is great, now I am able to use I18n.t("activerecord.attributes.car.states.#{car.state}").

But what if two months later, another developer adds a pending state? It would break when somebody runs my code with a pending car, and I would only be warned about it when facing the bug itself.

To avoid this situation, when adding the new state attribute, I also add specs to ensure all states have an associated translation:

Car.states.each do |state, _|
  it "has an associated translation for #{state}" do
    expect(I18n.t("activerecord.attributes.car.states.#{state}")).not_to raise_error
  end
end

RSpec mocking

Some people love it, some people don’t; in either case, we have to admit RSpec mocking is quite powerful. My perspective on the subject is to avoid mocking in feature specs as we want to stay as close as possible to a real-world example. When I have too much mocking to do to test a method, I probably need to think about the method/class dependencies.

Anyway, if you decide to use mocking, RSpec is a sweet candy. It helps you write difficult test cases with complex dependencies without instantiating tons of real objects and data.

Let’s take an example where I need an object to return a particular value. But this method has complex rules to return this value:

def pro?
  validated? && bank_account.allowed? && country.enables_pro? && electric_vehicles.any? # && ...
end

It is expensive to write and compute all requirements for this method, and I may even want it to return different results. I don’t want to be validating this pro? method neither; this is not the purpose of my test. With mocking, I can allow this object to receive this method and return the value I need for my test, instead of the value it would have returned with its default state.

context "when owner is pro" do
  before do
    allow(owner).to receive(:pro?).and_return(true)
  end

  # ...
end

Once again, let’s not forget that this powerful tool must be used cautiously; if an object is hard to unit test, maybe it is too much coupled with another object, or the abstraction is wrong.

Auto document a base class with `NotImplementedError`

Finally, ensuring the next developer, who could be yourself 6 months from now, is using your base class properly. When creating a base class meant for inheritance, you may need that its children implement a method.

Using the NotImplementedError standard error is a good way to ensure the method is implemented and to document it as necessary for child instances.

class Car
  def engine
    raise NotImplementedError, "#engine must be implemented on Car's children instances"
  end
end

class UrbanCar < Car
end

UrbanCar.new.engine
# => NotImplementedError is raised

Conclusion

I could add many more topics, even some simpler ones. With these tips, I am more confident now than I was 2 years ago, and I am looking forward to learning more in the years to come.

Of course, some may seem common sense to you, or even not necessary. I may also have forgotten good practices that look like a must have to you. If so, please feel free to reach us and debate.

How we handle incidents at Getaround

Miguel Torres — Mon, 05 Apr 2021 00:00:00 +0000

At Getaround, like at any other company, we sometimes experience incidents that negatively affect our product.

A couple of weeks ago from the time of writing, I released a feature that contained a seemingly harmless SQL query that returned the total balance of a user. This calculation was previously made on the fly with Ruby every time a user loaded a page where this was needed. This was particularly problematic with owners that had many cars and the query was slow to load, sometimes causing timeouts. So the commit that I had deployed was meant to counter this problem by using a table that was built exactly for this purpose. Instead, it brought the CPU utilisation to go over 90% and slowed down all database queries, causing timeouts all over the site

Finding out and responding

We use Grafana for monitoring, and we use a Slack webhook integration that let us know when certain events happen. In this case, 13 minutes after my code was live, we get a notification on Slack on a dedicated channel letting us know that there is a problem

Automated Slack alert to let the team know there is a problem

After looking at the CPU utilization graph and some further investigation, my commit is rolled back.

Red dotted line just before 15:30 indicates the faulty deploy

Despite the rollback, MySQL was still busy running the existing queries and the CPU utilization did not diminish, but after a couple of back and forth, and communicating with the rest of the company what was going on in the perfectly named #war-room channel in Slack, the issue was under control in less than an hour 🎉

Red dotted line just before 15:30 indicates the faulty deploy

After the incident

At Getaround we keep a record of all the technical incidents that have happened, and each entry on the list contains a couple of things:

A description of the problem
The timeline of events
The root cause

This is also called a Postmortem and it is an important step after an incident. The goal being to be able to share knowledge with your colleagues and try to prevent it from happening in the future as much as we can, all while acknowledging that incident are a normal part of software development. It is essential that a blameless culture exists in the company in order to be able for everyone to write in detail freely about what went wrong so we can learn from our mistakes. The Post mortem for this incident in particular would look similar to this:

Post Mortem Example

Timeline

15:24 - A release was made containing the commit which included the slow query

15:37 - Team was alerted on Slack about about a high CPU load

15:44 - The team identified the issue (high CPU load) to be related with the release at 15:24

15:46 - Commit rolled back

15:52 - After noticing that the CPU usage does not decrease, even after the rollback, it is identified that the db is still busy running the queries that it had enqueued

16:00 - Incident opened in New Relic (monitoring tool used at Getaround)

16:15 - Command launched to kill lingering db queries

16:23 - CPU load back to normal

23:48 - Incident on New Relic closed

Impact

User searches started timing out and there was an uptick of incidents on Bugsnag

What caused the incident

The combination of an underperforming query and the fact that it was a query used across many different placed caused the overload.

The offending code was:

Balance
  .where(entity_type: 'User')
  .where(target_type: 'User')
  .where(entity_id: user_id)
  .group(:country, :entity_id)

The problem was is not obvious at first, but after trying to understand what the query was doing with EXPLAIN it turns out that the query was not fully taking advantage of all of the indexes that we had in place, which means it scanned way more columns than it needed to. After the query was optimised to take advantage of the indexes, the number of examined rows returned by running EXPLAIN came down from 3468 to 4. So… yeah, big improvement.

Although we are able to objectively point towards the code that caused the incident, there are also other, more subtle factors that contributed for this code to be overlooked and committed. For example:

This query is only used in the back office used only by administrators, where the volume is not as high as production and performance is not as big of a concern as it can be for other user facing code
The table used (Balances) was born as an attempt to speed up balance calculations, since previously balances were calculated every single time, and it was assumed that the fact that we were replacing a long ruby calculation with a SQL query would inherently be a performance boost

Mitigation

The rollback of the offending code caused the queries to stop enqueuing themselves on an already stressed database and the killing of lingering processes managed to solve the incident completely. After finding the ids of the processes to kill, the following command was executed:

cat ~/Downloads/ids-to-kill | while read -r a; do echo $a |  mysql --user <USER>  --host <HOST> --password=<PASSWORD> ; done

Permanent solution

After finding out that using a where condition for entities was an overkill, the query was rewritten to take advantage on the existing indexes and a promising indicator that it was a good solution was that the rows to be examined dropped from 3468 to only 4

Lessons learned and possible action items

Even for “low volume” queries (i.e. queries used only in admin), it is very important to make sure that queries are performant
Tweak the internal Developer’s Checklist and/or the PR draft document to include soft reminders to think about performance. (For example: Running explain to new queries and scan through the output or Making sure that the queries used can leverage on existing indexes or create new ones if they don’t exist and will be heavily read

Links and images

In the original Post-Mortem I added the links to relevant places like the Slack, or the New Relic incident link, but in this public version I’ll omit some of them 😬

First dotted line just before 15:30 indicates the faulty commit deploy and the one at 15:50 is the rollback deploy

Search Controller response time during the incident

Takeaways

Postmortems are a great practice that help make the best out of bad situations when they happen, since incidents are not a matter of wether they will happen, but of when they will happen, and the best way to minimize the potential negative impact is for the team to be aware of potential pitfalls, this requires that everyone can feel free to go into detail about how their actions led to an undesirable outcome.

How we ran our last hack day

Michael Bensoussan — Wed, 24 Mar 2021 00:00:00 +0000

For the last few years, we did about one hackday a year where the whole team gathered together in Paris from different areas of France.

For one day, participants were given creative freedom to create a demo-able, team-based Getaround-related project. People try new technologies, explore new ideas, get coffee together and we all end up debriefing with a cold beverage and a cheese board 🧀🍻.

Writeup of our first hackday.

Despite the pandemic and the fact that we’re currently all working remotely, we didn’t see any reason to miss out on the fun. Thus, the first fully-remote Hackday was born!

The organization

A day is short and to come up with something meaningful, you need some organization.
When I say meaningful, I don’t mean some code you could put in production at the end of the day but really anything that would make this day different from the others; work with a coworker you never work with and learn he/she likes cats, discover and play with a new technology or plant a seed for a future feature.

So, to make sure the day is used to its fullest the objective was mostly to have groups and ideas ready for d-day.
One month before the event, we created a slack channel for people to pitch ideas and find their crew.

We also had a spreadsheet accessible to the whole company - not only people participating - to submit ideas.

The format

We had 32 people participating from the tech and data teams split into 12 groups of 2 to 5 people.

We met in the morning on a Google Meet and, while people finished their breakfast, did a small round table to explain what we’d be working on during the day.

People then gathered in their own Meet to work together during the day. There also was a global “breakroom” to chitchat during breaks.

Later in the evening, we had a virtual beer to debrief the day and chill out 🍻.

The results

Some projects warrant a blog post in their own right, but in the meantime here is a quick overview and some demos.

📱 iOS App Clip

Jean-Élie created a proof-of-concept allowing anyone to scan QR codes off of in-street-cars and book them instantly 😍
He created an iOS App Clip so that users don’t need to download the full aplication.

🚗 Android Auto Companion application

Quentin created an Android Auto companion displaying trip information (return date and place, kms & fuel at checkin, assistance, …), navigation and notifications (missing check-in information, fuel before return reminder, …).

🖼 Cobalt Web IDE

Romain and Thibaud worked on a no-code React app connected to Android to build a full Android native page based on our API Driven UI and our design system ✨

♻️ “Green Search”

Emily, Alice, Rémy, Benjamin, Hugo and Camille (😅) built a car ecoscore and implemented it in our search pages to direct demand towards greener supply.

Here’s the full presentation:

💎 Type signatures with Ruby

Miguel, Howard and Eric spent the day exploring type signatures with Ruby.
They played with Sorbet and rbs.

Here’s the full presentation:

But also, all these other projects:

🧬 Use of a graph database dgraph to detect risky profiles
👀 Use of computer vision to autofill car listings
🤑 Create a car listing in the Blockchain with Ethereum
🍕 Foodaround, an app to share recipes between employees
💬 A notification center for our Android & iOS apps
🐦 A system to analyse, categorise and apply sentiment analysis on Twitter messages
🔬 NLP on user-generated content; categorize reviews and support tickets, and recommend macros to answer our clients

Slides from different projects presented at our Demo Day

Skills were sharpened, bonds were strengthened, beers were enjoyed - it was a blast!

Thanks for reading 🍻

🍻

Predicting slack emoji reactions with machine learning

Adrien Siami — Mon, 01 Jun 2020 00:00:00 +0000

Every year at getaround, we (The engineering team) take part in what we call a “Hack Day”.We can work on a subject of our choice for a day, in a team of developers.

We can work on pretty much anything we want as long as it is remotely related to Getaround. It could be exciting beta features, or tooling to make our lives better. It does not necessarily needs to be shippable.

This time, I wanted to work on something both fun and challenging. I always wanted to look into machine learning but never got the chance, so it didn’t took me long to find a fun topic to work on.

As a slack emoji reactions power user, I thought building a bot that could react on slack messages with a relevant emoji would be very ~~useful~~ funny.

Disclaimer: This approach is most likely far from good, this is the result of 3 full-stack engineers working for 8 hours on a topic they didn’t know anything about beforehand.

What are we building?

In the contextual menu of a message, we want a new action to trigger an emoji reaction.
The emoji reaction has to be relevant to the message being reacted on.
To build relevance, we are going to use the existing data of emoji reactions and messages by our team, into a machine learning model.

What tools are we going to use?

Ruby, our favorite swiss knife.
A neural network, which after a bit of research seemed to be an easy and “good enough” solution.
ruby-fann to build the neural network in ruby
stopwords-filter to remove stop words from the sentences
pragmatic_tokenizer to turn our sentences into a list of words, without punctuation.
ruby-stemmer to find the “stem” of the words in our sentences

Neural networks crash course

The following video helped us a lot to grasp the concepts of neural networks :

What I retained of this video, which may not be 100% correct but was enough to build this project, is as follows :

A neural network makes use of a graph data structure to predict a different set of outputs, given a different set of inputs.You have to choose the number of inputs (only rational numbers) and outputs (between 0 and 1).Those are the input layer and output layer, there are also one or more hidden layers in the middle, where the “magic happens”.

My rough understanding is that when you train a neural network, you basically try to find mathematical correlations between the input and the output, you kind of bruteforce coefficients which will transform your inputs into your outputs. There is also things such as the activation function that are taken into account.

Choosing the good number of inputs and outputs is primmordial, as well as number of hidden layers, and depends a lot on the shape of your data and what you expect to get from it.

For our project, as inputs we have a list of words, and as output we want one emoji.

Since the number of inputs and outputs has to be fixed, here is what we decided to do :

We took the top 200 most used stems and used them as input, the value will be the number of times they appeared in the message.
We took the top 50 most used emoji reactions, the expected output will either be 1 (the emoji was used as a reaction) or 0 (the emoji was not used as a reaction).

Importing the data from slack

First step is to get enough data to work with, we used the slack-ruby-client gem to fetch messages on a selected list of channels, we only kept the messages with emoji reactions.

We stored the message content and the emoji reactions in a JSON file.

Training the neural network

The interesting code is available here.

One important thing to do when working with machine learning is to control how well your model is doing. An usual approach is to keep a certain amount of your data for testing purposes. That’s what we do in train.rake. We keep 10% of the data apart in another file, and we don’t use this data for training. Later on, we can try to apply our model on this data and see if the results make sense.

To be quite honest, we didn’t obtain a very good result statistically speaking. However, the emoji predictions were quite hilarious, so we decided to stick with it.

Plug it into slack

Then it’s just a matter of plumbing, we created a slack app and made use of message shortcuts.

Each slack message now has a link in the contextual menu. When clicking on this link, a request is sent to our app from slack, with a payload containing informations about the clicked message. Then, we clean the message with the same process we used for training, and we run it throught the machine learning model.

Finally, we add a reaction to the message, from our bot API key.

Conclusion

Writing this bot was very fun and informative, and made us realize that machine learning concepts, although obscure from the uninformed eye, can be grasped pretty quickly.

If you know about machine learning, I’d love to know what would be the best way to have done that, feel free to leave a comment in the gist I shared.

Bookkeeping

Nicolas Zermati — Sun, 01 Mar 2020 00:00:00 +0000

All companies handle money transactions. The law enforces companies to maintain and publish records of those transactions. Those records are called books and the act of maintaining them is called bookkeeping.

At Getaround, our business-model generates a lot of transactions. We receive and route customer’s money to various partners. For that part, we decided to build digital books in our platform in order to do its part in the company’s bookkeeping.

In this article, I’ll tell you about bookkeeping techniques from the 15th century and how close they are to programming concepts we use everyday.

Double-entry-bookkeeping

This technique was formalized in the 15th century by Luca Pacioli. It has been used way before that though. Pacioli’s work has been translated by Jeremy Cripps in an online available text. In a nutshell, it is a simple, powerful, flexible, and mature accounting system. It describes various documents, called books:

the inventory,
the memorandum,
the journal, and
the ledger.

Those documents provides guidelines to record and control economic activities.

I’ll skip the inventory and introduce only the last three books.

Memorandum

This book keeps track of business operations in a chronological order. Each entry includes all the details related to the operation: « The who, what, why, how, when, and where need to be answered ».

An entry, page 53 of the Memorandum, could look like this:

You can see three entries:

John Doe paying for an upcoming rental,
Nicolas Z. being paid for this rental when it is done, and
Insurance being paid commissions for a bunch of rentals.

Each entry match a different business operation.

Because merchants were not always around, their employees would write those entries in the Memorandum. The knowledge from the Memorandum will be transferred to the Journal and to the Ledger later.

Journal

Each Memorandum entry leads to at least one entry in the Journal. Each entry in the Journal represents a transaction, it has multiple lines and each line has:

an account, prefixed by To for debits and by By for credits,
a reference to the Memorandum page number (M-53),
a reference to the Ledger Account page number (L-000001-1),
an amount, either in the credit or in the debit column, and
more details about the transactions.

Here is the page 34 of the Journal:

We have four entries:

John receives €100 from Stripe, following his credit-card payment,
John splits this €100 between each party involved in Rental #5043,
Nicolas Z. gives €60 to Stripe, to receive a bank transfer, and
Insurance gives €1,000 to Stripe too, to receive a bank transfer.

When an account gives money, we record a credit. When an account receives money we record a debit.

It can be a bit surprising to see that John starts by receiving €100. From John’s perspective, he just gave something. But from the merchant’s perspective, John’s internal account just received €100! Same when we want to pay €60 to Nicolas. It is €60 that are going out from Nicolas Z.’s internal account.

To ensure consistency, there is a single rule: for a transaction, the credits must be equal to the debits. This rule makes it easy to avoid mistakes like disappearance of resources.

Ledger

The Ledger has many sections representing Ledger Accounts. Each account has a reference, in this example, I used a number (#123456) and a name.

The goal of a Ledger Account is to keep track of operations that happen on that specific account. The choices of the accounts will have an impact on what we’ll be able to monitor with the Ledger.

Each entry in the Ledger is coming from a line in the Journal. It has:

a reference to the Journal page number (J-34),
a description mentioning the other party from the Journal transaction and
an amount, either in the credit or in the debit column.

The sum of debits, from all Ledger Accounts, must be the same as the sum of credits.

We our Ledger Accounts:

John Doe received €100 from Stripe and then gave it all to others accounts.
Nicolas Z. received €60 from John Doe and then gave it all to Stripe.
Stripe gave €100 to John Doe, received €60 from Nicolas Z., and received €1,000 from the Insurance.
The Insurance received €20 and then gave €1,000 to Stripe.
Assistance and Getaround received €10 from John’s Doe.

With that information, we can tell that:

we are good with John Doe and Nicolas Z., we don’t owe them, they don’t owe us,
we owe Stripe €1,050, that’s most likely an anomaly because transactions are missing,
Insurance owes us €980, but that’s the same kind of anomaly,
we owe €10 to the Assistance provider, they haven’t been paid, and
we owe €10 to ourselves, Getaround, they haven’t been paid either.

Much more

There is much more to say about this system but that’s more or less all I need for the rest of this article. If you’re interested there is still Jeremy Cripps’ interpretation of Luca Pacioli’s writing.

What about tech?

Frameworks

The first thing that I admire in that work is that it is generic. No matter the nature of your business, those guidelines can help you to solve a technical problem. The solution will adapt to the specifics of your business.

It really looks like a framework like Ruby on Rails. If you follow the conventions, you will benefit from the work of a whole community and from the wisdom of those who tried to solve the same problems before you.

Databases

Those books I described are very close to a database.

The reference system that ties entries from different books together is comparable to foreign keys. Information is split between books, sometimes denormalized, sometimes referenced.

A Ledger represents the same information that we have in the Journal but optimized for different use-cases. A Ledger is like a projection, like a materialized view. It creates read-models optimized for certain use-cases, certain consistency checks.

Decoupling

The Memorandum and the Journal are used by different people and for different purposes. There is a context for business operations and another for the accounting. That looks like some Domain-Driven-Design to me.

We don’t want the Memorandum to be impacted by the Journal unavailability. We don’t want to need the Memorandum when searching informations in the Journal. And finally, we don’t want to have them synchronized all the time. We’re talking about asynchrony and eventual consistency here!

Immutability

One of the important properties of those books is that entries were immutable. Each book is an append-only log of events and transactions. This looks very much like an event-driven architecture.

When there is some mistake or something to update, it has to be through compensating transactions rather than amending past transactions.

Simplicity

Double-entry-bookkeeping was in use way before the 15th century. Being relatively simple was one of the key to longevity. A complex solution would be dead by now, but this is still the foundation of most financial systems today!

Another key is certainly its robustness. The process leaves little room for error. It exposes a clear set of rules to follow and provides guarantees about the information the system will deliver.

Conclusion

Italian merchants knew a lot about how to build information systems. I wonder if there are other disciplines that refined systems, like accounting, over centuries, without any computer, and to end up with such mature models. If you have other examples, in any other field, please reach out to me!

Discovering such simple, effective, and flexible tools is truly inspiring. In my opinion, it is totally relevant to guide the design of modern software systems as we’re still solving the same problems!

Writing JavaScript like it's 2020

Clement Bruno — Wed, 08 Jan 2020 00:00:00 +0000

JavaScript (JS) has long been criticized for being verbose and quirky. But the recent additions made to the language allow us to cope nicely with some of the debatable design decisions that were made and even benefit from a truly enjoyable development experience. In fact JS boasts a vast ecosystem, is present in a wide array of development use cases and is improved each and every year with excellent features.

In the following article we explore some of these great features that were added to the language with ES2019 and ES2020.

NB: The following list does not aim at being exhaustive but merely at describing some of the new stuff I am enthusiastic about. Additionally, most of these new features are not yet supported by the major browsers but are already usable if you use the Babel transpiler or a recent version of TypeScript (>= 3.7).

ES2020

Optional chaining operator

This one probably is among my favorites because it allows to vastly reduce the amount of code written when dealing with complex objects and when being unsure about the content structure.

For instance, if we take the following example:

const userInfos = {
  name: 'Bob',
  email: '[email protected]',
  familyMembers: {
    brother: {
      name: 'Bobby',
      age: 16
    },
    mother: {
      name: 'Constance',
      age: 55
    },
    father: {
      name: 'John',
      age: 60
    }
  }
}

console.log(
  userInfos.familyMembers
    && userInfos.familyMembers.mother
    && userInfos.familyMembers.mother.age
);

As you can see above retrieving the mother’s age was a pain and the nesting level is not even that deep. Developers already tried to solve this issue in the past and some interesting solutions such as the delve utility function were developed. But the optional chaining operator is now a native implementation of the wanted behaviour and removes the dependency to external packages which is always appreciated. Using this new operator allows us to reduce the code to:

console.log(userInfos.familyMembers?.mother?.age); // => 55

In case the value called after ? is not found the program won’t crash and will return undefined instead.

console.log(userInfos.familyMembers?.grandPa?.age); // => undefined

BigInt

The addition of BigInt to JS primitives is a good thing since, for long, manipulating numbers in JS was considered hazardous. The Number type is capped to integer values of 2**53-1 which can be really limiting.

console.log(Number.MAX_SAFE_INTEGER) // => 9007199254740991

This addition provides real built-in support for manipulating large numbers and using this new type is trivial. Just adding n at the end of a number makes it a BigInt.

// BigInt numbers can be declared like that:
const bigIntNum = 100n;
const bigIntNum2 = BigInt(100);
const bigIntNum3 = BigInt("100");

There are some limitations though:

Operations between numbers are only possible if they share the same type:

100n / 2 // => TypeError: Cannot mix BigInt and other types, use explicit conversions
100n / 2n // => 50n
typeof(100n / 2n) // => 'bigint'

Since the output of operations involving BigInt numbers is itself a BigInt, fractional values are truncated:

25n / 2n // => 12n

Nullish coalescing Operator

I am really looking forward for this one to be widespread because it is also a great feature that fixes the flaws of the infamous ||. As a reminder previously the || operator could produce surprising result because it would consider all falsy values…

const someValue = null || "some value" // => "some value"

The above example is perfectly valid but I am way less comfortable with the following 2 lines:

const someNumber = 0 || 300 // => 300 
const someStr = "" || "some string" // => "some string"

This is due to the fact that, contrarily to what we have in most other languages, in JavaScript:

!!0 // => false
!!"" // => false

Using the new ?? operator solves such issue since it only deals with null and undefined values instead of all the falsy ones:

const someValue = null ?? "some value" // => "some value"
const someNumber = 0 ?? 300 // => 0 
const someStr = "" && "some string" // => ""

ES2019

I’d also like to mention two useful features that aren’t yet very widespread despite their usefulness which were brought with ES2019.

Array.flat()

flat as its name indicates allows to reduce progressively the nesting of imbricated lists:

const nestedList = ["first level", ["second level", ["third level"]]];

console.log(nestedList.flat()); // => [ 'first level', 'second level', [ 'third level' ] ]

It takes an optional parameter that specifies how many level of nesting should be unnested.

console.log(nestedList.flat(2)); // => [ 'first level', 'second level', 'third level' ]

When no parameter is provided the function defaults to 1. Therefore it is equivalent to write:

console.log(nestedList.flat()); // => [ 'first level', 'second level', [ 'third level' ] ]
console.log(nestedList.flat(1)); // => [ 'first level', 'second level', [ 'third level' ] ]

If you are dealing with a data structure with an unknown level of nesting and are sure you want everything unnested you can provide Infinity as an argument to the flat function.

console.log(nestedList.flat(Infinity)); // => [ 'first level', 'second level', 'third level' ]

NB: Please note that a flatMap function was also introduced and as its name indicates it combines the flat described above with a map loop method.
It is also worth mentioning that, in the functional programming spirit, these new methods do no mutate the array on which they are called but create a new one.

Object.fromEntries()

This new feature is interesting to switch from one data structure to another to best fit our development need.
For instance, given a nested lists data structure:

const someList = [['age', 55], ['name', 'bob'], ['email', '[email protected]']];

if I needed to access the “bob” value, I’d have to write something like:

console.log(someList[1][1]); // => "bob"

This works but it is flawed because it relies on the list content ordering and it does not provide any context regarding the fact that I want to access the name property. With the the new Object.fromEntries() function we can adopt a very much cleaner approach:

const someList = [['age', 55], ['name', 'bob'], ['email', '[email protected]']];
const someObject = Object.fromEntries(someList);
console.log(someObject); // => { age: 55, name: 'bob', email: '[email protected]' }
console.log(someObject.name); // => "bob"

NB: The corollary feature is Object.entries() which allows to transform an Object into a nested array structure.

const someObject = { age: 55, name: 'bob', email: '[email protected]' }
const someList = Object.entries(someObject);
console.log(someList); // => [['age', 55], ['name', 'bob'], ['email', '[email protected]']];

Conclusion

The features described above are just a subset of the recent additions made to the language but these are very much likable because they allow us to write code that is much more readable and concise. As a consequence the whole development experience feels less hacky and really enjoyable overall.

Migrating a Live IoT Telemetry Backend

Khwaab Dave — Tue, 10 Dec 2019 00:00:00 +0000

In order to provide a magical experience for our carsharing customers, Getaround vehicles are equipped with Connect® hardware that communicates with the Getaround network. That magic is powered by an entire IoT backend which we recently migrated. Some might call that magical. As our platform grew quickly, the initial infrastructure experienced stability issues and hindered our ability to scale or improve telemetry features. Migrating an internal service which powers the business is a bit tricky. A single mistake has the potential to shutdown business for hours if not days. With this in mind, we used an iterative and parallel approach to carefully migrate each feature from the old to the new.

IoT at Getaround

We believe the best carsharing experience should be seamless– from booking, to finding and unlocking, and then returning the car. Getaround’s Connect® hardware powers this experience of finding and accessing the car and allows us to monitor our fleet’s health.

When Getaround started, IoT was still a relatively new field, and off-the-shelf services were limited at best. We created a proprietary telemetry backend and protocol. It was a single cloud server, running a service written in Erlang, receiving telemetry and routing commands and configurations to all vehicles on the platform.

At a high level our system looked like this.

For years, with a smaller number of cars on our platform, this was functional and required minimal maintenance. However, once we started rapidly scaling, minor issues became huge problems:

The service and protocol did not scale with thousands of devices
As we scaled, edge cases became a more frequent problem we could no longer ignore
Hiring a specialized Erlang team to fix and work on a system is very difficult

Choosing an IoT Service

IoT as a field had made leaps of progress in the years since we created our initial infrastructure. We decided the best way to scale our system was to switch away from our proprietary solution and use one of the newer off-the-shelf IoT products.

We began research and landed on four services to evaluate:

AWS IoT Core
Google Core IoT
Samsung Artik
ThingsBoard

As we looked into these offerings we had a few metrics we were comparing:

Stability / Age / Development Cycle / Adoption
Scaling
Protocol - MQTT or COAP
Flexibility
Setup Ease
Maintenance Ease and Skillset
Integration Ease
Cost of Service
Bandwidth cost of service (over GSM)
Limits

First we eliminated ThingsBoard. Their software was interesting and open source, but it was geared towards running your own instance, like we had been. They had a SaaS offering as well, but we decided it was risky to use a relatively unknown and small company for a system which our entire business relied upon.

Next we eliminated Samsung Artik. Artik had similar issues as AWS in moving the data from Samsung to our platform in Google Cloud, but it was a much newer and less adopted system than AWS. Samsung Artik did offer COAP and MQTT, and we looked into it specifically to evaluate COAP, but we finally chose MQTT due to its more mature libraries and better TLS support. Samsung Artik had the same feature limitations as Google Cloud and the same integration problems of AWS.

Finally, the decision ended up being between Google and AWS. In evaluating the basic MQTT Telemetry (offering, cost, and limits), they both seemed similar. Since we had a working relationship with Google through our other backend systems, we decided it was beneficial to try and keep our services within Google Cloud Platform. The integration seemed simpler even though AWS was more established, flexible, and feature-rich.

Our first design thoughts were to replicate our current features falling largely into three categories:

Receive telemetry from Connect® devices
Send commands to Connect® devices
Modify configuration of Connect® devices

As we began designing the commands and configuration infrastructure, we started to see the limits of Google’s Cloud IoT Core:

MQTT topics were predefined, making it harder to organize the system for our purposes
Implementing a robust device configuration system required a large amount of work

When sending a configuration request to a device, we wanted a system in which we could guarantee the configuration is sent whenever the device comes online, and monitored even while it is offline. We previously lacked this ability which hindered monitoring the state of any device. To do this with Google’s Cloud IoT Core, we would’ve had to implement another service with a database maintaining these configurations. As we started to design this system, it was almost as much work as implementing the entire IoT backend from scratch. We took a step back and started to rethink our decision regarding Google versus Amazon.

AWS IoT Core eventually won out because of the additional features offered along with the basic MQTT telemetry system:

Flexible MQTT topics
Device Shadows
Jobs

As we looked deeper into Device Shadows, we realized the configuration system we were designing with Google Core IoT was a rudimentary version of AWS’ Device Shadows. Using AWS, we didn’t need to create supplementary services to achieve the functionality we desired. The only downside was that we needed an ETL from AWS to the rest of our company infrastructure in Google, but we decided the work to implement the ETL was far less than the work required to have Google Core IoT perform the way we needed. As much as we would’ve liked to have kept our systems contained within Google, their IoT product was too new and didn’t offer as many features as AWS.

Iterative and Parallel Migration

When we designed the migration process we took into account how we release new Connect® features. When we develop a new feature for the Connect®, it is difficult to test all the real world situations the Connect® devices are exposed to. To verify these features, we roll out firmware releases iteratively to an increasing percentage of the fleet, and with feature flags on all the new features. Feature flags allow code to be exercised with the end result mocked out, making it possible to test new features in parallel with a functioning system.

During the migration, our system looked like this:

The sequential steps of our migration workflows

Activate the connection between Connect® devices to AWS and move data from AWS to our telemetry database in Google Cloud.
Implement sending commands from our web platform in Google Cloud to AWS IoT Core, receiving the commands on the Connect®, and replying back.
Shutdown the connection between the Connect® and our old backend.

Step 1: Telemetry

The first migration we implemented was our main telemetry feed from each vehicle. We rolled this out iteratively based on a flag that maintained the connection to the old backend, but when enabled, sent the data to AWS instead.

Ideally, we would have duplicated this data and sent it through both backends to our database for comparison, but the format of our database made this a complex task. Instead, after the firmware was released, we closely observed about 10 vehicles which had been switched over to AWS. Each of these vehicles could immediately revert back to the original network if required, and could also be accessed via SMS in an emergency. Once we identified and fixed some minor issues with these vehicles, we slowly (over the course of a month) increased the number of cars until all capable devices were sending telemetry through AWS.

Step 2: Commands

We focused on commands once our telemetry was stable and fully released. Fortunately, commands did not have the same database complexity as our telemetry system, so we could run both AWS and legacy commands in parallel.

We implemented a feature flag which allowed AWS commands to function normally up until the point a Connect® executed a command (e.g. lock, unlock, set configuration). At this point, the device replied to AWS with a message saying it would’ve performed the action if the feature flag was enabled. This allowed us to test the the communication with AWS without unintentional consequences from any bugs in our implementation.

Using this feature flag and the same iterative rollout process, we were able to fully update the fleet and monitor the health of AWS commands while still using our older backend to do the actual work. This allowed us to verify that AWS commands were working as expected, and to roll out any bug fixes before actually using the feature. When our analytics confirmed that AWS commands were performing as well as before (it was actually better), we started another iterative rollout to flip the feature flag which actually acted on the AWS command. We still kept the previous commands enabled, but had deduplication logic in the Connect® to ensure we didn’t execute the command twice.

Finally, once we had sufficiently verified all commands running in parallel, we stopped sending commands through the old system if a device was registered with AWS, completing the migration of commands and no longer relying on the old infrastructure.

Step 3: Close the connection

Now that our AWS implementation had reached feature parity, we had the option to disable the connection from the Connect® to our old server and complete the migration. Because of a number of legacy devices in our fleet, we still run the old system to ensure those vehicles are still operational, but the load is so minimal it takes virtually no effort.

Conclusion

The complete migration of our telemetry system, from choosing a new service to full migration, took just under a year. Each step had to be carefully considered to avoid any negative impact on our users. The migration would’ve been nearly impossible if it were not iterative and done in parallel with our older system. Every release was verified before roll out and we assured our operations teams that there would be no changes in day-to-day business during migration. Everything had to work the exact same way it always had.

There were a few hiccups along the way, but because we had iteratively rolled out all of our features in parallel with our old system, it was simple to revert back when we encountered issues. And with the iterative release, any issues we did encounter were confined to small portions of the fleet. Fixing a couple mistakes by hand is feasible, but a bug affecting the entire fleet would have been disastrous.

As a result, we finally have a fully scalable, distributed and flexible backend which no longer is the blocker in expanding the functionality of the Connect®. Now we just have to explore how to use all this new data!

Improving Performance with Flame Graphs

Howard Wilson — Tue, 29 Oct 2019 00:00:00 +0000

Recently we had reason to investigate the performance of one of our most commonly used endpoints. We find New Relic to be a great tool for identifying and exploring performance problems, but what about when performance isn’t exactly a problem per se, but we’d like to optimize it all the same?

Often the key to understanding can be in effective visualization of the problem, so in this short post we’ll explore how to do just that using Flame Graphs.

Generating the Graph

We won’t go through this in detail, because it’s very well covered in this good blog post from 2016. We found the following development environment setup process effective:

# Gemfile
gem "ruby-prof-flamegraph"

# development.rb
config.cache_classes = true

# Controller action
profile = RubyProf.profile do
  # Action code here
end

File.open("ruby-prof-profile", "w+") do |file|
  RubyProf::FlameGraphPrinter.new(profile).print(file)
end

Fire the action locally and then convert the output to a graph using FlameGraph:

cat ruby-prof-profile | ./flamegraph.pl --countname=ms --width=1600 > flame.svg

Note: Stackprof is a popular alternative to ruby-prof.

Reading the Graph

Here’s part of the graph for the endpoint in question:

Flame Graph

Time is represented horizontally, and the call stack is represented from bottom to top. This block of code is iterating over an array of cars and serializing a photo URL for each one, as well as some other attributes.

The parts of interest are highlighted in blue. Oddly enough, they were all calls to a simple #config method which was necessary to generate each of our photo URLs. This gets called hundreds of times by this endpoint, but it’s just a hash so that shouldn’t be a problem, right? Here’s the code:

def self.config
  SHARED_CONFIG.deep_merge(ENV_SPECIFIC_CONFIG)
end

Unfortunately, it is a problem because it’s doing a #deep_merge of our increasingly large environment config into our system-wide config every time it’s called.

Once we’ve spotted it, the memoization fix is simple:

def self.config
  @_config ||= SHARED_CONFIG.deep_merge(ENV_SPECIFIC_CONFIG)
end

Note: Take care when memoizing class methods in this way in a threaded environment: some other thread-safe store might be more appropriate for you.

Caveats

In development, application performance isn’t necessarily going to behave the same as it will in production. For example, we noticed local performance issues with Time#utc.
Bear in mind the impact that adding a call-stack profiler to your production environment might have. Tools like rbspy might be a better way to go.
Absolute timing isn’t going to be as useful as the relative time spent in methods, since the profiler itself has an impact.

What is my job like at Getaround EU

Nicolas Zermati — Tue, 29 Oct 2019 00:00:00 +0000

My job title is officially backend engineer but this is pretty vague. I wanted to explain a bit what I do on a daily basis. First, it is a good reflective exercise for myself. Then, if readers like what I do, maybe some of you will want to join us!

If you’re in a hurry, this is a quick recap of the main points of the article:

I mostly maintain an existing system, which is both challenging and rewarding.
I’ll build a long-term vision in order to guide incremental refactoring rather than big rewrites.
I’m responsible for a key piece of the product.
Some of the main features I work on can take months to get done.
I mostly deliver internal APIs, with very little UI.
I have lot of freedom and very few deadlines.
The feedback loop on some decisions is really long.

Of course, all this only reflects my own beliefs and perceptions. I’m happy to discuss anything you read here and encourage you to say Hi! on Twitter.

Disclaimer: Sometimes I’ll say I, sometimes I’ll say we. I’m not doing anything completely by myself. All of what I do greatly relies on and involves the work of the team.

I think the easiest way to share what my job is like would be to tell the story of the main areas I work on. I’m in what we call the finance squad so my work is mostly focused on how the company accepts money from our customers, how we keep track of it, and how we dispatch it to our various partners. The scope is a bit broader than this but it is a good starting point…

There is no particular order here.

Payments

We support credit cards via Stripe. We also accept Paypal in some countries. We used to use other providers, so our integration aims to be provider-agnostic. This integration style works quite well and I can’t thank enough the people that set it up.

I’ve been busy keeping up with regulations. For instance, I worked on being PCI-DSS-compliant. Thanks to Stripe, becoming compliant was a smooth process. We use Stripe Elements and Sources so we don’t have to handle sensitive credit card information.

You may have noticed, the European Commission released the PSD2, a directive aiming at securing online payments with two-factor authentication. All the payment providers operating in the European Commission had to work really hard to be ready for this. Account managers at Stripe did their best to help us, they made themselves very available, invited us to workshops, and so on. On my side, the main challenges were:

releasing the new API incrementally, and
keeping our provider-agnostic code agnostic.

The new (Payment Intent and Payment Method) APIs introduced some pressure on our existing abstractions. For instance, the system was designed in such a way that a call to the payment provider should either end up in a successful transaction or a failed one. This change introduced another state: pending_action when the bank requires a two-factor authentication to complete the transaction. How to handle that? What’s the impact on our apps, on our data, etc?

In both situations, when moving to Sources and then to Payment Methods, my role was to adapt and extend our existing abstractions to isolate those new concepts, specific to Stripe, as much as possible from the rest of the system to finally plug the new APIs in. I attended Stripe’s workshop, worked with other squads to clarify what the impact was going to be on our product, did a lot of Q&A, monitored the releases, challenged our testing strategy for Stripe’s integration, tuned the integration during the releases, fixed bugs and so on.

As I mentioned, in addition to receiving money, we also dispatch it to our partners. Each payment must match certain transactions between actors.

We need to pay or debit each of those actors. The squad maintains and scales the payout system. We do batches of bank transfers to the owners almost every working day. The system has to decide what we should pay to whom and what need to be reviewed by the finance department. With the company’s growth, this part of the system was often subject to technical and operational scaling issues.

In order to be able to expand to other countries easily we integrated Stripe Connect. With that came a lot of work around KYC (Know Your Customer) compliance. We needed to collect identity documents, billing information, legal information, to add rules in the payouts, etc. Introducing Connect triggered the need to introduce new concepts and internal processes. The impact of such a thing is far-reaching: we needed to make changes to our on-boarding flow to communicate with our partners, ensure that customer support were ready, etc.

It is better to identify the consequences of the new constraints as soon as possible. Fortunately, this is a team effort. Most of this effort is done by the product owner; coworkers will help through kick-offs and reviews too. I need to be involved in the product in order to help to spot those consequences. When we miss something, it is no big deal, we find out and adapt.

Moving forward

Our application and its accounting system was centered around a single concept: the rental. The assumption that we’re dealing with a rental had firmly taken root throughout the app and coupled things together.

With Drivy Open, we were actually selling something else entirely: subscriptions to a service. Whilst Open was still a startup inside the startup, we operated a separate system for everything. Because it was a great success, we incrementally merged Drivy Open to the main application. Everyone worked hard to make Drivy Open the future of Drivy. On my side, it was a challenge to untangle all the rental-coupled logic that was everywhere in the application in order to build more flexible sub-systems.

To accomplish this, we needed to establish a long-term vision. Refactoring a whole system in one go would be a very big investment. Also, by doing it slowly, we can learn along the way. Each feature becomes a good opportunity to advance toward the vision. If a feature doesn’t fit that vision well, both the vision and the feature get the opportunity to get reworked.

Not rushing through the vision could get frustrating. It would solve most of our current issues, be more efficient, easier to maintain and so on. I said frustrating because moving on to that vision is definitely subject to the needs of the company. As long as the users are happy, and we can move fast enough, we don’t need to rush into building it. Thus we need to be patient. We wait for the opportunities to make progress on it. When planning and discussing with the squad, we can elaborate that vision together. We also have to make trade-offs between short-term and long-term investments.

Supporting other squads

We’re dealing with a majestic monolith. A given functional scope matches a certain area in the code. Many scopes require interaction with the payments, or with the accounting. Team organization evolved quickly. It evolved quicker than the code itself. We’re organized in cross-functional teams with a given scope (inspired by Spotify’s squads).

Many squads will at some point need something from the finance squad. It could be some assistance for a given task, or it could be a feature that isn’t yet supported and that we’ll need to add. This brings its fair share of effort in term of planning prioritisation. It also encourages interaction and cross-squad collaboration which I enjoy. So in addition to working on the system itself, I’m often in a support role to other squads. It’s very rewarding, but sometimes this dependency becomes a bottleneck for the team’s outcomes. Fortunately, the long-term vision solves it!

To avoid that bottleneck, we aim for a modular monolith. We need to expose clear boundaries of what the subsystems are that the finance squad must provide, in term of scope and APIs. I, obviously, would like to reduce our scope. But wait, it’s not that simple… Features are built on the scope I would like to get rid of. It means than if we move a boundary between A and B, the scope of subsystem A will shrink and leave a hole between A and B. So my job would be to deprecate that scope; to prevent further features to rely on it. Then, thinking about how to fill the void for the existing features. Maybe by adding stuff to B? Maybe by filling the void with another subsystem? Maybe the void belongs to subsystem A after all?

To achieve that, I must stay alert, keep the vision in mind, say no, offer alternatives, make it clear that it is a long-term investment, … Deprecating scope provides no short-term business value. But, the more the team grows, the greater the value of this investment when it pays off.

Reporting

One of the most important responsibilities of the squad is to produce reporting for the finance department to use. Seems boring, doesn’t it? Not to me it doesn’t! The needs of a finance department can create a lot of constraints that are fun to play around with.

Over time, this reporting pushed the system to be more robust. At some point in the past, the finance department was using a single report with all the figures for each rental the company did, since forever. In order to get a monthly vision, the technique was to calculate the difference between the report from month N and N-1. This clever way was found in order to cope with the fact that some changes could happen to past data, for instance an adjustment on an old rental. Because of the volumes, such a technique wasn’t working anymore. The extract was too big to work with, and too long to generate.

It took months of work to take the system to a point where it could generate many small monthly extracts with only the data of what happened that month. We needed to find out a clear way of cutting exports, to detect modification of the past, to find out ways of avoiding those modifications, to be confident that we all we did was equivalent to the previous technique, and to do the migration correctly. We went from rental-based extracts to invoice-based ones. Maybe this doesn’t seem much to you but I’ve faced lots of technical implications while trying to figure out reporting.

A good thing is that I mostly care about keeping the accounting system sound. The reports are, in the end, a dozen of - relatively simple - SQL queries that don’t move too much.

Small features

Alongside the big projects and the long-term vision, there is a lot of other features that I do. Those are smaller and can be done in less than a few days. Usually those are features that improve the team efficiency. Here are a few examples: new administration tools, financial processes that get automated, or updates on existing facilities.

What I like about those is that, in contrast to the other features, they provide predictable, easy, and quick satisfaction. A feature is usually done quickly, shipped, and instantly useful. Many of those features are internal tools. This context is pretty easy to work with:

We only support English in the tools thus we can avoid the internationalization flow.
We usually don’t care about supporting old versions of Internet Explorer.
We don’t do complex CSS things: we use an internal bootstrap mixed with our design system.
We can iterate very fast.

I also like having those small features because they have a direct impact on my coworkers workload so I can be a hero :-)

Data analysis

We are dealing with a database that has been holding the company’s data for almost a decade. Still more challenging, is the fact that we collect data from our providers, unstructured jsons fields, and even others companies’ data that we imported. Given the timescale and the diversity of sources, data isn’t always consistent with today’s happy path. We’ve got some safety nets in place to help us gain confidence that the data stay as we expect over time.

Still, before adding a feature, we often need to dig in the data a little bit. It helps understand the volumes we’re dealing with, to be sure we have no holes in our thinking, … We don’t have a data-analyst like other squads could have thus we need to do that digging ourselves. This routine is very useful. It allows us to know the data well, and to keep SQL skills sharp.

We’re also blessed with a product owner that understands, tweaks, and writes SQL requests. When I have an unexpected result, she can proofread my requests and spot missing bits!

Writing SQL is a must as we sometimes have CPU-expensive logic that we need to translate from Ruby to SQL for efficiency purposes. With our growth and with the diversity of our customers, the performance truly matters. We’re constantly trying to find better ways than duplicating logic from our application code to SQL. To do that, many solutions are used such as introducing immutability, storing more information in the database, building specialized tables, caching, and more.

We also have a dedicated data department that makes available hundreds of tables with all the information you can imagine. We’re in the process of gaining the ability to maintain our own ETL pipelines in order to suit our specific needs. I’m really excited by that prospect!

I didn’t mention the way we manage bugs, how we deliver software, how we support the rest of the team when they have a question about the system, … There are many aspects of my job that I left out here but I think I pictured the biggest part.

If you find all this interesting, if you want to know more or dig into specific points, I encourage you to reach out to me. It would be my pleasure to discuss all this even more!

More tips and tricks for junior developers

Emily Fiennes — Wed, 18 Sep 2019 00:00:00 +0000

This article follows on from Clement’s post in which he details useful tips and tricks learnt while working at Drivy. Like Clement, I undertook Le Wagon’s intensive 9-week bootcamp. The program was great for a rapid overview of the key elements of full-stack engineering.

Since then, the learning curve has been a steep and stimulating one - at Drivy, I’m surrounded by real dev-warriors. As you can imagine, I relish every pull request I submit or review, for the opportunity to learn new things from my colleagues.

In this article, I will explore just a handful of the many useful tips and tricks that they have shared with me, in the hope that they will be useful to other junior developers.

POST or PUT?….or PATCH ?

At Drivy we implement RESTful api design, where CRUD actions match to HTTP verbs. I found the difference between POST, PUT and PATCH difficult to grasp, before I encountered concrete examples in the Drivy codebase:

POST

POST is used to create a new resource on the server, and maps to the create controller action. To create a car, i.e. a new row in the database, we need to gather and post to the server the data that corresponds to the columns in the cars table. This might be:

{
  is_open: false,
  make_id: 53,
  model_id: 11,
  plate_number: 'L87hYQJ',
  registration_year: 2019,
  user_id: @user.id,
}

We send this information, or payload, to the server at www.drivy.com/cars. The server then decides the location, or URI, for the resource - which will also correspond to the resource’s unique ID - and creates the row in our databse corresponding to that location. For example, www.drivy.com/cars/1. So far, so good…

PATCH and PUT

Hang on, both verbs correspond to the update in CRUD? Yes, but there is a subtle difference.

PUT overwrites the whole resource at an existing location. If we send the following payload to www.drivy.com/cars/123…

{
  mileage: 9
}

…the entire resource at location /123 will be overwritten i.e. our car’s only attribute will now be its mileage. By the way, if a resource is not found on the server at the given location, a new one is created by the server.

On the other hand, PATCH overwrites only the attributes included in the payload. If the attribute is a new one, it is added to the resource. If we send this payload to our original resource, which now resides at www.drivy.com/cars/1…

{
  mileage: 5,
  is_open: true
}

… the is_open attribute will be overwritten, and the mileage attribute added. So once these changes have been applied by the server, we will end up with a Car resource at location www.drivy.com/cars/1 that looks like this:

{
  is_open: true,
  make_id: 53,
  mileage: 5,
  model_id: 11,
  plate_number: 'L87hYQJ',
  registration_year: 2019,
  user_id: @user.id,
}

Manipulating data structures: benchmarking `flat_map` vs `map.flatten`

One of the first Ruby tools in the toolbox that I encountered during Le Wagon was Enumerable#map, and this method can be usefully combined with Array#flatten, to return a useable array of resources that might otherwise be nested. Take the following example:

# A `car` has many `car_photos`. Imagine we want to get all the photos of all the cars for one of our pro owners. This owner has 56 cars, and each car has at least 3 car_photos. I could:

@user.cars.map(&:car_photos)

With this request, I end up with a data structure that looks something like this

[[car_photo_1, car_photo_2, car_photo_3, car_photo_4, car_photo_5]]

It’s an array of an array of ruby objects. To be able to use it, I must first .flatten the array because the values are nested.

@user.cars.map(&:car_photos).flatten

But I know I’ll be using this request a lot, for users with a lot of cars. Maybe I’ll even be rendering all the photos at once. I’m going to need a faster way I can do this, so I’ll benchmark the performance of .flatten compared to .flat_map. I’ll do this in a temporary rake task (for easy access to database connection) but you might also run it as a script. My rake task requires the Benchmark module and looks like this:

  task benchmark_flat_map_vs_map: :drivy_environment do
    require 'benchmark'

    user = User.includes(cars: :car_photos).find(977)

    Benchmark.bmbm do |x|
      x.report('flatten') { user.cars.map(&:car_photos).flatten }
      x.report('flat_map') { user.cars.flat_map(&:car_photos) }
    end
  end

I’m loading all the user data into memory first, so we can just focus on the map comparison without including time needed for database roundtrips. I’ve chosen to use the bmbm method which does a “rehearsal” run to get a stable runtime environment and eliminate other factors.

(By the way, this RailsConf 2019 talk is a really useful and accessible intro to how and when to profile and benchmark your code.)

So, the results were:

Rehearsal --------------------------------------------
flatten    0.006082   0.001934   0.008016 (  0.010056)
flat_map   0.000158   0.000015   0.000173 (  0.000188)
----------------------------------- total: 0.008189sec

               user     system      total        real
flatten    0.000316   0.000003   0.000319 (  0.000313)
flat_map   0.000147   0.000002   0.000149 (  0.000145)

You’ll notice that flat_map is more than twice as fast than map. This latter will create an intermediary array, and so the code has to be iterated over twice.

This, my friends, is where flat_map is useful. Like map it takes a block:

@user.cars.flat_map(&:car_photos)

but doesn’t create that intermediary array.

I used to rely on these out-of-the-box methods, without ever interrogating what was going on. Imagine running the same request involving 1000 power users, each with 50 cars, and each car with 10 photos. Using flat_map can significantly help improve performance.

`&` Safe Navigation operator…part 2

Clement discussed the use of the & safe navigation operator to safely navigate through layers of object relations.

In the Owner Success squad, we learned the hard way that method chaining using the safe navigation operator can also be a sure-fire way to introduce bugs - and precisely because navigation is safe, i.e. no errors are raised, they can be excruciatingly difficult to debug.

When we install an Open box in a car, we use the provider_device_id given by the box provider. A device id is considered valid on the provider’s api if it is:

in ALL CAPS;
with no whitespaces (internal, leading or trailing).

However, on our side, the device ids are entered manually in the backoffice. To cover our backs against human error, we format the device number at the time of form submission:

The & allows us to safely chain the methods, so that in the event that strip returns nil, upcase will not raise an error. Great!

What we didn’t realise is that whilst strip returns the original receiver, even if no changes are made, strip! returns nil in the event that the receiver was not modified. Plus strip and strip! only deal with trailing and leading whitespaces, not internal ones.

So for a provider_device_id looking something like this:

'A dEVice ID 123'

…strip! returned nil and upcase! was never run on the original object. We didn’t know about it because nil&.upcase! was not raising an error. We ended up with lots of device numbers in an invalid state, leading to errors on our external provider’s api, and had to correct them manually with a rake task.

It’s always worth checking the documentation for the subtle differences between methods with and without the !. We generally try to avoid chaining them to avoid introducing bugs like this one.

Arrays: concat, prepend, + and «

I find it useful to remind myself with clear examples of the precise output and side effects of each of these methods. Methods that modify the original receiver can also be a source of well-hidden bugs.

Array#concat

array_1 = ['Volkswagen', 'Vauxhall', 'Renault']
array_2 = ['Tesla', 'BMW']

array_1.concat(array_2)
#=> ['Volkswagen', 'Vauxhall', 'Renault', 'Tesla', 'BMW']

array_1
#=>  ['Volkswagen', 'Vauxhall', 'Renault', 'Tesla', 'BMW']

array_2
#=> ['Tesla', 'BMW']

Moral of the story: array_1 is modified, array_2 is unchanged.

Array#+

array_1 = ['Volkswagen', 'Vauxhall', 'Renault']
array_2 = ['Tesla', 'BMW']

array_1 + array_2
#=> ['Volkswagen', 'Vauxhall', 'Renault', 'Tesla', 'BMW']

array_1
#=>  ['Volkswagen', 'Vauxhall', 'Renault']

array_2
#=> ['Tesla', 'BMW']

Moral of the story: Neither array is modified.

Array#prepend

array_1 = ['Volkswagen', 'Vauxhall', 'Renault']
array_2 = ['Tesla', 'BMW']

array_1.prepend(array_2)
#=> [['Volkswagen', 'Vauxhall', 'Renault'], 'Tesla', 'BMW']

array_1
#=> [['Volkswagen', 'Vauxhall', 'Renault'], 'Tesla', 'BMW']

array_2
#=> ['Tesla', 'BMW']

Moral of the story: array_1 is modified, array_2 is unchanged.

Array#<<

array_1 = ['Volkswagen', 'Vauxhall', 'Renault']
array_2 = ['Tesla', 'BMW']

array_1 << array_2
#=>  ['Volkswagen', 'Vauxhall', 'Renault', ['Tesla', 'BMW']]

array_1
#=> ['Volkswagen', 'Vauxhall', 'Renault', ['Tesla', 'BMW']]

array_2
#=>  ['Tesla', 'BMW']

Moral of the story: array_1 is modified, array_2 is unchanged.

Delegating methods with Module#delegate

You can use delegate to expose the methods of objects on another class. For example, a cancellation belongs to a rental - as indeed you might expect it to in the real world.

class Cancellation

  belongs_to :rental

  #[...]

end

By the way, this database relationship is incidental and not a strict criteria for the use of delegate. It is a hint though, that there might be some overlap in the implementation of these two classes, and thus that there may be scope to delegate.

The cancellations table might look something like this in the database:

 t.string "state"
 t.integer "rental_id",
 t.integer "some_other_id_field",
 t.decimal "some_refund_field",
 t.decimal "some_other_refund_field"

As you can see, it does not have its own currency column.

So, what if you need to access the currency of a cancellation? You could:

cancellation.rental.currency.

But this means that the cancellation object has to know that a rental object has a currency column. That violates the Law of Demeter, which is the principle that objects should know as little as possible about each other. If our cancellation object knows too much about the rental object, or is coupled too closely, then any future changes to the implementation of the Rental class become hard to maintain.

You can use delegate to avoid chainging objects in this way. On the Cancellation class you can do:

class Cancellation

  belongs_to :rental

  delegate :currency, to: :rental

  #[...]

end

Then, you might call cancellation.currency elsewhere in the code. For example:

invoices = Invoice.where(currency: @cancellation.currency)

This helps keep your code DRYer, avoids object-chaining and respects the law of Demeter. Hoorah!

CapybaraScreenshot

The Capybara-Screenshot gem will automatically capture a screenshot for each failure in your test suite. But did you know that you can also manually capture photos? Just pop one of the following directly in your code:

Capybara::Screenshot.screenshot_and_save_page
Capybara::Screenshot.screenshot_and_open_image

and a lovely screenshot will be taken of the current step in your integration spec at that point in time. This has helped me countless times to debug my integration specs. You get to see what the user would see at that stage in the flow, and check that all information is correct and displaying as it should. Plus you’ll get a more digestible error output and stacktrace.

to_sql

9 weeks didn’t leave a whole lot of time to cover SQL in any detail. As I start to work on more complex projects, I need to make data-based decisions or include SQL in my requests for performance reasons. Chaining to_sql to an ActiveRecord relation returns the SQL statement run by the database adapter against the database to retrieve the results.

For example:

OpenDevice.where(“id < ?”, 500).to_sql

returns

SELECT `open_devices`.* FROM `open_devices` WHERE (id > 500)

Little by little, this is helping to improve my understanding of the underlying SQL syntax, rather than relying on the magical layer between me and the database that is provided by ActiveRecord.

Whether you are setting out on your full-stack adventure, or you already have a bit of experience, I hope this summary of some tips and tricks has been helpful. Don’t hestitate to reach out with comments or feedback :)

A basic decision tree in Ruby

Jean Anquetil — Fri, 23 Aug 2019 00:00:00 +0000

Recently, we did a rework of the user’s profile completion flow in our Drivy web and mobile applications. We went from a basic single screen form to a multi-steps one. The idea was to simplify the flow and ask only for the information needed depending on the user’s answers. As we had to deal with multiple possible paths, we decided to work on a little decision tree algorithm.

Fig 1. The new profile flow

What do we need to define our decision tree?

Let’s say that our decision tree is made up of multiple steps and for every steps there’s one or several possible answers that we will call outcomes.

Fig 2. First step

Fig 3. Complete flow made up of outcomes

If we know the path, which is made up of outcomes, we will be able to find the next step.

Fig 4. The path made up of outcomes

Decision tree declaration

The idea is to have a collection where the first element is a step class name and the second one is an object composed of the possible step’s outcomes. For each step’s outcome, we define a collection where the first element is a step class name and the second one is an object…and so on.

Let’s imagine that we need to ask if the user wants to rent a car or if they own a car and want to add it on the platform.

DECISION_TREE = [
  RoleStep, {
    RoleStep::DRIVER => [
      DriverStep, {
        DriverStep::EligibleLicenseYears => [
          ...
        ],
        DriverStep::IneligibleLicenseYears => [
          ...
        ]
      }
    ],
    RoleStep::OWNER => [
      OwnerStep, {
        OwnerStep::CarHasManualTransmission => [
          ...
        ],
        OwnerStep::CarHasAutomaticTransmission => [
          ...
        ]
      }
    ]
  }
]

Step class declaration

The step class would gather its possible outcomes and everything else related to it. For instance, in almost all of our steps we had to deal with a form so this is where we defined it. We could define a #can_skip? method that could check if the step has already been filled, or could be skippable somehow. Doing this way, it becomes really convenient to define specific rules on steps.

class RoleStep
  DRIVER = :driver
  OWNER = :owner

  def self.form_class
    RoleStepForm
  end
end

Finding the next step

Once we have our decision tree declared with all of its steps, we need to build a small recursive method that will find the next step according to the path (cf figure 4).

def next_step(path: [], steps: DECISION_TREE)
  return steps if path.empty?

  next_steps = steps.last[path.shift]

  next_step(path: path, steps: Array(next_steps))
end

And here you are, you can now iterate through the DECISION_TREE, giving a path or not, to find the according next step.

puts next_step.first
# => RoleStep

puts next_step(path: [RoleStep::OWNER]).first
# => OwnerStep

Sharing React components with rollup.js

Thibaud Esnouf — Wed, 24 Jul 2019 00:00:00 +0000

At Drivy, we have defined our very own design system. This system describes our visual guidelines and rules, and is composed of visual web components. For each component, we have created a React implementation that could easily be used by our design team to build a web site documentation (thanks to MDX).

Having a documented design system was an achievement in itself. But to fully take advantage of it, the final step was to use our React components in other frontend projects.

In a Node.js world, that means importing them as a node module dependency.

The main steps are:

bundle the components in a useable manner
publish the result through NPM

Project characteristics

Key points of the design system project to bundle:

React components (tsx files)
TypeScript (ts files)
SVG assets
Sass classes and utilities
Design tokens (single source of truth variables stored in JSON files, used to propagate our design decisions)

Notice that some of the sources have a specific syntax (tsx/ts files) and have to be transformed (transpiled) to be read by a browser. Bundling process must produce outputs that can be seamlessly imported in a tier project without extra configuration or processing.

Building the project with rollup.js

We chose rollup.js as bundler tool for our library because it is well adapted: it’s efficient and easy to configure. Other module bundlers like Webpack and Parcel provide advanced features for a developer (dev-server with hot module replacement, for example) but those things are not required for our achievement.

We want the following outputs:

React components as ES modules (preferred to CommonJs as it is more future proof and allows tree-shaking)
TypeScript declarations files
SVGs
Source maps
Sass files
Design tokens

Rollup configuration

Rollup is configured thanks to a rollup.config.js file at the root of our project.

Plugins

Rollup has a bunch of plugins. For our needs we use:

rollup-plugin-typescript2 (to transpile TypeScript files and generate declarations)
rollup-plugin-json (to convert our token .json files to ES6 modules)
rollup-plugin-svgo (to export SVGs through JavaScript)

Note: currently, we don’t use a plugin to convert our Sass files to CSS ones. It’s a deliberate choice as we want to output our design system variables and mixins to use them in other projects using Sass. But it would be great if we could support both Sass and CSS, in order to stick with our “ready-to-use” principle. So we’ve scheduled this task in our roadmap.

Entry point and output

Rollup’s config requires an entry point that will be used to resolve the dependencies

{
  "input": "src/index.ts"
}

This file will export all our components

export { BasicCell } from "./components/BasicCell/"
export { BulletList, BulletListItem } from "./components/BulletList/"
export { Button, ButtonGroup } from "./components/Button/"
...

Then we define how the build result should be output

{
  "output": {
    "file": "dist/index.js",
    "format": "esm",
    "sourcemap": true
  }
}

The result will be put in a dist folder
We tell Rollup to generate the sourcemaps
We tell Rollup to generate ES modules. This will allow Tree Shaking

Exclude external dependencies

Our project is based on React and so uses some external dependencies (react, react-dom, classnames …). We don’t want such libraries to be resolved and bundled with our project. So, all dependencies external to the project (described in package.json dependencies/peerDependencies) are configured to be excluded from the build process

{
    external: [
        ...Object.keys(pkg.dependencies || {}),
        ...Object.keys(pkg.peerDependencies || {})
    ],
}

TypeScript declarations (types)

Our project being set up with TypeScript, we want to export our declarations files (*.d.ts) so our lib API will be smoothly consumed (available) by any other TypeScript project.

We have 2 options:

Put declaration files with the same name and at the same level as your JavaScript modules. So for the hello.js module, TypeScript will look for a hello.d.ts file in the same directory.
Declare the entry point for your declaration file in your package.json

Example:

{
  "types": "types/index.d.ts"
}

We recommend the latter. It allows you to separate the types (TypeScript related) from your JavaScript code source

Going further with the rollup configuration

To obtain this result, you can configure the TypeScript compiler (tsconfig/json) to generate the declaration files in a dedicated directory

{
    "compilerOptions": {
        "declaration": true,
        "declarationDir": "dist/types",

Then you can configure rollup to use this setting for its TypeScript plugin

{
    plugins: [
        typescript({
            typescript: require("typescript"),
            useTsconfigDeclarationDir: true
        }),
        ...
}

Build task

We launch rollup to build our project:

yarn rollup -c

The result is available in the dist folder (as defined in the rollup output config)

Note: Our full build script launches rollup then copy our package.json to the dist folder.

Publishing

We are now ready to publish our package to NPM so it will be available as a dependency to another project (using npm install / yarn add). NPM has different mechanisms to specify the files to package and publish:

Blacklist strategy (.gitignore + .npmignore )
Whitelist strategy: files configuration in package.json

These 2 approaches are somewhat different, and one is not necessarily preferable to the other. However, the blacklisting strategy tends to be riskier as it exposes files that are sensitive or not relevant. Keep in mind that some files are always included, regardless of settings:

package.json
README
CHANGES / CHANGELOG / HISTORY
LICENSE / LICENCE
NOTICE
The file in the “main” field

In package.json:

{
    "files": [
        "tokens/**/*",
        "**/*.{scss,d.ts,js.map,svg,png,woff,woff2}",
        ".stylelintrc.js"
    ]

Tip: It’s not easy to visualize files that are packaged by NPM (npmjs.com doesn’t list files of a module) To do so with your local project, execute the following command in the directory containing your package.json file:

npm pack && tar -xvzf _.tgz && rm -rf package _.tgz

It will output the files that will be included in the publishing process

Testing/using locally

When importing our components in a tier-project and using them in a real context, we can often encounter conflicts (from tier css rules for example) and unintended behavior, or simply discover some bugs. We can’t afford to wait for our components to be published to encounter those issues. We must have a way to test our components in a tier project before publishing to NPM We can use the npm/yarn link mechanism to symlink our component project and add it to the node_modules of another one. However, such a mechanism doesn’t allow us to perfectly reflect how our project will be packaged by NPM and deployed in another project. Plus, some configuration files that would not be present in the node modules (filtered by the NPM publishing process) can interfere. To be closer to the real process (how our package will be published and installed), we used an advanced tool named yalc.

yalc allows to package a project and add it as a node module like NPM would do, but in a local store.

Conclusion

We can now use our design system assets in any frontend projects. There are natural advantages that come with using a centralised library: single source of truth, reduced maintenance cost, etc. But using a centralised library also enables us to invest more in our design system, which in turn makes it easier for the design system to be adopted company-wide.

How Kotlin's Coroutines help us to deal with Bluetooth

Romain Guefveneu — Tue, 16 Jul 2019 00:00:00 +0000

At Drivy, we want to enable users to open the car even if it’s on the bottom floor of the deepest, underground parking. Since we can’t rely on a GSM connection when so deep underground, we need to use a Bluetooth connection.
But communicating with a Bluetooth device is easier said than done, due to the fact that it’s low-level and requires many asynchronous calls. Let’s see how we can improve this.

Bluetooth 101

Bluetooth communication is not exactly like HTTP communication. We don’t have URLs or ports. All we have are services and caracteristics. And UUIDs, lots of UUIDs.

According to the official doc, Bluetooth GATT services are collections of characteristics and relationships to other services that encapsulate the behavior of part of a device. So basically a service is a set of characteristics.

According to the same official doc, “Characteristics are defined attribute types that contain a single logical value.”
Characteristics are where the data is, that’s what we want to read or write.

Last thing, services and characteristics are identified by UUIDs.

Bluetooth callbacks

On Android, a Bluetooth device communicates with us via a BluetoothGattCallback:

abstract class BluetoothGattCallback {
    fun onConnectionStateChange(gatt: BluetoothGatt, status: Int, newState: Int) {}

    fun onServicesDiscovered(gatt: BluetoothGatt, status: Int) {}

    fun onCharacteristicRead(gatt: BluetoothGatt, characteristic: BluetoothGattCharacteristic, status: Int) {}
    
    fun onCharacteristicWrite(gatt: BluetoothGatt, characteristic: BluetoothGattCharacteristic, status: Int) {}
    
    [...]
}

Here is our issue: when we write a characteristic to the Bluetooth device to send a command, we want to wait for the device’s acknowledgement to continue. In other words, we want to communicate synchronously with the device.
To do so, we need to block the execution until onCharacteristicWrite is called back for my characteristic.

Kotlin’s coroutines and channels

Coroutines are a great tool for dealing with asynchronous calls. Combined with channels, we have here the perfect tools to communicate synchronously with a Bluetooth device.

Here is a simple “Hello World!” using a channel:

suspend fun main() = coroutineScope {
    val channel = Channel<String>()

    launch {
        delay(3000L)
        channel.offer("World!")
    }

    println("Hello ${channel.receive()}")
}

channel.receive() will wait for the channel to have something to offer. In this way, “Hello World!” will be displayed 3 seconds later.

Bluetooth callbacks, Coroutines and Channels

What we need is a way to wait for the device acknowledgment before sending another command. We’ll use a coroutine and a channel to achieve this.

The channel setup

We’ll use a channel of BluetoothResults, a data class composed of the characteristic’s UUID and value, and the event status:

data class BluetoothResult(val uuid: UUID, val value: ByteArray?, val status: Int)

Each call to onCharacteristicRead or onCharacteristicWrite will offer to the channel a BluetoothResult:

private val channel = Channel<BluetoothResult>()

private val gattCallback = object : BluetoothGattCallback() {
    
    override fun onCharacteristicRead(gatt: BluetoothGatt, characteristic: BluetoothGattCharacteristic, status: Int) {
      channel.offer(BluetoothResult(characteristic.uuid, characteristic.value, status))
    }

    override fun onCharacteristicWrite(gatt: BluetoothGatt, characteristic: BluetoothGattCharacteristic, status: Int) {
      channel.offer(BluetoothResult(characteristic.uuid, characteristic.value, status))
    }

}

We now need a function that will wait for the channel to have a matching BluetoothResult to offer:

private suspend fun waitForResult(uuid: UUID): BluetoothResult {
    return withTimeoutOrNull(TimeUnit.SECONDS.toMillis(3)) {
        var bluetoothResult: BluetoothResult = channel.receive()
        while (bluetoothResult.uuid != uuid) {
            bluetoothResult = channel.receive()
        }
        bluetoothResult
    } ?: run {
        throw BluetoothTimeoutException()
    }
}

This waitForResult function will wait for the channel for 3 seconds, or throw a custom BluetoothTimeoutException.

Then, we’ll use a new BluetoothGatt.readCharacteristic function that will wait for the response, via the channel:

private suspend fun BluetoothGatt.readCharacteristic(serviceUUID: UUID, characteristicUUID: UUID): BluetoothResult {
    val characteristic = getService(serviceUUID).getCharacteristic(characteristicUUID)
    readCharacteristic(characteristic)
    
    return waitForResult(characteristicUUID)
}

Et voilà! Now we can communicate synchronously with a Bluetooth device:

val gatt : BluetoothGatt = connectToBluetoothDevice()
try {
	val result = gatt.readCharacteristic(MY_SERVICE_UUID, MY_CHARACTERISTIC_UUID)
} catch(e : BluetoothTimeoutException) {
	Log.e("Bluetooth", "Can't communicate with the device.", e)
}

Things to consider when choosing a third-party API

Christophe Yammouni — Tue, 02 Jul 2019 00:00:00 +0000

External third-party-services APIs are useful: they allow you to benefit from the expertise and knowledge that others have acquired on a specific subject - a subject which is not your area of expertise and not the problem in hand. It would take too much time and effort to build and maintain such a service yourself.

However, choosing an API isn’t always an easy task.
Indeed, your choice will have an impact on your codebase and database architecture, and even your service itself. Imagine if your third-party payment-service went down: your customers wouldn’t be able to buy your products.

There are a lot of services that are alike providing APIs, with similar pricing and features.
So, how can you be sure you are making the right choice?
I’ll try to list all the different questions you need to ask yourself before implementing an API, covering documentation, libraries/SDKs, support, pricing, data privacy, and maintenance.

Documentation

Most of the time, when you need to choose a third-party library and there’s no documentation available, you can have a look at the source code to understand how it works and evaluate the code quality.

For a third-party-service API, as you won’t have access to the source code, you need to find a way to understand how it works, and to evaluate the implementation complexity.

Proper documentation should give you an idea of how your implementation looks. Most of your questions should be answered.

What’s the data format (JSON, XML etc.)?
Which authentication mechanism is required (basic auth, API token, etc.)?
How are successes and errors rendered?
Which and how many calls do I need to complete my task?
For inputs and outputs, which attributes are required? Which type should they be, what do they mean and are they any examples?
If there are any string values with a specific format such as date, time, and country, then, which convention do they follow?

Some documentation also provides live querying tools, enabling you to run tests to ensure documentation and code are aligned. If it does not, be sure to find a way to test it before implementing it, because you can find some significant differences between documentation and production APIs.

Libraries

Having an SDK to consume an API can save you a lot of time, but be cautious about them.

Is the SDK available in your language?
Does it follow the latest API version?
Does it handle all the features you want?
Does it provide useful feedback for errors?
Does it have any dependencies? Are they up-to-date? Beware that they might conflict with yours.
What size is it? For example, this can be a red flag for mobile apps: you don’t want to double the size of your app, just because of a third-party SDK.
If it has not been written by the company providing the API, be sure it’s mature enough and well maintained. Have a look at open issues, too.

Consistency

While you’re having a look at documentation and the library, make sure everything is consistent. Lack of consistency can mean the underlying code quality is poor, and you’ll have trouble implementing it.

If it’s a REST API, make sure it follows the principles.
If the output is formatted in JSON, make sure all outputs are in JSON - even errors.
Check the cases of attributes (ex: Snake case, Camel case), and make sure all endpoints follow the same.
Does the naming of endpoints and attributes/parameters make sense to you?

Support

Technical support can help a lot when you need your questions answered, whether it’s for implementation questions, or when you think there’s an outage.

Does your pricing plan include any phone support?
If there’s phone support, is it available 24/7? If not, is it compatible with your time zone?
If there’s no phone support, do they answer mail quickly?
Are the answers relevant?
Are there any community forums? Are they active?
Does a status page exist? Are outages frequent, and if so, are they well explained?

Reputation

Doing some research on the Internet can help you avoid surprises: find out the frequency of breaking changes and outages, or get news about the selling of the company providing the service and save yourself a migration.

Can you find any resources on the internet aside from the provider’s page?
Do the articles about the service tend to be negative or positive?
Is it used by well-known companies?

Pricing

This is a tough one because when you compare API pricing, you could think one is cheaper than the others. However, digging deeper can make you realize the cheaper one is actually more expensive on higher volumes.

For a start, look beyond the pricing plan that suits your needs: you’ll scale eventually, and you’ll quickly realize that adding a new user or making a few extra requests will become so expensive that you’ll need to migrate to another service anyway.

Is the price per month or per requests?
If it’s per request:
- What’s the request limit? Does it fit your expected usage?
- How much are you charged per additional request?
- Can you monitor usage?
How many calls are you allowed to make per hour/day/month? It can differ depending on the plan.
If there’s phone support, make sure it’s included in your plan.
If there’s a free plan, don’t hesitate to try it before you pay for it.

Data privacy

Whether you are concerned by the GDPR or not, if any personal data transits between your service and the provider, you should have a look at the data privacy policy.

Does it have a privacy policy page?
For how long will they keep your data on their servers?
Will your data be shared with third parties?
If they provide an SDK, does it have dependencies on tracking libraries? If this SDK is implemented on the client side, it means that you need to ensure no data that can be used to identify a person is given.
Is communication between you and the provider secured enough? Take a look at authentication and communication protocol.

Maintenance

Even if all the lights are green and you are confident with your choice, have in mind that the third-party API you are integrating will be replaced eventually.

To ease up the replacement, design your integration with the migration process in mind.

Encapsulate all the provider’s related code in abstraction layers (Ex: Remove the company reference from the method names).
Remove any provider’s related naming from your table/column names.
If you have some extra time, add multi-provider support, to do a progressive migration or fallback support.
- For example, if you have to integrate a push notification service’s API, design your system so it can support having two providers at the same time.

To sum up, this is by no means an exhaustive list of the problems you might encounter when integrating third-party services. Spoiler: APIs are always tricky. This article feeds off our own experiences and struggles. It explores the trade-offs to be made, between investing time and energy in building and maintaining a service yourself, or choosing to benefit from the expertise and knowledge of a third-party service when you integrate their API - warts and all.

Design system and API-Driven UI

Renaud Boulard — Wed, 05 Jun 2019 00:00:00 +0000

Why?

For some time now, we have been heavily relying on our API to display formatted content on our apps API Driven Apps. It enables us to be more agile, by shipping new features faster and easily iterating on them without updates. Recently, we pushed this paradigm even further, generating complete native views from the API with our design system.

There are several advantages to this, you can:

Build a brand new screen with almost no mobile development if all the visual components already exist
Easily add or remove components on a given screen without the need of an app update
Run A/B testing on your screens to see what works best, by moving or adding/removing component

Plus, you build your new features based on native views to keep the best experience for the end users. Let’s see how its work.

Design system

A design system is a collection of components that can be reused in different combinations to build your UI. It also includes colors, spacing and typography specifications. Design systems allow you to manage design at scale in order to build consistent websites and applications. Every component has its own purpose, and can itself show or hide a subset of information depending on the context.

Here’s an example of what the components in the Play Store App would be:

Play Store App components

At Drivy we have built our own design system in order to develop consistent UI across all screens of the application. Our design team has worked on a series of components for both iOS and Android in order to respect the specific guidelines of each OS. Every component has a name and an associated custom view in our code base.

Here is a sneak peek of what it looks like:

Mobile design system component

API Driven UI

The purpose of these components is to display information to our users. This information will most of the time come from a call to our API. So we decided to associate a JSON schema to every component of our design system.

Let’s take a simple component:

`basic_subtitled` component

There are 3 pieces of information in this component:

Icon
Title: Drivy Open
Subtitle: This is a self-service car.

The associated JSON schema will look like this:

{
"type":"basic_subtitled",
"title":"Drivy Open",
"subtitle":"This is a self-service car.",
"icon_url":"https://drivy-assets.imgix.net/icons/open_badge.png"
}

Each component has a type property, defining what kind of component it is. Here comes the magic: we can now have a list of components returned by the API, in order to build an entire screen. The type is used to deserialize the appropriate object and bind it to the appropriate custom view associated to the components.

Let’s take a example with our car details screen:

Each component has a dedicated type, which gives us the ability to build an entire screen from the API. Also, `Components` are grouped by `Sections` to help organize content hierarchy.

Concrete example

We have commercial vans on our platform. A few months ago, we decided to add all the dimensions of the vans to help our users know exactly what kind of furniture they can carry with the van.

As you can see below we were able to add a complex composant to the screen:

This change only required an update of the API, it’s win-win for everyone:

Customer: All of our customers can now see the van dimensions, no matter which version of the application they have
App developer: No mobile update is required, which means no iOS or Android development
Your Team: Only one backend developer is required, it’s time saved for the team
The App: It’s a native feature

Drawbacks

As with every software solution there are some drawbacks, this will no be the perfect solution for all of your screens:

When updating a component, you should take care not to break the API for the previous version of the component
As usual when it comes to offline mode, you must save the data for the information, but you should also save the list of component return by the API otherwise your screen will be empty
You cannot have a completely different layout for the landscape mode or tablet

Conclusion

We have been using this technique for a few months, and we have already seen the benefits for some features. There is a small start up cost of setting it up in your app and API, but we definitely think that on a long terms basis, it’s a valuable and powerful tool for quickly building a new feature, in a scalable way.

Not all of the screens in your app will match with API/design system technique, but when it comes to a new screen you can ask yourself the following questions:

Will this screen need frequent updates/iterations?
Do we want to run A/B tests?
Does this feature get its information from an API?

If the answer to all of these questions is YES, it could be a good solution to go for.

Embracing or banishing randomness

Nicolas Zermati — Wed, 15 May 2019 00:00:00 +0000

Writing tests is becoming a big part of our job. If it isn’t yet, I strongly encourage you to push your organization down that path. Why could be the topic of another article.

I think there is a tremendous value in having an efficient test-suite. By efficient, I mean that it doesn’t give much extra work when refactoring and it gives accurate information when something is broken. And by accurate, I mean having as few false positive as possible, as many defects being caught as possible, and as few tests failing as possible for a single defect.

As important as tests are to me, I don’t give as much attention to tests as I give to production code… In my reviews, I tend to have lower standards when looking at the tests. For instance, I won’t ask for a refactoring of the tests as long as they seem to be testing the behavior that just changed. It leads to heterogeneous practices. And on some topics, we simply disagree!

This article will be about a controversial topic and will try to show the benefits of using randomness in your tests. I will also cover some of the downsides too and if you have more points you would like to add, please ping me on Twitter.

Context

The examples in this article will follow a feature and its testing journey. Here is a description of the feature:

We consider the duration of the rental to be the number of 24-hour chunks between its start time and its end time. When a trip spans across more calendar days than its number of 24-hour chunks, we would like to use the pricing of the car for the most relevant days. For instance: if a trip starts at 2pm and finishes at 8am the next day, we would like to consider the pricing of the car for the first day to be from 2pm to midnight.

Here we’ll look at the development of the tests written to test the #date_range method. This method gives the relevant days we should consider in order to price the trip.

Use-case based approach

In this context, in order to clarify things between the product owner and the development team, some examples were created and agreed upon before the code was created. Those examples were translated into the following test-cases by the developer:

subject(:date_range) do
  described_class.date_range(starts_at, ends_at)
end

let(:starts_on) { starts_at.to_date }
let(:ends_on) { ends_at.to_date }

let(:ends_at) { starts_at + duration }

context "when the duration is less than 24 hours" do
  context "when the start and the end time are on the same day" do
    let(:starts_at) { Time.zone.parse("2018-06-01 07:00") }
    let(:duration)  { 13.hours }

    it "returns a range including only the day the trip started" do
      is_expected.to eq starts_on..starts_on
    end
  end

  context "when the trip spans across 2 calendar days" do
    context "when the majority of the trip happens on the first day" do
      let(:starts_at) { Time.zone.parse("2018-06-03 11:00") }
      let(:duration)  { 22.hours }

      it "considers only the first day" do
        is_expected.to eq starts_on..starts_on
      end
    end

    context "when the majority of the trip happens on the second day" do
      let(:starts_at) { Time.zone.parse("2018-06-01 20:00") }
      let(:duration)  { 23.hours }

      it "considers only the second day" do
        is_expected.to eq ends_on..ends_on
      end
    end
  end
end

context "when the duration is between 24 and 48 hours" do
  context "when the trip spans across 2 calendar days" do
    let(:starts_at) { Time.zone.parse("2018-06-01 11:00") }
    let(:duration)  { 36.hours }

    it { is_expected.to eq starts_on..ends_on }
  end

  context "when the trip spans across 3 calendar days" do
    context "when a majority of time is spent on the last day compared to the first day" do
      let(:starts_at) { Time.zone.parse("2018-06-01 18:00") }
      let(:duration)  { 45.hours }

      it "excludes the first day" do
        is_expected.to eq (starts_on + 1)..ends_on
      end
    end

    context "when a majority of the rental's total time is on the first day rather than on the last day" do
      let(:starts_at) { Time.zone.parse("2018-06-01 10:00") }
      let(:duration)  { 48.hours }

      it "excludes the last day" do
        is_expected.to eq starts_on..(ends_on - 1)
      end
    end
  end
end

I rewrote the test names as the ones we had were Example 1, Example 2, and so on. They were extracted from a spreadsheet of use-cases the product team gave us.

What you may see here is that those examples describe some use-cases that we believed would be enough to ensure that the implementation was correct: ie to cover all cases. And it actually covered the given specifications correctly. And the implementation was making all tests green. Unfortunately, the whole team forgot about this one:

context "when the duration is less than 24 hours" do
  context "when the trip spans on 3 days (because of daylight savings)" do
    let(:starts_at) { "2018-03-24 23:30".in_time_zone("Europe/Paris") }
    let(:duration)  { 24.hours } # Produces this time: 2018-03-26 00:30

    it "excludes the first and last days" do
      is_expected.to eq (starts_on + 1)..(ends_on - 1)
    end
  end
end

Because of daylight savings in some time zones, we could have one trip that spans across more than N + 1 calendar days, where N is the number of 24-hour chunks between starts_at and ends_at. The first lesson here is to be really careful about the edge cases.

While in this example it does look like an edge case, it was actually a bit more common. We have an extra rule that allows a trip starting from 10am and finishing at 11am the next day to be considered as a one - rather than two - day trip.

Approaching tests from a different angle

The point of the article is to show that without being more clever, we could leverage another strategy to explore the expected behavior and detect that missing use-case from earlier.

subject(:date_range) do
  described_class.date_range(@starts_at, @ends_at)
end

context "when the trip spans over the same number of days than its duration" do
  add_constraint { @trip_span_size == @number_of_days }
  it { is_expected.to eq @starts_on..@ends_on }
end

context "when the trip spans over one more day than its duration" do
  add_constraint { @trip_span_size == @number_of_days + 1 }
  
  context "when the lowest amount of time is spent on the last day" do
    add_constraint { time_spent_on(@starts_on) >= time_spent_on(@ends_on) }
    it { is_expected.to eq @starts_on..(@ends_on - 1) }
  end

  context "when the lowest amount of time is spent on the first day" do
    add_constraint { time_spent_on(@starts_on) < time_spent_on(@ends_on) }
    it { is_expected.to eq (@starts_on + 1)..@ends_on }
    end
end

context "when the trip spans over two more days than its duration" do
  add_constraint { @trip_span_size == @number_of_days + 2 }
  it { is_expected.to eq (@starts_on + 1)..(@ends_on - 1) }
end

# This method is called for each test until the result meet all the constraints.
# If a context doesn't meet any branch of the constraint tree, then it raises an
# error telling you what context you may be missing.
def generate_context
  @starts_at = random_datetime
  @duration = random_trip_duration
  @ends_at = @starts_at + @durationlike
  @number_of_days = Rational(@duration.to_i, 1.hour.to_i).ceil
  @starts_on = @starts_at.to_date
  @ends_on = @ends_at.to_date
  @trip_span_size = (@ends_on - @starts_on + 1)
end

def time_spent_on(day)
  day = day.in_time_zone(@starts_at.time_zone)
  from_time = [day.beginning_of_day, @starts_at].max
  to_time = [@ends_at, day.end_of_day].min
  to_time - from_time
end

# Below are some shared helpers that could be reused everywhere.

def random_datetime
  time_zone = ActiveSupport::TimeZone::MAPPING.values.uniq.sample
  datetime = ActiveSupport::TimeZone[time_zone].local(
    rand(2010..(Time.zone.now.year + 2)), # year
    rand(1..12),                          # month
    1,                                    # day
    rand(0..23),                          # hour
    [0, 30].sample,                       # minute
    0,                                    # second
  )
  day_offset = (0...(datetime.end_of_month.day)).to_a.sample # randomize the day
  datetime + day_offset.days
end

def random_trip_duration
  rand(1.second..30.days)
end

Here the add_constraint and generate_context are features that doesn’t exists yet. If you’re interested to work on implementing them, let me know!

Using that kind of approach leads to fewer examples, and to ones that are more meaningful. Now, the team needs to find properties that the subject under test should respect given a certain context.

The product and the developer must, together, come with both those contexts and properties. They force us to clarify our thinking. Here it means that we reformulate relevant days from the original specification. The context and properties forces us to extract the domain related concepts of number_of_days, trip_span_size and time_spent_on which could help to model the problem and maybe lead to a clearer solution.

Random generators can be shared across the application. Custom generators for any value of your domain must be available, very much like factories would be.

If it was that great, everyone would be doing it, right?

Caveats and workarounds

Coding the logic twice

In this appoach, we need to use elements from the context (such as @starts_on, @ends_on) to compute the expected results. What prevents me from making a mistake in both the expected value computation and the production code?

The use-cases approach is simpler to setup and less risky to write because it focuses on a single and fixed context. Even when the context isn’t fixed, we could use constraints on it in order to reduce the complexity of the expected result computation.

In the examples, the arithmetic on start and end dates are the same in term of complexity.

Too much generalization

The obvious difference between the two approaches is that the use-cases are really close to reality while the one using randomness forces us to come with well-structured rules and a more generalized approach. Driving the implementation from the use-cases may be more natural for TDD practitioners. The use-cases are needed in order to find relevant properties and contexts. Thus, use-cases are still mandatory in the process.

Not bad but… what about determinism

Using randomness is something that many people are afraid of. They may feel that they are losing control, that their test suite is gonna start slowing them down. Here are two remarks that are deep enough to, maybe, make you reconsider:

The tests are random as soon as impure functions are used such as Date.current.
The tests are random since they are randomized at programming-time by the developper *.

Those remarks implie that there are various classes of randomness. One is comming from impure functions either in the tests or in the production code. Those could lead to flaky tests.

Another one, introduced purposefully, which is here to help us to discover failures, to reveal inconsistencies in our thinking, and to detect unexpected behaviour as soon as possible.

Reproducing failures

Your tests will run on CI and will give you failures. Once spec fails, it isn’t obvious what the generated inputs were. Being able to understand and reproduce a failure is critical.

In the example, the context is lost upon failure. It is simple to get that context and it would give us a good hint as to what’s going on. Here is an example:

def must_equal(value)
  expect(subject).to eq(value), <<~MSG
  	Expected #{subject} to eq #{value} while using:
    - Starts at: #{@starts_at}
    - Ends at: #{@ends_at}
  MSG
end

# Replace this:
it { is_expected.to eq (@starts_on + 1)..(@ends_on - 1) }

# With:
it { must_equal (@starts_on + 1)..(@ends_on - 1) }

I’m also experimenting with a custom pseudo-random generator that would use a different seed for each test and, in case of a failure, would display that specific seed to you. This experiment is a bit raw at the moment but lives in Github’s nicoolas25/fuzzier repository. It would look like this:

def random_datetime
  time_zone = Fuzzier.sample(ActiveSupport::TimeZone::MAPPING.values.uniq)
  datetime = ActiveSupport::TimeZone[time_zone].local(
    Fuzzier.rand(2010..2020),
    Fuzzier.rand(1..12),
    1,
    Fuzzier.rand(1..23),
    Fuzzier.rand(1..59),
    Fuzzier.rand(1..59),
  )
  day_offset = Fuzzier.rand(0...(datetime.end_of_month.day))
  datetime + day_offset.days
end
  
def random_trip_duration
  Fuzzier.rand(1.second..30.days)
end

When an error occurs, it will output an integer, lets say 12345 that can be used to reproduce the same randomness:

it "has as many days as the number of days of the trip", fuzzier: 12345 do
  # ...
end

The faker gem provides something similar with Faker::Config.random.rand.

Using only one generation

This approach is very similar to property-based-testing. The difference is mostly that we don’t try many input sets on those examples; only one. But because tests run quite often, we end with way more use-cases over time. Solutions like Rantly fully embrace property-based testing and provide more tools including the ability to run a test against many input generations.

Because I see this approach more like an exploration tool, we could try to run a given test many times to be more confident that nothing could go wrong. It would look like this:

1_000.times do
  it "has as many days as the number of days of the trip" do
    # ...
  end
end

Doing that exploration may show you some use-cases you missed and give you more confidence that the properties you specified truly match the requirements.

When to use it

I think using this kind of approach has multiple benefits:

Conciseness & expressiveness of the specifications, as we don’t test samples but we specify the expected behavior using the language of the problem.
Adaptive and dynamic examples over the life of the test suite, as the test will run against new domain values as they are introduced in the application over time.
Better maintainability, as we can reason about properties rather than a long list of examples.

I wouldn’t recommend this approach for integration testing where the goal is rather to secure well-known paths rather than explore all the possible cases. Also, I think about UI tests as a place I wouldn’t like randomness. You may want to compare screenshots of your application and that would be harder if the content was changing.

But, for components where we need its behavior to be fully described, I would consider this approach. I would consider it in addition to the usual use-cases for some edge cases. It forces me to think more about the problem and to have deeper discussions with the business. It can also point me to cases I didn’t think of.

As I said before, this technique can be a bit controversial and I invite you to talk about this with your team and share your opinion!

Your JavaScript can reveal your secrets

Adrien Siami — Thu, 02 May 2019 00:00:00 +0000

Security is hard. It’s often very easy to overlook things, and one small mistake can have a very big impact.

When writing JavaScript, it’s easy to forget that you’re writing code that will be sent in plain text to your users.

Recently I have been doing a bit of offensive security, with a special interest on JavaScript files, to see what kind of information could be retrieved from them.

Here’s what I’ve learned.

Business logic and other business leaks

It’s not uncommon to see some business logic in JavaScript files, especially for frontend-heavy websites.

While this is not a direct security problem, it can tell a great deal about your internals.

It could be a secret pricing function, a list of states that reveal an upcoming feature, or an array of translation strings that uncover some internal tools.

You wouldn’t want your secret algorithms exposed to the face of the world, would you?

Internal API paths

Another interesting find in JavaScript files is API paths.

Frontend-heavy applications need to make calls to an internal API, and often the list of API endpoints is conveniently stored in an Object in one of the JavaScript files.

This makes the work of security searchers very easy as they have access to all endpoints at once. Some endpoints are maybe deprecated but are still showing in the list: this is more attack surface for a security searcher.

Access tokens

This one is really bad, but is really not that uncommon.

In JavaScript files, I’ve found the following:

AWS S3 id and secret key giving anyone full control over a S3 bucket
Cloudinary credentials giving anyone full control over the bucket
A CircleCI token, allowing me to launch builds, view commit history, and more
Various other third party API keys

Those are often found in the admin / internal JS files. Developers may think these files won’t be served to regular users so it’s fine to put sensitive information inside, but more often that not, it’s easy to get access to those files.

Getting to the interesting files

The interesting files are often the ones not intended for regular users: it can be an admin part, some internal tools, etc.

Every website has a different JS architecture. Some will load all the JS in every page, some more modern will have different entry points depending on the page you are visiting.

Let’s consider the following:

<script src="/assets/js/front.js"></script>

It’s very trivial, but in this case, one could try to load back.js, or admin.js.

Let’s consider another example:

<script src="/static/compiled/homepage.d1239afab9972f0dbeef.js"></script>

Now this is a bit more complicated, the file has a hash in its name so it’s impossible to do some basic enumeration.

What if we try to access this url: https://website/static/compiled/manifest.json?

{
  "assets": {
    "admin.js": "admin.a8240714830bbf66efb4.js",
    "homepage.js": "homepage.d1239afab9972f0dbeef.js"
  },
  "publicPath": "/static/compiled/"
}

Ooops! In this case this website is using webpack, a famous assets bundler. It is often used with a plugin that generates a manifest.json file containing the link to all assets, which is often served by the web server.

If you manage to find which tools a website is using, it’s easier to find this kind of vulnerabilities.

How to protect yourself

Here are a few tips to avoid being vulnerable to this kind of attacks:

Consider your JavaScript code public, all of it
If you really need access tokens in the front-end, get them via (secure & authenticated) API
Know your front-end toolbelt well to avoid basic attacks (manifest.json example)
Regularly audit your front-end code and look for specific keywords:
- secret
- token, accessToken, access_token, etc
- your domain name, for possible API urls
- your company name, for possible 3rd party credentials

Conclusion

Security issues can come from a lot of unexpected spots. When writing any kind of code, when pasting sensible data, it’s always good to ask yourself who will have access to this code, to avoid leaking all your secrets!

Sorbet: A Ruby type checker

Antoine Lyset — Fri, 19 Apr 2019 00:00:00 +0000

This article is aimed at beginner Rubyists who want to understand what the fuss around type checking is all about. It can also be relevant for more experienced developers who might be interested in using Sorbet and learning why it’s a bit special.

First I need to say that Sorbet has not been released yet (a preview version is available). Stripe is improving it internally and some other companies are testing it. We can still talk about it because it should be open-sourced in the coming future (they said summer 2019) and it’s nonetheless very interesting. This blogpost is the result of watching talks, and reading articles, Twitter feeds and the official website. It may contain some small mistakes and some parts may be obsolete when Sorbet will be released.

What’s a type checker and what do I need to know?

To understand Sorbet we first need to understand what a type is. A type is a definition applied to a part of our program (this part can be a variable or a function for example). This definition usually says something like “this variable is a String” or “this function returns an Integer”. A type checker will enforce these definitions by raising an expection if it finds an incoherence. An incoherence can be something like “this variable is of type String and you try to call the method #map on it but this method does not exist on type String so this is incoherent”, and then it will raise an exception. This exception can be raised at runtime when the program is launched (this is called dynamic typing) or just by analysing the source code without executing it (this is called static typing). The tool that will enforce these types is called a type checker.

There are a lot of different type checkers and it’s a large research field. We don’t need to understand Type Theory (one of the mathematical theories used by type checkers) to enjoy their use. I will just focus on Sorbet and describe what you can do with it.

Gradual type checking with runtime checks

Sorbet is both a static and a dynamic type checker. It will catch wrong definitions as early as possible by analysing the source code (you should run it in your editor and/or before releasing your code). This is particularly useful because Sorbet is fast, it can analyze 100kloc/sec (Rubocop is around 1kloc/s for comparison), so it will find bugs instantly before you even launch your tests.

The more interesting and specific side of Sorbet is that it will run side by side with your Ruby code, verifying types at runtime. Sorbet’s creator decided to implement this because Ruby is a very dynamic language and a lot of Rubyists write code that will generate code.¹ Plus, Sorbet is a gradual type checker.

A gradual type checker is a special kind of type checker because you don’t need to add type annotations to all your code to use it. You can start small, just use it in some parts of your code then extend its usage gradually when you feel the need. Actually Stripe even added a tool to Sorbet to find which parts of your code you should type check to have the most impact. You may think that these runtime checks are costly, but it does not seem like it ², and you can be sure that since Stripe is using it in production, performance problems are taken very seriously.

How to use it

First some typing.

# typed: true
extend T::Sig

sig do
  params(time: Time)
    .returns(String)
end
def format_time(time)
  label = "Time is : "
  formatted_time = time.strftime("%M:%H")
  label + formatted_time
end

Runnable Link

As you can see, Sorbet is just plain Ruby. First you add a # typed: true comment to instruct Sorbet that it’s a typed file (there are other values than true for different levels of strictness). Then you extend the object where you want to use it. Finally, you can call sig (short for signature) to define which types are your params and what the type of your returned value would be. This signature is applied to the definition of the next method.

sig takes a block as a parameter and in this block you use params that you chain with returns. These params and returns methods are the core of Sorbet.

Here I defined the params of my method format_time to be a Time and the return type to be a String. As you can see I didn’t have to type label because Sorbet can infer types and this makes it way more practical and less verbose than some other type Systems.

Bye “NoMethodError:”

In the next bit of code we have an ActiveRecord-like Model with a .find and a #plate_number. This example simulates a common use-case where you query a record and ask for one of its attributes.

# typed: true
class Car
  extend T::Sig
  
  sig do
    params(attributes: {id: Integer, plate_number: String})
      .void
  end
  def initialize(attributes)
    @attributes = attributes
  end
  
  sig do
    params(id: Integer)
      .returns(T.nilable(Car))
  end
  def self.find(id)
    # We are simulating some kind of Database Query
    if id == 1
      new({
        id: 1,
        plate_number: "1234"
      })
    end
  end
  
  sig { returns(String) }
  def plate_number
    @attributes[:plate_number]
  end
end


car = Car.find(1)
car.plate_number

Result:

editor.rb:35: Method plate_number does not exist on NilClass component of T.nilable(Car) 
    35 |car.plate_number
        ^^^^^^^^^^^^^^^^
  Autocorrect: Use `-a` to autocorrect
    editor.rb:35: Replace with T.must(car)
    35 |car.plate_number
        ^^^
Errors: 1

Runnable Link

When we type check it with Sorbet , it warns us that we didn’t handle the case where we don’t find a Car. The message is pretty clear and it even recommends that we use a special method T.must. This will enforce at runtime that we always have a Car. This may not be what we want and we can handle the case ourselves by adding something like:

car = Car.find(1)
if car
  car.plate_number
else
  "A plate number's placeholder"
end

Runnable Link

And now Sorbet is happy. It understands the if ... else and there is no more risk of errors.

More than a type checker

Code autocomplete thanks to Sorbet

Sorbet is not only a type checker, it’s a tool suite around types. For example there is a LSP server, it enables developers to easily implement code autocomplete, go to definition and all kinds of nice things for different editors (Visual Studio Code, Atom, Sublime Text, Vim, Emacs…). So if you’re using Sorbet in your code and in your editor you will have a source of documentation already available that is always true.

A lot more to learn and to come

These are pretty basic examples, but it can go further with Generics or Interfaces. It can even warn us of dead code.

I think Sorbet will really shine in large projects: it will reduce the fear of refactoring by providing instant feedbacks, it is a self-documenting method and it helps reuse someone else’s code. Sure it won’t remove testing but it can reduce some of it and will let us focus on what’s important (and not “What will happen if I put a String instead of an Array here?”).

The Ruby Community is very lucky to have such a big company investing so much effort in a type checker and willing to give it to the community (we are talking about more than 9 months of work by 3 very skilled people). If you want to know more, I really encourage you to check https://sorbet.org/ and to watch this video from Ruby Kaigi :

Footnotes

[1]: To handle some common Ruby metaprogramming techniques (code that generate code), Sorbet is able to “unroll” Ruby code, creating the metaprogrammed methods and type checking them. ↩

[2]: ↩

1) we run it in production;
2) @nelhage measured overhead and the worst case(for method that does nothing) IIRC was under 5%;
3) `sig` supports one more builder method: `.checked(false)` to disable runtime checking;
4) runtime type system erases generics.
— Dmitry Petrashko (@darkdimius) June 4, 2018

From translator to developer

Jordan Jalabert & Emily Fiennes — Tue, 19 Feb 2019 00:00:00 +0000

After working as a teacher and translator for several years, Emily embarked on a new phase in her career by learning a different kind of language: programming.

Emily has worked at Drivy for the past year and a half as a Full-Stack Engineer, after attending the intensive Le Wagon bootcamp in Paris. Here, she shares how she began coding and what life is like for a developer at Drivy.

If you’re considering a career change to become a software engineer, hopefully Emily’s story will inspire you to go for it.

What were you doing before you became a software engineer?

I studied modern languages and translation, and after university worked as a teacher and translator. My freelance translation work led me to work with some startups based in Bordeaux. I was intrigued by what the developers were doing and they pointed me in the direction of sites like Pluralsight, CodeCademy and TryRuby. I was pretty certain that I was going to be completely useless - I hadn’t studied maths or science since the age of 15! But initially, it was like learning another language.

Then I heard about Le Wagon, a 9-week intensive bootcamp to learn to code. After Le Wagon I was determined to find an internship because I knew that I had still had lots to learn. I did a 6-month unofficial internship with another startup, and then, after reading Jean’s Story of a Junior Developer, applied to Drivy.

Describe what you do as a Full-Stack Engineer. What’s a typical day like?

The tech-product team is organised into squads, and I’m in the squad that builds features for owners. Our features focus on acquiring and onboarding new owners, improving fleet quality, and providing tooling for owners. I’m a Full-Stack engineer so often I’ll work on a feature from start to finish, implementing both the client- and server-side code.

Starting out, I was apprehensive about being stuck behind a computer all day but in fact that’s rarely the case… there’s a lot of variety in my days at Drivy. True, some days I might spend the whole day writing code. But there’s always room to ask for help, or to discuss choices and strategies with colleagues.

Other days are a mixture of meetings with product managers to help define feature specs; with members of the design team to discuss implementation of design specs; with my squad to define the roadmap for the coming quarter; or simply with the developers in my squad to elaborate on the technical specs for a feature.

Sometimes I do pair-programming with a more senior developer, which is always an opportunity to learn new tricks. We might refactor something I’ve been working on, build specific skills like writing tests or reviewing pull requests, or doing coding exercises.

Half of my team works remotely, so communication is either be face-to-face or via Slack and video calls. In short, there’s no ‘typical’ day - the only thing that is typical is that I’m constantly learning, and that’s really stimulating.

What do you enjoy the most about your job at Drivy?

There are so many things I enjoy about my job: I genuinely go home on a Friday excited to come back on Monday. I enjoy the balance between human interaction, and losing myself in the feature I’m working on. I enjoy solving problems, big and small, and having something to show for my work at the end of the day.

The pleasure I take from coding is similar to the one I take from translation. I find abstractions to map on to real-world problems and create a vocabulary that represents these abstractions in the digital world, respecting coherence within the codebase and the syntax and grammar of Ruby¹. I have to think about how the code I write might be interpreted by future developers, and resolve the problem in such a way that leaves future options open. I was surprised to learn that writing code is very creative work.

Working at Drivy is very rewarding because we are encouraged to take responsibility for our work. When I work on a feature, I’m not just integrating technical specs that a senior developer has written for me: I’m encouraged to think about the end needs of the user and the unexpected edge cases that might arise, and challenge the product specs if necessary.

I’m surrounded by very talented and experienced people: it really motivates every day to think that one day I might be able to write code that is as sophisticated (which often means beautifully simple!) as they do. That said, everyone is very humble: sometimes I’ll ask a question or challenge something for being too complex, simply because my repertoire of methods - my ‘vocabulary’ - is less broad. I’m never made to feel that my opinion is less valid because I’m junior, and sometimes my suggestion even gets accepted!

Technology is often described as a male-dominated field. Do you feel that women have the same opportunities as men in tech?

I can’t speak for all women, and I’m not sure binary male-female is helpful in a bid to be more inclusive. The question of equal opportunities for men and women is huge and complex. Suffice to say: the men vastly outnumber the women in the tech team, and that speaks volumes about the inequality of opportunities.

But for me the problem came much earlier. I can remember being told aged 13 that I was good at HTML - but I was never told that it would be a career option for me. I was also good at languages, and instead it was suggested I would make a good teacher! I’m sure that those messages, received at such a formative age, contributed to the self-doubt I experienced (and continue to experience!) in my work.

How many women are there in the Engineering Team?

Too few - I’m the only female developer. There are two women in the data team, one of whom is the Head of Data.

How do you keep yourself informed about the latest trends?

I’ve attended a couple of meetups and conferences since I’ve been in Paris. In all honesty, I haven’t attended as many as I would have liked to. This is for two reasons. Firstly, I love my work but by the end of the day I’m exhausted - the learning curve is still steep. But it’s ok that I have other passions and other things I like doing beyond coding - it took me a long time to accept that.

Secondly, even though everyone at the events I’ve been to has been nothing less than welcoming, I still suffer from a lot of self-doubt! It’s easy to worry that you won’t be taken seriously.

There are also a couple of blogs and newsletters I receive regularly and other resources I’ve come across online when trying to find a solution to a problem. And I read the Drivy engineering blog of course!

What advice would you give to a woman considering a career in the tech industry? What do you wish you had known?

I wish I had known that the insecurity and self-doubt will never leave me, but that everyone suffers from that. In fact, I think that can even help make you a good developer, because it means you are constantly asking questions and seeking to improve your knowledge. Being aware of the things you don’t know is really important in keeping up with the changes in tech and sharing knowledge with colleagues.

I worried a lot about my fundamental capacities because of not having a mathematical and scientific background and because of being dyspraxic. I wish I had taken confidence from the fact that my atypical background means I have a unique skill-set that I can bring to my work.

Videla, A., “Programming as translation”, February 2019, https://increment.com/internationalization/programming-as-translation/ ↩

Rails 6 unnoticed features

Alexandre Ferraille — Fri, 15 Feb 2019 00:00:00 +0000

Rails 6.0.0.beta1 is out and you may have already tested it. We all have heard about the main features such as multi-database connectivity, Action Mailbox & Action Text merge, parallelized testing, Action Cable testing etc. But there’s also a ton of other cool features that I found interesting.

Requirements change

With each major release comes new requirements, starting with Ruby which is now required with a minimal version of 2.5.0 instead of 2.2.2 for Rails 5.2. Our databases also get an upgrade with 5.5.8 for MySQL, 9.3 for PostgreSQL and 3.8 for SQLite.

Webpacker as default

Webpacker has been merged in Rails 5.1 and provides a modern asset pipeline with the integration of Webpack for your javascript files. Before Rails 6 you had to generate your app with the --webpacker option to use it, now Webpacker is the default and it’s a good first step for a modern asset pipeline on Rails in replacement of Sprockets - you currently still need it to load your CSS and images.

Environment support for encrypted credentials

Rails 5.1 introduced encrypted credentials: a file containing your passwords, API keys etc., which can be safely shared. All you need to do is to store safely the ENV[’RAILS_MASTER_KEY’]. This created a problem: when you wanted to have different credentials for your environments you were stuck with one shared file across all your environments. This is solved now: you can have a specific encrypted file per environment.

DNS Guard: hosts whitelist

Rails 6 added a new middleware called ActionDispatch::HostAuthorization allowing you to whitelist some hosts for your application and preventing Host header attacks. You can easily configure it with a String, IPAddr, Proc and RegExp (useful when dealing with wildcard domains).

Translations and `_html`

If you’re using the _html suffix a lot for your translation keys, you can refactor a group of keys on the same level by adding _html to the parent and removing it to the children.

Filtering sensitive parameters

If you’re dealing with sensitive data you want to hide from logs, console etc. you can configure ActiveRecord::Base::filter_attributes with a list of String and RegExp which match sensitive attributes.

Time comparisons

Date, DateTime and Time received a bunch of methods allowing us to do comparisons without traditional operators - easier to read:

rental_1.starts_at.after? rental_2.ends_at # => true
rental_1.starts_at.before? rental_2.ends_at # => false

Private `delegate`

You can now delegate a method without exposing it publicly with the new option private.

delegate :full_name, to: :current_user, private: true

`Relation#pick`

If you need to select a column from an ActiveRecord::Relation you can use pluck which will trigger the following query:

SELECT `articles`.`title` FROM `articles`

But if you want the first value you’ll need to add a .limit(1), that’s what pick is doing for you:

SELECT `articles`.`title` FROM `articles` LIMIT 1

Deprecate `update_attributes!` & `update_attributes`

Should we prefer update_attributes or update? update_attributes! or update!? These methods have always been confusing but were nothing more than aliases. No more confusions and consistency issues!

Utf8mb4 as default for MySQL

Users are putting emojis 😀 everywhere, I’m 💯% sure you already got the issue when trying to insert them in your database. Setting Utf8mb4 as default instead of Utf8 solves the problem. It also helps if you need to handle Asian characters, mathematical characters etc. Note that you still need to migrate your old tables manually.

Change system database

Changing database from default SQLite to PostgreSQL (for example) is something that you might need to do at an early stage of your project, and could be painful if you don’t have the proper config files and templates in your Rails app. Now, you no longer need to generate a new Rails app with the proper database to grab the files you need, rails db:system:change is here.

`create_table`: `:if_not_exist` option for migrations

When running, rollbacking and updating migrations it can be a mess and sometimes you need to manually clean your database in order to run a create_table which already physically exists. Now, you can bypass a create_table block in this case using :if_not_exist option.

Navigating in the session object

Using and navigating in the session object can be painful if the keys/sub-keys you’re looking for are not defined. Fortunately, Rails 6 is adding a Hash-like method called dig, allowing you to safely navigate in your session object.

`Enumerable#index_with`

New method allowing you to create a Hash from an Enumerable:

%w(driver owner drivy).index_with(nil)
# => { 'driver' => nil, 'owner' => nil, 'drivy' => nil }

%w(driver owner drivy).index_with { |role| delta_amount_for(role) }
# => { 'driver' => '...', 'owner' => '...', 'drivy' => '...' }

`Array#extract!`

This is a new method added to Array which works like reject! but instead of returning an array of the non-rejected values, you get the values which returns true for the block.

my_array = [1,2,3,4]
my_array.extract! { |v| v > 2 } # => [3,4]
my_array == [1,2] # => true

Starting your app

You get an error when starting your rails server? The rails server command no longer passes the server name as a Rack argument but is using -u option. You will start you server this way: rails server -u puma.

Of course, there’s more

This is a non-exhaustive list of things I found fun and/or useful and I encourage you to read the full changelog. A lot of deprecation has been added, previously deprecated methods have been dropped and there’s much more that you might find useful.

Handle disabled mobile data setting on iOS

Thomas Senkman — Wed, 13 Feb 2019 00:00:00 +0000

For an unknown reason, a significant number of users disable the mobile data for our iOS app. This is not a problem when they are booking a car from their couch at home with Wi-Fi, but can quickly become a major issue when they try to unlock their Drivy Open car in the street. At this specific moment, there is very little chance that they remember they disabled this setting, so it very often leads to a call to Customer Services that could have been avoided.

Drivy's settings with mobile data switched off

If we do nothing about this, the users’s resquests would always fail when not on Wi-Fi, and they would see a default error. In our case, they would see the message “An error has occurred”, which doesn’t help them to understand what’s wrong. However, it’s our job to let the users know what they can do to fix the issue.

iOS native implementation

To try to solve this, according to our tests, iOS shows an alert when all these conditions are met:

network is not reachable
mobile data setting is disabled for the current app
mobile data setting value has changed since last app opening

iOS native alert

Its UX is nice, because it redirect the user to the correct screen to update the setting with a single tap. But this doesn’t cover all the cases: if a first Wi-Fi call is successful, even with the mobile data setting disabled, the user will never see the alert.

Because we think it’s not sufficient, we decided to reimplement this alert for each network call that fails because of this specific setting.

Let’s code this

Requirements

The only requirement is to have an iOS 9 app target, as we will use CTCellularData which is only available from this version. To our knownledge there is unfortunately no way to check the network-data setting value before.

Used tools

So since iOS 9, Apple provides in CTCellularData a listener to check the value of the mobile data switch for the current app.

We’ll also need to be able to check for reachability to know if we are in our specific error case. As we use Alamofire in our app, we used their NetworkReachabilityManager, but you can also use another solution like Reachability.swift or even add Apple’s SCNetworkReachability class to your app for this.

Getting the current mobile data setting value

Implementing the listener to get the current value is pretty straightforward:

import CoreTelephony

//...

let cellularData = CTCellularData()
cellularData.cellularDataRestrictionDidUpdateNotifier = { (state) in
  print(state)
}

Getting the reachability status

Same logic here, we just have to explicitly start the listener after setting it:

import Alamofire

// ...

let networkReachabilityManager = NetworkReachabilityManager()
networkReachabilityManager?.listener = { [weak self] state in
  print(state)
}    
networkReachabilityManager?.startListening()

Data availability

Checking reachability can be done at any moment, but is not instant. So if you init the NetworkReachabilityManager and directly try to get the current status, this will probably fail. To avoid this, and because this does not consume a great deal of memory, we can have our own manager that stores the current value whenever it changes:

import Alamofire

class ApiReachabilityManager {

  private var currentNetworkReachabilityState: NetworkReachabilityManager.NetworkReachabilityStatus = .unknown

  init() {
      let networkReachabilityManager = NetworkReachabilityManager()
      networkReachabilityManager?.listener = { [weak self] state in
        self?.currentNetworkReachabilityState = state
      }    
    networkReachabilityManager?.startListening()
  }
}

When to make the checks

We strongly advise to check both statuses, especially reachability, only when network errors happen. You should never check for reachability before making a network call to potentially avoid making it. If there is an issue with the reachability, this would result in blocking all the network calls of your app. This is even the first “important thing” Alamofire says in their documentation:

Do NOT use Reachability to determine if a network request should be sent. You should ALWAYS send it.

Final implementation

Here is our singleton manager, which contains:

2 publics functions to:
- start listeners
- check current statuses
1 private function to present the same alert as the native one

import Foundation
import CoreTelephony
import Alamofire

class ApiReachabilityManager {
  static let shared = ApiReachabilityManager()
  
  private let networkReachabilityManager = NetworkReachabilityManager()
  private let cellularData = CTCellularData()
  
  private var currentNetworkReachabilityState: NetworkReachabilityManager.NetworkReachabilityStatus = .unknown
  private var currentCellularDataState: CTCellularDataRestrictedState = .restrictedStateUnknown
  
  func start() {
    cellularData.cellularDataRestrictionDidUpdateNotifier = { [weak self] (state) in
      self?.currentCellularDataState = state
    }
    
    networkReachabilityManager?.listener = { [weak self] state in
      self?.currentNetworkReachabilityState = state
    }
    
    networkReachabilityManager?.startListening()
  }
  
  func checkApiReachability(viewController: UIViewController?, completion: (_ restricted: Bool) -> Void) {
    let isRestricted = currentNetworkReachabilityState == .notReachable && currentCellularDataState == .restricted
    
    guard !isRestricted else {
      if let viewController = viewController {
        presentReachabilityAlert(on: viewController)
      }
      completion(true)
      return
    }
    
    completion(false)
  }
  
  private func presentReachabilityAlert(on viewController: UIViewController) {
    let alertController = UIAlertController(
      // TODO: replace YOUR-APP by your app's name
      title: "Mobile Data is Turned Off for \"YOUR-APP\"",
      message: "You can turn on mobile data for this app in Settings.",
      preferredStyle: .alert
    )
    if let settingsUrl = URL(string: UIApplication.openSettingsURLString), UIApplication.shared.canOpenURL(settingsUrl) {
      alertController.addAction(
        UIAlertAction(title: "Settings", style: .default, handler: { action in
            UIApplication.shared.open(settingsUrl)
        })
      )
    }
    
    let okAction = UIAlertAction(title: "OK", style: .default, handler: nil)
    alertController.addAction(okAction)
    alertController.preferredAction = okAction
    
    viewController.present(alertController, animated: true, completion: nil)
  }
}

We need to start the listeners at some point, we’ve chosen to do it directly at app launch, in AppDelegate since our app needs network calls directly:

ApiReachabilityManager.shared.start()

Then, in your view controller, you can simply call the checkApiReachability method in case of error:

func handleError(_ error: Error) {
  ApiReachabilityManager.shared.checkApiReachability(viewController: self) { (restricted) in
    if !restricted {
      // TODO: continue to handle error, there is no network-data issue
    }
    // No need to handle else case as alert has been presented if needed
  }
}

Conclusion

It’s kind of strange to reimplement a native alert, but we were really surprised by iOS’s ~~incomplete~~ basic version, which isn’t that much of a help in our case. We only did this recently so we don’t have enough data to draw conclusions, but we hope this will avoid some calls to Customer Services.

And don’t forget to never rely on reachability before making the actual network call: it should always be an error handling helper.

Ruby tricks for junior developers

Clement Bruno — Tue, 22 Jan 2019 00:00:00 +0000

As a junior developer who started his professional coding journey fairly recently I realized that I only used a limited number of methods and ruby capabilities. Since I got started at Drivy, I have discovered several ruby tricks that helped me make my code more readable and efficient.

The dig method

The dig method can be used on hashes (and Arrays) to, as its name suggests, dig through the potential multiple layers in the object and retrieve the value corresponding to the argument provided to the method.
By using dig you find yourself able to nicely shorten your code and improve overall readability:

# Let's consider the following hash structure
user_info = {
  name: 'Bob',
  email: '[email protected]',
  family_members: {
    brother: {
      name: 'Bobby',
      age: 16
    },
    mother: {
      name: 'Constance',
      age: 55
    },
    father: {
      name: 'John',
      age: 60
    }
  }
}

Formerly, if I wanted to access our user’s brother name, I would have written something like that:

user_info[:family_members][:brother][:name]

But that is considering the fact that I was SURE that each key in fact existed in the hash structure.
For instance, if I executed the following statement, my program would crash because of a non existing key:

user_info[:family_members][:grand_father][:name]
# => `NoMethodError: undefined method `[]' for nil:NilClass`

Therefore, if I wanted to be safe while navigating in my hash structure, I should have written something like that:

user_info[:family_members] && user_info[:family_members][:grand_father] && user_info[:family_members][:grand_father][:name]
# => nil

This is way too long and annoying to write… With the dig method I can simplify this statement a lot:

users_info.dig(:first, :family_members, :brother, :name)
#=> "Bobby"
users_info.dig(:first, :family_members, :grand_father, :name)
#=> nil

Protected methods

Everyone knows about public and private methods but I’ve found that not many people use protected ones.

Neither private methods nor protected ones can be called directly by instances of the class in which the method is defined.
For an instance to access these methods they must be called within a public method defined in the class.

# Let's consider the following classes
class Animal
  def current_activity
    puts "I am #{define_activity}"
  end

  private

  def define_activity
    @activity ||= ["eating", "hunting", "sleeping"].sample
  end
end

Hereabove, the current_activity method could be called by any instance of the Animal class (or any other class inheriting from the Animal class).

# For instance
class Cat < Animal
  def preferred_activity
    "My favorite activity is #{define_activity}"
  end
end

felix = Cat.new
felix.preferred_activity
# => "My favorite activity is sleeping"

However, when a method is private it cannot be called on self within a class… For instance, this would raise an error:

class Animal
  def current_activity
    puts "I am #{self.define_activity}"
  end

  private

  def define_activity
    @activity ||= ["eating", "hunting", "sleeping"].sample
  end
end

kong = Animal.new.current_activity
# => NoMethodError: private method `define_activity' called for #<Animal:0x007ff6881d08b8>

Being able to do so could be especially handy if you wanted to call it on other instances of your class passed as method arguments. For instance it would be useful to be able to do that:

class Animal
  ...
  def same_activity_as?(other_animal)
    define_activity == other_animal.define_activity
  end

  private

  def define_activity
    @activity ||= ["eating", "hunting", "sleeping"].sample
  end
end

fido = Animal.new
bobby = Animal.new

fido.same_activity_as?(bobby)
# => NoMethodError: private method `define_activity' called for #<Animal:0x007ff688142f90>

In order for the above code not to break we can make the define_activity method protected instead of private and everything will work just fine:

class Animal
  ...
  def same_activity_as?(other_animal)
    define_activity == other_animal.define_activity
  end

  protected

  def define_activity
    @activity ||= ["eating", "hunting", "sleeping"].sample
  end
end

fido = Animal.new
bobby = Animal.new

fido.same_activity_as?(bobby)
# => true

NB: Please note that the protected method can here be called on instances of the class but only within the class definition body. These methods cannot be directly called on an instance of the class such as: fido.define_activity which would return an error NoMethodError: protected method 'define_activity' called for #<Animal:0x007ff688846120>

||= assignment

When I got started I didn’t know about ruby’s double pipe equals: ||=. The concept is fairly simple and can be very useful in a variety of situations. Basically using ||= allows you to perform a variable assignment if and only if the variable is not yet defined or if its value is currently falsey (nil or false).

number = 10

nil_variable = nil
nil_variable ||= number
# => 10 --> nil_variable is now set to 10

false_variable = false
false_variable ||= number
# => 10 --> false_variable is now set to 10

not_defined_variable ||= number
# => 10 --> not_defined_variable is now set to 10

content = "I already have some content"
content ||= number
# => "I already have some content" --> The content variable is not reassigned and keeps its initial value

This one allows to navigate safely through the layers of objects relations. Basically, let’s say that we have a company with only one employee and that this employee has a name and an email address:

class Company
  attr_reader :employee

  def initialize(employee)
    @employee = employee
  end
end

class Person
  attr_reader :name, :email

  def initialize(name, email)
    @name = name
    @email = email
  end
end

bobby = Person.new('Bobby', '[email protected]')
drivy = Company.new(bobby)

In this context if I wanted to access Drivy’s employee name I woud probably do the following:

puts drivy.employee.name
# => Bobby

However, this only works in an environment where none of the elements in the chain (except possibly for the last one) can be nil. Now, let’s imagine a case where the company does not really have any employee. The drivy object would be instantiated as follows and the above code would raise an error:

drivy = Company.new(nil)

puts drivy.employee.name
# => NoMethodError: undefined method `name' for nil:NilClass

In order to prevent this behaviour, ruby has the & operator (since version 2.3) which behaves a bit like the try method in rails. It tries to fetch the object attribute and returns nil if any element in the chain is nil. For instance:

drivy = Company.new(nil)

puts drivy&.employee&.name
# => nil

google = Company.new(bobby)
puts google&.employee&.name
# => Bobby

Struct

Struct is kind of a shortcut class which acts in a way more “liberal” manner. It allows for faster development when you are not willing to create a new class with an initialize method and other behavioural methods.

Struct is created as follows:

animal = Struct.new(:name, :type, :number_of_legs)
fido = animal.new("fido", "dog", 4)

And the parameters passed to the new method are instance variables directly accessible from the instance:

puts fido.name
# => "fido"
fido.type = 'cat'
# => "cat"

# I said that Struct are more liberal because they do not enforce the presence of the arguments that you have to provide to create new instances:
bob = animal.new("bob")

In addition to Struct, ruby provides the ability to use OpenStruct. The main difference is that Struct creates a new class whereas OpenStruct directly creates a new object.

dinosaur = OpenStruct.new(name: "t-rex", number_of_legs: 2)

puts dinosaur.number_of_legs
# => 2

These tools can be very useful to stub simple object or classes while testing since they allow to replicate basic behaviours very quickly without much hassle.

Symbol#to_proc

Finally I’d like to share a ruby idiom which allows to nicely shorten some statements and improve readability :)

You may have already seen things such as:

[1,2,3].reduce(&:+)
# => 6

[1,2,3].map(&:to_s)
# => ["1", "2", "3"]

When ruby sees the & followed by a symbol, it calls the to_proc method on the symbol and passes the proc as a block to the method.

The above examples are equivalent to writing:

[1,2,3].reduce(0) do |sum, num|
	sum + num
end
# => 6

[1,2,3].map do |n|
	n.to_s
end
# => ["1", "2", "3"]

Lambda composition in ruby 2.6

David Bourguignon — Tue, 15 Jan 2019 00:00:00 +0000

What are we talking about?

We recently updated a sizeable application to ruby 2.5, which opened up some nice features for us, such as the yield_self feature.

But I also wanted to have a quick look at 2.6 for comparison purposes, and I found a small feature that can easily be overlooked: the new proc composition operators: << and >>.

You can find the original request (from 2012!) here.

This is a way to compose a new proc by “appending” several other procs together.

Note: all the code that is present in this article can also be found in this gist

A simple example

# This lambda takes one argument and returns the same prefixed by "hello"
greet = ->(val) { "hello #{val}" }
# This lambda takes one argument and returns the upcased version
upper = ->(val) { val.upcase }

# So you can do
puts greet["world"]    # => hello world
puts upper["world"]    # => WORLD

present = greet >> upper

puts present["world"]  # => HELLO WORLD

present = greet << upper

puts present["world"]  # => hello WORLD

Lines 2 and 4 declare 2 simple lambdas, taking 1 argument each.

Lines 10 and 14 are where the magic happens.

This works like a shell-redirect operator, it takes the “output” from one lambda and sets it as input of the other.

In line 10, we take the “world” input, pass it to greet, then take the output to pass it to upper.

The equivalent would be doing: upper[greet["world"]]

Line 14 is the same in reverse order. The equivalent is greet[upper["world"]]

Let’s dive deeper

That was a simplistic example. Let’s try something more useful.

Let’s say we have a transformation-rules directory, and some pipeline definition that would define which rules we should use in a particular case.

Let’s define some pricing rules:

# List of our individual pricing rules
TAX           = ->(val) { val + val*0.05 }
FEE           = ->(val) { val + 1 }
PREMIUM       = ->(val) { val + 10 }
DISCOUNT      = ->(val) { val * 0.90 }
ROUND_TO_CENT = ->(val) { val.round(2) }
# One presenter
PRESENT       = ->(val) { val.to_f }

# Pre-define some rule sets for some pricing scenarios
REGULAR_SET    = [FEE, TAX, ROUND_TO_CENT, PRESENT]
PREMIUM_SET    = [FEE, PREMIUM, TAX, ROUND_TO_CENT, PRESENT]
DISCOUNTED_SET = [FEE, DISCOUNT, TAX, ROUND_TO_CENT, PRESENT]

Now we can define a price calculator:

def apply_rules(rules:, base_price:)
  rules.inject(:>>).call(base_price)
end

At this point we can easily calculate the pricing for a given scenario:

amount = BigDecimal(100)

puts "regular:    #{apply_rules(rules: REGULAR_SET, base_price: amount)}"    # => 106.05
puts "premium:    #{apply_rules(rules: PREMIUM_SET, base_price: amount)}"    # => 116.55
puts "discounted: #{apply_rules(rules: DISCOUNTED_SET, base_price: amount)}" # => 95.45

Again, this is quite a naive implementation.

Here we can find that the order of the rules and the type of operator we use will change, for instance in our case we want to apply taxes on the final amount (including discount, premium or other). Rounding as a last step is important too.

Is using these composition operators a good idea?

I’ll let you make your own mind up.

This is one more tool in the ruby toolbox. It’s one important step toward a more functional style, like yield_self in 2.5 (now aliased as then).

Both allow for some kind of pipeline, yield_self for transforming values, and <</>> for transforming lambdas.

Why we've chosen Snowflake ❄️ as our Data Warehouse

Faouz EL FASSI — Mon, 07 Jan 2019 00:00:00 +0000

In the first of this series of blog posts about Data-Warehousing, I’ve been talking about how we use and manage our Amazon Redshift cluster at Drivy.

One of the most significant issues we had at this time was: how to isolate the compute from the storage to ensure maximum concurrency on read in order to do more and more data analysis and on-board more people in the team.

I briefly introduced Amazon Spectrum and promised to talk about how we were going to use it in a second blog post… But, that turned out not to be the case, because we ultimately decided to choose another data-warehousing technology (Snowflake Computing) which addresses the issue mentioned above, among other things, that I’ll expose here.

Why are we changing our Data Warehouse?

In Redshift and most of the Massive Parallel Processing SQL DBMS, the underlying data architecture is a mix of two paradigms:

Shared nothing: chunks of a table are spread across the worker nodes with no overlaps;
Shared everything: a full copy of a table is available on every worker node.

This approach is convenient for homogeneous workloads: a system configuration that is ideal of bulk loading (high I/O, light compute) is a poor fit for complex analytical queries (low I/O, heavy compute) and vice versa.

When you deal with many consumers with different volumes and treatments you usually tend towards a multi-cluster organization of your data warehouse, where each cluster is dedicated to a workload category: I/O intensive, storage-intensive or compute-intensive.

This design gives more velocity to the teams. You can decide to have one cluster for each team, for example, one for the finance, one for the marketing, one for the product, etc. They generally no longer have resource related issues, but new kinds of problems could emerge: data freshness and consistency across clusters.

Indeed, multi-clustering involves synchronization between clusters to ensure that the same complete data is available on every cluster on time. It complexifies the overall system, and thus results in a loss of agility.

In our case we have thousands of queries running on a single Redshift cluster, so very different workloads can occur concurrently:

a Drivy fraud application frequently requires the voluminous web and mobile app tracking data to detect fraudulent devices,
the main business-reporting runs a large computation on multiple tables,
the ETL pipeline of production DB dump and enrichment is running,
the ETL pipeline responsible for the tracking is running,
an exploration software extracts millions of records.

In order to improve the overall performance, to reduce our SLAs and make room for every analyst who wants to sandbox a complex analysis, we were looking for a solution that would increase the current capabilities of the system without adding new struggles.

It has to ensure the following:

ANSI SQL support and ACID transactions.
Peta-byte scale.
A fully managed solution.
Seamless scaling capability, ideally ability to scale independently compute and storage.
Cost effective.

Snowflake Computing meets all those requirements, it has a cloud-agnostic (could be Azure or AWS) shared-data architecture and elastic on-demand virtual warehouses that access the same data layer.

The Snowflake Elastic data warehouse

Snowflake is a pure software as a service, which supports ANSI SQL and ACID transactions. It also supports semi-structured data such as JSON and AVRO.

The most important aspect is its elasticity.

Storage and computing resources can be scaled independently in seconds. To achieve that, virtual warehouses can be created and decommissioned on the fly. Each virtual warehouse has access to the shared tables directly on S3, without the need to physically copy the data.

Multi-Cluster, Shared Data Architecture. Source: https://www.snowflake.com

They also have two really interesting features: auto-suspend and auto-scale. Every time a cluster is not used for more than 10 minutes, it is automatically put in sleep mode with no additional fees. The “Enterprise” plan also gives the auto-scale feature that adapts the size of the virtual warehouse according to the workload (horizontal scaling). I haven’t tested this feature yet since we have the lower “Premier” plan.

From Redshift to Snowflake

The data engineering team at Drivy is composed of two engineers. We dedicated a full quarter to the migration on top of the day-to-day operations, and it’s not finished yet. During this migration, we took the opportunity to pay some of our technical debt and modernize some of our ETL processes.

One of the greatest improvements we addressed was the versioning on S3 of every data involved prior and post a transformation. At every run of every ETL pipeline, for instance, if we consider the bulk loading of the production DB, a copy of the raw data and the transformed data is stored on S3.

That gives us many new capabilities: reproducibility, auditing and easier operations (when backfilling or when updating a table schema).

The biggest blocks of the migration were:

MySQL to Snowflake: Production DB bulk loading and transformations, with three kinds of ingestions, incremental append-only, incremental upsert, and full dump - we made a questionable choice here: our intermediate format is csv, we had many formatting issues.
Captur: Our internal tracking framework, it’s a pipeline that loads raw events from S3 (sent by the web and the mobile apps through a Kinesis stream) and split them into a backend and a frontend schema holding different tables (one for each event). It also automatically detects changes and adapts the schema (new columns, new tables) when needed.
API integrations: spreadsheets, 3rd parties APIs… straightforward but numerous.
Security and Grants management.

Virtual Warehouses mix

We want to group similar workloads in the same warehouses, to tailor the resources needed to the complexity of the computations, we made the following choice in our first iteration:

quantity	size	users	description	usage per day	usage per week
1	S	ETL + Viz	Main warehouse for bulk loading, ETL and visualizations software.	∞	7d/7
1	L	Exploration	Used early in the morning for ~100 high I/O extractions for an exploration software.	0 - 4h	7d/7
1	XS	Analysts + Business users	Main warehouse for analysts, ~200 daily complex analytical queries.	0 - 10h	5d/7
1	L	Machine Learning + Ops	Compute intensive warehouse for punctual heavy computations.	0 - 2h	N.A.

Every warehouse has the default auto-suspend set to 10min of inactivity.

What’s next

Once we finish our migration, I’ll share my thoughts with you about the overall performance of the new system. I’ll also iterate on the mix of strategies presented above to ensure maximum velocity and convenience while minimizing the costs. Also, I’ll tell you more about how we do grant management.

Meanwhile, don’t hesitate of course to reach out to me if you have any feedback!

Airflow Architecture at Drivy

Eloïse Gomez — Wed, 21 Nov 2018 00:00:00 +0000

Drivy has been using Airflow to orchestrate tasks for 2 years now. We thought it was the best tool on the market when we wanted to start digging into data. The purpose was to understand how well our features were performing. We didn’t really know how the data was going to be used, and by whom. We wanted something easy to use and set up. We set up everything on an ec2 instance. 75 workflows later, we wanted to upgrade our Airflow version and move from a local to a celeryExecutor mode. In a local mode there is only one worker (which is also the webserver and the scheduler). In the celeryExecutor, on the contrary, there are several workers which can execute tasks in parallel. Our number of DAGs is constantly growing and Celery mode is the best choice to handle this growth.

What is Airflow ?

Airflow is Airbnb’s baby. It is an open-source project which schedules DAGs. Dag stands for Directed Acyclic Graph. Basically, they are an organized collection of tasks. Thanks to Airflow’s nice UI, it is possible to look at how DAGs are currently doing and how they perform. If a DAG fails an email is sent with its logs. It can be manually re-triggered through the UI. Dags can combine lot of different types of tasks (bash, python, sql…) and interact with different datasources. Airflow is a really handy tool to transform and load data from a point A to a point B. You can check their documentation over here.

A simple Airflow DAG with several tasks:

Airflow components

An Airflow cluster has a number of daemons that work together : a webserver, a scheduler and one or several workers.

Webserver

The airflow webserver accepts HTTP requests and allows the user to interact with it. It provides the ability to act on the DAG status (pause, unpause, trigger). When the webserver is started, it starts gunicorn workers to handle different requests in parallel.

Scheduler

The Airflow scheduler monitors DAGs. It triggers the task instances whose dependencies have been met. It monitors and stays in synchronisation with a folder for all DAG objects, and periodically inspects tasks to see if they can be triggered.

Worker

Airflow workers are daemons that actually execute the logic of tasks. They manage one to many CeleryD processes to execute the desired tasks of a particular DAG.

How do they interact ?

Airflow daemons don’t need to register with each other and don’t need to know about each other. They all take care of a specific task and when they are all running, everything works as expected. The scheduler periodically polls to see if any DAGs which are registered need to be executed. If a specific DAG needs to be triggered, then the scheduler creates a new DagRun instance in the Metastore and starts to trigger the individual tasks in the DAG. The scheduler will do that by pushing messages into the queuing service. A message contains information about the task to execute (DAG_id, task_id..) and what function needs to be performed. In some cases, the user will interact with the web server. He can manually trigger a DAG to be ran. A DAGRun is created and the scheduler will start trigger individual tasks the same way as described before. Celeryd processes, controlled by workers, periodically pull from the queuing service. When a celeryd process pulls a task message, it updates the task instance in the metastore to a running state and begins executing the code provided. When the task ends (in a success or fail state) it updates the state of the task.

Airflow architecture

Single-node architecture

In a single-node architecture all components are on the same node. To use a single node architecture, Airflow has to be configured with the LocalExecutor mode.

The single-node architecture is widely used by the users in case they have a moderate amount of DAGs. In this mode, the worker pulls tasks to run from an IPC (Inter Process Communication) queue. This mode doesn’t any need external dependencies. It scales up well until all resources on the server are used. This solution works pretty well. However, to scale out to multiple servers, the Celery executor mode has to be used. Celery executor uses Celery (and a message-queuing server) to distribute the load on a pool of workers.

Multi-node Architecture

In a multi node architecture daemons are spread in different machines. We decided to colocate the webserver and the scheduler. To use this architecture, Airflow has to be configure with the Celery Executor mode.

In this mode, a Celery backend has to be set (Redis in our case). Celery is an asynchronous queue based on distributed message passing. Airflow uses it to execute several tasks concurrently on several workers server using multiprocessing. This mode allows to scale up the Airflow cluster really easily by adding new workers.

Why did we choose to use the multi-node architecture ?

Multi-node architecture provides several benefits :

Higher availability: if one of the worker nodes goes down, the cluster will still be up and DAGs will still be running.
Dedicated workers for specific tasks : we have a workflow where some of our DAGs are CPU intensive. As we have several workers we can dedicate some of them to these kinds of DAGs.
Scaling horizontally: Indeed since workers don’t need to register with any central authority to start processing tasks, we can scale our cluster by easily adding new workers. Nodes can be turned on and off without any downtime on the cluster.

In the next episode, we’ll look at how we automate Airflow cluster deployment :).

Sources

Open-sourcing checker jobs

Nicolas Zermati — Mon, 24 Sep 2018 00:00:00 +0000

We’ve recently extracted the checker_jobs gem from our codebase. It’s a simple alerting tool with a very specific purpose which this article will explain.

Over time, we update the rules that our data has to comply with. Making sure our data is always what we expect it to be is hard, especially when old constraints change, new constraints come along, new fields are added, backfill isn’t always possible…

Even with a careful team behind it, the system can produce corrupted data for weeks, months, or years before anyone notices. By that time, it could be too late or just impossible to fix. In comparison, crashes are noticed faster and could be corrected quickly, when a data issue could spread and impact many parts of the system making the issue way more expensive to fix.

The checker_jobs are here to be sure that when this sneaky data corruption happens, you notice it right away.

What was the problem?

Imagine we’ve got, a users table with a terms_of_services_accepted_at column. This column could be set for new users but not for old ones. We need the user to accept the ToS before they can book a trip on our platform. Unfortunately, old trips aren’t subject to that rule since the column didn’t exist back then.

We’ll do the best we can to be sure that we update all our user’s paths to take that new requirement into account. Even with our nice test suite, we don’t cover all the code paths, especially with all the production data. That data isn’t fresh from a testing factory, but testing on legacy, old, and sparse data is a different topic!

So to get some peace of mind, we would like to be sure that there are no recent trips booked where the driver didn’t accept the ToS. What we could do is write a piece of code verifying that we have no trips with users having the users.terms_of_services_accepted_at unset.

How this gem is useful

The gem is offering you a quick way to get alerted when this piece of code finds such a trip. You can basically get notifications (emails, bugtrackers, …) when a trip doesn’t honor the ToS rule.

It would look like this:

class TripChecker
  include CheckerJobs::Base

  notify :email, to: "[email protected]"

  ensure_no :trip_without_users_terms_of_service_being_accepted do
    Trip.joins(:users).merge(User.terms_of_service_not_yet_accepted)
  end

  ensure_no :trip_with_deactivated_car do
    # ...
  end
end

Then you would have to enqueue that TripChecker as often as you want to do that verification. In our case, because we use a Ruby tasks scheduler and Sidekiq, it looks like this:

every(1.day, 'trip_checker', at: ['00:10', '12:05'], tz: "Paris") do
  Sidekiq::Client.enqueue(TripChecker)
end

Here is an example of what we see in Bugsnag when one of our checkers is triggered:

What are the other ways of solving this?

There are others solutions to this issue like:

code that is more defensive and crashes if the preconditions aren’t met,
some database features such as triggers, foreign keys, or checks, or
a better test suite that can work with production data.

We try to use those when it makes sense, and we advise you to do the same. Still, the checker_jobs are different from all of those solutions:

they are safer than code defensiveness, they don’t impact your production system,
they are cheaper to create, maintain, and, most of all, delete than database constraints, and
they are easier to setup than regression testing on production data.

Of course, they don’t provide the same guarantees compared to the other solutions thus the comparison isn’t that fair.

What’s next?

You could give checker_jobs a go, follow the instructions on Github and tell us how it went!

In the future, there are many things that we would like to see, things such as:

More job processors, ActiveJob is a good candidate,
More notifiers, I’m talking about PagerDuty, Bugtrackers, SMS, etc.
checker_jobs-web a extra gem that allows you to publish the results of the checks on a dedicated web UI, and of course
Contributions from the community!

We intend to extract and release more of this kind of libraries and we hope others will find them useful.

Exporting significant SQL reports with ActiveRecord

Nicolas Zermati — Wed, 29 Aug 2018 00:00:00 +0000

A few months ago we faced a memory issue on some of our background jobs. Heroku was killing our dyno because it was exceeding its allowed memory. Thanks to our instrumentation of Sidekiq, it was easy to spot the culprit. The job was doing a fairly complex SQL request, and outputing the query’s result into a CSV file before archiving this file.

In this article, I’ll explain what happened and detail the method we used to solve the problem. I had never seen or used this technique before thus I thought it would be nice to share.

More context

We run a tiny framework, something more like a convention, to run SQL queries and archive the results. If I remove the noise of the framework, we had a code like:

rows = with_replica_database do
  ActiveRecord::Base.connection.select_rows(query)
end

CSV.generate do |csv|
  csv << header
  rows.each { |row| csv << row }
end

In this simplified example, there are:

with_replica_database: a helper that helps us run a piece of code using a replica database,
query: our SQL query, as a String, and
header: a placeholder for the Array of our columns names.

We used select_rows as the results of the query didn’t really match any of our models. It is a reporting query that does too many join, group by, and subqueries. The query takes dozens of minutes to run. We could, and probably should, integrate that into our ETL but that’s not the point…

The resulting CSV file wasn’t that big, maybe a hundred megabytes.

The issue

The memory comsumption of this came from the many rows returned by the select_rows method. Each row is an array containing many entries as our CSV have many columns. Each entry could be a complex datatype converted by ActiveRecord into even more complex Ruby objects. We had many instances of Time with their TimeZone, BigDecimal, …

Since the query returns millions of rows, even while having a linear complexity, the memory consumption is too high.

An impossible approach

At first I thought about paginating the results much in the same way that find_each works. The problem with that was that for 10000 rows, if I paginatd by 1000, it would take 10 times the time of the same request without pagination.

Our query looked like this:

SELECT t.a, u.b, SUM(v.c) as c
FROM t
JOIN u ON u.id = t.u_id
JOIN v ON v.id = u.v_id
GROUP BY t.a, u.b

Just imagine t, u, v being subqueries with unions, OR conditions, other GROUP BYand more of poorly performing stuff. The sad part is the GROUP BY which required the engine to go through all results in order to group rows correctly. Using pagination on this would be something like:

SELECT t.a, u.b, SUM(v.c) as c
FROM t
JOIN u ON u.id = t.u_id
JOIN v ON v.id = u.v_id
GROUP BY 1, 2
ORDER BY 1, 2
LIMIT 10000
OFFSET 1000000

So the fewer entries on a page, the less memory used on the client-side but the more time spent in the database because more requests will be done. The more entries on a page, the more memory used on the client-side but the less time spent in the database because less requests will be done.

In the end, this approach wouldn’t have been future-proof.

Focusing more on the problem

It was easy to try to find solutions to the results does not fit in memory problem because it is a known one. It is common with Rails that long lists and association-preloading will cause you memory issues. The quick-fix is to use the find_each or in_batches methods.

I realized that I didn’t actually need to load everything in memory, I’m only interested in getting one line at a time in order to write it into the CSV and then forgotting about it, thanks to the garbage collector.

Solving the right problem

After acknowledging what the true issue was, it was possible to find something more efficient: streaming APIs.

CSV.generate do |csv|
  csv << header
  with_replica_database do
    mysql2 = ActiveRecord::Base.connection.instance_variable_get(:@connection)
    rows = mysql2.query(query, stream: true, cache_rows: false)
    rows.each { |row| csv << row }
  end
end

The idea was to bypass ActiveRecord and use the underlying MySQL client which was providing the stream option. I’m sure there are similar options for other databases.

With that implementation, we only do one request, so no pagination, but we won’t have all the results in memory. We never needed to have all those results in memory in the first place anyway.

Conclusion

I would be very interested to use this feature with ActiveRecord’s ability to return models rather than rows. Maybe it is already possible but I didn’t find it. If you have any further information on the subject, please let me know!

I hope you won’t have to use these lower level APIs. But, if you do encounter the same kind of memory issues, don’t throw money at it right away. Try this first ^^

And obviously, most of this could be avoided by tweaking the layout of data and their relations. In our case, denormalization could make this easier but we’re not ready to pay that cost - yet.

Edit As nyekks mentionned it on Reddit, sequel seems to be better at this out of the box.

Security tips for rails apps

Adrien Siami — Mon, 27 Aug 2018 00:00:00 +0000

As your application gets larger and larger, the surface area for security issues expands accordingly, and security bugs become more and more problematic.

Here are a few tips to avoid some common pitfalls regarding security for Rails apps.

Use I18n with html tags properly

It is quite common to want to mix I18n translation keys with HTML tags. I’d recommend against doing that as much as possible, but sometimes you can’t really avoid it.

Let’s take the following example:

# en.yml
en:
  hello: "Welcome <strong>%{user_name}</strong>!"

<%= t('hello', user_name: current_user.first_name) %>

We have a problem here, because this will produce the following output:

Welcome <strong>John</strong>!

Oops! Indeed, our string was never marked as html safe, therefore rails will escape html entities.

One (bad) way to fix it would be to do the following:

<% # Don't do this! %>
<%= t('hello', user_name: current_user.first_name).html_safe %>

While it works, we just exposed ourselves to a nasty XSS. Indeed, our user can now change their name with some pesky JavaScript in it and the JavaScript will be executed.

XSSes are often underrated as benign security issues, but they can be fatal if exploited properly.

🎉

Note that this is pretty much the same as doing this:

<% # Don't do this either! %>
<%= t('hello', user_name: h(current_user.first_name)).html_safe %>

One good way to avoid XSSes is to really try to avoid using html_safe (or raw) as much as possible, and when forced, double check that you have full control of the content displayed.

Be defensive by default

You can’t trust user params; you most likely already know that. But there are different ways to implement sanitization of user params.

Let’s pretend we have a form, and we want to use one of two different Form Objects depending on a param:

class FooForm; end
class BarForm; end

form_klass = "#{params[:kind].camelize}Form".constantize # Don't do that
form_klass.new.submit(params)

Here, we get the good form class by constantizing a string that is controllable by the user. This is very bad practice and can lead to terrible side effects (imagine sending make_user_admin instead of foo or bar)

One solution could be to do this:

if params[:kind] == 'foo' || params[:kind] == 'bar'
  form_klass = "#{params[:kind].camelize}Form".constantize # Still, don't do that
  form_klass.new.submit(params)
end

Here we are ‘safe’. We check that the params are one of the two expected values and only constantize if needed. While this works fine, we haven’t corrected the root security issue (which is the use of constantize over user input).

Code grows old and evolves, developers copy and paste parts constantly, and at some point your offending line could end up outside of its guard.

Now let’s have a look at this alternative:

klasses = {
  'foo' => FooForm,
  'bar' => BarForm
}

klass = klasses[params[:kind]]
if klass
  klass.new.submit(params)
end

We have the same behaviour as above, except this time we don’t use constantize.

By being defensive and keeping a close eye on user input, we can avoid many basic security issues.

Beware of arrays or hashes

Consider the following code:

  # POST /delete_user?id=xxx

  def can_delete?(user_id)
    other_user = User.find_by(id: user_id)
    current_user.can_delete?(other_user)
  end

  user_id = params[:id]

  if can_delete?(user_id)
    User.where(id: user_id).update(deleted: true)
  end

This code is voluntarily weird-looking in its structure to be vulnerable to the security issue, but trust me, I’ve seen it in the wild ;)

Everything works ok here until we start messing a bit with the params.

Let’s imagine we send the following request:

POST /delete_user?id[]=42&id[]=43&id[]=44&id[]=45..

Rails will parse params[:id] as an array: [42, 43, 44, 45]

  User.find_by(id: [42, 43, 44, 45]) # This will return one user with id 42 (lower id)

  User.where(id: [42, 43, 44, 45]).update(deleted: true) # This will update all of those records!

Thanks to some weirdness in find_by and messing with the params, here we managed to act on records we may not have access to.

It’s always good to remember that params can also be arrays (or hashes!) as that can pose some security risks.

Remember that evil input is not always where we think it is

Most of us are very wary when dealing with user params, or values coming from the database.

However, there are some attack vectors that can be forgotten, including (but not limited to) the following:

Cookies: they are 100% editable by the user
Other headers in general: Referer, User-Agent, etc.
User IP: easily spoofable on misconfigured apps (using X-Forwarded-For)
Local files: here’s an interesting example of poisoning the ssh auth.log file in order to perform a remote code execution.

It’s always good to think about where any given input comes from and wonder if it can be tampered.

Implementing Up Navigation on Android

Romain Guefveneu — Fri, 03 Aug 2018 00:00:00 +0000

Parent Navigation has always been a tough topic on Android. There are not a lot of apps that implement the guidelines correctly, maybe because they are hard to understand or complicated to implement. Even the Google apps don’t implement them: it’s always frustrating to take a screenshot, press Up on the preview and not be redirected to the Google Photo app 😞.

Unlike the Back button, which should come back to the previous screen – even if that screen was not from the same app –, the Up button should stay in the same app.

Let’s see how to implement this navigation.

A Simple App

Here is a simple music app, with 3 activities:

a Main activity, with a list of albums
an Album activity, with a list of tracks
a Track activity, with the track’s name and that of the album.

Forward navigation is pretty obvious:

On MainActivity, users can go to AlbumActivity or TrackActivity.
Nothing special here. So what about back navigation?

Since we’re in the same task, Up navigation and Back navigation do the same thing: they come back to the previous activity. But if we don’t start from the main activity (for instance from a notification or a widget), Up and Back won’t have the same behavior:

Back will still dismiss the current activity (or fragment), so users will come back to the previous app
Up should redirect to the app’s parent activity.

How to implement

No need to write anything in the Activity#onOptionsItemSelected method! Everything is already done in the Android SDK. We mainly just need to edit the AndroidManifest.xml file to add some attributes to our activities.

Declare a `parentActivityName`

First, we need to declare a parent activity for each child activity: TrackActivity will come back to AlbumActivity, which itself comes back to MainActivity.

<activity
    android:name=".AlbumActivity"
    android:parentActivityName=".MainActivity"
    ... />

<activity
    android:name=".TrackActivity"
    android:parentActivityName=".AlbumActivity"
    ... />

Declare a `launchMode`

By declaring parent activities’ launchMode as singleTop, we prevent the system from creating a new activity each time we press Up.

<activity
    android:name=".MainActivity"
    android:launchMode="singleTop"
    ...>
    ...
</activity>
<activity
    android:name=".AlbumActivity"
    android:launchMode="singleTop"
    ... />

The parent activity is no longer recreated. 🎉

Now we have the desired behavior, but we don’t cover all the cases. What if I start TrackActivity from outside the app, and press Up? I want to be redirected to AlbumActivity.

TrackActivity is not redirect to AlbumActivity when pressing Up. Mildly frustrating, to say the least.

To do that, we have to declare a taskAffinity.

Declare a `taskAffinity`

Declaring a taskAffinity for TrackActivity allows a new task to be created when starting this activity from outside the app. Thanks to that, Up navigation will switch to the main task and create an AlbumActivity.
Curious about how it works? See Activity#onNavigateUp

<activity
    android:name=".TrackActivity"
    android:taskAffinity=".Track" 
    .../>

One issue here: AlbumActivity needs to know which album we want to display.

Override `onPrepareSupportNavigateUpTaskStack`

On TrackActivity, we need to override onPrepareSupportNavigateUpTaskStack to edit the intent that will start the parent activity when pressing Up:

override fun onPrepareSupportNavigateUpTaskStack(builder: TaskStackBuilder) {
    super.onPrepareSupportNavigateUpTaskStack(builder)
    val albumId = intent.getLongExtra(TrackActivity.EXTRA_ALBUM_ID, -1L)
    val albumIntent = AlbumActivity.create(this, albumId)
    builder.editIntentAt(builder.intentCount - 1)?.putExtras(albumIntent)
}

See also onCreateNavigateUpTaskStack.

TrackActivity is now redirected to AlbumActivity when pressing Up. 👌

One last thing: because we create a new task when starting TrackActivity from outside the app, TrackActivity will remain on the Recents screen when pressing Up. It would be great to remove it automatically.

TrackActivity is still visible in the Recents screen. Not very useful.

Declare `autoRemoveFromRecents`

With autoRemoveFromRecents, the activity will be removed from the Recents screen when its task is completed, for instance when coming back to the parent activity.

<activity
    android:name=".TrackActivity"
    android:autoRemoveFromRecents="true"
    .../>

TrackActivity is no more visible in the Recents screen. 🙌

Summary

To sum up, this is what the AndroidManifest.xml looks like:

<activity
    android:name=".MainActivity"
    android:label="@string/app_name"
    android:launchMode="singleTop">
    <intent-filter>
        <action android:name="android.intent.action.MAIN" />
        <category android:name="android.intent.category.LAUNCHER" />
    </intent-filter>
</activity>
<activity
    android:name=".AlbumActivity"
    android:label="@string/album_title"
    android:launchMode="singleTop"
    android:parentActivityName=".MainActivity" />
<activity
    android:name=".TrackActivity"
    android:autoRemoveFromRecents="true"
    android:label="@string/track_title"
    android:parentActivityName=".AlbumActivity"
    android:taskAffinity=".Track" />

Conclusion

Now that you know how to tune the AndroidManifest to get a satisfying parent navigation, please don’t just finish() the activity when pressing Up 😀

Sources

Quick wins to deal with users' broken email addresses

Jean Anquetil — Thu, 05 Jul 2018 00:00:00 +0000

If a user signs up to Drivy, we want to welcome them. If a driver has an upcoming booked trip, we would like to send them the needed information. If they want to reset their password, they need to receive a confirmation email and so on.

In another words, transactional emails are very important for a successful experience. So, how do we deal with broken email addresses?

Regex-ing the format

First of all we decided to check the email-address format of a new user during her sign-up flow. To do so, we compare it with a very simple regex.

\A\S+@\S+\.\S+\z

Here is our assumption. An email address can:

have at least one or more instance of any non-whitespace character,
be imperatively followed by an at symbol,
then have again one or more instance of any non-whitespace character,
be imperatively followed by a dot symbol,
then have again one or more instance of any non-whitespace character.

And that’s it. We don’t want to define a complex pattern such as the RFC 5322 one whereas an email provider has its own syntax rules: we don’t want to block some potentially valid addresses.

Transliterating

Later down the line, we faced some delivery issues with email addresses containing special characters (i.e. àéèù…) so we decided to transliterate the email addresses of some specific domains.

For instance, we know that Gmail supports addresses with accents. But they don’t differentiate between an address with or without accents: they are the same. We therefore decided to transliterate the email addresses from the following domain names: Gmail, Outlook, Hotmail and Live.

Using a custom coercion with Virtus in our form object, this is done really smoothly. (However, Virtus is now deprecated so if we were to start from scratch today, we would use something else.)

However, transliteration is not without its limits. The day we open a country without a Latin alphabet we will not be able to transliterate the email addresses anymore:

I18n.transliterate("日本語")
# => "???"

Using an external service

To go further, we could carry out many more checks using an external service. For instance, MailGun released a library called Flanker.

It carries out the following checks:

DNS lookup: that the @domain.com exists.
MX check: if that @domain.com has a Mail Exchange record. In other words, that the domain is configured to receive emails.
That the email address complies with general validation rules but also specific ones. For instance, regarding @gmail.com they check if the address length is between 6 and 30 characters.

Conclusion

There will always be a lot of different ways to prevent or sanitize broken email addresses but it will remain difficult to handle all the use cases. Maybe another way to fight this would be by not relying too much on emails: using the phone number to verify a profile and using push or browser notifications to talk to a user.

Usage of Sidekiq middleware

David Bourguignon — Thu, 31 May 2018 00:00:00 +0000

At Drivy, we use a lot of background jobs, called from service objects, API calls, cron, etc.
A time came when we needed to add some context data across several of these code layers.

For instance, we have some context data we need to keep for auditing reasons. This data can originate from several points in the application: maybe from some part of the web application, from the mobile app, or from a service object.

We tried to find a way to keep this new context data through all code layers and jobs without having to resort to adding context data arguments everywhere.

We decided to use Thread.current objects to host this data for the current process.

CAVEAT: Using this kind of global data in this way is usually considered to be bad practice. I will not discuss it here, but you can look at this discussion for more detail.
We use global data with caution, in a limited scope and only after having really thought about it. All interactions with the global data is tightly contained in service objects to limit the risk of using the data outside of its intended scope.

module ProcessContext
  module_function

  def reset
    self.attributes = {}
    attributes
  end

  def attributes
    Thread.current["process_context"] || reset
  end

  def attributes=(new_attributes)
    Thread.current["process_context"] = new_attributes
  end
end

It works well, up to the point where we delegate some of this processing to background jobs. The jobs run on a different thread (even on a different machine).

We use Sidekiq to manage our jobs. Sidekiq works in the following way (a simplified version):

The client side enqueues a job into a Redis database;
On the server side the workers:
- read the database to pick a job in the queue;
- run them.

Conveniently, Sidekiq provides a way to add some code around job processing, on the client side, the server side or both. So we used these middlewares to propagate the context information from the client (our Rails application) to the Sidekiq server.

Client side

The Sidekiq middleware client API is:

class Drivy::MyClientMiddleware
  def call(worker_class, job, queue, redis_pool)
    # custom code
    yield
    # custom code
  end
end

And you add it to Sidekiq configuration in this way:

# config/initializers/sidekiq.rb
Sidekiq.configure_client do |config|
  config.client_middleware do |chain|
    chain.add Drivy::MyClientMiddleware
  end
end

Note: You may want to add this client middleware to the server middleware pipe, see below

In our case, we want to enrich the job with some metadata. Sidekiq allows the adding of information to the job that will be available on the server side:

module Drivy::Sidekiq::Middleware::Client

  class AddProcessContext
    def call(_, job, _, _)
      process_context(job)
      yield
    end

    private

    def process_context(job)
      if ProcessContext.attributes.present?
        job['process_context'] = TrackedEventContext.attributes.to_json
      end
    rescue => e
      # Log/notify error as we do not want to fail the job in this case
      puts e
    end
  end
end

We only need the job argument here. It’s basically a regular Hash. We just add here our own information (be careful to store only data that will be serialised in JSON).

Server Side

The Sidekiq middleware server API is:

class Drivy::MyServerMiddleware
  def call(worker, job, queue)
    # custom code
    yield
    # custom code
  end
end

And you add it to Sidekiq configuration in this way:

# config/initializers/sidekiq.rb
Sidekiq.configure_server do |config|
  config.server_middleware do |chain|
    chain.add Drivy::MyServerMiddleware
  end
end

In our usage, we need to retrieve the metadata from the job and set it in the current process:

module Drivy::Sidekiq::Middleware::Server
  class AddProcessContext

    def call(_, job, _)
      process_metadata(job)
      yield
      reset_metadata
    end

    private

    def process_metadata(job)
      if job['process_context']
        ProcessContext.attributes = job['process_context']
      end
    rescue => e
      # Log/notify error as we do not want to fail the job in this case
      puts e
    end

    def reset_metadata
      ProcessContext.reset
    end
  end
end

We simply restore the data from the serialised version.

Each middleware is executed in the same thread as the main job process, so we know the context data will be available to the Ruby job.

A word of caution

Thread reuse

Sidekiq will reuse threads for different jobs in some cases, so we must be very careful to cleanup our ProcessContext to ensure we do not pollute the context of other jobs.

Middleware client on the server side

Sometimes, jobs running on the server can enqueue jobs, and act as a client. In this case, you’ll want to add the client middleware to the server configuration as well:

# config/initializers/sidekiq.rb
Sidekiq.configure_server do |config|
  config.client_middleware do |chain|
    chain.add Drivy::MyClientMiddleware
  end
  config.server_middleware do |chain|
    chain.add Drivy::MyServerMiddleware
  end
end

Conclusion

Middlewares are a useful tool, we use them for logging, and monitoring mainly. You can find some interesting plugins using middleware on the Sidekiq Wiki.

And again, do not use global states if you can avoid it.

Rails 5.2: ActiveStorage highlight

Alexandre Ferraille — Thu, 17 May 2018 00:00:00 +0000

Rails 5.2 was released a few weeks ago and comes with awesome features like the new credentials vault, HTTP/2 early hints, Redis cache store and ActiveStorage, which I’m going to focus on in this blog post. The project was initiated by DHH in mid-2017 and has been merged into Rails core. It’s a built-in way to deal with uploads without extra dependencies like Paperclip, Carrierwave or Shrine.

How does it work?

ActiveStorage comes with a complete DSL which allows you to attach and detach one or multiple files to a model. By default, ActiveStorage isn’t installed in a new rails project, you have to run:

rails active_storage:install

This command simply copies a migration in your projet. ActiveStorage needs two models/tables : active_storage_blobs where each record represents a file (which is not stored in the database of course) and active_storage_attachments is a polymorphic bridge between your models and your uploaded files.

In your model, you need to declare that you’re attaching files:

class Car < ApplicationRecord
  has_one_attached :photo
end

Then, you can add a file input into your form:

<%= image_tag url_for(car.photo) if car.photo.attachment.present? %>
<%= form.label :photo %>
<%= form.file_field :photo %>

In your controller, you must specify that you want to attach a file:

@car.photo.attach(params[:car][:photo]) if params[:car][:photo]

And that’s all you need to do for a basic file uploader. If you attach a new file or you delete your record, ActiveStorage will remove the old one from your storage and clean up your database.

ActiveStorage goes further still

With a lot of nice little “cherries on the cake”, ActiveStorage covers most of the cases:

Storage: A few lines of configuration are enough to store/mirror your files into AWS S3, Microsoft Azure or Google Cloud. And if you’re using a more funky storage provider, you can extend the ActiveStorage::Service class.
Direct upload: A complete JavaScript library has been written for ActiveStorage which allows you to directly upload to your storage bypassing the rails backend. It comes with a lot a JavaScript events to easily plug this feature with common libraries such as TinyMCE, DropZoneJS…
Image post-processing: A common need when uploading images is to create resized variants from the original. ActiveStorage works by default with MiniMagick, which is a Ruby implementation for ImageMagick.
Video and PDF previews: With external libraries (ffmpeg and mutool), you can get a preview from a file without downloading it entirely.

What’s the difference with other upload managers?

Maintenability: Since ActiveStorage has been merged into Rails, all the features described above are built-in and don’t require any extra dependencies and so less maintenance needs to be scheduled.

Structure: Most popular gems like Shrine, Paperclip, etc. don’t provide ready-to-use tables and require a migration to add a few fields where you want to store your file information. Even if you feel free to do what you want and you’re not stuck to the ActiveStorage way, from my experience you’ll certainly recreate a polymorphic Asset model.

Form: As we seen above, ActiveStorage attaches and detaches files outside ActiveRecord transactions. You need to do it by yourself when you need to, independently from a save, whereas common gems store your files using ActiveRecord callbacks directly from your params. In my opinion, ActiveStorage provides a better way to handle file attachments by separating two concepts: attributes which go in the database and files which depend on your storage.

Missing features: There’s a few advanced features which are not handled by ActiveStorage (yet?) and you’ll need to develop them if you choose to go with ActiveStorage. For instance, Shrine (currently the most advanced competitor), provides a way to cache uploaded files and avoid re-upload when your form has errors. Shrine also provides a simple way to manipulate and post-process files in the background. And last but not least, the implementation of TUS protocol allow you to do multi-part uploads.

What does the future have in store for them?

It’s really interresting to see that Thoughbot (the Paperclip maintainers) just announced the deprecation of Paperclip in favor of ActiveStorage and it’s a good example of what makes the Ruby On Rails community so strong.

Major kudos to @thoughtbot for all the work on Paperclip over the years! It was one of the premiere file attachment solutions for Rails for a very long time. The work helped inform and inspire Active Storage 🙏❤️ https://t.co/DGoCDAZS0N
— DHH (@dhh) 15 mai 2018

We are deprecating Paperclip in favor of ActiveStorage. Learn what this means for you. https://t.co/b4MpPhKXaN
— thoughtbot (@thoughtbot) 14 mai 2018

We can be sure they’ll use their experience by contributing to ActiveStorage, as did janko-m (Shrine maintainer) who already improved S3 storage and implemented his own ImageProcessing gem to ActiveStorage.

How we handle file uploads at Drivy

At Drivy we handle file uploads in a similar way to ActiveStorage. We have our own DSL for models and a lot of methods for controllers and views. We also have some JavaScript for direct upload to our cloud storage. We don’t use post processing libraries for our images and we delegate all these tasks to a third party in order to reduce the impact on our CPUs.

So, should we move to ActiveStorage? We already have all the features we need and there’s currently no need to move to ActiveStorage. Sure it might reduce the maintenance cost and we could take advantage of security fixes and evolution with Rails upgrades, but with a 6-year-old codebase and thousands of attachments the migration would be huge!

In my opinion, ActiveStorage is a really good choice for a new project.

EDIT: janko-m (Shrine maintainer) commented this post via reddit and raised interesting points related to ActiveStorage and future changes for Shrine.

Android Makers 2018 Key takeaways

Renaud Boulard — Thu, 17 May 2018 00:00:00 +0000

Android Makers is the largest Android event in France, organized by the PAUYG and BeMyApp. This year the event was held in Le Beffroi de Montrouge - what a great place to enjoy a conference. There was much more space than the previous Android Makers event: we were really comfortable, with all we needed to be fully focussed on the conferences.

Below are some key takeaways of these 2 days of great conferences.

Modern Android development

Romain Guy & Chet Haase, Google, Video

They gave us a big overview of the improvements of the last few years to Core Android development, from the view (ConstraintLayout) to the programming language (Kotlin) through the architecture, the tools and more.

Key takeaways:

Use Layout Inspector instead of the deprecated ~~Hierarchy Viewer~~
Don’t use ~~AbsoluteLayout~~, ~~GridLayout~~ and ~~RelativeLayout~~ any more. The best performances are provided by LinearLayout, FrameLayout and ConstraintLayout
Use SurfaceView not ~~TextureView~~
Handling life cycle should be much more simple than previously, with the help of Architecture Component
Architecture components are not necessarily the way you must build your app, it’s a recommended way, but you must build what matches your need
Google recommends the libraries such as Glide, Picasso and Lottie, and they will never build equivalent tools in android SDK as they are already really good
Use systrace instead of ~~traceview~~ to profile your code, and the Android Profiler which containts a nice view including the touch events at the top (purple dot, the long dot are swipe event)
Remember that devices are resource-constrained, you must pay attention to, take care with the resources you use because there is a direct link to your user’s battery life

Travelling across Asia - Our journey from Java to Kotlin

Amal Kakaiya, Deliveroo, Video

A talk about their switch from Java to Kotlin at Deliveroo

Key takeaways:

Start small, like the test or the data class, and then increase slightly the amount of Kotlin code in your project. Everyone in your team must be confident about using it, and not feel the need to rush.
Use comments on PRs, to learn the language from concrete examples in your own code base - it’s always better than a HelloWorld example
Android Studio is your friend: it offers you alternatives to improve your kotlin code
You can use Android studio converter, but it doesn’t always convert with the optimum results.
At Deliveroo they schedule a weekly meeting called Kotlin Hour where all the developers share what they have learned on Kotlin.
I learned the existence of the Koltin Island (An island in Russia) far far away from the Java Island

Better asynchronous programming with Kotlin Coroutines

Erik Hellman, Hellsoft, Slides, Video

Good starting points if you have never used Coroutine, Erik told us about coroutines, and how we can use them to end up with simple code looking like the excerpt below for async calls:

load {
    loadTweets("#AndroidMakers")
} then {
    showTweets(it)
}

Key takeaways:

Channel: basically a queue, it’s a way of passing multiple values to a coroutine
The coroutine doesn’t replace RxJava they can work together
Retrofit has a coroutine adapter
Coroutine is still in an experimental phase

Design Tools and Handoffs

Taylor Ling, Fabulous, Slides, Video

During his talk Taylor answered a really good question What should a developer expect from a designer?

He divides his answer into 5 different parts:

Visual Design
- Always design on Small Screen (recommended size 640x360, Nexus 5) because this is where you have all the constraints
- Use consistent design across all of your app
Design Assets
- Proper size, proper format assets (SVG, PNG, JPG), and optimized assets to avoid increasing your apk size. New App Bundle will also help a lot for this point)
Design Specification
- To avoid wasting both your designer’s and developer’s time, use tools like Zeplin to easily communicate the complete design specification (The new Gallery tools would be a good alternative)
Interaction Design
- Provided not only screen but also screen flow diagram
- Animation should be defined with high quality video and even animation specification (delay, scale, duration etc…) and a timeline
- Define the right interpolator depending on the animation purpose (FastOutSlowIn, LinearOutSlow`, etc… )
Animation Design
- Today Lottie and After effect are the best ways to easily add complex animation to add to a mobile application
- For simple icon transition you could also use Shape Shifter

He turned the question the other way around What should a designer expect from a developper

Communication, ensure we all speak the same language (dp, sp, FAB, Snackar, etc…)
Explain to them what the constraints are, what could be implemented, and what could be not, and moreover suggest alternative solutions
Understand the business goal of design

Themes, style & more: from zero to hero

Cyril Mottier, Zenly, Slides, Video

Overall a really great overview of theme, style, text appereance and more. He explained how you should write them and organize them in your project. If you don’t feel confident with this part of the Android sdk, I definitely recommend that you watch this presentation.

Key takeaways:

How to properly handle themes regarding API version, to avoid re-writing them for every version in addition to the new attributes for each new version

Remote-control your application

Stan Kocken, Dashlane, Slides

Stan gave us his feedback on how they manage remote configuration of their Dashlane app, without need of a new release on the store. He told us about five main concepts:

Web View: Load web content in your app
- Pros: It allows you to update the entire screen
- Cons: Bad user experience. This is ok for small parts of your app like the help center
Remote Screen config: Send screen content with json (text, icon)
- Pros: Good user experience
- Cons: Limited possibility for updating the layout
Remote Copy: Change the strings.xml by a remotely downloaded json file to easily update the wording with a Custom Layout inflator to manage string in .xml file
- Pros: Fast wording updates in your app
- Cons: Hard to manage multi-language, plurals and text formating
A/B test: Compare two experiences to choose the best
- Comparing local and server-side A/B test
- Server-side is more appropriate for remotely updating the A/B tests
Feature Flip: Remotely enable or disable features in your app
- At Dashlane they distinguish between App Release and Feature Release. They release their app every 2 weeks, no matter the state of the feature, and then enable the feature when it’s ready
- Good question from the audience, you can no longer use the new release note feature from the Playstore with this workflow

ADB, Break On though To the Other Side

Eyal Lezmy, Quonto, Slides

As I’m not a fan of “Magic programming”, it’s always interesting to understand how the tools you use work. This presentation about ADB was an in-depth understanding of how ADB works under the hood.

Key takeaways:

ADB is split in 3 parts:
- adb client (adb command)
- adb server (Multiplexer)
- adb deamon (Phone)
Android Studio does not use abd but ddmlib(a jar that provides APIs for talking with Dalvik VM, both on Android phones and emulators)
I learned about the existence of the Google Sooner first ever Android phone, which has never been shown to the public.

Typesetting: desiging and building beautiful text layout

Florina Muntenescu & Nick Butcher, Google , Video

Big overview of all the Typesetting you can include in your app

Key takeaways:

Android P will add baseline param in TextView

<TextView
android:firstBaselineToTopHeight="24dp"
android:lastBaselineToBottomHeight="24dp"
.../>

android:firstBaselineToTopHeight: Distance from the top of the TextView to the first text baseline. If set, this overrides the value set for paddingTop. android:lastBaselineToBottomHeight: Distance from the bottom of the TextView to the last text baseline. If set, this overrides the value set for paddingBottom.

I learnt about the existence of the android:breakStrategy param: Break strategy (control over paragraph layout).
If you have a long piece of text don’t use a single TextView but use a subset of paragraph that you must put in a recyclerView, doing it this way will give you all the benefits of the RecyclerView when you scroll.

Gradle in Android Studio 3.2 3.3 and beyond

Bradley Smith & Lucas Smaira, Google

This was a good overview of how Gradle and Android Studio work together. They highlighted the pain points and explained to us why it’s slow sometimes.

Key takeaways (new upcoming features):

Single Variant Sync (Sync only the selected variant, should be available in AS 3.3-canary)
PSD (Project structure Dialog) new graphical interface with nice suggestions, dependency graphs and more to help you manage your gradle file

What’s next (Utlimate goal):

Running Android test through Gradle
Remove need for manual Sync of Gradle

Tools of the Trade

Ty Smith, Uber

He gave us a big overview of how the Android team work at Uber, and the tools and architectures they use.

Key takeaways:

The Uber app is based on a MonoRepository mainly to avoid dependency issues Full article
They introduced a system called Submit Queue which rebases changes on master and runs a customizable set of tests before merging them to avoid broken master
They use Phacility./Phabricator as a main tools
They use Buck and the Exopackage: from Ty word It's like instantApp but it's works :)
They have built a demo application with all the design components available, to keep a consistent design across the app

It’s quite impressive to see how many open-source libraries Uber have shared with the Android community:

OkBuck: OkBuck is a gradle plugin that lets developers utilize the Buck build system on a gradle project.
RIBs: Uber’s cross-platform mobile architecture framework.
Crumb: An annotation processor for breadcrumbing metadata across compilation boundaries.
AutoDispose: Automatic binding+disposal of RxJava 2 streams.
Artist: An artist creates views. Artist is a Gradle plugin that codegens a base set of Android Views.
NullAway: A tool to help eliminate NullPointerExceptions (NPEs) in your Java code with low build-time overhead
RAVE: A data model validation framework that uses java annotation processing.

Conclusion

This year Android Makers regroups 816 people, including 82 speakers and it was organized by 34 organizers. Thanks to them and all the sponsors who make this event possible! We will definitely come back next year. You can already buy your ticket for Android Makers 2019!

As always, it’s good to meet the android community and learn from other developers. This year the event became more international - I only saw one talk in French, all the other talks were in English. It was great to see that we have a conference in France that is attractive for the Android community!

See you next year!

Ensuring consistent spacing in your UI

Tim Petricola — Thu, 19 Apr 2018 00:00:00 +0000

Drivy is growing, and the impact of this is particularly reflected in the evolution of our visual identity, conveyed by Drivy’s UI.

Having more and more people involved in new features (product managers, copywriters, designers, developers, …) means having more UI updates on the website. To make things easier, we’ve recently started working on Cobalt, Drivy’s internal design system.

Design systems are a broad topic. This post will only focus on dealing with whitespace across the website.

The problem

In the past, we didn’t have a process for deciding on what value we should use for a given whitespace. Sometimes, a designer would choose a specific value or the developer might decide to take the time to make it pixel perfect. Other times it would just be a question of getting a feeling for “what looked right” during implementation. And then we’d end up with feedback such as “try adding 1 or 2 pixels there”.

This would lead to two main frustrations:

developers and designers would spend time searching for the perfect value™
visual approximations and inconsistency would appear across pages

Choosing values

We knew we wanted to establish a fixed set of values that we could use for all whitespaces. But how to choose them?

We experimented with some of our components (a car card, new landing pages, our new booking page etc.) to see what could work. This also meant playing with typography to find the proper line-heights for every font style. In doing so, we ended up with a 4px baseline.

We then extracted the main values from this set of experimental components. When it made sense to do so, we homogenized and merged some of the values to end up with a reduced set. Several components and a bit of tweaking later, we now use this set of values:

Enforcing it

Having a theoretical set of values is great. But how do you make sure that they will be used?

This is not an issue for our UI team as they are at the core of the design-system project and know what values to use. But what about developers who had been less involved? And, is it now necessary for them to know the specific value of a given spacing as they implement it?

We don’t think so.

We got inspired by what Shopify are doing with Polaris and ended up writing a custom Scss function. Whenever someone needs to add some spacing, they can write the following:

.component {
  margin-bottom: spacing(tight);
}

And the implementation is quite simple:

$unit: 4px;

$spacing-data: (
  none: 0,
  unit: $unit,
  extra-tight: $unit * 2,
  tight: $unit * 4,
  base: $unit * 6,
  medium: $unit * 8,
  loose: $unit * 12,
  extra-loose: $unit * 16
);

@function spacing($variant: base) {
  $value: map-get($spacing-data, $variant);

  @if $value == null {
    @error "Spacing variant `#{$variant}` not found.";
  }

  @return $value;
}

Dealing with edge cases

Components with a border

But what if a component has a bottom border of 1px, setting the whole baseline off? There are various solutions to this issue, such as dropping borders in favor of box shadows. But our approach is quite simple: compensate by removing some spacing.

.component {
  $border-width: 1px;

  padding-bottom: spacing(tight) - $border-width;
  border-bottom: $border-width solid #fff;
}

Component with user uploaded media

We have to accept that the baseline will not be respected. In our case here at Drivy, this occurs when we display pictures uploaded by users, where we want to respect the ratio.

The image has a height of 222px, which does not conform to the base 4 rule.

Conclusion

Our design system is still young and we have a lot of things to improve on and decide about. In the same way that we chose spacing values, we also have conventions for typography and colors.

Seeing as we have a lot of pages in our application, it’s not feasible to migrate the whole site to Cobalt at once. But we’re very excited to slowly roll out our new system throughout the application!

Running Our First Internal Hack Day

Emily Fiennes & Marc G Gauthier — Tue, 03 Apr 2018 00:00:00 +0000

Trying out new technologies, exploring new ideas, investigating potential solutions: these are all fundamental parts of the problem-solving element of a developer’s work. As with any form of creative work, it’s important to keep motivated and one step ahead of ever-changing technological advancements. This encourages a fresh approach, and above all, boosts the enjoyment we take from our work.

That’s why we’ve started holding regular hack days here at Drivy. Every few months, the tech, product and data teams can get together and pool their skills, experience and ideas. The brief? To plan and implement an Drivy-related MVP that could improve the product, but which you wouldn’t usually get the chance to explore or implement in your day-to-day work.

Format

We felt strongly that, if our hack days were to be a sustainable project, it was important to get the format right. This meant setting aside a specific day within our normal working week. Many team members with commitments outside of work wouldn’t have been able to show up, had we held it at the weekend. Furthermore, the direct benefits of the exercise - team cohesion, innovation, boosting motivation - could ultimately be invested back into our productivity and creativity in the long run.

Keeping focus

We decided to venture out of Drivy HQ, so as to focus purely on coding and avoid any unwanted interruptions. This made a nice change of pace from what can be a fast-moving and sometimes intense working environment. Plus, the change of scene made it feel different to a usual working day, and got those creative juices flowing.

Because that’s exactly what it was - another working day, just not at the office.

But there was one problem: we didn’t want to hold the hack day at the weekend, but we also couldn’t desert the office en masse and have no one available to respond to potential production issues either. We thus split the team, with each half hacking on different days, one week apart. Sure, it might have been better if the entire team had been present but this was a good middle ground that actually afforded a certain intimacy and didn’t disturb the usual operations of the website.

One ground rule was that all hackers would quit Slack. If something really critical happened in production, we would revert to a good old-fashioned phone call and a developer would stop hacking in order to address the issue. Thankfully it didn’t come to that, but it’s always reassuring to have a back-up plan.

Teams and projects

We decided on teams and projects beforehand, so were able to get straight down to business and fully capitalise on the 1-day timeframe.

Starting the discussion early also gave us plenty of time to improve our ideas and debate the most interesting, and most feasible, projects to pursue.

Quality time

Some members of our tech team work remotely, but they all came to Paris for the hack day. This was a great chance for everyone to collaborate with people they might not otherwise get to work with. There’s also something to be said for the team-building potential of a hack day in a wider sense. For one day, we collectively ventured outside of our zones of comfort, in terms of the projects we chose, the speed of delivery and the technologies used.

In that context, trust and communication between teammates becomes key to decision-making, and ultimately the experience can make for stronger, more cohesive teams. As the Drivy team continues to grow, and collaboration becomes more squad-oriented, team cohesion and communication will be increasingly important.

Project time!

With 17 developers, there were many projects. Some warrant a blog post in their own right, but in the meantime here is a quick overview:

Rebuild the search using ReactJS, Flux and styled components
Integrate Google Assistant to book cars using Google Home
Pay on Android using credit cards NFC capabilities
Build a Slack chatbot to book meeting rooms
Search and book cars with Alexa
Detect licence plates and more based on user-uploaded pictures
Try out a new way of doing caching
Use machine learning to better understand user behaviour with our new Instant Booking feature
Use iOS 11’s ARKit to locate Drivy cars in the street
Add a chat system to help users encountering problems
Reliably send SMS using Gammu, a sim card and a Raspberry Pi

Here’s a bonus extract of the demo day presentation for the Alexa project by Howard.

Crunch time

Once the two hack days were completed, we held a Demo Day together back at Drivy HQ. It lasted for an hour, where each group had the opportunity to present their objectives, work process and results.

We recorded the demo and shared it with the rest of the company, as many people were curious to learn about what we built. Interest had been sparked by the endless potential for future developments to the product that could be imagined, and it was cool to transcend any tech/non-tech frontiers in this way. Plus the positive feedback further proved the real sense of ownership and investment that all team members have here at Drivy.

With our first hack day successfully under our belt, we’re already eagerly anticipating the next. The day was a valuable opportunity to collaborate with different teammates, to explore novel ideas in a new context and to disrupt our usual working practices. After all, it’s through moving fast and breaking a few things along the way that you can come up with novel solutions to problems you maybe didn’t even know existed.

Redshift tips and tricks - part 1

Faouz EL FASSI — Wed, 21 Mar 2018 00:00:00 +0000

At Drivy we have massively been using Redshift as our data warehouse since mid-2015, we store in it all our transformations and aggregations of our production database and 3rd-party data (spreadsheets, csv, APIs and so on). In this first blog post, we will discuss how we adapted our Redshift configuration and architecture as our usages changed over time.

This article targets a technical audience designing or maintaining Redshift data warehouses: architects, database developers or data engineers. It will aim to provide a simple overview, with a mix of tips to help you scale your Redshift cluster.

To recap, Amazon Redshift is a fully managed, petabyte-scale data warehouse deployed on AWS. It is based on PostgreSQL 8.0.2, uses columnar storage and massively parallel processing. It also has a very good query plan optimizer and strong compression capabilities.

Overview of Redshift's architecture. Source: https://docs.aws.amazon.com

In this first blog post, we will cover the following topics:

how engineers must adapt the default-queue management strategy, called workload management (WLM) to fit their needs;
how to tweak Redshift’ distribution and sorting styles in order to tune table design for improving queries performance, which is crucial for large tables (> ~100M rows).

Usage at Drivy

The big picture is that we have different usages with different SLA levels: from fast-running queries that must be highly available (near real-time reporting for fraud) to long-running batch jobs (e.g: propagating an ID on all the tracking records for all the sessions of all the users across all their devices 😅).

Prior to recent changes, Redshift was subject to roughly 50K requests per day:

~70% were ETL jobs and visualizations jobs, having a high reliability and availability requirement and various execution times [1min, 60min];
~10% were short running queries (< 15min) written by analysts, having no specific SLA;
~20% were very short queries (< 1min), metrics, health and stats (internals of Redshift).

Since a few months ago our usages have slightly changed as more analysts came and a new set of exploratory tools is being used.

We’ve decided to deploy Tableau to all project managers and analysts to improve agility in data-driven decision making. They have started using it with their own credentials to ingest data from Redshift to Tableau.

It resulted in multiplying the concurrent connections to Redshift by two, and a high load on the queue dedicated to analysts, neither fitting the current WLM strategy, therefore breaking our SLAs.

We identified a few levers.

Design a better WLM strategy and monitor it thoroughly.
Improve our schema design:
- create pre-processing ETL pipelines for the frequent extractions that do a lot of aggregations and computations which are responsible for memory issues;
- reduce redistribution among worker nodes of the Redshift cluster for frequent computations with high cardinality;
- leverage AWS S3 if it is a simple extraction of large tables (relocate the data source).

Concurrency issues

Initially we had the following workload management strategy, in addition to the Short Query Acceleration queue set at a maximal timeout of 6 seconds:

queue	concurrency	timeout	RAM
etl	5	∞	50%
visualisations	5	900s	20%
analysts	3	900s	20%
default	2	1800s	10%

When enabled, Redshift uses machine learning to predict short running queries and affect them to this queue, so there is no need to define and manage a queue dedicated to short running queries, for more info.

To face the limitations introduced by the use of Tableau through the credentials of the analysts, we’ve created a dedicated Redshift user group called exploration where we’ve added the Tableau user, using the same Redshift queue as the etl and slightly changed the timeout of the other ones to the following configuration:

queue	concurrency	timeout	RAM
etl, exploration	5	∞	50%
visualisations	4	1500s	20%
analysts	4	1800s	20%
default	2	3600s	10%

We kept the SQA queue and increased its timeout to 20s. This avoids short queries getting stuck behind the long-running ones in the visualisations, analysts and default queues.

This new configuration limited the high load on the analysts queue resulting in queries being queued and frequent out of memory issues, but added some lag on the ETL pipelines.

We wanted to monitor badly designed queries, and queries that are subject to a bad distribution of the underlying data, significantly impacting the queries execution time. WLM gives us the possibility to define rules for logging, re-routing or aborting queries when specific conditions were met.

We decided to log all the queries that may contain errors, such as badly designed joins requiring a nested loop (cartesian product between two tables).

Here is an example of our current logging strategy:

Logging rules for the etl & exploration queue

When the rules are met, the query ID is logged in the STL_WLM_RULE_ACTION internal table. Here is a view to locating the culprit: the query text, the user or system who ran it and the rule name that it is violating (defined in the WLM json configuration file).

CREATE OR REPLACE VIEW admin.v_wlm_rules_violations AS
SELECT
    distinct usename,
    "rule",
    "database",
    querytxt,
    max(recordtime) as last_record_time
FROM STL_WLM_RULE_ACTION w
INNER JOIN STL_QUERY q
ON q.query = w.query
INNER JOIN pg_user u
on u.usesysid = q.userid
group by 1, 2, 3, 4;

Note that the query rules are executed in a bottom-up approach, if 3 rules are defined (log, hop and abort). The query will be logged and then re-routed to the next available queue (⚠️ only for SELECT and CREATE statements) before being aborted.

Now that we have a suitable workload configuration and a few monitoring tools to log badly designed queries, let’s see how to improve query performances to shorten the ETL pipelines!

Schema optimizations

One of the most important aspect of a columnar storage database such as Redshift is to decrease the amount of redistribution needed to perform a specific task.

The only way of approximating it is to define the correct combination of distribution and sort keys.

Here is a recipe for choosing the best sort keys, adapted from AWS documentation:

if recent data is queried most frequently, specify the timestamp column as the leading column for the sort key;
if you do frequent range filtering or equality filtering on one column, specify that column as the sort key;
if you frequently join a (dimension) table, specify the join column as the sort key;
if one of your fact tables has more than ~100M rows and has many dimensions, use an interleaved distribution style.

And, for distribution keys:

distribute the fact table and one dimension table on their common columns;
choose the largest dimension based on the size of the filtered data set;
choose a column with high cardinality in the filtered result set;
change some dimension tables to use ALL distribution (copy the whole table to all compute nodes).

The explain command gives us the opportunity to test different distribution styles by measuring the query cost.

To summarize, using explain it’s really important to follow certain points.

Avoid NESTED LOOP in all your queries.
Limit HASH JOINS: by defining the join condition as distribution and sorting key it will be transformed to a MERGE JOIN -> fastest join style.
Maximize DB_DIST_NONE in your long-running queries: this means that the records are collocated on the same node, thus no redistribution is needed.

You should also be careful regarding the skew ratio across slices of your worker nodes if you have an interleaved sort distribution style, if the data is evenly distributed the load is split evenly across slices of each worker.

Skewed distribution resulting in starvation, computations are as long as the slowest slice (containing most of the data). Source: https://aws.amazon.com/fr/blogs/big-data/top-10-performance-tuning-techniques-for-amazon-redshift/

Bonus tip: this view gives you a full overview of all the tables in your database and it gives, the following information on each table:

its size in MB;
if it has a distribution key;
if it has a sortkey;
its skew ratio.

CREATE OR REPLACE VIEW admin.v_tables_infos as
SELECT
  SCHEMA schemaname,
  "table" tablename,
  table_id tableid,
  size size_in_mb,
  CASE
    WHEN diststyle NOT IN ('EVEN','ALL') THEN 1
    ELSE 0
  END has_dist_key,
  CASE
    WHEN sortkey1 IS NOT NULL THEN 1
    ELSE 0
  END has_sort_key,
  CASE
    WHEN encoded = 'Y' THEN 1
    ELSE 0
  END has_col_encoding,
  CAST(max_blocks_per_slice - min_blocks_per_slice AS FLOAT) / GREATEST(NVL (min_blocks_per_slice,0)::int,1) ratio_skew_across_slices,
  CAST(100*dist_slice AS FLOAT) / (SELECT COUNT(DISTINCT slice) FROM stv_slices) pct_slices_populated
FROM svv_table_info ti
JOIN (
  SELECT
    tbl,
    MIN(c) min_blocks_per_slice,
    MAX(c) max_blocks_per_slice,
    COUNT(DISTINCT slice) dist_slice
    FROM (
      SELECT
        b.tbl,
        b.slice,
        COUNT(*) AS c
      FROM STV_BLOCKLIST b
      GROUP BY b.tbl, b.slice)
    WHERE tbl IN (SELECT table_id FROM svv_table_info)
    GROUP BY tbl) iq
ON iq.tbl = ti.table_id;

Conclusion

This not-too-long blog post highlighted some of the straight forward ways to scale a Redshift cluster, by configuring the best WLM setup, leveraging query rules monitoring and improving query performances by limiting redistribution.

You should also bear the following list of various points in mind when designing your data warehouse:

You’ll need at least 3 times the size of your largest table as available disk space to be able to perform basic maintenance operations;
Use distribution keys to avoid redistribution, and use ALL distribution on small dimensions;
Reduce the use of the leader node as much as possible by leveraging COPY/UNLOAD;
Compress your columns. Pro-tip: don’t compress sort keys columns because there will be more data in each zone map and the SCAN operation will take more time;
Increase batch size as much as possible;
Gain half the IO time in your ETL pipelines by creating temporay tables for pre-processing instead of disposable regular tables: temporary tables are not replicated!

On the last major update of Redshift, Amazon came up with Redshift Spectrum. It is a dedicated Amazon Redshift server independent from the main cluster. Such as many compute intensive tasks can be pushed down to the Amazon Spectrum layer using Amazon S3 as its storage. It uses much less of the cluster’s processing and storage resources and provides unlimitedish read concurrency!

We will deep dive in Redshift Spectrum in the second part of this blog post series.

Meanwhile, don’t hesitate of course to reach me out for any feedback!

Rails 5.1 Change Tracking in Callbacks

Howard Wilson — Mon, 12 Feb 2018 00:00:00 +0000

After recently upgrading to Rails 5.1, we noticed that certain model changes were no longer getting logged properly by PaperTrail. After a bit of digging, this turned out to be due to a subtle difference in the way that Rails now tracks changes.

It’s a little contrived, but let’s say we have a model that becomes active once info is present, like this:

class Car
  after_save do
    self.state = "active" if info_was.blank? && info.present? && state != "active"
    puts changes # Let's see how changes are tracked
    save!
  end
end

In Rails 5, we get something like:

> car.update!(info: "foo")
{
  "info" => [nil, "foo"],
  "active" => [false, true],
  "updated_at" => [Fri, 19 Jan 2018 15:48:12 CET +01:00, Fri, 19 Jan 2018 15:48:15 CET +01:00]
}

However, after upgrading to Rails 5.1, we’ll get:

> car.update!(info: "foo")
{
  "info" => [nil, "foo"],
  "updated_at" => [Fri, 19 Jan 2018 15:48:12 CET +01:00, Fri, 19 Jan 2018 15:48:15 CET +01:00]
}

The state transition is now missing from the tracked changes.

NOTE: As of Rails 5.1, we’ll now see also see the following warning, but it doesn’t give us any indication of this small change in behavior.

DEPRECATION WARNING: The behavior of changes inside of after callbacks will be changing in the next version of Rails. The new return value will reflect the behavior of calling the method after save returned (e.g. the opposite of what it returns now). To maintain the current behavior, use saved_changes instead.

Admittedly this is a small and isolated change, and this illustration is something of an anti-pattern, but this might just help clarify the new behavior for others seeing similar issues!

How we documented our API using unit testing

Christophe Yammouni — Tue, 23 Jan 2018 00:00:00 +0000

At Drivy, we have an internal API to communicate with our native apps available on both iOS and Android. One of the main pain point we experienced is documentation.

As our product is constantly evolving, we need to have up-to-date documentation in order to help both mobile and backend developers stay aware of what each endpoint is expecting and returning.

Static documentation is known to be hard to maintain, it’s indeed easy to forget to update it from time to time. What is more, you quickly end up with differences between documentation and actual behaviour.

The solution we are going to see is something between a static and a live documentation.

Backend stack

Our backend is written in Rails and we use Rspec as part of our test suite. For our API, we use active_model_serializers to handle the view component of the MVC pattern. Here, views are called serializers. We tried other options like RABL, but felt that active_model_serializers was the best choice. For example, its DSL is inspired from Rails, so a Ruby developer does not need special training to learn how to use it, it’s also more simple to do unit testing.

We tend to have a serializer per action and some nested nodes are generated by shared ones.

Our solution

Here’s an example of one of our serializer spec.

let(:expected_json) do
  {
    "id": user.id,
    "first_name": "Ben",
    "last_name": "Driver",
    "email": "[email protected]",
    "phone_number_national": "0612345678",
    "phone_country": "FR",
    "address_line1": "28 rue des paquerettes",
    "address_line2": "appt B",
    "city": "Suresnes",
    "postal_code": "92150",
    "country": "FR",
    "about_me": "I love cars",
    "license_number": "1234567890",
    "license_first_issue_date": "2001-01-01",
    "license_country": "FR",
    "birth_date": "1981-01-01",
    "birth_place": "macon",
    "created_at": "2016-01-01T11:00:00Z",
    "avatar": {
      "thumb_url": "https://drivy-test.imgix.net/uploads/originals/ed3585a06f3ad9a1c945456953cb9ed7.jpeg?auto=format%2Ccompress&crop=faces&dpr=2.0&fit=crop&fm=png&h=100&mask=https%3A%2F%2Fdrivy-prod-static.s3.amazonaws.com%2Fimgix%2Favatar_mask_circle.png&w=100"
    },
    "license_release_date": "2001-01-01",
    "stats": {
      "ratings_average": ratings_average,
      "ratings_count": 2,
      "owner_ended_rentals_count": 1,
      "driver_ended_rentals_count": 1
    }
  }
end

it_behaves_like "a serializer"

shared_examples_for "a serializer" do
  it "matches the expected output" do
    expect(JSON.pretty_generate(generated_json)).to eq JSON.pretty_generate(expected_json)
  end
end

Under the hood, we generate the JSON output from the serializer, and compare it to our expected_json variable. If any difference is spotted, the spec will fail.

What’s really interesting here is that this is an actual test. This means that any change to the serializer will break the spec, preventing a release to production and therefore forcing developers to actually update the test.

So what we see in this file will always be synchronised with the production generated content.

From a quick look, you can learn attribute names, possible values/types, and if some changes are tied to a specific app version, you can have different context, exposing those behaviors.

Conclusion

We’ve been using this solution for more than a year, with new developers joining the team, and it still fits our needs. Contexts and expected outputs are easy to read by a non-Ruby developer.

Of course, we have room for improvements, for example, it’s missing description of attributes, endpoints path, etc. But it’s a win-win solution, because we have all benefits of unit testing, and it acts as a documentation.

With some additional work, we could easily generate a standalone documentation on a regular basis regrouping all serializers output in a single page.

We’ve only been seeing a standalone piece of documentation for what a client can expect to receive from requesting our API. But what about inputs? What does a client need to send as parameters to properly receive those answers. In a future article, we’ll see how we managed to document request inputs.

Pro tips for productivity

Emily Fiennes — Tue, 16 Jan 2018 00:00:00 +0000

Every 2 weeks we hold Tech Talks here at Drivy. These are an opportunity for our now 17-strong team of developers to gather for 1 hour in a non-squad setting. The Tech Talks are emphatically not about tackling issues or solving problems, but are instead an opportunity to explore new ideas, ask questions, or to return to discussions started in a different context.

As a recent new hire to Drivy in a junior capacity, I quickly recognised that the Tech Talks were going to be a valuable addition to my onboarding experience. Any junior, joining an established tech team with rigorous working practices in place, will likely feel daunted as I did. One way I combatted this was by trying to learn as much from my colleagues as possible.

Imagine my glee then, when during a recent edition of the Tech Talks, we were invited to share our productivity tips 🎁. I’ve rounded up a selection of the tips and tricks that were shared, which covered a broad range of topics including Git, shell config and debugging. These tips may be highly valuable to other juniors out there - or indeed anyone looking to streamline their personal working practices. Plus it’s really interesting — and reassuring — to take a step back and see that everyone has their own way of working.

Disclaimer: these are tips that were shared by members of the dev team at Drivy. Many of the points have themselves been sifted from the Internet - no point reinventing the wheel as they say, especially when you’re aiming to move fast and break things 😉.

Git

force-with-lease

Renaud reminded us that the standard

git push -f

overwrites the remote branch with your local branch, thereby potentially overwriting any work that may have been built on top of your original remote branch. The alternative is to

git push --force-with-lease

which affords more flexibility: if any new commits have been added to the remote branch in the meantime, the force-with-lease option will not update the remote branch.

To quote the documentation, ‘it is like taking a “lease” on the ref without specifically locking it, and the remote ref is updated only if the “lease” is still valid’. This is important if it’s possible that another team member has checked out your branch locally and continued working on it.

git_dig

Nico has the following in his .bashrc file

function git_dig {
  git log --pretty=format:'%Cred%h%Creset - %Cgreen(%ad)%Creset - %s %C(bold blue)<%an>%Creset' --abbrev-commit --date=short -G"$1" -- $2
}

With the following command:

$ git_dig rental_agreement

he can subsequently see all commits that contain the string rental_agreement in their diff, whether those commits have been deleted or not. Sometimes you might want to find dead code, or to see the evolution over time of a given method.

Octobox for GitHub notifications

Nico also shared this tool with us, to help keep on top of GitHub notifications in your inbox. Octobox ‘adds an extra “archived” state to each notification so that you can mark it as done.’ Plus, if any new activity occurs on an archived item, it is moved back to your inbox so it won’t get forgotten.

Aliases

Aliases were a new discovery for me, right on Day One of my Drivy adventure. There just wasn’t time to cover this kind of setup in depth at Le Wagon, the 9-week intensive bootcamp where I started to code.

So, for anyone who is in the dark as I was, an alias basically lets you abbreviate a system command or add default arguments to a command you use regularly. For example, to start up the server or to drop and recreate the database, you can create a shorter alias. You write them in the config file of whatever framework you are using for managing your z-shell config (usually ~/.zshrc or ~/.zshenv).

Here’s a quick tour of some of the useful ones that came up. This can give you an idea of how you might use aliases, although of course you’ll need to customise for your own setup and tools:

To start or restart the server, or open your CI:

alias dstart='foreman start -f Procfile.dev'
alias drestart='pgrep unicorn | xargs kill - USR2'

alias circle='open https://circleci.com/gh/drivy/drivy-rails'

Tim suggested the following to quickly migrate and rollback:

alias rdm='rake db:migrate && rake db:rollback && rake db:migrate'

Drivy operates in 6 countries, so there’s a lot of country-specific config. Jean suggested the following alias to quickly open all country files for editing in Sublime:

alias countries='subl lib/drivy/countries/*.rb'

Romain showed how to set aliases for various workspaces:

alias dr='cd ~Workspaces/drivy-rails'
alias da='cd ~Workspaces/drivy-android'
alias di='cd ~/Workspace/drivy-ios'
alias weh='~/Workspaces/drivy-android/scripts/write_device_hosts'

Mike uses the following aliases. The first asks for confirmation before file deletion. The second searches your code for any ‘TO DO’ or a ‘FIX ME’ left as a note-to-self awaiting action.

alias rm='rm -i'
alias todos="ack -n --nogroup '(TODO|XXX|FIX(ME)?):'"

Whatever aliases you choose to set, remember to commit the file somewhere!

Git aliases

Git aliases go in the global .gitconfig file, and can save you time and effort on regularly used commands. The following were suggested by Christophe and Romain

[alias]
      co = checkout
      ci = commit
      st = status
      br = branch
      di = diff
      lg = log --color --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)<%an>%Creset' --abbrev-commit

Plus, less typing means less chance of making a mistake 😇.

Plugins

The Fuck

If you do make mistakes often though, Adrien has a remedy.The Fuck can be installed via Homebrew and is, in their own words, ‘a magnificent app which corrects your previous console command’.

It tries to match mis-spelled commands, and then create a new command using the matched rule like so:

BitBar (Mike)

Mike recommended BitBar to us, for putting the output from any script or program directly in your menu bar. You can either browse their plugins, or write your own if you’ve got what it takes 💪.

He uses it to be able to easily check the status of external tools, or to stop and and start Homebrew services, like so:

Sublime plugins

A real medley of text-editor plugins were mentioned. Some useful ones for Sublime were:

Click to Partial

With ⌘ + shift + click on a Rails partial path, you can open that file in a new tab, as explained by Adrien.

Trailing Spaces

Jean recommended Trailing Spaces. This plugin lets you highlight and delete any redundant trailing spaces in your code, either automatically upon saving or by hand at any time.

BracketHighlighter

Another handy Sublime plugin is BracketHighlighter. It will match a variety of brackets, helping you to avoid unclosed or unmatched brackets. It too is fully customizable.

Debugging

Lastly, a quick word on debugging. We use pry_remote for debugging at Drivy, because running our processes on Foreman means we can’t interact with a regular Pry session.

Jean reminded us that when pry hits its breakpoint,

$ @

can be used in the pry console as a shortcut for

$ whereami

Adrien shared a tip for pry_remote too. You can call:

$ ls

on an object within the pry console, and get a list of associated methods:

This has just been a sample of the points touched upon, and of course there were other, more sophisticated ideas too - beyond the scope of a junior. But if there’s one thing that emerged from the veritable smörgåsbord of tips and tricks shared, it’s that there is no one standardized way of working. Joining an established tech team in a junior capacity can be very overwhelming, so it can be reassuring to remember that sometimes.

Highlights from the 2017 dotJS

Victor Mours — Thu, 11 Jan 2018 00:00:00 +0000

Last month was the dotJS conference in Paris, which we attended, along with some 1400 developers from all over the world. As you could expect, the speakers were stellar. Some talks made us want to ponder our assumptions about how we write JavaScript and come to a more elevated understanding of our front-end ways, and some of them made us want to grab the nearest keyboard and tinker frantically with WebAssembly, TypeScript, or WebGL.

If you don’t have a full day of free time ahead of you to watch all the talks, here are some highlights of the ones we recommend checking out first.

Wes Bos - Async & Await

If you haven’t had time to play with async and await, you could take a look at the documentation by yourself, or you could just kick back, relax, and let Wes Bos do the explaining. He lays out really nicely how you can simplify your code when you’re chaining synchronous promises, and how to handle errors in this case.

Trent Willis - Working Well: The Future of Web Testing

Trent’s talk explores the consequences of the arrival of Headless Chrome in our testing toolkit. Chrome headless now comes with Puppeteer, an API for controlling the browser and its DevTools, which makes for a much better developer experience than the old school Selenium-driven scripts.

Along with the DevTool profiler and the accessibility testing library axe-core, these open up the possibility of a shift of what we expect from our tests. We can now go from “does this code work?” to “how well does this code work?”. This allows a more nuanced, yet measurable way of seeing our code.

I think what really lies beyond this is the topic of code metrics. While we have a fairly established way of taking code validity into account in the standard development workflow - either the test suite runs a green build and the code can be merged, or it is red and must be fixed before merging - there doesn’t seem to be a standard way of taking code quality metrics into account.

One way some teams deal with this is to consider the build to be green if it improves code quality metrics overall, and red if it doesn’t. But that process can break down easily. What if a change to the codebase were to considerably improve accessibility, while degrading performance a bit?

Would it be acceptable to merge it? Obviously the answer is “It depends”. If we want to make informed decisions based on code quality, we should also be considering other factors that are harder to measure, such as code readability and maintainability.

I feel like these are still open problems, and I’m excited to see what comes out of the more widespread use of these tools in the years to come.

The video of the talk is not online yet, but this article from Trent’s blog is worth a read.

Adrian Holovaty - A framework author’s case against frameworks

As we’re all three node packages away from irrecoverable JavaScript fatigue, this was one of the most refreshing talks of the conference. Adrian gave us a solid reminder that we may want to chill out about keeping up with every framework out there, maybe focus a bit more on the patterns than on the frameworks, and take it easy. He’s building Soundslice in plain vanilla JavaScript, and it’s an impressive app, so he’s probably got a fair point.

…And if you’ve still got time, check these out:

Feross Aboukhadijeh built the best annoying website to send to spammers. And it’s hilarious.
Suz Hinton told an inspiring story about the design of the accessibility icon. The video is not yet available, but in the meantime, I encourage you to find out more here.

dotCSS 2017 hightlights

Jean Anquetil — Thu, 21 Dec 2017 00:00:00 +0000

dotCSS is the largest CSS conference in Europe. The 2017 edition occurred late November in Paris and it was a great opportunity to exchange and learn from the community. As you can play back the talks on the dotConferences website, this post won’t dive into the details of all the conferences, this is a digest of what what was most interesting for us.

A search engine built with CSS only

Tim Carry from Algolia told us how he built a search engine with CSS only. It uses the power of selectors, as the language is super powerful at targeting elements in the HTML and applying style to them.

The search bar is a simple input tag which has a value attribute. Thus, a CSS selector will be able to match this input when it has a specific value.

Actually the search engine cheats somewhat, as the value attribute has to be updated each time something is typed with a JavaScript statement. But this is the only line of JS to make it work 🙂

Link to the talk

The `grid` CSS property

Benjamin De Cock from Stripe showed us how cool the grid CSS property is. This property is mainly used by Stripe on their website’s backgrounds and often combined with a skewY property.

Quickly, here’s how it works: the grid area is made of multiple span tags where each one can receive a specific style. In comparison, Flexbox is great when working on a single axis but grids are easier to deal with on two axes.

We are also using a grid on the Drivy homepage

Unfortunately the browser support is not that good yet: depending on your target audience, decide if you need a fallback style or if it is reasonable to drop support for older browsers. Anyway using the grid CSS property with simple rules gives a bunch of possibilities.

Link to the talk

Write less CSS

Adam Detrick, Web Engineer & Design Systems Lead at Meetup talked about the difficulty of maintaining a sustainable style codebase.

He explained that as reading CSS is too complicated we are used to writing new components and to adding new styles. We should write less of it and break the vicious circle by framing the problem correctly and thinking more about the developer experience.

For instance, he mentioned that using these kinds of utility classes could be a step in the right direction:

at[breakpoint]_[property]--[variant]

These are small sharp tools
You style by memory, these classes are easy to understand and to remember
This leads to a quick implementation

An example would be atLarge_align--center where the text inherits the default left alignment but at large viewports the text is centered.

Finally he also mentionned the fact that the documentation should be kept as close to the code as possible. In this way, using a documentation generator could be useful.

Media queries level 4

Member of the CSS Working Group, Florian Rivoal presented the Media Queries Level 4 which offers some syntax improvements regarding the range, the boolean logic and shortcuts.

He also explained that using Media Types is not a good idea anymore. For instance, using @media screen {...} goes for a smartphone, a TV screen, a laptop… Perhaps we want different behaviors on each ones: using Media Features we could match with a precise device such as a Wii, a phone, a e-ink media and so on.

Finally, he gave us some best practices.

Don’t:

use wrong media features as proxy,
set breakpoints based on popular devices,
try not to be too specific,
use px to define size.

Do:

try not to use Media Queries that much, there is Flexbox, grid, etc,
use all tools in the box,
set breakpoints for where your design would break,
use em to define size.

Link to the talk

Typography - Axis-Praxis

Laurence Penney created Axis-Praxis which is an environment for playing with Variable Fonts.

You choose fonts then you can adjust sliders and play with precise settings on the variations axes built into them. It relies on the font-variation-settings CSS property and this lets us experiment what the future of Variable Fonts may be.

Check it out on Axis-Praxis.org 😉

Link to the talk

Embulk: move easily data across datasources

Antoine Augusti — Mon, 11 Dec 2017 00:00:00 +0000

At Drivy, we heavily use Embulk for our data needs. Embulk is an open-source data loader that helps data transfer between various databases, storages, file formats, and cloud services. It can automatically guess file formats, distribute execution to deal with big datasets, offers transactions, can resume stuck tasks and is modular thanks to plugins.

Embulk is written in JRuby and the configuration is specified in YAML. You then execute Embulk configuration files through the command line. It’s possible to inject environment variables and other configuration files can be embedded thanks to the Liquid template engine.

Overview of Embulk's architecture and its various components

The available components are the following:

Input: specify where the data is coming from (MySQL, AWS S3, Jira, Mixpanel etc.)
Output: specify the destination of the data (BigQuery, Vertica, Redshift, CSV etc.)
File parser: to parse specific input files (JSON, Excel, Avro, XML etc.)
File decoder: to deal with compressed files
File formatter: to format specific output files (similar to parsers)
Filter: to keep only some rows from the input
File encoder: to compress output file (similar to decoders)
Executor: where do Embulk task are executed (locally or Hadoop)

The plugins are listed on the Embulk website and are usually available on GitHub. If needed, you can write your own plugin.

Usage at Drivy

At Drivy, we currently have a bit less than 150 Embulk configuration files and we perform nearly 1,200 Embulk tasks everyday for our ETL needs running on Apache Airflow. Our main usage is to replicate tables coming from MySQL to Amazon Redshift, our data warehouse.

From MySQL to Redshift

For example, here is the Embulk configuration file we use to pull data about push notifications from MySQL to Redshift, incrementally.

This is stored in push_notifications.yml.liquid:

{% include 'datasources/in_mysql_read_only' %}
  table: push_notifications
  incremental: true
  incremental_columns: [updated_at, id]
{% include 'datasources/out_redshift' %}
  table: push_notifications
  mode: merge
  merge_keys: [id]

This short configuration file uses powerful concepts. First, it leverages incremental loading to load records inserted (or updated) after the latest execution. In our case, we will load or update records according to the value of the latest updated_at and id columns. Records will be merged according to the id column, which is a primary key. Secondly, we use the Liquid template engine to pull two partials. datasources/in_mysql_read_only is used to specify the common MySQL configuration for the input mode and datasources/out_redshift is used to specify the Redshift configuration for the output mode.

Here is what the file datasources/out_redshift.yml.liquid looks like:

out:
  type: redshift
  host: {{ env.REDSHIFT_HOST }}
  user: {{ env.REDSHIFT_USER }}
  password: {{ env.REDSHIFT_PASSWORD }}
  ssl: enable
  database: {{ env.REDSHIFT_DB }}
  aws_access_key_id: {{ env.S3_ACCESS_KEY_ID }}
  aws_secret_access_key: {{ env.S3_SECRET_ACCESS_KEY }}
  iam_user_name: {{ env.S3_IAM_USER_NAME }}
  s3_bucket: {{ env.S3_BUCKET }}
  s3_key_prefix: {{ env.S3_KEY_PREFIX }}
  default_timezone: {{ env.REDSHIFT_DEFAULT_TIMEZONE }}

Basically, it describes how to connect to our Redshift cluster and it respects the format defined by the Redshift output plugin for Embulk. Note that we reference almost only environment variables that will be injected at runtime. This is used to keep secrets out of the codebase and gives us the ability to switch easily between several environments (production and staging for instance).

Running the script is then as straightforward as executing the Bash command

embulk run push_notifications.yml.liquid -c diffs/push_notifications.yml

after setting the required environment variables. Embulk will keep the last values for the updated_at and id columns in diffs/push_notifications.yml for future executions.

The diffs/push_notifications.yml file looks like this:

in:
  last_record: ['2017-12-11T13:51:41.000000', 11230196]
out: {}

From a CSV file to Redshift

Here is how we import CSV files into Redshift.

Embulk ships with a CSV guesser, that can automatically build a configuration file from a CSV file.

If we start from a sample configuration file like this one that we will write in cr_agents.yml.liquid:

in:
  type: file
  path_prefix: {{ env.EMBULK_PATH_PREFIX }}
  parser:
{% include 'datasources/out_redshift' %}
  table: cr_agents
  mode: truncate_insert

and by running the Bash command

EMBULK_PATH_PREFIX=/tmp/agents.csv embulk guess cr_agents.yml.liquid

Embulk will then generate the appropriate CSV boilerplate like this, after parsing our CSV file.

in:
  type: file
  path_prefix: {{ env.EMBULK_PATH_PREFIX }}
  parser:
    charset: UTF-8
    newline: CRLF
    type: csv
    delimiter: ','
    quote: '"'
    escape: '"'
    null_string: 'NULL'
    trim_if_not_quoted: false
    skip_header_lines: 1
    allow_extra_columns: false
    allow_optional_columns: false
    stop_on_invalid_record: true
    columns:
    - {name: id, type: long}
    - {name: drivy_id, type: long}
    - {name: zendesk_user_id, type: long}
    - {name: talkdesk_user_name, type: string}
    - {name: first_name, type: string}
    - {name: last_name, type: string}
    - {name: country, type: string}
    - {name: second_country, type: string}
    - {name: third_country, type: string}
    - {name: fourth_country, type: string}
    - {name: is_drivy, type: boolean}
    - {name: is_active, type: boolean}
{% include 'datasources/out_redshift' %}
  table: cr_agents
  mode: truncate_insert

You can then adjust manually the configuration for the CSV parser if needed.

Finally, we’re now ready to import our CSV file into Redshift. This can be done thanks to the Bash command

EMBULK_PATH_PREFIX=/tmp/agents.csv embulk run cr_agents.yml.liquid

Because we specified that we want to use the truncate_insert mode for the output plugin, Embulk will delete first every record in the destination table cr_agents before inserting rows from the CSV file.

Conclusion

I hope you now have a quick grasp of what Embulk is and how it can speed up your data import and export tasks. With simple configuration files and a good plugin ecosystem, it is our go-to solution almost every time we need to perform data transfers in our ETL.

Sending an e-mail to millions of users

Adrien Siami — Mon, 20 Nov 2017 00:00:00 +0000

Recently, we had to send an e-mail to all our active users. For cost reasons, we decided to invest a bit of tech time and to go with transactional e-mails instead of using an e-mail marketing platform.

While it would certainly be quite straightforward for, say, hundreds or even thousands of users, it starts to get a bit more complicated for larger user bases.

In our case, we had to send the e-mail to ~1.5 million e-mail addresses.

In this blog post, I’ll quickly explain why a standard approach is not acceptable and go through the solution we chose.

A naive solution

Let’s implement a very naive way to send an e-mail to all our users. We’re going to create a job that loops through all the users and enqueues an e-mail.

class MassEmailJob < ApplicationJob

  queue_as :default

  def perform
    User.find_each do |user|
      Notifier.some_email(user).deliver_later
    end
  end

end

Now let’s see what could go wrong.

Your job might get killed

Looping through millions of users is not free and it will most likely take a fair amount of time.

During this time, you’re maybe going to deploy, restarting your job manager, and killing your job. Now you don’t know which users have received the e-mail, and which have not yet.

One easy fix would be to run the code outside of your job workers, maybe in a rake task, but you have to make sure it won’t get killed, or that if it’s killed, you can resume it without any issue.

You are going to get blacklisted from e-mail providers

E-mail providers don’t like spam. If you send thousands of e-mails from the same IP in a short time, you’re guaranteed to get throttled or even blacklisted.

Therefore, it is necessary to space out the e-mails a bit, for example, adding a 30s delay every 100 e-mails.

You are going to congest your job queue

Every e-mail to be sent equals a job run in your job queue: if you enqueue millions of jobs in the same queue you use for other operations, you’re going to create a lot of congestion.

Therefore, you’d probably want to have a special queue only for your sending with a dedicated worker.

Our solution

First, let’s list the requirements we had in mind:

We wanted to be able to enqueue as many or as few e-mails to be able to test the water first (check deliverability, congestion) and then scale up
We wanted to easily be able to establish those users for whom we had scheduled an email, and those users who were still waiting.
We had to be able to stop sending emails quickly in case something went wrong, and we had to be able to resume it without losing data.

Redis to the rescue

Redis is an amazing multi-purpose tool, it can be used for storing short lived data such as cache, used as a session store, etc. It has a multitude of useful data structures, the one we’re going to use today is the Sorted Set.

A sorted set it a bit like a hash / dictionary / associative array. It contains a list of values, and each of these values has a score.

Redis offers very useful functions to deal with sorted sets, let’s have a look at one in particular.

`ZRANGEBYSCORE`

This function returns a range of n elements from the sorted set, with a score included between min and max, can you see where this is going? :)

We’re going to store all our user ids in a sorted set, with a score of 0, and change that score to 1 when we enqueue an e-mail for them.

Then, it’s really easy to ask for any number of users for whom we haven’t enqueued the e-mail, using ZRANGEBYSCORE.

Building the sorted set

Let’s create a rake task to populate a sorted set with our user ids.

task :populate_users_zset => :environment do
  redis =  Redis.new(YOUR_CONFIG)
  User.select('id').find_each.each_slice(100) do |users|
    redis.multi do
      users.each do |user|
        redis.zadd('mass_email_user_ids', 0, user.id)
       end
     end
  end
end

Here I’m using MULTI to add the user ids 100 by 100 to the set in transactions, to go easy on redis CPU.

While this task may take quite some time, it is safe to re-launch if killed.

Enqueuing a number of e-mails for send

Now that we have our sorted set, let’s write another task. This one will pick a given number of user ids from the set and enqueue an e-mail for them, while spacing out the sends in time a bit.

task :send_email_batch, [:batch_size] => :environment do |t, args|
  redis = Redis.new(YOUR_CONFIG)
  ids = redis.zrangebyscore('mass_email_user_ids', 0, 0, limit: [0, args.batch_size])
  delay = 30.seconds

  ids.each_slice(100) do |ids_slice|
    ids_slice.each do |id|
      Notifier.some_email(User.find(id)).deliver_later(wait: delay)
      redis.zadd('mass_email_user_ids', 1, id, xx: true)
    end
    delay += 30.seconds
  end
end

Here I get as many user ids as requested thanks to ZRANGEBYSCORE and its limit option. I then iterate over the ids and enqueue the jobs 100 by 100, while delaying the sending by 30 seconds each time.

And that’s it! Thanks to this system you can gradually increase your e-mail batches while keeping an eye on deliverability.

Send 100 mails to test it out:

rake 'your_namespace:send_email_batch[100]'

Everything looks good ? Send 1000, then 10000, etc.

Then it’s easy to know how many e-mails are left to be scheduled: just pop a redis console and ask away using zcount!

Remaining e-mails to schedule:

ZCOUNT mass_email_user_ids 0 0

E-mails already scheduled or sent:

ZCOUNT mass_email_user_ids 1 1

Cons

Obviously there is no perfect solution, here are a few downsides:

Quite a few manual actions

This is clearly not a fire and forget solution, it needs the attention of a dev for a little bit of time: enqueuing the sends, monitoring, waiting for a batch to finish and then send another one, etc.

However, this kind of sending is usually rare but important, so having it done right is worth the effort.

Stopping the machine, is possible, but at a cost

If you enqueue a lot of small batches, you’re going to be fine, but at some point you are going to enqueue batches of 100k e-mails or even more.

What if something goes wrong (deliverability dropping, etc) and you want to stop everything to have a look? You would need to stop the dedicated worker but the jobs are already enqueued, meaning that if you don’t resume for a long time, when starting over the jobs are going to run without delay and you may experience congestion or throttling from your e-mail provider.

This is a risk we were willing to take and that we mitigated with strong monitoring and cautious batching.

Conclusion

This solution worked well for our needs, but as always, your mileage may vary!

Sending millions of e-mails is tricky, but is an interesting problem to solve. Thanks to a bit of custom dev and redis, we were able to send our e-mail in a reasonable amount of time with excellent deliverability.

Multi-currency support in Java

Romain Guefveneu — Mon, 20 Nov 2017 00:00:00 +0000

For a few weeks, Drivy has been available in the United-Kingdom. Unlike the others European countries where Drivy operates, the United-Kingdom uses a different currency: the pound (£). We had to make some changes in our Android apps to support this.

Server-side or Client-side Formatting?

At Drivy, formatting is generally done server-side, we just display the values as they are:

Here, prices are formatted server-side, depending on the search place (London, so £), and the app’s locale (french).

But for some specific features we need client-side formatting, for instance an input field. Let’s dive into some Java APIs to see how they can help.

Formatting

First thing first, how to format a currency? The position of the currency symbol doesn’t depend on the currency itself, but on the country of the locale. That means we’ll display “1 234,50 GBP” in French, and “€1,234.50” in English:

	€	£	$
France	1 234,50 €	1 234,50 GBP	1 234,50 USD
Switzerland	EUR 1’234.50	GBP 1’234.50	USD 1’234.50
Italy	€ 1.234,50	GBP 1.234,50	USD 1.234,50
United-Kingdom	€1,234.50	£1,234.50	USD1,234.50
USA	EUR1,234.50	$1,234.50	$1,234.50

As you can see, the currency symbol is not always displayed. Since there are multiple currencies using the same symbol (e.g. United States dollar and Canadian dollar), we will instead display the currency code if there is any ambiguity (well, except for € in en_US 🤷‍).

Alright, how do we do that in Java? Pretty simple:

Locale countryLocale = Locale.FRANCE;
Locale currencyLocale = Locale.UK;

NumberFormat currencyFormat = NumberFormat.getCurrencyInstance(countryLocale);
Currency currency = Currency.getInstance(currencyLocale);
currencyFormat.setCurrency(currency);

System.out.println(currencyFormat.format(1234.5f));
//1 234,50 GBP

If we dig a bit deeper, we can extract the format pattern:

DecimalFormat currencyFormat = (DecimalFormat) NumberFormat.getCurrencyInstance(Locale.FRANCE);
System.out.println(currencyFormat.toPattern());
//#,##0.00 ¤

#,##0.00 ¤ is the French currency format, and it doesn’t depend on any currency. ¤ is the currency symbol and behaves like a placeholder for the currency symbol or code.

Input field

Building our own currency input field is not very complicated. The main issue to solve is to know where to draw the currency symbol. Indeed, depending on the currency format, we have seen that the symbol can be either before the value or after the value. As we certainly don’t want to parse the format pattern, these four DecimalFormat methods will be useful:

getNegativePrefix()
getPositivePrefix()
getNegativeSuffix()
getPositiveSuffix()

Here are some interesting values:

	NegativePrefix	PositivePrefix	NegativeSuffix	PositiveSuffix	Example
France	-		€	€	-1 234,50 €
Denmark	kr -	kr			kr -1.234,50
Netherlands	€	€	-		€ 1.234,50-
United-Kingdom	-£	£			-£1,234.50
USA	($	$	)		($1,234.50)

Now all we have to do is to draw the prefix and postfix!

Conclusion

With the help of just a few provided APIs, we have seen that formatting a currency can be easy. So don’t try and format currencies by hand!

Data quality checkers

Antoine Augusti — Thu, 09 Nov 2017 00:00:00 +0000

At Drivy, we store, process and analyse hundreds of gigabytes of data in our production systems and our data warehouse. Data is of utmost importance to us because it makes our marketplace run and we use it to continuously improve our service.

Making sure that the data we store and use is what we expect is a challenge. We use multiple techniques to achieve this goal such as high standard coding practices or checker jobs we run on production data to make sure that our assumptions are respected.

Defining data quality

There are several research papers discussing the data quality dimensions as professionals have a hard time agreeing on the terminology. I found that the article written by the DAMA UK Working Group successfully defines 6 key dimensions that I summarize as follows:

Completeness: are all data items recorded? If something is mandatory, 100% completeness will be achieved. Sampling data does not achieve completeness for example.
Consistency: can we match the same data across data stores?
Timeliness: do we store data when the event occurred? For example, if we know that an event occurred 6 hours ago and we stored it only 1 hour ago, it could break a timeliness constraint.
Uniqueness: do we have duplicate records? Nothing will be recorded more than once based upon how a record is identified.
Validity: do we store data conforming to the syntax (format, type, range of values) of its definition? Storing a negative integer for a user’s age breaks the validity of the record for example.
Accuracy: does the data describe the real-world? For example if a temperature sensor is malconfigured and reports wrong data points that are still within the accepted validity range, the data generated is not accurate.

For example in a log database, uniqueness does not always need to be enforced. However, in another table aggregating these logs we might want to enforce the uniqueness dimension.

Data quality in the context of data warehousing

My main goal was to enforce a high quality of data in our data warehouse, which we fill with standard ETL processes.

For our web application, we already have checker jobs (we talked about this in this blog post) in the context of a monolith Rails application with MySQL databases. They are somewhat simpler: they run on a single database and check the quality of data we have control over because we wrote code to produce it. We can also afford to perform migrations or backfill in case we detect a corruption and want to fix the data.

When working with ETL processes and in the end a data warehouse, we have different needs. The main issue we face is that we pull data from various databases, third parties, APIs, spreadsheets, unreliable FTPs connections etc. Unfortunately, we have little or no control over what we fetch or collect from these external systems. Working with external datasources is a hard challenge.

We ingest raw data, we build aggregates and summaries, and we cross join data. Freshness depends on the source of the data and how we extract it. We don’t want alerts on data that is already corrupted upstream (this point is debatable), but we want to know if an upstream datasource gets corrupted. We usually want to compare datasets side by side (especially when pulling from another database) to make sure that the consistency dimension is respected.

Overall, I find it hard to enforce a strict respect of all data quality dimensions with 100% confidence, as data we pull upstream will never fully respect what was advertised. Data quality checkers can help us in improving our data quality, make sure preconditions hold true and aim for better data quality in the long run.

Abstractions

Now that we have a clearer idea about what data quality dimensions are and what we want to achieve, we can starting building something. My goal was to be able to perform checks to prove that data quality dimensions are respected. I had to come up with high-level abstractions to have a flexible library to work with and this research article helped me.

My key components can be defined as follows:

Data quality checks are performed at a specified interval on one or multiple datasets that are coming from various datasources, using predicates we define. Checks have a tolerance and trigger alerts on alert destinations with an alert level defined by the severity of the found errors.

Let’s define each word used here:

Alert levels: define how important the error is
Alerters: alert people or systems when errors are detected
Checkers: perform predicate checks on datasets
Parsers: create datasets from a source (parse a CSV file, read database records, call an API etc.)
Tolerance levels: tolerate some errors on a check (number, percentage, known broken points)
Escalation policies: switch alert destination depending on alert level
Logger: logs failing datasets somewhere
Clock: defines when a checker should be executed
Scheduler: run checks when they are up for execution

Checkers

Checkers are the most important components of the system. They actually perform the defined data quality checks on datasets. When implementing a new checker, you write a subclass from one of the abstract checkers supporting the core functionalities (extraction types, alert destinations, alert levels, logging etc.)

Available checkers:

PredicateSingleDatasetChecker: check that each element of the dataset respects a predicate
OffsetPredicateSingleDatasetChecker: given a predicate, an offset, check that two elements separated by the given offset respect the predicate. This is very useful to compare time records for example
PredicateDoubleDatasetsChecker: iterate on 2 datasets at the same time and check that the 2 records respect a predicate

Scheduler

We rely on Apache Airflow to specify, schedule and run our tasks for our data platform. We therefore created a pipeline putting together the data quality checks library with Airflow tasks and scheduling capabilities to easily run checks.

The main pipeline is executed every 15 minutes. Each data-quality check is composed of 2 main tasks:

a task with a ShortCircuitOperator which determines if the quality check needs to be executed now or not. If the quality check is not up for running, the second task is skipped
a task with a SubDagOperator to actually run the check: extract the dataset, run the checker and perform any alerting if needed.

Airflow directed acyclic graph in charge of running the various data quality checks

Airflow sub graph of a quality check running on two datasets side by side

Alerts

When a check is executed and detects a malfunction, we get alerted. For now on we only use Slack, but there is room for other alerters such as text messages, PagerDuty or emails.

When an alert triggers, we get to know what’s the alert, what’s the purpose of the associated check, how important the alert is with the number of falsy elements etc. Remember that alerts can have a certain level of tolerance - some errors can be tolerated - and different alert levels to help triage alerts. We get a quick view of data points which failed the check to have a rough idea about what’s going on, without jumping to the logs or looking immediately at the dataset.

Sample alert message on Slack showing a breach of SLA for data freshness

If we need to investigate further, we can look at the logs in Airflow or inspect the raw dataset. We find it convenient to have alerts in Slack so that we can start threads explaining why an alert triggered and if we need to take actions.

The future

We’ve been using these data quality checks over the last 3 months and we’ve been really happy to have them. It makes us trust more our data, helps us detect issues or prove that assumptions are indeed always respected. It’s also a good opportunity to step up our data quality level: we can lower thresholds over time, review SLAs and put more pressure on the level of confidence we have in critical parts of our storage.

For now, we plan to add more checkers (we have currently 20-30 checkers) to see if we’re happy with what we have, improve it and keep building on it.

Open source

We thought about open sourcing what we built, but we think that it’s a bit too soon and we want to gain more confidence before publishing it on GitHub.

Ideas and thoughts

If data quality is of interest to you and you want to react to this blog post, I would be thrilled to hear from you! Reach out on my Twitter.

Code sample

To get an idea of what a data quality checker looks like, here is a sample quality check which checks if data is fresh enough for various tables in our data warehouse (Redshift). This class can easily be tested, to have automated tests proving that alerts trigger with specific datasets.

This class is complete enough so that Airflow can know how to extract data from Redshift, transform and run the check automatically.

# -*- coding: utf-8 -*-
import datetime
from datetime import timedelta

from data_quality.alert_levels import FailingElementsThresholds
from data_quality.checkers import PredicateSingleDatasetChecker
from data_quality.tolerances import LessThan
from data_quality_checks.base_checkers.base import BaseQualityCheck
from data_quality_checks.base_checkers.base import DatasetTypes
from data_quality_checks.base_checkers.base import ExtractionTypes
from data_quality_checks.base_checkers.base import ScheduleTypes

class DataFreshness(BaseQualityCheck):
    # Run a query on Redshift
    EXTRACTION_TYPE = ExtractionTypes.REDSHIFT_QUERY
    # Dataset can be parsed from a CSV
    DATASET_TYPE = DatasetTypes.CSV
    SCHEDULE_TYPE = ScheduleTypes.CRON
    CRON_SCHEDULE = '20,50 7-22 * * *'

    def alert_level(self):
        # 0-2: warning
        # 2-3: error
        # > 3: critical
        return FailingElementsThresholds(2, 3)

    def tolerance(self):
        # Get notified as soon as we have a single issue
        return LessThan(0)

    def description(self):
        return 'Check that data is fresh enough in various tables'

    def checker(self, dataset):
        class Dummy(PredicateSingleDatasetChecker):
            def __init__(self, dataset, predicate, options, parent):
                super(Dummy, self).__init__(
                    dataset, predicate, options
                )
                self.parent = parent

            def checker_name(self):
                return self.parent.__class__.__name__

            def description(self):
                return self.parent.description()

        fn = lambda e: e['last_update'] >= self.target_time(e['table_name'])
        return Dummy(
            dataset,
            fn,
            self.checker_options(),
            self
        )

    def freshness_targets(self):
        conf = {
            5: config.FINANCE_TABLES,
            8: config.CORE_TABLES,
            24: config.NON_URGENT_TABLES
        }
        res = []
        for lag, tables in conf.iteritems():
            for table in tables:
                res.append({'table': table, 'target': timedelta(hours=lag)})
        return res


    def freshness_configuration(self, table):
        targets = self.freshness_targets()
        table_conf = [e for e in targets if e['table'] == table]
        if len(table_conf) != 1:
            raise KeyError
        return table_conf[0]

    def target_time(self, table):
        now = datetime.datetime.now()
        lag = self.freshness_configuration(table)['target']
        return now - lag

    def query(self):
        parts = []
        for table_conf in self.freshness_targets():
            query = '''
                SELECT
                    MAX("{col}") last_update,
                    '{table}' table_name
                FROM "{table}"
            '''.format(
                col=table_conf.get('col', 'created_at'),
                table=table_conf['table'],
            )
            parts.append(query)
        the_query = ' UNION '.join(parts)
        return self.remove_whitespace(the_query)

Sanitize your attributes through your form object

Jean Anquetil — Tue, 17 Oct 2017 00:00:00 +0000

At Drivy, we use the Virtus gem to build form objects in our codebase. This lets us:

Keep our business logic out of the Controller and Views
Deal with unpersisted attributes
Add specific validations instead of adding them directly in the model
Display custom data validations errors directly in the form
Use features from ActiveModel::Model by including it

Sometimes, we have to sanitize user input: format the data, remove whitespaces and so on. Here is a convenient way to handle it with Virtus.

Using #coerce

Let’s imagine that we want to remove all the whitespaces from a VAT number recorded as a string. This is a pretty simple use case, but concepts will apply to more complex situation as well.

First we have to define a custom attribute object for the attribute we want to sanitize. It has to inherit from Virtus::Attribute in order to use the coerce method. Then, in this method, we just have to define the reformatting we want to perform.

class SanitizedVatNumber < Virtus::Attribute
  def coerce(value)
    value.respond_to?(:to_s) ? value.to_s.gsub(/\s+/, '') : value
  end
end

Next, in your Virtus form object, we specify the vat_number attribute - the one we want to update - as a SanitizedVatNumber:

class CompanyForm
  attribute :vat_number, SanitizedVatNumber

  def initialize(...)

  end
end

And there we have it! The vat_number will be sanitized once the form is submitted.

Testing it with Rspec

It is also easy to add basic tests on this custom Virtus attribute, for instance by using Rspec:

describe SanitizedVatNumber do
  let(:object) { described_class.build(described_class) }
  subject { object.coerce(value) }


  context 'when vat_number is nil' do
    let(:value)  { nil }

    it { is_expected.to eq('') }
  end

  context 'when vat_number has white spaces' do
    let(:value)  { 'EN XX 999 999 999' }

    it { is_expected.to eq('ENXX999999999')}
  end
end

Conclusion

You avoid giving too much responsibility to your form object, which would be the risk of sanitizing attributes directly inside the form. Plus, Virtus custom coercion can be reused across multiple forms, and lends itself well to be easily unit tested.

Evolution Of Our Continuous Delivery Process

Marc G Gauthier — Wed, 27 Sep 2017 00:00:00 +0000

We’ve always valued releasing quickly, as unreleased code is basically inventory. It slowly gathers dust and becomes outdated or costs time to be kept updated. Almost 3 years ago we published an article “Drivy, version 500!” on our main blog, so I feel now is the time to get more into the details of how we accomplish pushing a lot of new versions of the app to production.

Our Different Iterations

As in most cases, we keep 改善 in mind and go for continuous improvement over one huge definitive solution straight away. We don’t try to build the perfectly automated system that handles all cases, when there are only 2 developers and no users. Probably obvious, but it’s something always worth repeating.

In this article I’ll try to explain chronologically all the steps we went through over the course of 5 years, so as to showcase the evolution of our tools and processes. Of course if you want to see how we do it now, go straight to the last step… but it might already be outdated by the time you read this!

Adding Tests & Releasing Manually

Drivy used to be in PHP, but for various reasons we decided to move away from it and use Ruby on Rails. At this point we started creating automated tests for everything we were doing, and have kept adding to our test suite since then. This is very important because without automated tests, you will never be able to release often whilst avoiding major bugs.

First specs added in 2012

As far as pushing to production was concerned, we tended to basically do nothing about automation, and would just release manually after running to the specs on our machines. This was fine because we were not a lot of developers and therefore didn’t move too quickly, and there were not too many specs yet so the build was fast.

Note that I don’t mention Capistrano or similar tools. This is because we are hosted on Heroku, and they provide their own simple toolbelt for deployment. However, if we were not using this provider, I feel like the minimal first step would be setting up something like Capistrano.

Improved Process

Quickly we developed a simple process loosely inspired by Kanban and based on Github tags and Huboard to be able to visualize progress. This would allow us to quickly see if a given commit could be deployed or not, and therefore to release faster without the need for additional back and forth.

To do so we started linking every commit to issues and used tags such as:

Backlog: This needs to be done eventually
Todo: This needs to be done soon
Going: Someone is currently working on it
On Staging: The ticket is mostly completed and is being tested on the staging environment
Ready For Prod: The ticket has been tested on staging and works as expected
On Prod: The ticket has been deployed to production

Closing an issue would mean “the ticket has been proved to be successful in production, with no bugs or regressions of any kind”.

Huboard a few years ago

Since then we moved from Huboard to waffle which was more stable and quicker, but there are tons of options out there nowadays including Zenhub or Github project boards.

Note On Documentation

At this point we were already half a dozen developers and we needed simple ways to share information with newcomers. We started adding and maintaining more documentation on the important parts of the release process.

Release documentation from the Drivy wiki

Getting Migrations Right

Releasing code might also imply changing the database schema, which can get tricky. Rails makes it easier since it uses migrations, but we’ve added a couple of ground rules in order to be effective:

Never edit/remove a migration file that has been merged into master. Add new migrations instead.
Commit the migration file (and schema.rb) in its own commit if the migration needs to be done in two steps. There shouldn’t be other files in the commit. This makes it easier to do a zero-downtime deployment.

There are a lot of articles and resources on how to do zero downtime deployments, so I won’t get into details here, but know that’s something required in order to ship fast.

Jenkins

After a bit we added Jenkins, a continuous integration server. It would run on a spare mac mini in the open space. This was a good improvement because it would help make sure we always ran tests and that any red build would be noticed.

Jenkins also had the great advantage of deploying to our staging environment right after a succesfull build.

Shell Deploy Script

Since we already had a documented flow to release and a Jenkins server, it was only a matter of time until we could automate it. The thing that made us decide to automate was the fact that, with people joining the team, we were afraid that it would slow down releases and create larger and therefore riskier releases.

To achieve this we created a simple shell script that could be run by Jenkins at the press of a button. It would go through all the documented steps to release, except automatically!

Excerpt from the release script

Note that all this time we would often look back at the data and see how we were doing in term of number of releases.

Releases per day in 2014 using git logs and Excel

Feature Flippers & Soft Releases

Releasing quickly proved to be steadily improving the way we worked, and reduced the risk of bugs. However there were cases where we could not deploy to all users right away. To deal with this issue, we added a feature flipper feature using the appropriately named flipper gem.

UI to decide how to make a feature available to our users

This allowed us to release code that we didn’t intend to use right away, or that we wanted to offer only to a subset of users. It was a great way to decorrelate the “technical” release from the “product” release.

CircleCI & the Drivy CLI

The Limits of Jenkins

After a while with this setup, we started to see some limitations. We didn’t want to invest a lot of time managing Jenkins, but the machine would sometime go down, there would be random hardware issues and making sure to apply software updates were a pain.

We decided not to invest more energy into Jenkins and instead move to a cloud solution. At this time CircleCI looked like a great option. So after properly benchmarking all other competitors, we dropped Jenkins and started using CircleCI.

CircleCI also provides additional functionalities out of the box. One particularly interesting feature is to add multiple containers in order to increase parallelization when running specs, thereby speeding up the test suite.

Adding a Command Line Tool

Since CircleCI didn’t provide a way to manually trigger a release, we had to build a bit of instrumentation and it took the form of a command line interface tool.

This was fairly easy to develop using clamp and we integrated it as a gem in our project so that every developer could release easily.

Adding More Developer Tools

When needed, we would build small tools like a chrome extension to be able to better visualize in github was was about to get released using a URL looking like this:

https://github.com/ORG/REPO/compare/last_release...master

Improving Bug Management

As we grew and shipped faster, we needed to make sure we weren’t introducing regressions. We were adding automated tests of course, but this doesn’t prevent every possible issue, so we improved the way we were monitoring and fixing bugs.

If you’d like to know more about this, we actually have an entire article dedicated to it: Managing Bugs at Drivy.

Improving the CLI

As time passed we improved with a Slack integration for notifications, a GitHub integration for automatic updates on issues and much more.

This proved to be very useful and made the act of releasing more or less painless.

Simplifying The Release Process

We were using git-flow, but it felt way too complicated compared to what we actually needed. We decided to streamline the process, making releases even simpler to understand. This was detailed in this article: “Simple Git Flow With Heroku Pipelines”.

We also worked on making even smaller incremental releases than before, splitting work into individual and releasable commits. This made releases easier, and if you’d like to know more you can check out this article: “Best Practices for Large Features”

Better Data

Since we prefer looking at data rather than staying in the dark, we added more information about the different steps of our release process. This way we now have a nice graph in Grafana with the number of releases we are making per week:

Improving Organization

As we grew, the engineering team got larger and it lead to reduced agility. We worked on a new organisation based on Spotify’s squads and it helped us move even faster than before, which was visible in the number of releases.

Excerpt from Spotify's presentation video about Squads

CircleCI 2 & Docker

After years spent adding on to the test suite, we started to feel that the build time was slowing us down, clocking in at approximatively 15 minutes. At this point CircleCI released a new version that allowed us to tweak our build better thanks to Docker, so we integrated it and saw great improvements on build speed.

Release Tool

Once again growth caused us to change our way of working. Now with more developers than ever in the team, there are more questions to be answered: about access rights, enforcing certain constraints, making internal contributions to the tooling easier…

This is why we introduced our new Drivy Tool app, strongly inspired by Shopify’s shipit. This allowed us to better control credentials and improve onboarding as there is nothing to install: just login using Github and use the app!

What About Micro Services?

Splitting the app into a lot of micro services could improve release rate as well. However this is quite costly to do, so we plan on extracting services on an as-we-go and if-appropriate basis. We don’t feel any pain (yet), so there is no need to make a big technical move.

Current State Of Affairs

As you can see we went through a lot of iterations, involving new tools, changes in processes and more. Nowadays we release to production close to 10 times a day, with very few regressions.

I can’t say enough how central a good test suite is to any continuous integration process. If you can’t catch regressions or new bugs quickly, there is no chance you can implement such a process.

Same goes for good and simple processes. There’s no need to go overboard with red tape… but effective, structured and documented ways of doing things will help with productivity. It’s also great for onboarding new and more junior developers.

Setting up Vim for React development

Victor Mours — Tue, 12 Sep 2017 00:00:00 +0000

We’ve recently introduced Preact to our Rails stack at Drivy, and the results have been rather satisfying so far. As a Vim lover, I was curious to see how to go about setting up Vim React or React-like project. As usual, the wealth of plugins out there didn’t disappoint.

Here’s the setup that I came up with:

Syntax Highlighting

Let’s start with the basics, and get some syntax highlighting for JavaScript and JSX by adding these to the plugins section of your .vimrc:

Plug 'pangloss/vim-javascript'
Plug 'mxw/vim-jsx'

I’m assuming here that you’re using vim-plug as your plugin manager. If you’re unsure about how to install these with another plugin manager, check out the README from the respective repos.

Emmet for easier JSX

In a Rails environment, I’m mostly used to languages such as Slim or Haml which simplify writing HTML, so going back to writing closing tags in JSX felt a little tedious. Fortunately, you can get rid of some of the grunt work with Emmet-vim, which enables you to expand your CSS selectors into HTML (or JSX) on the fly.

For example, you could type

%h2#tagline.hero-text

and then expand it to

<h2 id="tagline" className="hero-text"></h2>

in just two keystrokes.

Let’s install it:

Plug 'mattn/emmet-vim'

and then add this to your .vimrc:

let g:user_emmet_leader_key='<Tab>'
let g:user_emmet_settings = {
  \  'javascript.jsx' : {
    \      'extends' : 'jsx',
    \  },
  \}

Give it a try: in insert mode, type p.description, and then hit Tab-, (without leaving insert mode). It will expand as <p className="description"></p>. Note that this is using the JSX className syntax, thanks to the tweak on user_emmet_settings.

Syntax checking

Syntastic has been the go-to solution for syntax checking in Vim for a while, but it has the major flaw of being synchronous. That means that you can’t do anything - not even move your cursor - while it is running. For large files, this gets annoying rather quickly. The good news is that Vim now has support for async tasks, and you can switch to Ale, which is short for Asynchronous Lint Engine. You will never be interrupted by your linter again, hurray!

Arguably, this isn’t specific to React, but since you’ll need syntax checking for JSX, it’s a good opportunity to improve your overall setup.

Installing Ale is nothing unexpected:

Plug 'w0rp/ale'

Of course, Ale is only the glue between Vim and the actual syntax checker that runs under the hood, which in this case would be ESLint.

Here’s how to install ESLint:

$ yarn add --dev eslint babel-eslint eslint-plugin-react

and then configure it by runnning:

$ eslint --init

This will create an .eslintrc file, which you should check in to version control so that everybody is using the same style guide. You may want to have a chat with the other people working on your project, to make sure everybody agrees on which rules you’ll enforce.

Ale works out of the box with ESLint, so there’s no further setup needed. However, I found Ale more pleasant to use with a couple tweaks in my vimrc:

let g:ale_sign_error = '●' " Less aggressive than the default '>>'
let g:ale_sign_warning = '.'
let g:ale_lint_on_enter = 0 " Less distracting when opening a new file

Autoformatting

Ok, this is the best part. You may know about Prettier, an “opinionated code formatter”, which will reformat your Javascript code from scratch, much like gofmt does for Go.

Having Prettier run each time that you save a file is surprisingly satisfying: you’ll basically never have to think about formatting again. After using it for a couple hours, I even realized my way of writing was a bit different: I was just typing unformatted code, and trusting Prettier to make it look good. That’s its killer feature: you get to focus on what your code does, not how it’s written.

Once again, this will be useful for all your JS projects, not just React ones, so let’s get the setup going:

First, let’s install prettier:

$ yarn add --dev prettier eslint-config-prettier eslint-plugin-prettier

Now, you should be able to run eslint --fix src/App.js, and src/App.js will be reformatted automatically.

Good, now let’s make that happen in vim each time you save a file. A naive way of doing this would just be to set an autocommand to run ESLint, but that would have the downside of being synchronous. Rather than digging into Vim’s async job api, the easiest way of doing this is to use asyncrun, a plugin to easily run shell commands in the background.

Plug 'skywind3000/asyncrun.vim'

And then you can add that sweet sweet autocommand.

autocmd BufWritePost *.js AsyncRun -post=checktime ./node_modules/.bin/eslint --fix %

The -post=checktime option reloads the buffer from the file after the command is done running.

However, this does bring an issue: each time you tweak a file, the whole thing will be reformatted, which might make your git diff a bit unreadable. Here at Drivy, we’ve decided to bite the bullet and run prettier on our whole JS codebase, so that the styling would be up to date on the whole app. It was a big commit, but everything went smoothly, and we now have a consistent and pleasant style across the codebase.

Closing thoughts

This config is working pretty well for me, but as ever with Vim, it’s always possible to go deeper and find other improvements. If you do or if you have any questions, feel free to ping me on twitter.

Happy coding with Vim and (p)React!

Code simplicity - Reading levels

Nicolas Zermati — Fri, 01 Sep 2017 00:00:00 +0000

When we write code, who is it for? Is it for the machine? It is the machine that will parse and run your code. Is it for the next developer? It is that person that will spend time reading and updating the code. Is it for the business? After all, the code wouldn’t exist if it had no purpose. Obviously I think code targets all of them.

The goal is to not only make your code understandable by the machine, but also to your future self and to the business itself. This isn’t an easy thing. In this article I only aim to go past the machine-readable to reach the next developer level. To do that, let’s refactor a small fictive class…

Initial code

In the previous article of the serie, I started extracting a chunk of code from a Rails’ controller to an external object. I did it in a very simple way and here is the result:

class ConfirmOrder < Command
  def initialize(order, payment_token, notify: true)
    @order = order
    @payment_token = payment_token
    @notify = notify
  end

  validate do
    payment = @order.payments.new(token: @payment_token)

    if !payment.valid?
      add_error(:payment_token_invalid)
    end

    if payment.amount != @order.sales_quote.amount || payment.currency != @order.sales_quote.currency
      add_error(:payment_amount_mismatch)
    end
  end

  perform do
    @order.payments.create!(token: @payment_token).capture!

    Order.transaction do
      invoice = @order.invoices.create!({
        amount: @order.sales_quote.amount,
        currency: @order.sales_quote.currency,
      })

      @order.sales_quote.items.find_each do |item|
        invoice.items.create!({
          product_id: item.product_id,
          quantity: item.quantity,
          unit_price: item.unit_price,
        })
      end

      @order.update(status: :confirmed)
    end

    if @notify
      OrderMailer.preparation_details(@order).deliver_async
      OrderMailer.confirmation(@order).deliver_async
      OrderMailer.available_invoice(@order).deliver_async
    end

  rescue Payment::CaptureError => error
    add_error(:payment_capture_error)
  end
end

It is machine-readable. Does that mean that you can read that easily? Nope… I explain a bit what the code is doing in the other article but it isn’t enough and more importantly, do we want to maintain an up to date documentation for everything?

Use the method Luke

One of the technique I use is to abstract things in methods. As we do with variable, picking relevant name could create a narrative that’s much easier to follow.

For instance, let’s look at that validate block, starting with this:

if payment.amount != @order.sales_quote.amount || payment.currency != @order.sales_quote.currency
  add_error(:payment_amount_mismatch)
end

We don’t want to clutter the reader’s mind with the detail of the conditional. It doesn’t even fit on my screen! We could instead wrap this in a method:

add_error(:payment_amount_mismatch) unless payment_matches_sales_quote?

This doesn’t put all the details in the validate block. And I think it is for the best because when I read that block, I would like to have an overview of what’s validated, not every single implementation details.

We could apply the same for the other conditional and get this:

validate do
  add_error(:payment_token_invalid) unless valid_payment_token?
  add_error(:payment_amount_mismatch) unless payment_amount_matches_sales_quote?
end

To make this happen, we need to define 3 private methods:

private

def payment
  @payment ||= @order.payments.new(token: @payment_token)
end

def valid_payment_token?
  payment.valid?
end

def payment_matches_sales_quote?
  payment.amount == @order.sales_quote.amount &&
  payment.currency == @order.sales_quote.currency
end

After that we end up with more lines in the file but the validate block can now provide a faster understanding to anyone reading the class.

It is possible to go to an higher level of narrative with something like:

validate do
  payment_token_must_be_valid
  payment_amount_must_match_sales_quote
end

You’re the judge of the abstraction level you want to give. The first refactoring uses unless. It is understandable by any Ruby developer. The second version could be understood even by the business person asking for that feature.

Taking a few steps further

If we continue to apply this to the perform block, we could end up with something like:

perform do
  capture_payment!
  create_invoice_and_update_status
  send_notifications

rescue Payment::CaptureError
  add_error(:payment_capture_error)
end

If you’re interested in the create_invoice_and_update_status, you’re free to dig deeper, but if you don’t you have the choice not to bother with the details.

Conclusion

By creating different narratives you can optimize for different targets. Targeting the business forces you to think in the same mindset which has great communication benefits.

You can find the gist of the code here

Code simplicity - Command pattern

Nicolas Zermati — Fri, 01 Sep 2017 00:00:00 +0000

The command pattern is sometimes called a service object, an operation, an action, and probably more names that I’m not aware of. Whatever the name we gave it, the purpose of such a pattern is rather simple: take a business action and put it behind an object with a simple interface.

A controller’s action doing it all

One of the most common use case I encounter for this pattern is to get business logic out of MVC’s controllers. For instance, in a Rails application, an action responds to a single HTTP call using a POST, a PATCH, or a PUT verb and semantic. It means that those actions are intended to update the application’s state.

The following example takes an action to illustrate the situation. The goal of the confirm action is to complete an order. Completing an order follow those steps:

validate that the payment amount is correct,
pay the order,
create an invoice using the existing sales quote,
update the state of the order, and
send notifications.

class OrdersController < ApplicationController
  def confirm
    order = current_user.orders.find(params.fetch(:id))
    payment = order.payments.create!(token: params.fetch(:payment_token))

    if payment.amount != order.sales_quote.amount || payment.currency != order.sales_quote.currency
      raise Payment::MismatchError.new(payment, order)
    end

    payment.capture!

    Order.transaction do
      invoice = order.invoices.create!({
        amount: order.sales_quote.amount,
        currency: order.sales_quote.currency,
      })

      @order.sales_quote.items.find_each do |item|
        invoice.items.create!({
          product_id: item.product_id,
          quantity: item.quantity,
          unit_price: item.unit_price,
        })
      end

      order.update(status: :confirmed)
    end

    OrderMailer.preparation_details(order).deliver_async
    OrderMailer.confirmation(order).deliver_async
    OrderMailer.available_invoice(order).deliver_async

    flash[:success] = t("orders.create.success")
    redirect_to invoice_path(invoice)

  rescue Payment::MismatchError
    flash[:error] = t("orders.create.payment_amount_mismatch")
    redirect_to :back

  rescue Payment::CaptureError
    flash[:error] = t("orders.create.payment_capture_error")
    redirect_to :back

  rescue ActiveRecord::RecordInvalid
    flash[:error] = t("orders.create.payment_token_invalid")
    redirect_to :back
  end
end

There are more or less obvious issues in that implementation. Let’s see how much extracting that logic could help.

Moving out

The first step of the extracting process is simple: take the content of the action, put it in an object and call this object from the controller.

class OrdersController < ApplicationController
  def confirm
    ConfirmOrder.new(
      Order.find params.fetch(:id),
      params.fetch(:payment_token)
    ).perform

    flash[:success] = t("orders.create.success")
    redirect_to invoice_path(invoice)

  rescue Payment::MismatchError
    flash[:error] = t("orders.create.payment_amount_mismatch")
    redirect_to :back

  rescue Payment::CaptureError
    flash[:error] = t("orders.create.payment_capture_error")
    redirect_to :back

  rescue ActiveRecord::RecordInvalid
    flash[:error] = t("orders.create.payment_token_invalid")
    redirect_to :back
  end
end

class ConfirmOrder
  def initialize(order, payment_token)
    @order = order
    @payment_token = payment_token
  end

  def perform
    payment = @order.payments.create!(token: @payment_token)

    if payment.amount != @order.sales_quote.amount || payment.currency != @order.sales_quote.currency
      raise Payment::MismatchError.new(payment, @order)
    end

    payment.capture!

    Order.transaction do
      invoice = @order.invoices.create!({
        amount: @order.sales_quote.amount,
        currency: @order.sales_quote.currency,
      })

      @order.sales_quote.items.find_each do |item|
        invoice.items.create!({
          product_id: item.product_id,
          quantity: item.quantity,
          unit_price: item.unit_price,
        })
      end

      @order.update(status: :confirmed)
    end

    OrderMailer.preparation_details(@order).deliver_async
    OrderMailer.confirmation(@order).deliver_async
    OrderMailer.available_invoice(@order).deliver_async
  end
end

It seems to be more complicated than before. In some way it is since there is one extra level of indirection to the ConfirmOrder object now. Despite that, this basic extraction provides interesting benefits such as:

focusing the controller on fetching the parameters and handling the response,
reusing the ConfirmOrder in another context,
testing ConfirmOrder itself, this is an important-enough context to mention,
sharing behavior between commands, and
getting some privacy.

Supporting multiple contexts

Reusing this ConfirmOrder in a different context is easy. It is a small amount of work to get variations. Imagine that, because of the context, you want to confirm an order without sending the notifications…

class ConfirmOrder
  def initialize(order, payment_token, notify: true)
    @order = order
    @payment_token = payment_token
    @notify = notify
  end

  def perform
    # Same code as before...

    if @notify
      OrderMailer.preparation_details(@order).deliver_async
      OrderMailer.confirmation(@order).deliver_async
      OrderMailer.available_invoice(@order).deliver_async
    end
  end
end

We could use some state machine’s hook in order to deliver notifications and even to create the invoice. I tend to avoid callback as much as possible. Encapsulating the behavior in an object allows us to see what’s going on during that action in the same file. Also, it is easy to tweak the behavior if needed without impacting the rest of the system, as we just did.

Many business actions, such as completing an order, can be extracted using this pattern. Giving a clean API to all those commands gives some structure and consistency to the codebase.

In the example, the errors mechanism is using exceptions, such as Payment::CaptureError, forcing the controller to know about each one of them. At Drivy we’ve built a validation layer allowing us to write:

class OrdersController < ApplicationController

  def confirm
    ConfirmOrder.new(
      Order.find params.fetch(:id),
      params.fetch(:payment_token)
    ).perform!

    flash[:success] = t("orders.create.success")
    redirect_to invoice_path(invoice)

  rescue Command::Error => error
    flash[:error] = t("orders.create.#{error.code}")
    redirect_to :back
  end
end


class ConfirmOrder < Command
  def initialize(order, payment_token, notify: true)
    @order = order
    @payment_token = payment_token
    @notify = notify
  end

  validate do
    payment = @order.payments.new(token: @payment_token)

    if !payment.valid?
      add_error(:payment_token_invalid)
    end

    if payment.amount != @order.sales_quote.amount || payment.currency != @order.sales_quote.currency
      add_error(:payment_amount_mismatch)
    end
  end

  perform do
    @order.payments.create!(token: @payment_token).capture!

    Order.transaction do
      invoice = @order.invoices.create!({
        amount: @order.sales_quote.amount,
        currency: @order.sales_quote.currency,
      })

      @order.sales_quote.items.find_each do |item|
        invoice.items.create!({
          product_id: item.product_id,
          quantity: item.quantity,
          unit_price: item.unit_price,
        })
      end

      @order.update(status: :confirmed)
    end

    if @notify
      OrderMailer.preparation_details(@order).deliver_async
      OrderMailer.confirmation(@order).deliver_async
      OrderMailer.available_invoice(@order).deliver_async
    end
  rescue Payment::CaptureError
    add_error(:payment_capture_error)
  end
end

You may be able to guess what’s in the Command class but there isn’t much. Also here I’m using the Ruby 2.5 rescue which will work inside blocks!

Wait, what did you meant by privacy?

When all that code was in the controller, it was surrounded by other actions. It means that each private method that you would like to define would also be visible from within those other actions. Most of the time it doesn’t make sense. I’ve seen, and unfortunately wrote myself, controllers with too many private methods. I dodged the name clashes with prefixes, I grouped methods by the action they referred to, I added comments, and I even tried concerns. Nothing really was really satisfying.

In that sense, a dedicated object make things a lot simpler to organize. In the next article of the serie, I’ll go deeper on how to use and abuse methods in order to offer the best documentation to the next developer. It’ll continue this example so be sure to check it out.

Conclusion

In this article nothing is especially new but this way of bundling business actions is getting more and more common. Hanami has Action and Trailblazer has Operation for instance. If you never thought of it, it is time to practice!

Best Practices for Large Features

Howard Wilson — Mon, 31 Jul 2017 00:00:00 +0000

As developers, we sometimes find ourselves faced with feature requests that will take weeks of work and touch many areas of the codebase. This comes with increased risk in terms of how we spend our time and whether things break when we come to release.

Examples might be moving from a single to multi-tennant application (scoping everything by accounts), or supporting multiple currencies or time zones. This post brings together some tips that we find useful at Drivy for approaching these types of problems.

In general, our goals are to build the right thing, take the right implementation approach, and not to break anything. We’d like to try to do those things pretty quickly, too!

Building the Right Thing

When working on large features, the cost of building the wrong thing is higher than usual, since there is more time between receiving a feature request and presenting the completed feature back to the product team for validation. This makes up-front communication a very important part of the process, especially since more agile startups often don’t use formal specification documents.

Assumptions

One way to avoid misunderstandings is to make a list of assumptions. Check these with the product team and include them in pull requests so any reviewers are aware of them (and can challenge them). Assumptions might take the following form:

It’s not possible for a user to be in state X and Y at the same time
This feature will only be implemented for countries X and Y
There’s no need to update old data to be consistent with the new rules

Smaller Features

It’s always worth questioning whether a large feature really does need to be released all in one go. Are there smaller pieces which all add value incrementally? For example, Drivy now supports precise times for rental start and end, instead of just AM/PM time slots. But we didn’t need to make this change all in one go. We started with bookings, then moved on to booking changes, and eventually the state changes of the rentals themselves.

Taking the Right Implementation Approach

There are normally several ways to solve a problem. Taking the right implementation approach is the “tech side” of building the right thing. In other words, “will we solve this problem in a way that the tech team generally agrees is appropriate?”

Naming

Often, naming is a useful place to start. Getting a few developers together to talk about what any new entities or concepts will be called can help to identify the right abstractions in our code. Even if sometimes they feel like isolated implementation details, the abstractions developers select can strongly influence terminology and understanding across other teams in the company. For example, modeling an Order rather than a Request can have a profound impact on the perceived urgency of that user action.

Are data flows or processes changing? Even if not explicitly designing a state machine, it’s exactly these kinds of problems that typically take a lot longer to discuss and get right, than to implement. There’s nothing wrong with taking time up front to properly explore the different possibilities. Draw on the whiteboard and get the opinions of the rest of your team!

Spike / Prototype

The purpose of a spike is to gain the knowledge necessary to reduce the risk of a technical approach. It also often gives developers enough confidence to more accurately estimate how long the feature will take to develop.

Some general guidelines:

Aim to cover enough of the feature to model the majority use case, but also explore any worrying edge cases
Take plenty of notes when questions/concerns arise, possibly split into product/tech
Tests are not mandatory, but simple acceptance tests may be useful
Keep a release plan in mind and, if possible, split the spike into deployable commits (more on this later)

First Code Review

Reviewing code is hard. A reviewer is expected to make sure the code is correct and of a high quality before it gets merged into the release branch. In order to do this effectively, it’s usually best to keep the size of the pull request to a minimum. But how then will a reviewer be able to get an end-to-end perspective of your implementation?

One answer is to split code reviews which validate a general approach from code reviews which accept code into production. Here, we’re doing the first type of review. It’s a review best suited to a senior member of the team; ideally someone who has a broad knowledge of the codebase.

Here’s what we like to do:

Split the spike into commits which roughly represent separately releasable work areas. This helps to keep focused on the goal of minimizing risk when releasing.
Describe each one of these commits separately. A reviewer is then able to focus on smaller chunks of code, even if the overall pull request is quite large.
Create a simple release plan, paying special attention to any data migrations - these can be more time consuming and error prone than anticipated.
Include all the technical questions which arose while building the spike
Include screenshots

Again, the goal at this point is to validate the approach, but without losing sight of how we’ll structure the feature for release. This code isn’t going to be merged into the release branch in it’s current form.

Iterate on Feedback

This takes a little more time if the pull request has already been split into separate commits, but git helps us re-write our branch. There’s plenty of information on how to go about this in the related post: “Editing your git history with rebase for cleaner pull requests”.

Of course, there are lots of visual tools too (such as Gitup), which get the same job done without using the command line.

Once the branch is updated, we force push back to the same remote branch to preserve a clean commit history.

Minimizing Risk When Releasing

Test Preconditions

Before starting to release any code, it can be worth verifying that things expected to be true in production are actually true. Let’s say our new feature is going to introduce behavior which depends on the country of active cars. We can check in the database to ensure that the expectation “an active car always has a country” is true, but that doesn’t give 100% confidence. It may be true a few seconds after activation, but not immediately.

What we can do in cases like this is introduce some logging where our feature will go:

if car.active? && car.country.blank?
  Rails.logger.warn "Active car #{car.id} doesn't have a country, which is required by feature #123"
end

Once we have more confidence in this precondition, the logging can be replaced with a guard clause which raises an exception. Not only does this mean that we can be confident in our assumptions in production, but other developers will also understand immediately which preconditions are satisfied and benefit from the same confidence when coding.

Final Code Reviews

Now it’s time to get the feature into production. Typically, having split the original pull request, each commit can be git cherry-pick‘d to a new branch in turn and then improved to a production-ready state:

Link to the original PR/issue for context
Add complete tests if they’re not already present
Ask the question: can this be rolled back if something goes wrong?
Split out migrations if they need to be run before the code is deployed
Soft release the feature behind a feature flipper if it shouln’t actually be visible until later

This time, Github’s suggested reviewers facility is a good way to find someone on your team to review the code. The suggestions are based on git blame data, so they’ll be people who are familiar with the code being changed. Their goal is to confirm that the changes are safe to release. This should be a much quicker process, since the quantity of code is smaller, and the overall approach has already been agreed.

Checker Jobs / Monitoring

We often write small jobs to test the long term “postconditions” of a feature. Or in other words, ensuring ongoing data consistency. Usually this concerns aggregate data that is difficult to verify synchronously. For example, checker jobs might verify that there are no overlapping rentals for the same car, or that there are no gaps in the numbering of our tax documents.

These jobs usually send an email to the appropriate team if inconsistencies are detected. They’re cheap to create, and can help to catch unexpected outcomes before they become too problematic.

Conclusion

Remember, this is just a selection of ideas that we find work well for us. Your mileage may vary!

How we are using member voice to improve UX

Julia Maunier & Marion Crosnier — Mon, 17 Jul 2017 00:00:00 +0000

Whether it’s on social media, on the app stores, through emails or phone calls, we receive hundreds of messages from our users every day, and answer each of them.

If answering questions is good, fixing the original problem is even better. We truly believe in this virtuous circle as a customer centric company.

Since customer service data is the first accessible and actionable “member voice” data in every company, we started by focusing on it. In collaboration between product, customer relationship & data, we wanted to turn our volumes of tickets & phone calls into clear contact reasons, to identify pain points on which we should focus to improve user experience.

Qualify customer service data

Here are the steps we used at Drivy to improve the way we deal with member voice. This is not a perfect process and might not work for you as it is, but we’ve been successful with this approach.

Step 1: Define a common tag referential

The very first step is to make a list of all the possible issues you can think of (replicating the customer journey helps), and gather it behind topics. If a topic seems too big, split it into smaller topics. Accept you won’t be able to be exhaustive, but don’t end up in a giant “Other” section. Turn these topics into tags to apply it on tickets or phone calls.

Step 2: Apply it manually

After checking with the Customer Service team that the tags made sense, we applied them on each ticket and phone call manually until we had enough data to run our analysis. Onboarding of front-line teams is key for the quality of the data as their input is the basis of our analysis.

We could then identify the contact reasons in each country ranked by volume, and understand clearly what pain points generated most contacts from our users.

Step 3: Automatize as much as possible the tagging

Manual tagging can’t scale, and doesn’t enable us to know in advance why a user needs our help.

In order to achieve that, we revamped our contact form to reflect the contact reasons we observed from our manual test. This way, according to the contact reason the user selected, we could either push help articles, or let him contact us and prioritize his request.

Step 4: Import data from our tools

In order to perform our analysis, we needed to have access to the necessary data from our ticketing, phone calls and satisfaction tools.

We imported the data in Redshift thanks to an Extract Load and Transform process using Apache Airflow and some Python scripts. The tables are updated on a regular basis.

Step 5: Make it visual

Once the data was available in Redshift, we wanted to know what were the issues faced by our users in each country at a glance, so anyone in the company could know it, without requiring to be a SQL expert.

We created dashboards on Redash displaying the data we needed to understand our users better.

Step 6: Prioritization of pain points

However, it’s not all about volumes. A topic generating lots of contact is not necessarily the top source of insatisfaction. We also implemented satisfaction surveys sent automatically after each ticket resolution. This way, we can identify easily which contact reasons generate insatisfaction, and work on improving our treatment processes and policies.

It’s the mix volume x insatisfaction that truly tells us where our focus should be.

Step 7: Need for emotional data

When focusing on a major pain point, we needed to understand precisely what were the issues faced by our users. We are frequently spending time reading the tickets and listening to phone calls so that we can capture better our users perception and feelings, as well as precise the pain points we need to work on with product evolution.

To infinity and beyond

We started with the data from our customer service, but there are lots of other sources we’d love to include (app reviews, social media posts…).

Including these other sources will help us catch more than only pain points. Feedbacks on the product help us not only on reducing insatisfaction, but increase satisfaction and user delight.

Running feature specs with Capybara and Chrome headless

Tim Petricola — Wed, 05 Jul 2017 00:00:00 +0000

At Drivy, we’ve been using Capybara and PhantomJS to run our feature specs for years. Even with its issues, PhantomJS is a great way to interact with a browser without starting a graphical interface. Recently, Chrome added support for a headless flag so it could be started without any GUI. Following this announcement, the creator of PhantomJS even announced that he would be stepping down as a maintainer.

Setting feature specs to run with a headless version of Chrome means that our features specs can be executed in the same environment most of our users are browsing with. It is also supposed to improve memory usage and stability.

Installing prerequisites dependencies

Assuming you already have Chrome (59 or more recent for macOS/Linux, 60 or more recent for Windows) on your machine, you’ll also need to install ChromeDriver. On macOS, you can install it with homebrew:

brew install chromedriver

If not already present in your application, add selenium-webdriver to your Gemfile:

group :test do
  gem 'selenium-webdriver'
end

Configuring Capybara

Capybara provides a simple API to register a custom driver. You can do so in your test/spec helper file.

Capybara.register_driver(:headless_chrome) do |app|
  capabilities = Selenium::WebDriver::Remote::Capabilities.chrome(
    chromeOptions: { args: %w[headless disable-gpu] }
  )

  Capybara::Selenium::Driver.new(
    app,
    browser: :chrome,
    desired_capabilities: capabilities
  )
end

As stated in the documentation, the disable-gpu is needed to run Chrome as headless.

Using Chrome headless

On an app running on Rails 5.1 with system test cases, use the provided DSL to use the driver:

class ApplicationSystemTestCase < ActionDispatch::SystemTestCase
  driven_by :headless_chrome
end

Otherwise, use the more generic way of setting a javascript driver for Capybara:

Capybara.javascript_driver = :headless_chrome

Troubleshooting

Empty screenshots

With Capybara, there is a possibility to take a screenshot during your tests (or automatically on a failure). This feature results in an empty gray image on headless Chrome 59 but the proper behavior is restored on Chrome 60 (in beta as of today).

`trigger` method

To prevent some issues in PhantomJS when elements would overlap, we had a lot of calls like this:

find('.clickable_element').trigger('click')

In Chrome, it is raising the following error as the trigger method is not supported:

Capybara::NotSupportedByDriverError: Capybara::Driver::Node#trigger

This can now safely be replaced by the straightforward click method:

find('.clickable_element').click

Conclusion

You can see an example app on drivy/rails-headless-capybara.

Even though we introduced Chrome headless very recently, we’re quite optimistic that it will lead to even less bugs in our application.

The Tech Recruitment Process At Drivy

Marc G Gauthier — Mon, 03 Jul 2017 00:00:00 +0000

I recently saw another article highlighting the many ways in which recruitment in software development is broken. Whiteboard coding, random trivia, poorly trained interviewers… it’s all very painful and it seems to be the situation in a lot of places.

However there are companies trying to turn this around. For instance I loved the “Companies that don’t have a broken hiring process” list, and I’m constantly working to make sure Drivy deserves its place in it.

Since this is still a major pain point, I decided to share how we handle recruitment for engineering positions at Drivy. I don’t think that it’s perfect or much out of the ordinary. I’m also convinced that it’s going to evolve as it has done in the past, but it’s been working well for us and we got good feedback so far!

The Interviewing Process

Our Vision

Overall there are two big things we want to check: making sure that the person has the technical capabilities to do the job, and then making sure that they will bring a lot to the company’s mission and culture.

Here is basically how it goes:

Phone screening
Take home assignment
“Resume” interview
Technical interview
Product interview
Interview with another team
Finalizing the hire

This might seem that there are a lot of steps… and maybe it’s true. However we feel that it’s good for both parties if they get a good look at what working together would be like.

In terms of timing, we try to be as fast as possible, so that even if you get to see a lot of people, it can be condensed in a very tight schedule, grouping some interviews together if needed. We all know that processes that last forever are a major pain for applicants. Also most interviews don’t last more than one hour, so overall it still seems reasonable.

The Process In More Details

Let me explain the process for fullstack or backend developers. Note that your mileage might slightly vary, we don’t want to be completely rigid as we’re still young and growing. However we follow this exact process in most cases.

Phone Screening

After applying, the applicant will quickly get a first skype call with someone in charge of recruitment. This first contact is usually a good place for the applicant to ask more information about the position and the company, as well as explaining why they’re interested in Drivy.

Take Home Assignment

If the applicant is still interested, they will be given an assignment to complete at home. We spent a lot of time trying to provide something that can be completed in an acceptable amount of time while still reflecting what the job will be. You can check it on Github, it is based on our internal accounting system which is a massive part of the app since we are a marketplace and we have to deal with a lot of money moving around.

Once the applicant has done the assignment, it is reviewed by people in the engineering team. We mostly check if the applicant can write clear and simple object oriented code and is able to justify main decisions if necessary.

If it’s not considered good enough (the standard varies depending on the position), the process stops here and we try to give some insights on what to improve.

“Resume” Interview

The next step is an interview with one of the senior member of the team - most likely me. There we discuss what they’ve done in the past, the position, motivation to work for Drivy and so on.

If we didn’t talk about salary range during the screening, it will be discussed here.

Technical Interview

If this goes well, they move to an onsite technical interview with a couple of developers from the team. Depending on the position, the exact process and the people involved can vary, but the main objective is to talk about code.

The applicant is asked to bring code written before. It can be open source code, a side project, client work or a small subset of the codebase from a previous position (if this is something the candidate is allowed to do). We’ll sit around the applicant’s computer and challenge the choices made and how it turned out. We believe that it’s a good way to discover what a candidate is capable of. Of course we know that all code in a codebase can’t be perfect, but there’s a lot to be learned in the tradeoffs and teachings of old code.

We’re proud of our own code and often show pieces of it to candidates at this point in the process to see how they react to it and grasp it. This also helps them to get some confidence that they won’t be working on spaghetti code.

Product Interview

After this there is an interview with someone working a lot with product management. This is important because we consider ourselves a product company, so we are looking for people interested in what we are building. It is also a good opportunity for applicants to ask more questions about the project.

Interview With Another Team

Finally the applicant has a small interview with someone from another team - it could be the person responsible for international expansion, the head of communication or someone at customer support. This is a way to make sure every department is aligned on who we hire as well as show the applicant what the rest of their future colleagues look like. It’s also an opportunity to for the applicant to get insights about the company’s culture - not just the engineering team.

Finalizing The Hire

Once everybody agrees that the person would be a good fit, we ask for past references to contact. In every case so far it’s been a formality, but we prefer to be safe on this one.

If this goes well we discuss all the remaining topics, like finalizing the exact offer.

Our Experience With This Process

Personally I think that this process works quite well. We avoid the pitfalls of whiteboard interviews, but still get a great sense of the technical capabilities of applicants.

It is a bit time consuming, but hiring is too important to be cutting corners, and I feel like the amount of time we ask of applicants remains reasonable. We also had a lot of people - that we hired or not - telling us that the process was a good experience because the exercices and conversations were interesting. The current team is also a great example that this process helps us find the right people.

Of course, like everything we do, this process will evolve and this post will probably be outdated soon. However the guiding principles of trying to have an interview process close to the reality of the job will remain the same.

I mostly talked about the process after a candidate applied, so here is a little information about how we hope to get candidate’s attention.

All our job offers try to give a sense of what the position is going to be about. We don’t shy away from sharing info about our stack, our projects or our internal processes. We also want to be inclusive by not requiring a specific degree, opening remote for certain positions and making otherselves visibles to as many communities as we can.

Michael presenting MySQL 5.7 features during a ParisRB Meetup at Drivy

We are present in meetups so that anyone can meet a Drivy employee and ask questions about our open positions and have an informal chat. For instance lately we hosted ParisRB (large Ruby meetup), Women On Rails and Paris Ruby Workshop. Our developers also tend to go to various conferences and get to chat with a lot of people there as well.

Designing state machines

Adrien Di Pasquale — Mon, 26 Jun 2017 00:00:00 +0000

State machines are a very powerful tool but are often underused in web development. The design process forces you to think hard about how you want to model your data, about the different objects lifecycles, about the way you want to expose your data and communicate with your whole team, and about the upcoming evolutions.

Going through this process takes a lot of efforts but is worthwile, it brings a lot of structure to your code and your team. Also, the actual implementation of a state machine is usually very simple.

Intro : State machines are simple

A simplified state machine for a a Movie object can be represented like this :

And this diagram was generated with the following Ruby code:

# first, gem install 'state_machine'

class Movie
  state_machine :state, :initial => :in_production do
    event :finish_shooting do
      transition :in_production => :in_theaters
    end
  end
end

movie = Movie.new
movie.state # in_production
movie.finish_shooting!  # will raise if something goes wrong
movie.state  # in_theaters

# to generate the diagram : $ rake state_machine:draw CLASS=Movie FORMAT=svg

see the state_machine gem for more information

Design objectives

In the context of a fast evolving product and a growing team, the aimed qualities of a state machine should be:

simple : so it’s easy to understand and feels natural for everyone, not only developers.
useful : it should help developers build and maintain the app, not be an obstacle
adaptable : it should be thought out to be evolutive

There is no one-size-fits-all solution and a lot of questions will have many valid solutions. An infinity of state machines could represent your data, and you could make your app work with them. You need to pick the one that makes the most sense for your needs and your vision.

Here are some tips to help you make these decisions:

Tip 1 : Talk with everyone

Designing a state machine should be a collaborative process. It is important that developers share their opinions and agree on a structure, so they will be willing to use it afterwards.

It is also extremely important to go talk to people with other roles in your team, to understand how they talk about the data and how they interact with it.

Here is a quick example to illustrate the diversity of viewpoints:

a movie seen by the Netflix Supply Team

a movie seen by the Netflix Marketing Team

Tip 2 : Accept that some choices are partial

Unfortunately, when designing state machines it is often hard to reach an unanimous and universal truth. As pointed above, different teams opinions are all valid in their context. Also, as the product evolves, the truth evolves.

It is your responsibility to decide what to preserve from the different opinions and what you will go against. It is important that you have all the elements to decide, and make a conscious and reasoned choice, so you can justify it to other people.

Don’t rush, reaching consensus is a time-consuming process.

Tip 3 : Do not over-anticipate the future

In the context of a startup like Drivy, the product roadmap and the strategic directions are likely to change often. Some decisions will reveal to have been short-sighted, sub-objects may appear, you may have to add extra transitions for edge-cases, etc…

It is useful to think about the degrees of freedom your design leaves open. You can orient and pick these degrees in the directions you think are more likely to happen.

When you do not feel extremely confident about the forecast evolutions, it is a good advice to try and make the least engaging choices. It often boils down to creating the least states possible, because it is easier to split them later than to merge them (from our experience at least, your mileage may vary).

Tip 4 : Store everything

Storing only the current state on objects is dangerous. Investigate objects in corrupt states is complicated, as you cannot understand how they ended up there. Also, when you have to make changes to your state machine, you have much less flexibility on how to migrate objects because you cannot distinguish them.

We strongly recommend you archive all the different states, transitions and events that objects go through. A versioning library like the papertrail gem can help in that matter.

This was initially presented as a talk at Paris.rb on 05/07/2016

Android Makers 2017 Highlights

Romain Guefveneu & Jean-Élie Le Corre — Tue, 23 May 2017 00:00:00 +0000

Android Makers is the biggest Android event in France, it occurred last month in Paris. It’s always a great time to connect and learn from the Android community. For this first edition, a lot of great speakers from all around the world were in Paris. This post is not intended to dive into the details of each conference, but more about an overview, giving you enough insights to chose the right conference to playback.

Modules - Octo

Our friends at Octo talked about how they improved compilation time, test time and how to better split your app into reusable components.

They worked on the Meetic app, if you work on a big app or on multiple apps for the same company, you certainly faced the same kind of issue:

Compilation time is growing up
It becomes slower and slower to launch tests
Multiple apps using the same component and repeating code

They finally split the app into modules:

It dramatically improved the build time
They could share the message component between 2 apps

One year of Clean Architecture: the Good, the Bad and the Bob - Octo

Watch on YouTube

Octo again, on how they applied the principle of Uncle Bob’s clean architecture on Android.

By splitting the responsibility of each layer of your app, you improve the testability and flexibility.

Takeaways:

Easier to onboard new developers in big teams on big projects because of the conventions to follow
Be pragmatic, adapt the architecture to your need and team size, don’t over-engineer if it’s not necessary

Make your app work offline - Virtuo

Watch on YouTube

Virtuo is a new generation car rental agency. Your smartphone replaces the old rental agency.

The main feature of the app is to be the virtual key responsible to open the car. As the cars can be parked in an underground parking, the app must work offline.

Takeaways:

You can use http headers max-age and max-stale to fine tune your client cache
UX matters: instead of downloading the virtual key in the background without telling the user, they enforce the user to click on a big “Download the key” button. That way, you are sure the key is on the phone when it is offline in the parking.

The Fabulous Journey to Material Design Award - Fabulous

Watch on YouTube

While I would prefer more conferences about UX and UI at Android Makers, this one by the co-founder of Fabulous was really great!

The particularity of Fabulous: be Android first. Why? There are a lot more users on Android, and they are willing to pay for great experiences.

To achieve such a great experience means that the whole company must be sensible to design and user experience.

Takeaways:

Design first in the specs flow
Context is super important to re-engage users
Great illustrations give your app personality
You can A/B tests in the app and A/B test Google Play Page to improve conversion
Always take the user feedbacks into account when you iterate on your product

The ART of organizing resources - Philips Hue

Watch on YouTube

All computer scientists knows that naming things is hard. As your app grows, assets multiplies and it can quickly becomes a mess if you don’t follow strict naming rule. Jeroen Mols, from Philips Hue, suggests this simple pattern:

For example:

activity_main for activities layouts
linearlyaout_main_fragmentcontainer for views
all_infoicon_small for drawables

It makes everything clear!

Taking care of UI Test - Novoda

Watch on YouTube

Keep your tests clean! When using Espresso, UI tests look like this:

@Test
public void autoCompleteTextView_clickAndCheck() {
    // Type text into the text view
    onView(withId(R.id.auto_complete_text_view))
        .perform(typeTextIntoFocusedView("South "), closeSoftKeyboard());

    // Tap on a suggestion.
    onView(withText("South China Sea"))
        .inRoot(withDecorView(not(is(mActivity.getWindow().getDecorView()))))
        .perform(click());

    // By clicking on the auto complete term, the text should be filled in.
    onView(withId(R.id.auto_complete_text_view))
        .check(matches(withText("South China Sea")));
}

With PageObject Pattern, it’ll look like this:

@Test
public void autoCompleteTextView_clickAndCheck() {
    SearchScreen searchScreen = new SearchScreen();
    searchScreen.givenKeyword("South ")
                .tapOn("South China Sea")
                .assertTextMatches(""South China Sea");
}

It’s easier to read, reuse and maintain!

Deep Android Integration - Uber

Watch on YouTube

Ty Smith from Uber reminds us that a good user experience also consist in a deep system integration. It’s not only about having a UI, but also about using right system APIs such as:

Deep linking: redirect URLs to your app
SyncAdapter & AcountManager: save users settings and information in the cloud
ContentProvider: if you need to share data between apps

Pro Tip: You can listen for INSTALL_REFERRER broadcast to show the relevant screen after an install!

Conclusion

A lot of great talks for this first Android Makers conference in Paris. It’s always a pleasure to learn from other developers and other people in the industry. It’s also a great moment to meet passionate people and to connect with others at the after-hours events. See you next year!

Story of a junior developer at Drivy

Jean Anquetil — Thu, 18 May 2017 00:00:00 +0000

Hey, I’m Jean, a Junior Full Stack Developer at Drivy. I joined the company after graduating from a two-month full stack program at Le Wagon. Except from being passionate about tech at large, I didn’t know anything about web development last year but from now on I am coding full time and I love it! Here is my feedback after 6 months at Drivy.

Don’t panic if you don’t understand anything

The stack was definitely new to me, the only web dev experience I had was the Bootcamp. I had almost never heard about these things such as Sidekiq, Rspec, FactoryGirl, Webpack, Haml or the basic design patterns… But it doesn’t matter, I was here to get experience and to give in to panic wouldn’t have helped me into learning step-by-step.

At first I was asked to work on basic static pages, which seems like nothing but actually it made me feel quickly confident in my ability to contribute to the product. Besides that, I worked on asynchronous emails and then I completed my first product feature. Slowly but surely I’m discovering the codebase, the different services plugged to it, the good practices, and I’m finding my niche in the team!

After all, when you start working on a five-year-old project, anyone has to get one’s bearings, right?

Don’t be afraid to ask

I work with 13 brilliant developers and even if I have to keep in mind that I could make them loose time, I can’t expect them to keep an eye on me all the time. In other words, I guess they expect me to ask for help if needed. It seemed pretty tricky to me at first, I was often wondering if my questions were relevant.

I think the most important is to be honest with you and your colleagues. If you feel stuck then take a look at Stack Overflow, look for something similar in the codebase, read the documentation and if you still have no answers: that’s not a big deal, just ask your colleagues but the most important thing is to formulate well your issue.

Formulating my issue gives me a global view on it and it often highlights new tracks to look at.

Benefits of working on a high traffic app

Testing

Working on an high traffic app such as Drivy also gives me the opportunity to face scalability matters. As we have millions+ users the smallest code update can bring bugs. So one of the first skills I had to improve when arriving in the company was testing.

I don’t work alone on the codebase so beyond the fact that my code has to be easily maintainable by the others, adding tests is also a way to prevent another developer from breaking what I just did. And honestly, writing tests is - according to me - one of the best answers to fight stress generated by deploying new features.

I also learnt to do benchmarks: what happens if I run this query on millions+ of records? Should I consider a denormalization for this data? So using benchmarks let me justify using this or that approach.

n = 1000
b = Benchmark.bm do |x|
  x.report("Arel scope:") do
    n.times { Rental.find(3577388).reviews.readable.to_a }
  end
  x.report("Denormalized scope:") do
    n.times { Rental.find(3577388).reviews.is_readable.to_a }
  end
end;

Quickly comparing an Arel scope with a denormalized one.

Release Flow

Another concept I quickly learnt not to ignore in a feature development: what is my release flow?

This is super important to consider when you start working on a new feature. Should I ship my migration first, then my code? Am I doing a rollbackable migration or not? Could it lock the database? What if users are browsing the page I am updating?

Thus, I’m always wondering if the feature I’m working on is splittable into smaller ones: that will be easier to review, easier to test and less painful to release.

Releasing first the migration in a separate commit then the code makes it easier to do a zero-downtime deployment.

Communication

I realized that working as developer doesn’t mean keeping to yourself. For instance, I like to make an effort to prepare my pull requests: well explain the context, which design patterns I used, what my release flow is, and maybe give some screenshots.

That sounds naive but this is to make the reviewer comfortable and let them focus on what I did and why I did it this way. That also helps me to see my work’s big picture and feel justified in asking for advice.

Pull Request with explanations.

Another important thing about communication is that you also work with non-developer.

As a Junior dev I can’t afford to loose time working on misunderstood requirements. So there is a challenge in converting the requirements into well defined specs by doing short, prepared and focused meeting especially with the help of our product managers.

Conclusion

Finally, I strongly believe that working on a real project with a real team is the best way to keep learning and improving your skills: you are surrounded by smart people who let you focus on things that matter.

Day after day I feel more confident in bringing my contributions to the project I’m working on without neglecting the fact that I can have a serious impact on it.

Code Simplicity - Value Objects

Nicolas Zermati — Tue, 09 May 2017 00:00:00 +0000

Understanding the application’s state at a given point in time is valuable. You and your team must make efforts to keep the cognitive load required to reason about its state as low as possible.

Application’s state is often based on classes such as Numeric, String, Array, etc. In this article we’ll see how to abstract business-specific objects on top of those primitive types.

A simple specification

I need to model a car. A car is simply defined by its serial number and its mileage. In addition to this, a car will have an interface to update distance that have been driven. When a car is created the serial number is generated and the mileage is set to zero.

class Car
  def initialize
    @serial_number = SerialNumber.generate(self)
    @mileage = 0
  end

  def drive(distance)
    @mileage += distance
  end
end

Here I used another module to generate the serial number. It isn’t the purpose of the article so let’s ignore it for this time. To express the distance, I used a Numeric instance. Indeed, nothing had been said explicitly in the specification.

Fighting implicitness

Here I implicitly expect the distance passed to the drive method to be positive. It obiously is because a negative distance make no sense!

However, something looking obvious now to you might not look the same to someone else or in the future. A code with a lot of implicit constraints is hard to trust because for each change you’ll have to carry in your head all those implicit constraints and make sure they are still enforced. I don’t know about you but this looks scary as hell to me.

There is different way of fighting this implicitness. We could try to add safeties to our code to mitigate the unexpected inputs. It would look like this:

def drive(distance)
  @mileage += distance.abs
end

Tada! No more problem having a negative number as argument!

This is, in my opinion, worse than the first version. Now there is some misplaced code in the Car class. It raises questions that makes no sense.

Why a distance would be negative?
Is that #abs call really needed?

Those are hard questions, especially when it isn’t your code, when it is in a critical part of the application, and it has been there forever. Those questions are hard because some would find obvious that a distance must always be positive.

Other programming environments helps you express that kind of constraints using advanced type systems. Ruby, on the other hand, is more permissive and the responsibility of making things explicit, relies on the design you’ll come with.

The right battlefield

The issue is that we have no place to express that implicit constraint about the distance being positive. The car shouldn’t be responsible to manage this. Lets fight the distance battle on a more appropriate battlefield: in a Distance class.

class Distance
  def initialize(value)
    if value < 0
      raise ArgumentError, "A distance must be positive"
    end

    @value = value
  end

  def +(other)
    unless other.kind_of?(Distance)
      raise ArgumentError, "Only another distance can be added"
    end

    Distance.new(value + other.value)
  end

  attr_reader :value
  protected :value
end

class Car
  def initialize
    @serial_number = SerialNumber.generate(self)
    @mileage = Distance.new(0)
  end

  def drive(distance)
    @mileage += distance
  end
end

This Distance class isn’t perfect but the Car class is more robust than it was and even more expressive. Distance is a value object that we created in response to a primitive obsession code smell.

In this example, we made the concept of a distance explicit. It allowed us to express the constraints related to the concept itself.

One could argue that it was shorter with the implicit version. It was shorter to write. Code is read way more often than written. Once the distance class is done, no need to read it each time you use it. And finally, if you only look at the Car class, the last version express more and is safer.

Going further

Value objects are not only good for giving a home to implicit constraints. They are also good to aggregate things that belong together. For instance, an amount of money will need a currency and an amount. A value object can tie them together and prevent operations mixing currencies.

Internet is full of articles about value objects! Read them all as each of them would give you a different perspective on this topic.

Code Simplicity - Introduction

Nicolas Zermati — Mon, 08 May 2017 00:00:00 +0000

This is an introduction to a serie of articles about code simplicity. The concept itself is a bit abstract but don’t worry: I aim to provide some good examples and explainations for you to get something out of it!

In this first article I’ll share some context about that whole code simplicity thing.

Building applications

There are different kinds of applications. Most of the application I wrote myself were web applications. They could be defined as state associated with a list of operations.

The state of the application includes obviously what’s in the database as well as the emails that have been sent, payments that have been made, logs, cached contents, etc.

The operations transform the current state of your application to the new one. For instance, when you Tweet something, you’re updating Twitter’ state. Even when you display your feed, Twitter updates its state regarding the ads it showed, the last tweets you’ve seen, etc.

The portion of code simplicity relating to the previous definitions is that reading simple code helps you better understand the application’s state and how it evolves over time. If you acknowledge the difference between state and operations then you’ll tend to write code in an intention-revealing way that will contribute to the understanding of future readers.

Keep in mind that we spend more time reading than writing code.

Why does it matter?

If state and operations never needed to evolve we wouldn’t care so much about making the code expressing them as simply as possible. But…

application grows: new feature need to be added,
requirements change: old features need to be updated,
code isn’t perfect: bugs need to be fixed,
teams get bigger: new hires need to understand the application,
teams get smaller: remaining members need to be able to take over,
etc.

When either state or operations are poorly organized, it becomes longer and harder to deal with the previous list. Time is lost trying to untangle the past, making those actions more expensive and sometime impossible to achieve.

Code simplicity, finally!

Now that I gave more context, I can afford to give you my definition!

Code simplicity is a way to get changes in your application, through both years and people, at a low and constant cost.

I’m not saying that getting there is achieved only by code simplicity. I’m saying that without simple code, you’ll have a hard time.

What’s next?

As I said this article was an introduction. In the next articles we’re going to explore examples showing situation where code simplicity could be enhanced. Here are the list of the following articles:

Express the application’s state better with value objects
Split state from operations with the the Command pattern
Offer various levels of reading by (ab)using methods
Maintain invariants by raising often
Protect your state with preconditions
More to be announced!

I’ll update the list with links as they are published!

MySQL Evolution - From 5.6 to 8.0

Michael Bensoussan — Tue, 02 May 2017 00:00:00 +0000

Editing your git history with rebase for cleaner pull requests

Adrien Siami — Wed, 26 Apr 2017 00:00:00 +0000

At Drivy, we make extensive use of pull requests to ensure that our code is up to our standards and to catch possible issues.

Reviewing big pull requests can get tedious, that’s why we try to make them as readable as possible. This means splitting them in small commits that all make sense individually, so you can read the pull request commit by commit and understand the general direction of the code change.

It’s also useful if you want to only show a part of your PR to some people. For instance, you might want the front-end developer to only look at the front-related commits.

Split Your PR Commit by Commit

In a perfect world, you’d come up with a plan on how you want your PR to be split into commits, work on each commit sequentially, then submit your PR.

Unfortunately, this is not the world we live in, and more often than not, mistakes happen and your history quickly looks like this:

Thankfully, git is super powerful and allows to rewrite history, thanks to the dreaded git rebase command.

Git rebase

Git rebase has many usages. The main idea is that git rebase is used to apply a bunch of commits on top of a different base.

In this article I’ll focus on one use case we can encounter when trying to submit a readable PR: editing past commits.

Editing commits

Let’s imagine we’re building a car sharing platform ;) We’re working on a big redesign of the “new car” form and our history looks like this:

So far so good! But we just realised that we forgot to update a wording. The commit is already far back in the history so we can’t use git commit --amend.

We could create another commit but wouldn’t it be much better to edit our “update wordings” commit as if we never forgot this wording to begin with? Let’s use git rebase to achieve just that.

Interactive rebase

We’re going to run git rebase -i master, meaning that we want to reapply our commits on top of master, but in interactive mode (-i). This will allow us to play around with each commit :

Here, git opens our favorite text editor and asks us what to do with each commit.

By default, pick will just apply the commit, but we can update each commit line to tell another story. We can also reorder the lines to have the commits applied in a different order.

Choosing a commit to edit

In our current use case, the command that we want is edit, if we replace pick by edit on a commit line, when applying the commit, git will halt and yield control to us so we can do whatever we want.

Let’s do it!

Now let’s just save the file and quit our editor.

We’re now back to when we commited this first commit, the 2 others haven’t been applied yet, and we can now do our changes.

Applying our changes

When we’re happy with our changes, we can add them and run git commit --amend to update the commit.

Afterwards, we have to run git rebase --continue to continue with the rebase and apply the next commits.

In the end, we’ll keep our 3 commits, but the one we edited now contains our latest changes.

Our history is clean, ready for review!

Conclusion

git rebase is super powerful, especially with its interactive mode. You can use it to do many things: reorder commits, merge commits together, edit past commits, split commits in several commits, remove commits completely, etc. If you want to know more about it, have a look at the official documentation.

But as you know, with great power comes great responsibility. Rewriting the history could cause harm if you’re working on a shared branch and other developers are pulling your code, keep that in mind!

Before ending this article, here’s a last piece of advice: if you find yourself lost in a git rebase -i session and just want to return to the state before ever trying to rebase, the command you’re looking for is git rebase --abort.

Happy rebasing!

Instrumenting Sidekiq

Michael Bensoussan — Thu, 20 Apr 2017 00:00:00 +0000

As Drivy continues to grow, we were interested in having more insights on the performance of our jobs, their success rate and the congestion of our queues. This would help us:

Better organize our queues.
Focus our performance work on the slow and high throughtput jobs.
Eventually split some jobs in 2 or more jobs.
Add more background workers or scale our infrastructure so we stay ahead when our application is growing quickly.

It also helps detect high failure rates and look at overall usage trends.

To setup the instrumentation we used a Sidekiq Middleware.
Sidekiq supports middlewares, quite similar to Rack, which lets you hook into the job lifecycle easily. Server-side middlewares runs around job processing and client-side middlewares runs before pushing the job to Redis. In our case, we’ll use a server middleware so we can measure the time of the processing and know whether the job succeeded or failed.

We use InfluxDB to store our metrics, so we’ll use the InfluxDB Ruby client but you could easily adapt this code to use StatsD, Librato, Datadog or any other system.

Setting Up The InfluxDB Client

We instantiate our InfluxDB client with the following initialiser:

if ENV["INFLUX_INSTRUMENTATION"] == "true"
  require 'influxdb'

  INFLUXDB_CLIENT = InfluxDB::Client.new(ENV["INFLUX_DATABASE"], {
    udp: {
      host: ENV["INFLUX_HOST"],
      port: ENV["INFLUX_PORT"]
    },
    username: ENV["INFLUX_USERNAME"],
    password: ENV["INFLUX_PASSWORD"],
    time_precision: 'ns',
    discard_write_errors: true
  })
end

We’re using UDP because we chose performance over reliability. We also added the discard_write_errors flag because UDP can return (asynchronous) errors when a previous packet was received. We don’t want any kind of availability issue in our InfluxDB server to cause our instrumentation to fail. We added this flag to the official client in the following PR.

Note that we use a global InfluxDB connection to simplify our exemple, but if you’re in a threaded environment you might want to use a connection pool.

Once the connection is setup, we can start extracting the following metrics:

The time the job was enqueued for - this will allow to identify queues congestion
The time the job took to perform
The success count
The fail count

InfluxDB also supports the concept of tags. Tags are indexed and are used to add metadata around the metrics. We’ll add the following tags:

worker class name
queue name
Rails environment

Creating The Sidekiq Middleware

The middleware interface is similar to the one used by Rack - you implement a single method named call, which takes three arguments: worker, item, and queue:

worker holds the worker instance that will process the job
item is a hash that holds information about arguments, the job ID, when the job was created and when it was enqueued, …
queue holds the queue name the job is fetched from

Here’s the skeleton of our middleware:

class Drivy::Sidekiq::Middleware::Server::Influxdb
  def initialize(options={})
    @influx = options[:client] || raise("influxdb :client is missing")
  end

  def call(worker, msg, queue)
    # code goes here
  end

  private

  def elapsed(start)
    ((Time.now - start) * 1000.0).to_i
  end
end

Our middleware is instantiated with the InfluxDB client and we added a private helper to measure the time difference in milliseconds.

Now to the call code:

def call(worker, msg, queue)
  worker_name = msg['wrapped'.freeze] || worker.class.to_s
  enqueued_for = elapsed(Time.at(msg['enqueued_at']))
  start = Time.now

  data = []
  tags = { worker: worker_name, queue: queue, env: Rails.env }

  begin
    data << {
      series: "sidekiq",
      tags: tags,
      values: { count: 1, enqueued_for: enqueued_for }
    }

    # Here the job is processing
    yield

    data << {
      series: "sidekiq",
      tags: tags,
      values: { success: 1, perform: elapsed(start) }
    }
    @influx.write_points(data)
  rescue Exception
    data << { series: "sidekiq", tags: tags, values: { failure: 1 } }
    @influx.write_points(data)
    raise
  end
end

If the job succeeds we send a success point with our tags and the time the job ran (perform). If it fails, we only send a failure point with tags.

To load our middleware, we add this code to our Sidekiq initialiser:

Sidekiq.configure_server do |config|
  config.server_middleware do |chain|
    if ENV["INFLUX_INSTRUMENTATION"] == "true"
      chain.add Drivy::Sidekiq::Middleware::Server::Influxdb, client: INFLUXDB_CLIENT
    end
  end
end

Visualising Our Metrics

To visualise our metrics, we use Grafana. It’s a great tool allowing us to graph metrics from different time series database. It can connect to different backends, supports alerting, annotations and dynamic templating.

Templating allows us to add filters to our dashboards and using the tags we defined earlier we’ll be able to filter our metrics by environment, queue or job name.

This article is not about Grafana so we’ll only show you how to build one graph. Let’s say we want to graph the average and 99th percentile time our jobs are taking to process.

We’ll have this InfluxQL query to get the mean time:

SELECT mean("perform") FROM "drivy"."sidekiq"
WHERE "env" =~ /^$environment$/
AND "queue" =~ /^$queue$/
AND "worker" =~ /^$job$/
AND $timeFilter
GROUP BY time($period) fill(0)

Here we select the mean of the perform field that we filter by our tags then group it by time interval. We also use fill(0) to replace all the null points with 0.

To get the 99th percentile we use the same query but replace mean("perform") by percentile("perform", 99).

Here how this graph looks in Grafana:

SQL Editor

And here’s our whole dashboard:

You'll find our Grafana config in the following gist.

What’s next

The next step would probably be to add some alerting. At Drivy, we alert on the rate of failure and on the queue congestion so we can reorganize our queues or scale our workers accordingly. Grafana supports alerting but that’s probably out of the scope of this article.

Instrumenting Sidekiq has been a big win for our team. It helps us better organize our queues, follow the performance of our jobs, detect anomalies or investigate some bugs.

API Driven Apps

Jean-Élie Le Corre — Tue, 18 Apr 2017 00:00:00 +0000

At Drivy, the product is often changing. To be as reactive as possible, we want to be fast and iterate a lot of features. For mobile teams, it’s a challenge to keep up the pace for our iOS and Android apps. You have to deal with the release cycle of the App Store for iOS, and with the users who don’t update their app to the last version on both platforms.

We use a lot of different technical solutions to make our apps flexible, here we will describe one of them: how to make your content dynamic and let your users use your latest features even if it’s not yet implemented in your native app.

Dynamic Content

Recently we had to implement a new view helping our car owners to see all their requests at once. It’s quite a simple view:

A single cell looks like this:

To fill the content, the usual way of doing it would be the API sending an array of Requests with raw data:

{
	"avatar_url": "…assets/avatar.jpg",
	"created_at": "2017-02-11T14:03:23Z",
	"driver_name": "Jean-Élie Locataire",
	"car_title": "Tesla Model S",
	"start_date": "2013-07-01T14:03:23Z",
	"end_date": "2013-07-01T14:03:23Z",
	"mileage": 200,
	"price": 69,
	"state": "auto_cancel"
}

Then, the app would format the dates, the distances, the price and the state to display all of them with style.

Instead, we used a GenericItem model:

{
	"image_url": "…assets/avatar.jpg",
	"title": "Jean-Élie Locataire",
	"subtitle": "Tesla Model S",
	"top_right_detail": "02/11/2017",
	"detail_text_html": "from <strong>Wed, Feb 15, 2017</strong> at 07:00
	\nto <strong>Sat, Feb 18, 2017</strong> at 07:30
	\nto <strong>200 km</strong> included",
	"bottom_left_detail": "€69",
	"bottom_right_detail": "Automatically cancelled",
	"url": "https://www.drivy.com/requests/1"
}

From now on, the app doesn’t even know what it manipulates, it could be a Car, a Driver or a Request.

Here is a schematic representation of the generic cell composed with an UIImageView and UILabels:

The server is responsible for formatting all the content like the dates and price. The apps send a Accept-Language header in every API call with the current locale of the device, that way the back end can localize the content accordingly.

We also use html with simplified tags in the middle details, allowing us to format dynamically the content. On the native side, almost all the fields are optional, the cell is adapting its height to the content.

Once we have this generic mechanism, we can leverage it for other part of the app. Here is our message cell, you can find an image, a title, details and top right details. The layout is different though, we use a type property on the GenericItem to tell the app which layout it must use to display the cell.

Dynamic Features

You can see a url parameter attached to the generic item. Whenever a user tap the cell, the app’s router parse this url, if we have a native view responding to this path, it is pushed on the stack. If the path is not yet handled by the app, it’s opened in an in-app web browser.

That way, the app can handle new features, whether you have not yet implemented them in the app, or if a user didn’t update the app yet.

Conclusion

Making natives apps flexible is useful if you want to be agile and iterate quickly on your app. Generic content and dynamic navigation are only 2 technical solutions to achieve that, so far it helped us a lot!

Send Rails console commands to Slack

Antoine Augusti — Thu, 16 Mar 2017 00:00:00 +0000

At Drivy, our main repository is a Ruby on Rails application that we run on Heroku. Sometimes, things don’t go as planned and we need to run one-off commands to fix a particular piece of data or to investigate a bit further about an issue. To do this, we use the rails console command in the production environment.

Let’s be clear: we reach for the console only if we have no other choice. We always deploy new code to fix bugs or use migrations if we need to update a large number of database rows. But as we still need it sometimes, we need to be sure this is a tool we can trust and control. Our rule is that if a command has been run more than twice, it needs to be automated in our back-office. We also have processes to limit the access to this feature to a group of people and rules in place to comply with our data privacy policy.

Reporting on console commands

We want commands typed by authorised developers or system administrators to be made public and available in real-time. This serves multiple purposes:

be aware when this happen: commands should be executed manually only on special conditions. If we need to run commands to fix events often, we need to build something that can handle automatically this kind of events to avoid at all cost the need to run console commands.
have an history of executed commands: if at some point we encounter an issue we had weeks ago, we can see how we fixed the issue by looking at the commands’ log.
let developers discover commands: because commands are made public, developers often discover interesting ways to fix an issue. This leads to discussions and code improvements later.

Hooking into the Rails console

We use pry locally, but the Rails console uses irb in staging or production. We needed a way to hook into irb and we used the fact that irb interacts with the standard output to override the behaviour of the STDOUT class. To know if we need to change the behaviour of the standard output, we check if we are running in a one-off Heroku dyno thanks to the environment variable DYNO set by Heroku.

We added an initializer which looks like this:

is_staging_or_prod = Rails.env.production? || Rails.env.staging?
dyno_in_run_mode = ENV.fetch('DYNO', 'nope').starts_with?('run')
dev_name = ENV.fetch('DEV_NAME', '')

if is_staging_or_prod && dyno_in_run_mode
  raise Drivy::Errors::DevNameNotSetError if dev_name.blank?

  # Override how printing to sdtout works by sending
  # the output of stdout to a Slack webhook also.
  # When writing commands in irb, irb prints to stdout
  class << STDOUT
    include Drivy::Console::ReportCommand
    alias :usual_write :write

    def write(string)
      usual_write(string)
      send_command_to_slack(dev_name, string)
    end
  end
end

And the ReportCommand class actually does the work of reading from the standard output history using Readline::HISTORY and sending the data to an external service (Slack for us). The code below gives the main logic, the complete code is available in a gist.

module Drivy::Console::ReportCommand

  def send_command_to_slack(developer_name, command_output)
    return unless has_command? && has_output?(command_output)

    # Documentation is at https://api.slack.com/docs/message-attachments
    fields = [
      {
        title: "Command",
        value: wrap_command(read_command),
        short: true,
      },
      {
        title: "Output",
        value: wrap_command(parse_output(command_output)),
        short: false,
      },
      {
        title: "Developer",
        value: developer_name,
        short: true,
      }
    ]

    env_color, env_title = ["#e74c3c", "production"]

    params = {
      attachments: [
        {
          fields: fields,
          color: env_color,
          footer: "Console #{env_title} spy",
          footer_icon: "https://drivy-prod-static.s3.amazonaws.com/slack/spy-small.png",
          ts: Time.zone.now.to_i,
          mrkdwn_in: ["fields"],
        }
      ]
    }

    response = slack_client.post('', params)

    raise "Failed to notify Slack of console command, status: #{response.status}" unless response.success?
  end

  private

  def has_command?
    Readline::HISTORY.length >= 1
  end

  def has_output?(command_output)
    return false unless command_output.instance_of? String
    command_output.strip.start_with? "=>"
  end

  def read_command
    Readline::HISTORY[Readline::HISTORY.length-1]
  end
end

We use the console thanks to our homemade Drivy CLI and not directly through the Heroku CLI. We will likely talk about our CLI in upcoming posts, it is a tool we use to manage our day-to-day operations (running commands, releasing, handling database migrations, managing content…). After configuring the Slack webhook integration, the final result looks like this:

We’re pretty happy about this new tool because we gained a lot in visibility and confidence in our operations. We are always looking forward to improving our developers’ tooling.

Use Android's FileProvider to get rid of the Storage Permission

Romain Guefveneu — Tue, 14 Mar 2017 00:00:00 +0000

When you need to share a file with other apps, the easiest way could be to use the external storage as a temporary place where to save this file. For example, if you need to take a picture with a camera app, you need to specify a file where the camera app will save the picture, and using external storage might be tempting.

However this solution has many drawbacks:

You lose control of your file

Because you put your file in a public directory, you can’t safely delete it nor control which app can read and modify it.

You have to ask WRITE_EXTERNAL_STORAGE permission

This could make many users afraid and may lead to a bad UX with runtime permissions.

External storage quickly becomes a mess

Since you can’t safely delete your shared files, you let them in the external storage forever.

Use a FileProvider

You may already use ContentProvider to share data with other apps, you can do the same with FileProvider to share files!

FileProvider is part of Support Library, available for all Android versions starting 2.3. The main goal of this API is to temporary open a private file to some targeted apps: you keep the file in your private folder, and let some other apps read or even write it via a secured ContentProvider. Permissions are revoked when your activity is destroyed.

Implementation

Add support lib dependency

In your app build.gradle, add this dependency:

compile 'com.android.support:support-v4:<version>'

Specify available folders

Create an xml file (for example file_provider_paths.xml) in xml resources folder:

<paths xmlns:android="http://schemas.android.com/apk/res/android">
    <files-path name="shared" path="shared/"/>
</paths>

Define a Provider

In your ApplicationManifest.xml, add this provider inside application node:

<provider
    android:name="android.support.v4.content.FileProvider"
    android:authorities="<your provider authority>"
    android:exported="false"
    android:grantUriPermissions="true">
  <meta-data
      android:name="android.support.FILE_PROVIDER_PATHS"
      android:resource="@xml/file_provider_paths"/>
</provider>

Just set your android:authorities, like com.drivy.android.myfileprovider, and link the created xml resource file in android:resource

ProTip: Use ${applicationId}in android:authorities to automatically use your package name: ${applicationId}.myfileprovider

First thing to do: get the shared file’s Uri

Uri sharedFileUri = FileProvider.getUriForFile(this, <your provider auhtority>, sharedFile);

Use the same provider authority as in your ApplicationManifest.xml. The Uri will looks like this: content://com.drivy.android.myfileprovider/shared/myfile.jpg

You can now create a chooser intent:

ShareCompat.IntentBuilder intentBuilder = ShareCompat.IntentBuilder.from(this).addStream(sharedFileUri);

And start it:

Intent chooserIntent = intentBuilder.createChooserIntent();
startActivity(chooserIntent);

That’s it!

One last thing: Legacy support

You need to manually grant permission for older Android versions.

Grant permission for intent

Before sharing your file, you’ll have to manually grant the permission (read and/or write), for all applications targeted with your intent. Indeed, you can’t know which one the user will choose to share the file with.

final PackageManager packageManager = context.getPackageManager();
final List<ResolveInfo> activities = packageManager.queryIntentActivities(intent, PackageManager.MATCH_DEFAULT_ONLY);
for (ResolveInfo resolvedIntentInfo : activities) {
  final String packageName = resolvedIntentInfo.activityInfo.packageName;
  context.grantUriPermission(packageName, uri, permissions);
}

Revoke permissions on activity destroy

We can assume that, when returning back to your app and leaving the activity, the shared file has already been copied by the targeted app, and is not required anymore. You can revoke all permissions.

context.revokeUriPermission(fileUri, permissions);

Conclusion

FileProvider is a really convenient and elegant way to get rid of WRITE_EXTERNAL_STORAGE, I encourage you to use it: your app will be better without extra permissions.

Sources

Taskqueues tips

Adrien Di Pasquale — Mon, 13 Mar 2017 00:00:00 +0000

Taskqueues are used to asynchronously run tasks (indistinctly called “jobs”). They are very useful to enqueue actions for later processing in order to preserve short response times. For example, during the signup you may want to send the confirmation mail asynchronously. Or you may have a slow task to generate an export file instead of doing it inline.

As a website grows, the need to use a taskqueue often arises. The general architecture usually looks like this:

The broker is a messaging queue (Redis, RabbitMQ …)
The webapp enqueues tasks upon specific requests
The CRON enqueues tasks at specific times (often called a Clock or a Scheduler)
Workers dequeue and process tasks.

Frameworks

All the main web languages have several: Celery, RQ, MRQ (Python), Resque, Sidekiq (Ruby) …

The main qualities of a framework are:

Efficiency: Fast and cheap
Reliability: All tasks get executed exactly the right number of times
Visibility: Live monitoring, debugging tools, exception tracebacks …

Common problems

Workers generally execute the same codebase as the app. This is the source of many problems, since the codebase is not optimized for this context. Workers aim at high throughput but Ruby and Python default implementations are single-threaded. Also, tasks often interact with unreliable third party services, so they have to be very resilient.

We use a taskqueue extensively at Drivy, so we have some tips to share!

Tip 1: Design tasks well

Most importantly, tasks should aim at being re-entrant. This means that they can stop in the middle and be ran again in another process. This is important because jobs may raise exceptions in the middle and be retried at a later time. Workers may also crash and restart. This often means that your tasks should be stateless, and not expect the DB to be in a certain state at processing time. They should be responsible for checking the state before running their actions.

Tasks should try to be idempotent. This means that running them several times (consecutively or in parallel) should not change the final output. This is very convenient, so that you’re not too scared if a task gets enqueued or processed multiple times.

It is also a good practice that tasks require the least number of arguments possible. For instance, you can only send a model ID instead of sending the whole serialized object. This gives you better predictability and easier visibility.

Additionally, if you want workers to run in multiple threads, you should be careful to design thread-safe tasks. Specifically, you should pay attention to the libraries you use, they are often the culprits of unsafe calls.

-> Avoid class level calls

Example of unsafe Ruby code:

class SomeTask < ResqueTask
  def self.perform(value)
    Util.some_opt = value
    Util.do_something
  end
end

Classes are instanciated process-wise. So when you run multiple tasks in parralel, Util.some_opt will be the same process-wise. This can lead to many problems.

Here is the same code refactored not to use class level calls:

class SomeTask < ResqueTask
  def self.perform(value)
    util = Util.new(value)
    util.do_something
  end
end

-> Avoid Mutable instance variables

Here is another example of unsafe code in Python:

class SomeTask(MRQTask):
  some_list = []
  def run(self, params):
    some_list.push(params["value"])

When ran in parallel, some_list will be the same process-wise. A simple way to fix it is to instanciate missing params inside the task perform method:

class SomeTask(MRQTask):
  def run(self, params):
    some_list = []
    some_list.push(params["value"])

Tip 2: Know your broker

We often misunderstand the exact behaviour of our backend task storage systems. It is often a good idea to take the time to read the specs and the open issues of the system you use.

Each broker library has its own trade-offs in terms of delivery atomicity. Exactly-once is the guarantee that each message will only be delivered once. Few systems can provide this, and some would say that it’s infeasible in a distributed environment. Most systems provide at-least-once or at-most-once delivery guarantees. You therefore often have to handle redundant messages delivery.

Also, a broker should have a good resiliency to crashes. You don’t want to loose tasks on system crashes. If you use in-memory storage systems like Redis, you should backup regularly to the file system if you don’t want this to happen.

Tip 3: Monitor your broker

In some cases, tasks enqueuing calls may get entirely discarded: the webapp (or the CRON) tries to enqueue a task to the broker, but the call fails. This is obviously a very bad situation, as you’re going to have a very hard time trying to re-enqueue the lost tasks afterwards. If the volumes are high, or if the failures are silent, then this quickly turns into a catastrophic situation.

One situation where this can happen is when the broker exceeds its storage capacity. You should foresee this happening. A common issue causing this is to pollute your broker with metadata: very long arguments, results, logs, stacktraces …

Having network issues with the broker can also become a very painful point, especially since it’s often random. You should try and design your infrastructure to have as little latency as possible between the app and the broker.

A very good advice is thus to monitor your broker’s system in depth, looking at the different metrics and setting up alerts. You can also look for SaaS hosting for your brokers as they often provide out-of-the-box monitoring solutions.

Tip 4: Specialize your workers

Workers can have different configs depending on what type of tasks they perform. For instance:

Computing Worker: 2Gb ram + 4 threads
I/O Worker: 256Mb ram + 100 threads

Tip 5: Strategical queuing

You usually try and optimize your different workers and queues to be able to dequeue everything in time at the lowest cost possible.

Here is an example of a queuing strategy:

(the number on the arrows represent the queue priority for each worker)

There is no one-size-fits-all solution for optimizing this. You’ll have to iterate and find out what works best for your tasks with your specific workers. You’ll have to change it over time as you update tasks and their mean runtime evolves independently. Again, monitoring is absolutely necessary.

At Drivy we have decided not to name our queues after the bit of logic they handle. We don’t want to have a mail queue or a car_photos_checks queue. We think it’s more scalable to group tasks in queues depending on their properties: mean runtime, acceptable dequeuing delay. So we have queues like urgent_fast or average_slow.

Tip 6: Anticipate tasks congestions

Here is a common bad day scenario:

Task A becomes 10 times slower (maybe because of network latency)
Task B keeps failing (maybe because some corrupt data got introduced in the DB)
Workers cannot dequeue everything in time

These sort of situations will necessarily happen. You should not try and avoid them altogether but at least be monitoring this, and be ready to take some actions.

Your monitoring system should be able to alert you when queues don’t respect their SLAs anymore. Being able to scale lots of worker quickly will help you. Try and know your limits in advance so you don’t overload a resource or consume all the network bandwidth. Auto-scaling is not easy to setup at all, don’t rely on it at the beginning.

Tip 7: Anticipate worker crashes

Here is a common list of things that can go wrong:

Exceptions
Hardware crashes
System reboots (e.g. Heroku apps restart everyday)

You absolutely need to handle soft shutdowns: receive the signals, try and finish tasks in time, and requeue them otherwise. Your workers should also feature an auto-restart feature, so that the congestion doesn’t get out of hand to quickly.

Memory leaks may also happen. Depending on the volumes, this may grow quickly and workers may crash. (be aware that Heroku Errors like Memory Exceeded are not monitored by error trackers like new relic). Debugging memory leaks is very hard. It’s even harder in taskqueue systems, so try and find the right tools for your stack, and arm yourself with patience.

Tip 8: Reliable CRON

Syntax errors can go undetected: the CRON system is rarely ran in development and even less covered by tests.

You may also encounter runtime errors e.g.: argument computed on the fly fails. This can easily go undetected as we often run the clock as a background process of a proper worker.

To avoid this, you should try and keep the CRON as simple as possible: it should only enqueue tasks with hardcoded arguments. It should not fetch anything from external resources, like the DB.

Two good practices that can help:

check the syntax of your CRON in your specs
monitor the CRON effect, possibly by enqueuing a ‘heartbeat’ task every minute and checking it’s been dequeued quickly in a separate monitoring tool.

Tip 9: Track exceptions

Tasks will raise exceptions. You cannot and should not cover in advance all cases. User input, unexpected context, different environments are just a subset of the problems that you cannot forecast.

You should rather focus your efforts on tracking. Using a bug tracker service (Bugsnag, Sentry, ..) is a very good idea. You should have a middleware that logs all Exceptions to your bug tracker, and setup alerting from your bug tracker. You can then treat the bugs and create issues for each depending on their priority / urgency.

Here is how a bug tracker interface (Bugsnag) looks:

Tip 10: Have a coherent retry strategy

Most tasks should be retried several times before being considered as properly failed. If your tasks respect the contracts from Tip 1, it should not be a problem to retry tasks by default.

Different tasks may have different retry strategies. All I/O calls (especially HTTP) should be expected to fail as a regular behaviour. You can implement increasing retries delays to handle temporarily unavailable resources.

Tip 11: Check your DB connections

Depending on the tasks, workers may hit the DB way harder than a regular web process. Try and estimate how hard before scaling your workers. You may also hit connections limits quickly: namely your SaaS provider connection limit, or your system’s ulimit.

A good practice is to use connection pools in your workers, and to be a good citizen: release the unused ones, reconnect on deconnections … Dimension these pools according to your DB limits.

Another very alleviating solution is to use slave databases in your workers: a task may then never slow down your product. However, you have to be careful with the possible replication lags. It is sometimes necessary to kill workers while the slaves are catching up with their masters.

Tip 12: Don’t over-optimize

Workers are very cheap nowadays, so you should be better off investing time into monitoring than optimizing your tasks and queries.

This was initially given as a meetup talk, see the slides here: http://adipasquale.github.io/taskqueues-slides-2015.

Managing Bugs at Drivy

Marc G Gauthier — Thu, 09 Mar 2017 00:00:00 +0000

It is very important to have a minimal number of bugs in production. I would love to say that we have absolutely no issues, but problems are bound to happen and it’s really a matter of reducing risk. The question is really about how fast and how efficiently can you react.

Detecting Bugs

Monitoring Errors

First we need to know if there are any problems. The more obvious solution is to be reactive to any user reports, this is why we have a dedicated Slack channel so that the customer support team can let us know about any issues:

This is a very simple and light process that works pretty well for now. However this isn’t great when we have to rely on users to let us know about bugs! This is why we use Bugsnag that let us know of any 500 or JS error in our live environments:

Detecting Possible Errors Using Metrics

In the examples above the error is pretty straightforward: someone tried to do something and it failed. However the most worrying bugs are the one that are failing silently. They are harder to detect and can cause a lot of problems. For instance if you have an issue with Facebook connect that fails 10% of the time silently, you will not see any 500 error… however you will see a decrease in KPIs related to the Facebook connect feature.

This is why we use business metrics to detect possible issues as it’s a great way to detect possible regressions. We use a wide variety of tools, depending on the situation, from Universal Analytics to Redash. This gives us a simple way to detect changes in patterns and react accordingly.

For performances and other technical monitoring, we use New Relic and Logmatic. We also have a setup with Telegraf, Influxdb and Grafana to check our time based metrics:

We have a lot of others solution to be completely sure that everything works properly. For instance we will write “checker jobs”, which is basically a cron running frequently and checking if data is shaped as expected.

Reacting To Bugs

Notifications

We use various tools with specific escalation policies in order to react quickly to any issues. For instance PagerDuty will phone us if some metrics are getting bad, but we also built a Slack bot that let us know of less critical issues:

To make sure that every problem is taken care of, we have set up a new role we call the “Bugmaster” who is in charge of checking all issues.

Fixing Bugs

We work hard on reducing the cost of releasing to production. If you can ship quickly and safely, you’re able to remove any bug quickly. To do so we constantly work our internal processes and tools. For instance we have a command line interface connected to Slack that allows us to release a new version of the website:

Once a fix has been made, it’s important to have new automated tests to prevent regressions.

Not Having Bugs In The First Place

Detecting and fixing bugs is not the most fun part of the job… and it’s way better to have none! This is why we invest a lot in a solid test suite that runs on CircleCI, an efficient Git workflow. We also focus on shipping small things quickly using feature flags instead of a doing massive releases and, of course, we have a great team of individual that want to ship working software.

Our workflow and tools changed a lot over the years, but it’s getting more and more robust. I can honestly say that it is very rare when we’re caught off guard by a serious bug, which is great for our users.

Overall it feels great to be working on a project that is evolving quickly, but can keep a good level of quality.

This is a simplified update of an article originally posted on my personal website. You can read the previous version here.

Getaround Engineering

JPEG and EXIF Data Manipulation in Javascript

The EXIF segment in the JPEG structure

Segment delimitations

Segment size

IFD: Image File Directory

IFD Tag

Locate the EXIF part

From image to bytes

Segments reading

Read and replace EXIF tags

IFD0

EXIF Sub-IFD

Write back the image

Conclusion

Babel, JavaScript Transpiling And Polyfills

State-of-the-art JavaScript Features

Babel To The Rescue

JavaScript Transpiling

Polyfills Handling

useBuiltIns: "entry"

useBuiltIns: "usage"

useBuiltIns: false

Transpiling: The Case Of TypeScript

Our open-source preset configuration

Building a modular multiple flows wizard in Ruby

Form object interface

Manager

Steps

Country

Insurance provider

Mileage

Controller and routes

View

Pros and cons

Conclusion

JavaScript smooth API with named-arguments and TypeScript

The original issues with functions and arguments

Named-arguments pattern (with TypeScript)

Single Object as an argument

Object destructuring

TypeScript enhancement

Shorthand property names

Use it wisely

GDPR compliance and account deletion

GDPR In A Nutshell

Data Retention

Getaround Context

The User Lifecycle

Technical Implementation

Flow Management

Anonymization Process

MySQL 8 Features

Common Table Expressions (CTEs)

Window Functions

Expressions as Default Values

Indexing key parts

Descending Indexes

Invisible indexes

EXPLAIN ANALYZE Statement

What else?

Your job is not just to write good code

Code reviews

Meetings

Your audience matters

Listening

Making sure everyone is on the same page

What's a good team process?

A good process is fully understood

A good process emerges from the team who’ll apply it

A good process is challengeable

A good process is well tooled

What I learned in two years at Getaround

Ruby and Rails-related APIs

tap and then

Active Record Transactions

Design Patterns

Command pattern

Form objects pattern

Facade pattern

`useBuiltIns: "entry"`

`useBuiltIns: "usage"`

`useBuiltIns: false`

`Named-arguments` pattern (with TypeScript)

`tap` and `then`

Auto document a base class with `NotImplementedError`