CodeOwners

February 18, 2021 ~ merlion

Maintaining the stability and code quality is an important tasks that helps ensures standards are met and reduce the likelihood bugs are introduced.

If you are using a version control tool like git, the CODEOWNERS feature is a great way to achieve this.

Here are some ways to use CODEOWNERS:

1. Code Owners arounds submodules or specific packages. Add team members who are experts on certain parts of an application or framework.

app
  - module1
  - module2

2. Code Owners around build and pipeline files. This is good if there is an expert on maven or an SRE who is maintaining the CD infrastructure files.

3. Code Owners around test packages. Having QA check the testcases during a pull request review will help improve and speed up the continuous delivery pipeline instead of leaving the test review til a later time.

For more information regarding CODEOWNERS and how to use it check out the guide on github: https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/about-code-owners

Builder Design Pattern

February 10, 2021February 10, 2021 ~ merlion

When a class has many parameters, you run into several issues:
1. Unreadable. The constructor is long.
2. Parameters are not optional. If some parameters are optional then you would need to pass in a 0 or null. Another way is creating multiple constructors however that will become easily unmaintainable.

I will give an example in the context of trading systems. Imagine you need a class to store the details for a stock order. Here’s what a class would look like:

public class Order {

    private double quantity;

    private double price;

    private String stock;

    private String side;

    public Order(double quantity, double price, String stock, String side) {
        this.quantity = quantity;
        this.price = price;
        this.stock = stock;
        this.side = side;
    }

}

To instantiate the class, it would like this:

        Order order = new Order(100.0, 50.0, "GOOG", "BUY");

As you can see, it is hard to tell what if the 100.0 is the quantity or price since the types are the same. Furthermore, an order can contain over 20 values and the constructor will be come huge. Now, let’s take a look how the builder pattern can help.

Here’s an order builder class:

public class OrderBuilder {

    private double quantity;
    private double price;
    private String stock;
    private String side;

    public OrderBuilder quantity(double quantity) {
        this.quantity = quantity;
        return this;
    }

    public OrderBuilder price(double price) {
        this.price = price;
        return this;
    }

    public OrderBuilder stock(String stock) {
        this.stock = stock;
        return this;
    }

    public OrderBuilder side(String side) {
        this.side = side;
        return this;
    }

    public Order build() {
        return new Order(quantity, price, stock, side);
    }

}

Here’s code using the builder class:

OrderBuilder orderBuilder = new OrderBuilder().quantity(100).price(50.0).stock("GOOG");

Order order = orderBuilder.side("BUY").build();

From the code above, you can tell the meaning of the values and it is fluent to read. Additionally, you don’t need to build the Order right away. You can pass around the the order builder and keep setting the values then call build().

Resources

https://dzone.com/articles/design-patterns-the-builder-pattern

Statistical Arbitrage Portfolio Managers

February 9, 2021 ~ merlion

Statistical Arbitrage is type of short term trading that trades a diverse portfolio of securities that are typically held between a few seconds to a few days.

Statistical Arbitrage Portfolio Managers fall into three categories:

Latency Critical

Trading Style: High Frequency Trading or Market Making
Technical: High
Trading Workflow: Owns the end to end system including deployment
Programming Languages: C/C++
Latency: High
Challenges: Infrastructure Setup

Systematic

Trading Style: Trading ie. Volatility, Pairs
Technical: Medium
Trading Workflow: Owns everything except operations/production support
Programming Languages: Java, Python
Latency: Medium
Challenges: Scaling and Platform Monitoring

Alpha Models

Trading Style: Modelling
Technical: Low
Trading Workflow: Owns the generation of stocks to trade except trading the stocks
Programming Languages: Python, R
Latency: Low
Challenges: Good Data. Replication and back testing of the model

In a future post, we will go over some strategies and products that are important to stat arb portfolio manager

Tech Analogies: Buffers

February 9, 2021February 9, 2021 ~ merlion

If you live in a home, you receive mail. Your mailman delivers mail between a certain time, let’s say 9 AM to 12 PM. During that time, you are at work and can’t be home. So your mail main drops the mail into the mail box. When you come home at 4 PM, you pick up your mail from the mail box. That mail box is analogous to a buffer in computer science. If you did not have a mailbox, you would need to be home to pick it up so the buffer allows you to do other work and pick it up when you are ready.

Let’s say your best friend wants to send a 20 page essay to your home. However, the mail envelope can only fit 10 pages. So your friend has to send the essay in two envelopes (2 envelopes * 10 max size = 20 pages). That envelope size of 10 pages is known as the maximum transmission unit.

Your friend delivers the essay into two envelopes. At the same time, you decide to go on a one week vacation. During that time all your friends send a birthday card to your mail. Your mail man delivers the birthday cards every day until one day your mailbox is full and can no longer take any mail. What do you do? Should the mailman leave the mail on the floor, come back the next day or remove the old mail and throw it away? This is called a congestion strategy.

T Shirt Sizes

February 7, 2021February 7, 2021 ~ merlion

People are not good at estimating tasks. This is evident in the many delays projects all over the world from companies like IBM to large government infrastructure projects like the NYC Subway. The same is true in software development. There will be unforeseen edge cases and operational tasks that will add to the time to the project.

Instead of trying to figure out a precise estimate, we can use the fibonacci estimation system to help give a general ball park. Tasks are assigned one of the fibonacci numbers: 1, 3, 5, 8. The estimation is done by the team members via a majority vote wins system. Although some tasks are underestimated or overestimated, it tends to average out over time.

The fibonacci numbers can be categorized as T-Shirt sizes as follows:

X-Small – 1 Day
Small – 2 Day
Medium – 3 Days
Large – 5 Days
X-Large – 8 Days

We can also remove the 2 Days to simplify the model:

Small – 1 Day
Medium – 3 Days
Large – 5 Days
X-Large – 8 Days

Estimation is an important task as this helps planning across projects and gives visibility across the chain.

Underestimate and Over Deliver

Java Performance Tip: For Loops

February 7, 2021February 7, 2021 ~ merlion

A for loop is used to iterate over sequence of elements.

There are a couple of ways to write a for loop in java:

Traditional For Loop

for (int i = 0; i < nums.size(); i++) {
  System.out.println(nums.get(i) + 1);
}

2. Enhanced For Loop

for (Integer num: nums) {
  System.out.println(num + 1);
}

3. Streams

words.stream().forEach(num -> System.out.println(num + 1));

Doing a quick performance test using System.nanos:

        long start = System.nanoTime();
        // for loop code
        long end = System.nanoTime();
        System.out.println(end - start);

The results were: Option #1 = 3511729, Option #2 = 3701567, Option #3 = 3790359

Option 1 performed the best. Options 2 & 3 performed the worst and can be attributed to creating an iterator thereby generating garbage that will impact your gc time in addition to the unboxing of the integer.

Sometimes the simplest and the old fashion way of doing things is the way to go.

It is important to benchmark and profile your code to understand the bottlenecks. When doing performance improvements, attack the largest bottlenecks before doing micro optimizations. Additionally before looking at your java code, make sure your hardward is optimized to the fullest as that is usually the largest bottleneck. For one of our projects, we optimized the largest bottle necks until we reached the for loop optimization as one of the largest performance boost.

Resources:

https://jaxenter.com/java-performance-tutorial-how-fast-are-the-java-8-streams-118830.html

Alternative Data: Credit Card Transactions

February 7, 2021 ~ merlion

Alternative data is data used to obtain insight into the investment process. At hedges fund this type of data is has become critical for making trading decisions.

One example is credit card transactions. Yodlee is a software company that provides anonymous credit card transactions. Those never-ending receipts that you get at your local supermarket or retailer are useful to predict the revenue of a company.

The workflow goes as follows:
1. Yodlee provides the raw data. This can be delivered to a Cloud Store such as AWS S3

2. The raw data is loaded in a data warehouse store such as AWS Redshift, Panoply, Teradata

3. The raw data is cleaned by removing duplications and enhanced by adding new transaction descriptions

4. The data is then tagged by matching descriptions to companies. This is not a trivial process as products can be named differently at different stores. In this step, Regex Tagging is the simplest form of tagging technique if you know what set of rules to apply. However, the descriptions can change daily and new products can be on the market that your set of rules may not be aware of. This is where ML Tagging comes in to help tag companies by using machine learning and training data to automatically label your data. A human can provide feedback to the machine learning model to help it learn and get better at tagging. This step can be done in a data processing system such as AWS EMR.

5. After the data is tagged, then we can provide useful information by aggregating prices by time and other metrics.

6. Clients can consume the data through an API or subscribe for notifications on specific metrics when available.

Loading -> Cleaning -> Enhancing -> Tagging -> Aggregation -> Consumption

Hedge funds can use this data to accurately predict if companies will miss, meet or beat the analysis expected earnings. If it predicts correctly that the company will beat the expected earning then hedge funds can start buying the stock leading up to quarterly earnings call.

Resources

https://www.yodlee.com/

https://aws.amazon.com/sagemaker/groundtruth/what-is-data-labeling/

How catastrophizing can help build better products

January 20, 2021 ~ merlion

**Catastrophizing** *is when someone assumes that the worst will happen.*

Everyone has those moments when they fear doing something because they feared the worst that could happen. For example, skydiving, asking a girl out or traveling to a remote region with little access to resources.

This mindset has a negative impact on your life and the recommended advice is to shift your mindset to think about the best thing that could happen.

The concept of **catastrophizing** mindset can actually help build robust technology products. If you design and develop a product with only the best possible case in mind then in the event when things do wrong, you will be in a deep hole scrambling for a way to get out. As Murphy’s law states: “Anything that can go wrong will go wrong”. If you think about the worst possible case then you will be better prepared and have a more resilient application.

Recently, I was on support for a couple of weeks and this new application was rolling out to multiple regions and supporting multiple use cases. There were three incidents that the application failed to perform its duties or even worse corrupted the data. When it came to the time to remediate the problems, there was no backup plan or an emergency button. As a result, we had to scramble to think of solutions on the spot and write custom scripts right then and there. Even worse was one incident where it impacted external parties outside of the company that was depending on our data to be correct and we were under time pressure to deliver the corrected data. Had we thought about the worse case, we would have had defensive measures in place to prevent the issue and if the issue was not preventable then we would have had backup plans in place to remediate the issue.

When a team has a string of incidents then it is an indication to revisit your product and figure out the ways that your application can fail. Considering having war games to battle test your product.

Make More Mistakes!

January 5, 2021 ~ merlion

The other day – we received an email from a quant that some parameters for an algo had been reverted to its previous values. The algo had been running with olds values for over two weeks.

Upon digging into the root cause – I had discovered the analyst had made a mistake during an application upgrade and the data got reverted. I spoke with him not to yell at him for a mistake but to understand why the mistake occurred and discuss how we can prevent it in the future. I also explained the impact due to the mistakes so everyone understands the severity of it.

The management was surprised and probably disappointed however as long as you can explain the root cause and how to prevent or reduce the chances of it happening again then that is ok. Humans make mistakes and typically it is a process issue that resulted in the error.

The most important thing you can provide as a manager to your team is psychological safety – a safe place to learn and grow. Take responsibility as a manager and shield your team.

After all the dust settled – I told the analyst to keep making mistakes because the person who makes the most mistakes wins.

Blame Sink

December 28, 2020December 28, 2020 ~ merlion

Leadership Tip: Be a blame sink

During the course of your professional or academic career, you will need to work with various teams to achieve your objectives. Issues and problems will arise due to human, operation or technical errors. Often times it will occur at a sensitive time.

As a leader, you should rise above your ego and not look to blame others or immediately look for a root cause even when you or your team did not cause it. Instead be a blame sink – someone who absorbs all the blame in order to progress forward. Once you say: “hey everyone, that was my bad – what’s the next step or how can I help?”, you will find that everyone else will drop their ego and even say it was not your fault and come together to find a solution.

Story time
There was one time, I had to work with teams in Asia and US. During the call, everyone was blaming each other over an action that one team had done and another team believe it should not have been done. As the application owner, I stepped in and said we should have communicated it better and said it is not important at this moment to discuss what the correct process should have been but instead what do we need to do right now to fix the problem. Right after I said that, everyone agreed we should be discussing the solution and started talking about what’s the next step. You will find that some people will have strong opinions and pride that they can’t let go because they are caught with their emotions. To reel them back in, be a blame sink. By doing so, everyone will come together as a team and achieve your objectives.