Miere’s Personal Observations

Analysis: Prioritizing Technical Debt - Part I

2021-04-20T00:00:00+10:00

This post is the part I of the video analysis in which I'll elaborate on a few points exposed by Adam Tornhill on a talk presented in the 2019 GOTO Conference that happened in Copenhagen. As he addressed several topics, I've decided to tackle it into different fronts.

In this initial post I'll cover the first half of the video where he approaches technical debts found within a single code base (or repository if you will). There are several points on his presentation that, I believe, are spot on and deserves to be transcribed and better explained. Technical Debts as a concept is something usually underrated by several (if not most of) big companies. The amount of money and man-hour effort spent fixing problems is often pointed as an order of magnitude higher than avoiding it before being released into production.

0:37

Lehman Law's

Adam will use Manny Lehman's Laws of Software Evolution as the foundation of his argument during the whole talk. ¹:

Continuing Change: "a system must be continually adapted or it becomes progressively less satisfactory"
Increasing Complexity: "as a system evolves, its complexity increases unless work is done to maintain or reduce it"

2:20

One shall focus on the real issue

As he points out, there's an inherent conflict between these two laws. "On one hand, we have to continuously evolve and adapt our system but as we do, its complexity will increase which makes it harder and harder to respond to change". These are so intertwined that it prevents us to perceive the problems as it is, leading us to tackle "the symptoms more than the root cause".

What he has experienced is somewhat similar to what I've witnessed in the past few years. If we, as technical and business leaders, only monitor the tickets we have in the backlog we might not be able to identify the impact that technical debts impose to our users. Therefore, to actually solve the problem, we need to understand the underlying issue - the root cause of the problem. It might imply on revisiting a software, a component design or even how we approach and process the interaction with the user.

It worth mention that Tornhill is not advocating perfectionism. In general, users don't mind living with one or two glitches in the app as long as it doesn't impact their work routine. But as we neglect the existence of those technical debts, the time to release of new features or fix bugs grows bigger (and faster) than the symptoms experienced by the users.

The clear challenge for product and engineering teams is finding a balance between mitigating technical debt and ensuring the software still delivers valuable experience to its users.

6:55

The perils of quantifying technical debt

We really have to consider what kind of behaviour do we reinforce by putting our quantitative goal on technical debt. [...] People like us (developers) will optimise for what we are measured on. That most likely means that we're going to pick not only the simplest task, but we're going to pick tasks that we're comfortable with. [...] That also means that we lack the most important aspect of the technical debt: we lack the business impact.

It's safe to say that, at this point in time, Adam addresses the biggest wound in the current state of the software development industry. We've been putting much more effort on the results (having less tickets in the backlog) than on the outcomes (deliver more value, or provide a better experience, to our users). By simply removing those issues from our sight we are neglecting to solve the root cause of the problem, as it doesn't guarantee our software won't behave poorly in the foreseeable future.

10:50

Assessing the technical debt

Cyclomatic complexity² is probably the most common tool used to identify whether or not a piece of code needs to be revisited due to its complexity. Adam points out, though, that "code complexity is only a problem when we have to deal with it". There's no point in optimizing a complex source that is unlikely to change in the foreseeable future.

We can combine Cyclomatic Complexity with Code Change Frequency³ analysis. Together they give us a different perspective on how we've been dealing with the software, as we could use it to list which sections of it has been more frequently changed. This metric alone wouldn't be really useful, after all, why should we optimise something that is not complex at all.

To better understand the process, Adam proposes us to use our VCS tool and to list the most frequently changed files over a given period of time. Once you spotted a file which has been constantly changed, one can be submit each of the changes (as taken from the VSC repository) to a complexity analysis. The result would a timed series of data that we can plot into Complexity Trend chart - as illustrated by the picture below.

Picture taken from codescene.io

16:20

Hotspots X-Ray

Refactoring of a given unit of code can be approached in a multitude of ways. Adam emphasised that we should never start a refactor process without taking into account how the team, as a whole, interact with the identified problematic code.

Usually, a big refactor might too impactful to be applied straight into the main branch. It implies that we might need to branch that modifications out. The risk is clear, if the main branch is modified more frequently than the refactor branch, it might never be merged back again.

Reducing the refactor scope is key to succeed here. The speaker then suggests us to narrow down our problematic code base analysis to the function/method level. His ingenious suggestion allow us deliver valuable enhancements into our software within days rather than weeks (in case you're refactoring a humungous class) or even years (if we've decided to completely overhaul the system).

Manny Lehman wrote a set of articles between in the 1970s. He would wrap these articles later in 1980 in a book titled "On Understanding Laws, Evolution, and Conservation in the Large-Program Life Cycle". doi:10.1016/0164-1212(79)90022-0
Cyclomatic complexity is a software metric used to indicate the complexity of a program. It is a quantitative measure of the number of linearly independent paths through a program's source code. It was developed by Thomas J. McCabe, Sr. in 1976.
Code Change Frequency is a measure to identify files that has been more frequently changed. If you are curious to learn how you can address this, you can check out this really interesting repository.

Understanding the Rule of Three

2020-06-08T00:00:00+10:00

There is this subconscious understanding that unnecessary complexity is root of all evil in software development. We have been fiercely ranting about the dreadful experience caused by the dependency hell. We’ve been detracting legacy code as big ball of mud because nobody can honestly understand how it works. Fearlessly, we’ve been shushing our developer mates arguing that they’ve done the wrong coding choices.

It is funny how Machiavellian we can become in name of the simplicity. Surely, we can have better days with less cyclic dependencies, we probably can adopt better design patterns or even avoid syntaxes that ordinary people can’t easily read in our source code. But, to remove complexity from our daily routine, we need a systematic approach that help us to weigh up the benefits of every piece of code we introduce in our software.

The rule of three

Source: xkcd

Benefit-first ¹ is a technique to guide the decisions you made when you develop a software. Basically, before introduce any piece of code on your software you have to list pros and cons of adopting it, and analyse the impact of maintaining it in short and long term. If the benefit of introducing it is bigger than the cost of maintaining it, then it is probably safe to proceed.

The first piece of advice I’ve ever read that advocated a similar approach came from Martin Fowler on his iconic book Refactoring. According to him, we should adopt the rule of three to drive our refactoring decisions, avoiding overcomplicated strategies that might need to be rethink again in the future. As the book got notorious and relevant, the Rule of Three has been constantly repeated on the internet with subjective definition and, in some cases, with no clear steps to reproduce this technique in the future.

Distinguishing Chance, Coincidence and Trend

My professor of Statistics once taught me about a similar concept: identifying trend behaviour. Letting all mathematic formulas aside, I would reproduce his reasoning and translate to the Software Development world as follows:

Take a chance and design the most simplistic solution to your problem. As you have a simple piece of code testing it take less time, and it will be easier to adapt in case your needs change.
Treat further changes as a coincidence. We have this instinct to introduce generic solutions to solve problems that might happen in the future. A premature generalization might require further refactoring, or might even be removed altogether being replaced by a different approach in the future. Sometimes, a simple if/else statement is our best choice.
A good hint to identify trend behaviours is the introduction of duplicated pieces of code. That’s the perfect opportunity to introduce abstractions that not only removes these duplications but also allows developers to extend the current logic without modifying it ².

By looking to these rules it is clear that they focus on the software design, diverging from its original proposal as a technique to help you to refactor your code wisely. By adopting it in the early stages of the development, we enforce simple and evolving code over reactive trial-error-refactor approach, where complex solutions will be introduced only when we need to solve a complex problem.

Final Thoughts

Sometimes the requirements are clear enough so you can jump into the complex solution straightaway, but that is not usually the case. The above mindset was an attempt to narrow down the Rule of Three scope and try to make it more reproducible on a daily-basis. As a bonus hint, I’d like to left here another definition that approaches the Rule of Three as something totally different way. I found it when I was reviewing the references for this article and it has been echoing on my mind for a bit of time already. Hopefully it will be just as enlightening for you as it was for me.

There are two "rules of three" in [software] reuse:

It is three times as difficult to build reusable components as single use components, and

a reusable component should be tried out in three different applications before it will be sufficiently general to accept into a reuse library.

- Fact#18 from Facts and Fallacies of Software Engineering

Benefit-first isn’t a term carved by me. Although I’ve never saw any specific mention to this term on a book or a lecture before, I heard about it a few years ago from my first mentor and I’ve been using it since then. ↩
The Open-Closed Principle is the perfect technique for this job as, by definition, it closes out methods or classes for modification by introducing customization points that allow it to be extended from outside. ↩

Analysis: Thinking Asynchronously

2020-06-05T00:00:00+10:00

This is an opinionated transcription of Eric Johnson's talk Thinking Asynchronously. He has presented in the 2020 GOTO Conference, online edition because of COVID pandemic. His straightforward presentation approach guides us through steps that take advantage of asynchronous persistence pipelines to provide a better experience to our users. It is a great opportunity for newcomers to understand where AWS want to achieve with serverless from know on. I took the opportunity to elaborate more on a few services used by him on his talk to give more context.

2:45

Common Serverless Pattern

Usual serverless application will mimic the typical three-tier architecture. API layer will, as naturally happens, be responsible for Security and Routing, while compute layer will have anything else you need to persist your data into the storage layer. On Eric's perspective this comes with a concerning trade-off: if something goes wrong it will probably fail on your code as it is most vulnerable building block of your architecture.

5:24

Thinking Asynchronously

Eric proposes that we persist the data before we apply any computation to that. It brings us a few major benefit from the traditional approach:

Greater reliability. In case of failure in our codebase, we have our data persisted already.
Faster response times in our APIs. By moving the extra computation to a second step, the user already received the feedback in the UI.
We can do more in less apparent time to the client. As the complex computing is now the last thing to do, our persistence pipeline has bigger room to process data with no apparent impact on user's experience.

One might argue that you can squish bits and bytes of your code to provide a similar result. Perhaps the pillars from Eric's approach lies on how it increases the flexibility by reducing the response time on the API side. After all, it's an old known fact that better response times implies better user experience.

10:14

Well, we talk about serverless we look at "what is serverless?" and basically I meant serverless is: something happens, we react and do something.

10:45

Event Driven Patterns

Event Driven Development is the key to make the suggested approach work. AWS-wise, there's a multitude of events that can you can listen to with a AWS Lambda function. Of course, you can also take advantage of AWS network services like SNS, SQS and Kinesis to consume asynchronous events on your Docker container or application instance.

12:06

Amazon API Gateway

I'd like to draw your attention to this versatile service available in AWS, and the idea behind its conception. API Gateway was first introduced in 2015, it communicates directly with 100+ AWS Services, allowing you to transform requests and response payloads with Apache Velocity templating language (VTL). It's commonly used as serverless Rest API, allowing developers to configure HTTP routes in a higher level abstraction in which you don't have to provision resources to handle the request - matching Eric's personal definition of serverless. Requests received by the API Gateway are translated into events, allowing you to listen it directly to any compatible AWS service, like DynamoDB or Lambda.

API Gateway is also a well known pattern. A few years ago, Netflix OSS team introduced their own API Gateway solution. It was designed with a few key philosophies in mind, "each of which is", in their words, "instrumental in the design of our new system":

Embrace the Differences of the Devices
Separate Content Gathering from Content Formatting/Delivery
Redefine the Border Between “Client” and “Server”
Distribute Innovation

In fact, the problem it solves is wide known between teams handling large fleet of microservices. GraphQL, for instance, approaches these problems from a different perspective, and has been in internal use on Facebook since 2012 as well. Since Netflix blog post, several other approaches have been designed at an alternative for the custom brewed API Gateway solution. Krakend is a fairly popular and feature rich stateless API Gateway - might be a good tool for those situations where the cost AWS API Gateway is an issue.

12:06

Amazon DynamoDB

DynamoDB is an underrated, fascinating database service. The description on its website doesn't make justice to its capabilities: ¹

Key-value data store - It fits perfectly as a persistence layer for tasks that requires intensive write throughput - being especially good for timeseries data or document persistence.
Expirable entries - You can define a Time To Live (TTL) value to arbitrary expire entries on your tables.
Global tables - DynamoDB can manage tables accessible (replicated) globally.
In-memory Acceleration with DAX - It acts as a mix of near cache and table space for ly accessed data. Pricey, but might worth it if you take into account the cost of maintaining such a mechanism by your self.

Surely, DynamoDB's feature list is more extensive than that. But the ones above mentioned are ingredients for a multitude of scalable recipes for problems you might face on your daily routine. From The Poor Man's Event Sourcing Tool ² to a Globally Distributed Ordered Queue, it is the kind of Swiss Knife you want to have on your toolbox when you have a complex situation to tackle. You can even create your ad-hoc scheduling mechanism, allowing you "to schedule an irregular point of time execution of a lambda execution without abusing CloudWatch crons".

Inevitably, its simple key-value design introduces trade-offs: indexing is quite limited, you can't join tables and you will probably spend a bit of time trying to fine-tuning it for an optimal cost its read and write provisioning ³. But its flexibility and simple API, along with thoughtfully designed persistence tables, might be an elegant and affordable solution for your company.

14:20

Other "storage first" options

24:28

Amazon EventBridge

This is another obscure but intriguing service available in the AWS portfolio. EventBridge is more than an old-fashion Service Bus, but a blistering fast decision tree capable to translate inputs into actions. It can connect with, basically, everything from AWS Lambda and AWS Step Functions to AWS Kinesis and AWS SQS. You even use AWS SNS to trigger a further HTTP request to an external service.

31:03

Lambda Destinations

With Destinations, you can route asynchronous function results as an execution record to a destination resource without writing additional code. An execution record contains details about the request and response in JSON format including version, timestamp, request context, request payload, response context, and response payload. For each execution status such as Success or Failure you can choose one of four destinations: another Lambda function, SNS, SQS, or EventBridge. Lambda can also be configured to route different execution results to different destinations.

As we've reached the apex of this talk, Eric was so excited that his explanation was a bit disastrous ⁴, so I decided to quote an AWS Blog post about Destinations instead.

"A fully managed proprietary NoSQL database service that supports key-value and document data structures. DynamoDB can handle more than 10 trillion requests per day and can support peaks of more than 20 million requests per second" - as was seen in the Amazon DynamoDB description page - 05/Jun/2020.
Depending on how you design your tables, TTL and stream listeners, it might be cheaper than spinning up and maintaining a Kafka cluster.
The pricing model is not the same pay-as-go that you find on most of AWS services, instead of paying for requests you pay for the provisioned read and write capabilities of your tables. It's true that, a long time ago, Amazon introduced auto-scaling capabilities for table provisioning, but still you have to keep its pricing model in mind otherwise you might run out of budget.
Eric's explanation about Lambda Destination: "There is this really cool thing we announced last Re:Invent called Lambda Destinations. And the way this works is I can run a function and if it is successful than I can just trigger some data into EventBridge, Lambda SNS or SQS. Or if it is on fail, I can then trigger data into the same data."

Stuff Internet Says About Software Development #1

2020-05-22T00:00:00+10:00

This is my reading list from the past few days. I decided to put them here as it might be helpful to someone else. It was deeply inspired by HighScalability blog, a source I’ve been consuming for years.

Microsoft all over the places

Microsoft keeps its push to become a major player in the Open Source community. Let's take a look at the majestic presence they have at the media recently.

At Microsoft, 47,000 developers generate nearly 30 thousand bugs a month. These items get stored across over 100 AzureDevOps and GitHub repositories. To better label and prioritize bugs at that scale, we couldn’t just apply more people to the problem.

- Secure the software development lifecycle with machine learning

Microsoft was on the wrong side of history when open source exploded at the beginning of the century, and I can say that about me personally.

- Rust/WinRT Brings Microsoft Closer to Adopting Rust Internally

A case study will be written on how Microsoft allowed Zoom to eat their lunch. They spent millions on subterfuge trying to paint Slack as an inferior enemy when MSFT Teams actually can't do what Slack does and Teams' real competitor was Zoom. Now Zoom has 300M Daily Users. Lol.

- Chamath Palihapitiya

Rust/WinRT lets you call any WinRT API past, present, and future using code generated on the fly directly from the metadata describing the API and right into your Rust package where you can call them as if they were just another Rust module.

- Microsoft president Brad Smith, taken from Microsoft: we were wrong about open source

Rust on the Radar

As we're talking about Rust, it seems that is not only Microsoft who's investing time and effort on it.

There are many benefits a standardized ABI would bring to Rust. A stable ABI enables dynamic linking between Rust crates, which would allow for Rust programs to support dynamically loaded plugins (a feature common in C/C++). Dynamic linking would result in shorter compile-times and lower disk-space use for projects, as multiple projects could link to the same dylib. For example, imagine having multiple CLIs all link to the same core library crate.

- A Stable Modular ABI for Rust

Programming is hard.

Not because our hardware is complex, but simply because we’re all humans. Our attention span is limited, our memory is volatile — in other words, we tend to make mistakes.

- Why Rust? by Omar Faroque

The deno_core crate is a very bare bones version of Deno. It does not have dependencies on TypeScript nor on Tokio. It simply provides our Op and Resource infrastructure. That is, it provides an organized way of binding Rust futures to JavaScript promises. The CLI is of course built entirely on top of deno_core.

- Deno 1.0 by Ryan Dahl, Bert Belder, and Bartek Iwańczuk

Fowler and Friends

It looks like a busy week for Martin Fowler and his friends. New ThoughtWorks Radar was released, a few blog entries has been updated and the man himself has carved another set terms and added to his legacy to the Software Engineering.

This division of development into lines of work that split and merge is central to the workflow of software development teams, and several patterns have evolved to help us keep a handle on all this activity. Like most software patterns, few of them are gold standards that all teams should follow.

- Patterns for Managing Source Code Branches by Martin Fowler

For this Radar, we decided to call out again infrastructure as code as well as pipelines as code, and we also had a number of conversations about infrastructure configurations, ML pipelines and other related areas. We find that the teams who commonly own these areas do not embrace enduring engineering practices such as applying software design principles, automation, continuous integration, testing, and so on.

- ThoughtWorks' Technology Radar

Coming to understand the threat model for your system is not simple. There are an unlimited number of threats you can imagine to any system, and many of them could be likely. [...] Cyber threats chain in unexpected, unpredictable and even chaotic ways. Factors to do with culture, process and technology all contribute. This complexity and uncertainty is at the root of the cyber security problem. This is why security requirements are so hard for software development teams to agree upon.

- A Guide to Threat Modelling for Developers by Jim Gumbley

Other relevant quotes

Hum, let me see... What else should be mentioned?

Zoom scaled from 20 million to 300 million users</b> virtually over night. What's incredible is from the outside they've shown little in the way of apparent growing pains, though on the inside it's a good bet a lot of craziness is going on.

- A Short On How Zoom Works

Besides being an interesting approach to a very common problem, their discussion of Piranha also provides some very interesting insights into an organization that's *heavily* invested in feature flagging....

- Pete Hodgson about Uber open-sourcing Piranha , a feature-flagging tool

Deferring integration can increase the risk of merge conflicts, which causes you to move more slowly as you spend more energy addressing those conflicts. Slow change can sometimes be more risky than you expect because of the costs of extra work needed to reconcile conflicts, as well as the technical debt that results from bypassing the normal process to fix critical errors.

- Code Integration: When Moving Slowly Actually Has More Risk by Steve Berczuk

Simply put, testing in production means testing your features in the environment where your features will live. So what if a feature works in staging, that's great, but you should care if the feature works in production, that's what matters.

- Talia Nassi on Testing in Production

[...] when I was asked to reduce the resource requirements of a large MongoDB cluster, I reached the conclusion that the most obvious target - attribute names - wouldn’t lead to the kind of impact I wanted.

- Richard Startin on Shrinking BSON Documents

The most considerable impact I see is in regards to velocity. The team can focus on other business-impactful projects, rather than EKS and Kubernetes maintenance -- the undifferentiated heavy lifting is eliminated. The same reason people move from physical data centers to the cloud, or from EC2 to Serverless: offloading that effort to AWS is a very good proposition.

- Q&A on Container Scaling with Fargate

Did you know that http://pypi.org serves 800 million requests and delivers 200 million packages totalling 400 terabytes ... a day? No. Exactly. You want it to just work. Every day, rain or shine. To keep it that way: sponsor them

- Tim Head

We recently migrated a few small systems to CockroachDB (as a stepping stone). Overall, the experience was positive. The hassle free HA is a huge peace of mind. I know people say this is easy to do in PG. I have recently setup 2ndQuadrant's pglogical for another system. That was also easy (though the documentation was pretty bad). The end result is quite different though and CockroachDB is just simpler to reason about and manage and, I think, more generally applicable.

- latch

Our actual use-case is a little complex to go into in tweets. But suffice to say, the PUT costs alone to S3 if we did 1-to-1 would end up being just under half our total running costs when factoring in DDB, Lambda, SQS, APIG, etc.

- Wayne Robinson

Need operational analytics in #NoSQL? Maintain time bound rollups in @DynamoDB with Streams/Lambda then query relevant items by date range and aggregate client side for fast reporting on scaled out data. Turn complex ad hoc queries into simple select statements and save $$$

- Rick Houlihan

Another part of the solution is GPU acceleration using grCUDA — an open-source language binding that allows developers to share data between NVIDIA GPUs and GraalVM languages (R, Python, JavaScript), and also launch GPU kernels. The team implemented the performance critical components in CUDA for the GPU, and used grCUDA from Python to exchange data with the GPU and to invoke the GPU kernels.

- Optimizing Machine Learning Performance at Netsuite with GraalVM and NVIDIA GPUs

Although event-driven architecture has existed for more than 15 years, only recently has it gained massive popularity, and there is a reason for that. Most companies are going through a “digital transformation” phase, and with that, crazy requirements occur. The complexity of these requirements force engineers to adopt new ways of designing software, ones that incur less coupling between services and lower maintenance overhead. EDA is one solution to these problems but it is not the only one.

- How event-driven architecture solves modern web app problems

So, let's look at the resulting context of moving to microservices with entity services:

Performance analysis and debugging is more difficult. Tracing tools such as Zipkin are necessary.

Additional overhead of marshalling and parsing requests and replies consumes some of our precious latency budget.

Individual units of code are smaller.

Each team can deploy on its own cadence.

Semantic coupling requires cross-team negotiation.

Features mainly accrue in "nexuses" such as API, aggregator, or UI servers.

Entity services are invoked on nearly every request, so they will become heavily loaded.

Overall availability is coupled to many different services, even though we expect individual services to be deployed frequently. (A deployment look exactly like an outage to callers!)

- The Entity Service Antipattern by Michael T. Nygard

Moving all the “what does the world around me look like?” side effects to the beginning of the program, and all the “change the world around me!” side effects to the end of the program, we achieve maximum testability of program logic. And minimum convolution. And separation of concerns: one module makes the decisions, another one carries them out. Consider this possibility the next time you find yourself in testing pain.

- Ultratestable Coding Style by Jessica Joy Kerr

Documenting Your Software Architecture

2020-05-22T00:00:00+10:00

In the 2000s we went from documenting every single class of our software to not at all. It was an attempt to increase the delivery pace, keeping the team away from tasks that, eventually, have to be remade once the software changes. As a side effect, scaling the team became a problem. The most notable one is the lack of autonomy from its members. Not only newcomers need special attention to get familiarized with the basics of their software (like building and running it), but they would still ask several questions until they understand how it works and how it solves the problem it was designed for.

It is a known fact that designing new software requires meticulous planning, strict alignment between team members and keep them focused on the defined goal. Reducing the scope of your documentation to the software architecture might be a good starting point, as you probably don’t need to document every single class of your software to give them enough direction when they are coding. It relieves teams from frequently asked questions and forcing its members to stick with previous definitions unless a big change is necessary, and will let them focus on the business side of their projects.

Unlike the lack of autonomy, lack of accountability plays a non-neglectable indirect impact on team performance. Let us assume that in a Software Development team we are going to face only three types of issues: bugs, business-related issues and architecture issues. While Developers are accountable for solving bugs, and Product Owners for business-related issues, even on teams where there’s a dedicated Software Architect nobody responds for the architectural problems the team faces. On first glance, it seems unfair to blame him for a problem that was introduced collectively by a multitude of reasons that happened together.

In reality, once the team starts documenting their learnings and architectural decisions they will become more accountable from their architectural decisions. By being fully aware of the technical decisions made in the past and understanding the expected positive outcome it introduced, teams can be accountable for architecture issues on the software, erasing the grey zone between Architecture Decisions and Business Decisions.

Documenting for your different audiences

Documentation is a tool to transfer knowledge, to keep the team on the same page regarding what problem the software solves, how those problems were solved, what were the technical decisions that should be followed widely during the development, and how one can run the software and see it in action. Having long essays written in an MS Word document or a Wiki page is not enough to efficiently convey the information to the team members.

On his talk Visualising The Software Architecture, Simon Brown stressed that “as your software have different audiences with different needs”. He means that you need different tools to cover how the different layers of our software work. Thus, along with his iconic C4Model - which will be discussed later - I have adopted a couple of simple-yet-powerful guidelines that helped me to contextualize whoever is maintaining our source code.

The Source Code Guideline

For the sake of productivity, the documentation should be started with the Source Code Guideline, which covers the basics concepts a developer might need to understand how the project source code is structured. A basic version of this document should answer the following questions a developer might have - although it can be enhanced with more topics whenever needed:

How to run the software locally?
How to run all automated tests locally?
How to package the software?
How to deploy the software?
How to edit the software?

Bear in mind that long lists of commands can lead to reproducibility issues. So try to keep it as neat as possible, automating it before documenting it.

Here you can find a sample document that answers these questions with topics, making them easy to read.

The Architecture Decisions Guideline

Next comes the Architecture Decisions Guideline, which complies of everything that might affect the daily routine of any contributor to your software. It should be straightforward and concise, not only pointing the exact direction one has to follow to contribute to the software, but also introducing them lessons learnt from the past and decision taken to avoid possible issues. It may contain:

An analytical list of problems, techniques and methodologies that should be avoided on the project.
A detailed process of how issues should be fixed and how new features should be introduced in the software. Make sure to state the tools involved in this process - e.g. Git branches.
A brief explanation of how new releases are rolled out to production.
An explanation of how quality is enforced before a release is closed.
What coding principles shall be applied otherwise the proposed modifications might be denied.

When to compose this document is a complex topic. Synergic teams are constantly aligned and might be on the same page right from the conception of the software, which would allow the team to postpone the document creation to a future moment. It’s desirable, though, to have it finished once the first stable version of the software is released. Its existence will be the backbone of any code review that might happen in the future. It will ensure that the quality the code base achieved will last long enough so developers can move to the next project with peace of mind.

Here you can find a sample document that exemplifies how it can be composed. This one, though, is a bit denser than the previous one as it was based on a previously written document taken from a previous customer.

The C4Model

C4Model is probably the most pragmatic documentation model I’ve come across. Simon came with the idea to represent the different layers of software with diagrams. When he conceived it, he wanted us to experience a model that worked like maps: by zooming out you get more context you dive into more details, but when you zoom out to better understand in which context the software is running on. According to the way he conceives it, any software can be described in four main layers, each of which is represented by one diagram:

System Context Diagram - Shows the software in question in the centre, identifying who works with it and what it depends on.
Containers Diagram - Illustrates the overall shape of the architecture and a few technology choices.
Components Diagram - Explains the logical components and their interactions within the component.
Code Diagrams - Explains component implementations in detail.

The first three diagrams are his creations, all of which adopts simplistic notations to demonstrate how a particular piece of the system is working. Code Diagrams, on the other hand, are basically UML and he discourages us to adopt them as they tend to become outdated quite frequently. It is there in the case you have a specific need where documenting the source code itself is mandatory task.

C4Model is well documented on its website, with several years spent polishing it to reach this level of simplicity and organization. The beauty behind these diagrams lies in the fact we can pick our target audience, allowing us to choose when and who should be involved in the documentation process.

Re: Ensuring backwards compatibility in distributed systems

2020-05-20T00:00:00+10:00

A few days ago, I spotted a blog post from StackOverflow that drew my attention: Ensuring backwards compatibility in distributed systems. That is the sort of topic I love to consume as it gives me new insights and let me know how people are solving similar problems. The article was engagingly good and kept me focused on reading it until the end.

I would like, though, to write a few remarks about a few definitions assumed in the article which, if taken by the book, are not strictly correct. The idea behind the following paragraphs is far from detracting the author or the post itself. StackOverflow has a massive audience and knowing the correct definition might help them to adopt the right technique for the right job.

Conditions which the suggested deployment technique works

In the blog post, there is a topic about software deployment that covers a few important techniques that might be useful for developers to ensure backward compatibility between evolving versions of the same software. The author emphasized, though, that they will “only work under two conditions”, one of them is applying it to a “brand new software projects”.

Reading the article I couldn’t notice a single technique that could not be applied to old software. I had the opportunity to adopt those techniques myself in 2013 when I was hired by Ibratan to redesign their primary software. It was mainly written in C and COBOL and, by adopting a combination of Feature Toggle and Canary Deployment, I was able to fix some undesirable behaviours the software had and introduce a new API layer written in Java 8.

Personally, there is no such thing as Old Project, or Legacy Software if you will. There are only Well Written Software and Poorly Written Software, and it is possible to adopt any technique you want in both cases. Arguably, you might see a greater benefit in adopting those techniques in a poorly written software as they usually demand more maintenance.

Canary Release vs Blue/Green Deployment

Perhaps this is not directly related to the blog post itself, but to a universal feeling that Canary Deployment and Blue/Green Deployment are the same things. Despite their similarities, it is important to distinguish them apart as they introduce different benefits to our deployment pipeline.

The term Blue/Green Deployment was first introduced ages ago, having been carved by Daniel Terhorst-North and Jez Humble by early 2010s. The fundamental idea is to have two easily switchable environments to switch between, allowing the software to be pre-released and tested on a deployment environment similar to the production. Once considered stable, a switch mechanism takes place, redirecting user’s request traffic to the just deployed software.

The switch mechanism may vary depending on the business expectation (e.g. high availability SLAs) or different technical needs (e.g. run smoke tests before release) might you have. One of them is the Canary Release. It “is a technique to reduce the risk of introducing a new software version in production by slowly rolling out the change to a small subset of users before rolling it out to the entire infrastructure and making it available to everybody” ¹.

By introducing the ability to analyse the impact of the just-released software with real data coming from the production request stream, Data Analysts and Software Architects can measure the impact the new release will have, rolling it back if the results are not satisfactory. Blue/Green Deployment, on the other hand, is closely related to the deployment itself, therefore it focuses mostly on the technical side of it - namely high-availability and easy rollback.

Final Thoughts

I’d like to stress that the richness of the blog post content should not be blurred away despite the two topics that I covered. Gather all the information needed to compose such post is no easy task and might take precious ours to compose and wrap it in a way that his audience might enjoy. Perhaps a few links in the original to an external content could be enough to clarify the points I’ve made here, although writing down a few paragraphs helped to keep my understanding of those concepts fresh.

Canary Release by Danilo Sato. ↩

Analysis: Decomposing a Monolith

2020-05-19T00:00:00+10:00

This an opinionated transcription of a talk that Sam Newman has presented in the 2019 GOTO Conference, Berlin edition. There were several points on his presentation that, I think, were spot on and deserves to be transcribed and better explained. There is a bit of a debate about whether or not to start from a Monolith and then move to a Microservice Architecture. Understanding how decomposing monolith might not only shed a light on this but also made us understand the benefits of doing it gradually.