[{"content":"Bio I am a software engineer who strives for simplicity! I enjoy building lasting and impactful things! I am passionate about building highly available, scalable, and autonomous systems.\nI am experienced in:\nDistributed Systems Cloud Computing Automation Observability Manifesto My greatest qualities as an engineer are:\nHow I treat other people How I work and collobarate as part of a team We don\u0026rsquo;t need a title to be leaders. Everyone should be a leader.\nLeaders eat last Leaders ask questions Values 💡 Empathy 💡 Listening 💡 Integrity 💡 Ownership 💡 Collaboration 💡 Humility 💡 Pragmatism 💡 Helping Others Do\u0026rsquo;s ✅ Ask the right questions ✅ Simplify and clean ✅ Solve a problem every day ✅ Work smart by automating ✅ Make it easy to understand, maintain, and collaborate ✅ Start with UX/DX ✅ Build just enough ✅ Fix the root cause ✅ Make lasting changes ✅ One step at a time ✅ Make work fun! Don\u0026rsquo;ts ❌ Over-engineering ❌ Over-abstraction ❌ Leaky abstractions ❌ Premature optimizations ","permalink":"https://milad.dev/about/","summary":"Bio I am a software engineer who strives for simplicity! I enjoy building lasting and impactful things! I am passionate about building highly available, scalable, and autonomous systems.\nI am experienced in:\nDistributed Systems Cloud Computing Automation Observability Manifesto My greatest qualities as an engineer are:\nHow I treat other people How I work and collobarate as part of a team We don\u0026rsquo;t need a title to be leaders. Everyone should be a leader.","title":"About Me"},{"content":"TL;DR Seek to understand the context of the change (what, why, and how). Seek to understand the author’s perspective. Check out the branch and test the changes locally on your own. Consider using the Conventional Commenting style to better convey your intent. Try to be thorough in your reviews to reduce the number of iterations. Ensure the author is clear on what is required from them to do. Communicate which ideas you feel strongly about and which you don\u0026rsquo;t. Offer alternative implementations, but assume the author has already considered them (ask questions). Look for opportunities to simplify the code and improve its readability. If you don’t understand a piece of code, ask for clarification (other people might have the same question). It can be helpful to post a summary comment after a round of review comments. Read More GitLab\u0026rsquo;s Code Review Guidelines Google\u0026rsquo;s Code Review Developer Guide ","permalink":"https://milad.dev/gists/code-review/","summary":"TL;DR Seek to understand the context of the change (what, why, and how). Seek to understand the author’s perspective. Check out the branch and test the changes locally on your own. Consider using the Conventional Commenting style to better convey your intent. Try to be thorough in your reviews to reduce the number of iterations. Ensure the author is clear on what is required from them to do. Communicate which ideas you feel strongly about and which you don\u0026rsquo;t.","title":"Code Review Guidelines"},{"content":"TL;DR Nix is a purely functional package manager. It treats packages like values in purely functional programming languages. Packages are built by functions that do not have side-effects, and they never change after they have been built.\nConcepts Everything on your computer implicitly depends on a whole bunch of other things on your computer. Your computer is trusted to have acceptable versions of acceptable libraries in acceptable places. Nix removes these assumptions and makes the whole graph explicit.\nAll software exists in a graph of dependencies. Most of the time, this graph is implicit. Nix makes this graph explicit. The Nix Store Nix stores packages in the Nix store, usually the directory /nix/store, where each package has its own unique subdirectory. This directory represents a graph. Each file or directory is a node. The relationships between them constitute edges. Only the Nix itself can write to /nix/store directory. Once Nix writes a node into this graph database, it is completely immutable forever after. Nix guarantees that the content of a node does not change after it is been created. An Edge directed from a node is a dependency. If a node includes a reference to another node, it depends on that node. The transitive closure includes dependencies of dependencies as well. Command Description nix-store --query --references Shows all immediate dependencies for a store path (edges directed from a node). nix-store --query --requisites Shows all direct and indirect dependencies for a store path (transitive closure of all edges directed from a node). nix-store --query --referers Shows all store paths that directly depend on a store path (edges directed to a node). nix-store --query ----referrers-closure Shows all store paths that directly or indirectly depend on a store path (transitive closure of all edges directed to a node). Derivations A derivation is a recipe to build some other path in the Nix Store. A derivation is a special node (*.drv) in the Nix store, which tells Nix how to build one or more other nodes. It is a special format written and read by Nix, which gives build instructions for anything in the Nix store. Everything required to build this derivation is explicitly listed in the file by path (outputs, inputDrvs, inputSrcs, platform, builder, args, env). Everything, except derivations themselves, in the Nix store is put there by building a derivation. The hash in a derivation path is a hash of the content of the derivation file. If a dependency of the derivation changes, that changes the hash of the derivation. It also changes the hashes of all of that derivation\u0026rsquo;s outputs. Changing a dependency bubbles all the way down, changing the hashes of every derivation and all those derivation\u0026rsquo;s outputs that directly or indirectly depend on. Sandboxing A derivation build simply cannot access anything not declared by the derivation. Nix uses patched versions of compilers and linkers that do not try to look in the default locations. Nix builds derivations in an actual sandbox that denies access to everything that the build is not supposed to access. The Nix Language The Nix language is lazy-evaluated and free of side-effects. The Nix Language is just a Domain Specific Language for creating derivations. The Nix Language does not build anything. It creates derivations, and later, other Nix tools read those derivations and build the outputs. Read More NixOS nix.dev What is Nix ","permalink":"https://milad.dev/gists/what-is-nix/","summary":"TL;DR Nix is a purely functional package manager. It treats packages like values in purely functional programming languages. Packages are built by functions that do not have side-effects, and they never change after they have been built.\nConcepts Everything on your computer implicitly depends on a whole bunch of other things on your computer. Your computer is trusted to have acceptable versions of acceptable libraries in acceptable places. Nix removes these assumptions and makes the whole graph explicit.","title":"What is Nix?"},{"content":"Recently during an interview, I was asked a question about how much I know about security. At first, I paused for a few seconds because honestly, I didn\u0026rsquo;t know how to answer the question. Eventually, I answered as a developer I am making sure I am doing this, doing that, and following these best practices! After my interview, I was telling myself that I should know about the security best practices for developers and engineers. When I am working on something, I should have a set of principles, guidelines, and considerations on my mind and follow them. As a result, I decided to do a little bit of research and prepare a cheatsheet!\nSecurity by Design Principles Minimize Attack Surface Area Every new feature added to an application increases the overall risk to the application by providing more opportunities for attackers.\nMake sure you:\nBuild the minimum required set of features. Expose the minimum amount of data needed. Restrict access to the minimum number of users. Establish Secure Defaults An application must be secure and safe by default. For delivering an out of the box experience, the default settings should be secure. Users may be allowed to turn off some of the security requirements, but by default, a high-level security level is enabled.\nPrinciple of Least Privilege The Principle of Least Privilege (POLP) requires a user to have the minimum required permissions to perform a given task. This includes every aspect from application-specific permissions to processor, memory, network, and other permissions.\nPrinciple of Defend in Depth The principle of defense in depth states layered security mechanisms improve the security of the system as a whole. If one layer fails to defend against an attack, hopefully, other layers will protect the system or at least mitigate the consequences.\nFail Securely An application must securely handle errors and unexpected issues.\nThere are three possible outcomes from a security measure:\nAllowing the operation Disallowing the operation Error/Exception/Failure In case of an error, exception, or failure the security measure should follow the same execution path as disallowing the operation. An error, exception, or failure should not leave the system in an unsecured state or enable an operation that is not supposed to be allowed.\nErrors or exceptions related to application and business logic should follow a secure execution path. For example, they should not cause a security check not to be performed or be performed with bad values.\nLast but not least, Failures should not provide users with additional privileges or sensitive information.\nDo NOT Trust Services Never trust external services from a security perspective. Validate, verify, and secure all data and information returned from a service provider.\nSeparation of Duties Separation of duties of segregation of duties is about have more than one entity to complete a task. Separation of duties increases protection from fraud and errors. The entity that approves a task should be separate from the entity that performs the task and they should be both different from the entity that verifies the task.\nAvoid Security by Obscurity Security through obscurity is relying on secrecy in design and implementation for achieving security. NEVER rely upon security by obscurity as a means for securing a system. There should be enough security controls in place to keep an application safe without needing to hide the architecture, functionality, or source code.\nKeep Security Simple Avoid using complicated security controls and sophisticated architectures for securing applications. Complex mechanisms can increase the attack surface area and the security risk.\nFix Security Issues Correctly Once a security vulnerability is found, it is important to develop tests for it and understand the root cause of it properly. It is also very important to make sure that all instances of a given security issue is fixed across the entire application.\nCommon Vulnerabilities and Attacks Denial of Service (DoS) The Denial of Service (DoS) attack is about exhausting a service to make it unavailable for other users. There are many ways to make a service unavailable for legitimate users. For example, if a service receives unexpectedly a large number of requests, it may become unavailable.\nA Distributed Denial of Service (DDoS) attack is a type of DoS attack that comes from many distributed sources such as botnets.\nYou can read more about this attack here:\nhttps://owasp.org/www-community/attacks/Denial_of_Service https://www.cloudflare.com/learning/ddos/glossary/denial-of-service https://www.cloudflare.com/learning/ddos/what-is-a-ddos-botnet Query Injection Query injection is a code injection attack and the most common type of it is SQL injection. The attacker crafts a special query and sends it to the server through an entry field. The user-provided input changes the behavior of the query being executed on the server-side. The attacker can retrieve, manipulate, or destroy unauthorized data as well as execute admin operations on the database.\nYou can protect your application against this attack by:\nSanitizing and validating any user input. Using prepared statements for parameterized queries. Escaping all user-provided data. Enforce the principle of least privilege for executing the queries. You can read more about this attack here:\nhttps://owasp.org/www-community/attacks/SQL_Injection https://www.cloudflare.com/learning/security/threats/sql-injection Cross-Site Scripting (XSS) Cross-Site Scripting (XSS) attacks belong to code injection category of attacks.\nThe attacker uses an exploit in a trust web application and embeds malicious code into it. The victim runs the malicious code in her browser by visiting the trusted website. The malicious code steals sensitive information such as cookies and send them to a destination controlled by the attacker. This attack usually comes in a few different forms\nPersistent XSS The malicious code/script gets stored permanently in the trusted website\u0026rsquo;s database. Reflected XSS The malicious code/script comes from the request that the victim sent. To prevent from XSS attack:\nSanitize, validates, and verifies every user input both on server-side and client-side. You can read more about this attack here:\nhttps://excess-xss.com https://owasp.org/www-community/attacks/xss https://www.cloudflare.com/learning/security/threats/cross-site-scripting Cross-Site Request Forgery (CSRF) Cross-Site Request Forgery (CSRF) works by tricking a user into invoking a request from a web application that the user is already authenticated with. This request is usually a state-changing request as opposed to stealing data. The attacker cannot see the response to the forged request. An example of this attack could be tricking a user into requesting from her bank\u0026rsquo;s website to transfer some funds to the attacker with the help of some social engineering.\nHere are the steps:\nThe attacker forges a request. The attacker embeds the forged request on a website, email, etc. The victim unwittingly triggers the forged request. The target web application receives the request and fulfills it as a legitimate request made by the authenticated user. CSRF attacks are hard to be completely prevented due to their nature. Different HTTP verbs are handled differently by browsers, thus they have different levels of vulnerabilities to CSRF attacks. As a result, each HTTP verb requires a different kind of protection strategy against the CSRF attack.\nYou can read more about this attack and its mitigation solutions here:\nhttps://owasp.org/www-community/attacks/csrf https://www.cloudflare.com/learning/security/threats/cross-site-request-forgery Other Vulnerabilities and Attacks Buffer Overflow Buffer overflow is a class of exploit in which a program writes some data to a buffer more than it can hold. A buffer is a continuous block of memory and when overflowed, the excessive data will be written into other parts of memory. The attacker can cause the program to crash or execute malicious code.\nBuffer overflow attacks come in different forms. Stack-based and Heap-based buffer overflows are the most well-known. C and C++ are more vulnerable to this exploit as they don\u0026rsquo;t have any built-in protection against accessing or overwriting out-of-bound data in memory.\nYou can read more about this vulnerability here:\nhttps://owasp.org/www-community/vulnerabilities/Buffer_Overflow https://www.cloudflare.com/learning/security/threats/buffer-overflow DNS Cache Poisoning DNS Cache Poisoning or DNS Spoofing is an attack in which false information will be placed in a DNS cache, so that DNS queries return incorrect responses and users will be directed to the wrong and possibly malicious websites. The incorrect DNS information will remain in the cache until time-to-live is expired or the cache is purged. This attack is possible because in original DNS protocol, there is no mechanism for verifying identities and establishing trust between parties. New mechanisms are suggested to fix this vulnerability (DNSSEC, DNS over HTTPS, DNS over TLS).\nYou can read more about this attack here:\nhttps://owasp.org/www-community/attacks/Cache_Poisoning https://www.cloudflare.com/learning/dns/dns-cache-poisoning https://www.cloudflare.com/learning/dns/dns-security https://www.cloudflare.com/learning/dns/dns-over-tls HTTP Request Smuggling HTTP Request Smuggling or HTTP Desync attack exploits different interpretations of a stream of non-standard HTTP requests between the attacker (client), HTTP proxies, caches, and the HTTP server itself. The attacker takes advantage of how a stream of HTTP requests can be interpreted differently at various HTTP layers. The attacker can smuggle a malicious HTTP request through an HTTP intermediary to the server.\nYou can read more about this attack and its defenses here:\nhttps://portswigger.net/web-security/request-smuggling https://i.blackhat.com/USA-20/Wednesday/us-20-Klein-HTTP-Request-Smuggling-In-2020-New-Variants-New-Defenses-And-New-Challenges.pdf https://i.blackhat.com/USA-20/Wednesday/us-20-Klein-HTTP-Request-Smuggling-In-2020-New-Variants-New-Defenses-And-New-Challenges-wp.pdf HTTP Response Splitting HTTP Response Splitting exploits a vulnerability in which user-defined data are sent to a web application from an untrusted source and the web application includes the malicious data in an HTTP response header without validating it for malicious characters (newline).\nYou can read more about this attack here:\nhttps://owasp.org/www-community/attacks/HTTP_Response_Splitting https://blog.detectify.com/2019/06/14/http-response-splitting-exploitations-and-mitigations Read More Security by Design Principles Vulnerabilities Attacks Web Application Security Identity and Access Management (IAM) ","permalink":"https://milad.dev/posts/security-for-devs/","summary":"Recently during an interview, I was asked a question about how much I know about security. At first, I paused for a few seconds because honestly, I didn\u0026rsquo;t know how to answer the question. Eventually, I answered as a developer I am making sure I am doing this, doing that, and following these best practices! After my interview, I was telling myself that I should know about the security best practices for developers and engineers.","title":"Security For Developers"},{"content":"TL;DR A Rule is a set of conditions triggering a set of actions (if \u0026lt;conditions\u0026gt; then \u0026lt;action\u0026gt;). A domain expert models the domain knowledge (buisness rules) by defining the set of all the rules. Rules are usually defined using a domain-specific lanaguage also known as DSL. Using these sets of rules, we can build an expert system that can make decisions on behalf of domain experts. A rules engine is in the core of an expert system. Data constantly come through the system in streams or in batches. The rules engine decides when to evaluate which rules (evaluation can happen on-demand or in cycles). The order in which rules are defined does not matter. The rules engine decides which rules should be evaluated in what order. Chaining happens when the action part of one rule changes the state of the system and the conditions of other rules which can lead to triggering other actions as well. Chaining makes it very hard to reason about and debug the system. An inference engine applies logical rules to the knowledge base to infer new information from known facts. Inference engines usually proceed in two modes: forward chaining and backward chaining. There are some challenges with rule-based systems and expert systems: The sets of rules need to be defined and maintained by domain experts. An expert system is just as capable and precise as the rules are. For performance reasons, there are limitations applied to rule definitions (no. of rules, no. of conditions, cardinality of dimensions, etc.). Read More Rules Engine Domain Specific Languages What is Rule-Engine? ","permalink":"https://milad.dev/gists/rules-engine/","summary":"TL;DR A Rule is a set of conditions triggering a set of actions (if \u0026lt;conditions\u0026gt; then \u0026lt;action\u0026gt;). A domain expert models the domain knowledge (buisness rules) by defining the set of all the rules. Rules are usually defined using a domain-specific lanaguage also known as DSL. Using these sets of rules, we can build an expert system that can make decisions on behalf of domain experts. A rules engine is in the core of an expert system.","title":"What is a Rules Engine?"},{"content":" Ad placement is a multi-objective optimization (MOO) problem. There are many factors needed to be taken into account and be optimized:\nRelevancy Advertiser Value User Value User Experiencce Retention Fairness Basket Size and more \u0026hellip; The Marketing Funnel ___________________________ \\ Awareness / \\-----------------------/ \\ Consideration / \\-------------------/ \\ Conversion / \\ Loyalty / \\ Advocacy / \\___________/ Glossary Term Description Marketing The set of activities to attract people to products or services. Promotion Promotions are one kind of marketing to encourage conversion. Ads Advertising or Ads are another kind of marketing. Surface A medium or place for showing ads. Inventory The ad slots being offered to advertisers. Impression Impression is defined as the ad being in the viewport for a period of time. Conversion An action that is valuable to business (i.e. purchase, call). RTB Real-time bidding is the process of offering ad slots to advertisers through an online auction. Goal The goal that the ad tries to achieve. Targeting The group of users who are eligible for the add. Copy The text of the add. Creative The visual component of the add (i.e. image, video). Campaign A unit of advertising that includes a goal, targeting, budget, and the set of copies and creatives. Actors Term Description Advertiser Provides demand, may work via third-party agencies, generate revenue in the ads marketplace. Publisher Provides supply through surfaces for displaying ads. Exchange Matches ads demand with ads supply, acts as a mediator between publishers and advertisers, and provides value-added services (i.e. analytics). Demand-Side Platform (DSP) Bids on one or more exchanges on behalf of advertisers. Can also prevent partner advertisers from competing with each other. Supply-Side Platform (SSP) Offers ads inventory to one or more exchanges on behalf of publishers. Ad Stack Campaign Management Analytics and Insights Ad Exchange Budget and Pacing Ad Ranking/Pricing Billing Forecasting Fraud Detection Ad Rendering \u0026hellip; Ad Exchange Operations Ad Exchange will provide a marketplace for advertisers and publishers. It matches supply with demand in real-time by maximizing ROI for advertisers and minimizing the impact on user experience.\nAuction Every time an ad surface is available, a request is sent to the exchange, possibly through an SSP, to display an ad. Depending on the surface and audience (viewing users), a subset of ads will be eligible for display. The exchange will determine the best ad to be placed on the surface. The definition of best depends on a variety of factors (bid price, relevancy, platform value, quality core, etc.). The exchange may pay the publisher by specifying the conditions for compensation and price. Intermediaries (DSP, SSP, exchange, etc.) will each receive a cut for their role in the marketplace. Auction Theory First-Price Auction All bidders simultaneously submit sealed bids. The highest bidder pays the price that was submitted. Second-Price (Vickery) Auction Bidders submit sealed bids without knowing about others\u0026rsquo; bid. The highest bidder wins but the price paid is the second-highest bid (plus 1 cent). This model incentivizes bidders to bid their true value. Ad Ranking/Pricing When there are multiple ad slots available, we need to decide what ad appears in each slot. We assign a rank no. (position) to each ad. This decision can be made based on a variety of factors: Bid price Ad relevancy (Relevance Score) Ad quality (Quality Score) \u0026hellip; The rank can be used to decide the billing price for that rank position. Compensation Method Description Fixed Price The advertiser pays a fixed amount for a certain no. of placements or over a period of time. Using this method, there may be no auction. Cost Per Thousand Impressions (CPM) The advertiser pays for each impression regardless of if the user takes any action or not. Cost Per Click (CPC) The advertiser pays each time a user clicks on the add. This is the most common method. Cost Per View (CPV) The advertiser pays each time a user views the ad video. Viewing can be defined as watching for 10 seconds or more for example. Cost Per Action (CPA) The advertiser pays each time a user performs an action of interest (purchasing, installing an app, etc.). Cost Per Engagement (CPE) The advertiser pays for each user engagement (like, retweet, etc.). Demand-Side Platform (DSP) Operations Audience Targeting Advertisers can reach out to different segments of users by specifying various targeting criteria. Tagerting capabilities usually include the following:\nLocation Targeting Demographic Targeting Behavioral Targeting Contextual Targeting Retargeting Pacing We can think of pacing in various ways: Pacing defines how ads should deliver over the schedule of the campaign. Pacing controls how a campaign spends its budget. It helps the budget to meet the goals of bidding strategy and vice versa. Pacing defines the goals of a campaign so that the campaign hits a desired metric over the course of its lifetime. Ad Delivery Curve is defined by pacing and sets the goals of the campaign (hourly goals, daily goals, etc.). Pacing is needed for constraints such as dayparting, blackouts periods, holidays, etc. Budget Pacing is required to prevent campaign underdelivery and overdelivery. Campaign Underdelivery Loss of money (unused budget) Loss of money (advertisers may reduce ad spend) Campaign Overdelivery Dilutes cost metrics (CPM, CPC, \u0026hellip;) Loss of money (advertisers get free credit and may reduce ad spend) Loss of money (ads that overdeliver prevent other ads from delivering) Pacing (Ad Delivery Curve) can be: Naive: based on an arbitrary distribution (i.e. uniform distribution). Traffic-aware: historical traffic distribution defines the ad delivery curve. Custom: advertiser defines the goals. For pacing to work and staying on the ad delivery curve, there are different approaches: Greedy: spend the budget as fast as possible while the goal is not met. ε-Greedy: optimizes the greedy approach by dropping an ad with probability ε. Model-based: PID Controller, Model predictive control, etc. There are two approaches for tracking ongoing delivery and correcting it: Proactively (biased for underdelivery): assign quota upfront and serve ad if it has not run out of quota. Reactively (biased for overdelivery): count delivery via events and throttle once the threshold is met (goal is met). User Tracking Advertisers want to track their customer journeys and re-engage with them.\nMethod Description Pixel Ad exchanges rely on long-lived cookies to be present. Publishers can embed a snippet of code from the exchanges to track users. Session Token A session token stored in the session cookie can be leveraged to track users across a single session on the publisher\u0026rsquo;s surface. Mobile App Mobile apps leverage either signed-in users or device unique identifiers. Marketing pixels or tracking pixels are tiny snippets of code to gather information about visitors on a website. They can be used for targeting users, measuring a marketing campaign\u0026rsquo;s performance, track conversions, and build a targeted audience base. Retargeting Pixels are used for tracking user behavior to tailor ads that they are interested in. Conversion Pixels are used for tracking sales to identify the source of conversions and measure the success or failure of ad campaigns.\nAnalytics Performance Metrics Metric Description Ad Spend Amount of money spent on an ad campaign. CPM, CPC, CPV, CPA, CPE Described above. Click Through Rate (CTR) The ratio of impressions that result in a click. Return On Ad Spend (ROAS) The attributed value of sales over the cost of the advertising used for selling. Conversion Rate The ration of impressions that result in a sale. Attribution Marketing attribution is the practice of determining which marketing channels and advertisements are contributing to sales or conversions.\nModel Description First-Touch Attributes the conversion to the first ad that the user has interacted with it. Last-Touch Attributes the conversion to the last ad that the user has interacted with it. Multi-Touch Attributes the conversion equally to all ads that the user has interacted with them. Weighted Multi-Touch Attributes the conversion differently to the ads based on how the user has interacted with each of them. If multiple ads were involved in the customer journey, each may take 100% credit for the sale which results in an attributed value greater than the value of the sale.\nReading More Google Ads Glossary How does online tracking actually work? Dayparting Vickrey Auction Vickrey–Clarke–Groves Auction Building Advertising Platforms (Overview) How to Build a Production Ad Platform Prebid ","permalink":"https://milad.dev/posts/ad-platform/","summary":"Ad placement is a multi-objective optimization (MOO) problem. There are many factors needed to be taken into account and be optimized:\nRelevancy Advertiser Value User Value User Experiencce Retention Fairness Basket Size and more \u0026hellip; The Marketing Funnel ___________________________ \\ Awareness / \\-----------------------/ \\ Consideration / \\-------------------/ \\ Conversion / \\ Loyalty / \\ Advocacy / \\___________/ Glossary Term Description Marketing The set of activities to attract people to products or services.","title":"What is an Ad Platform Made of?"},{"content":"TL;DR Stream Processing is a big data paradigm. In Batch Processing, We need to have all data stored ahead of time. We process data in batches. We aggregate the results across all batches at the end. Batch processing tries to process all the data at once. In Stream Processing, Data come as a never-ending continuous stream of events. Stream processing naturally fits with time series data. Data are processed in real-time and we can respond to the events faster. Stream processing distributes the processing over time. You can use stream processing when: When the data is huge and cannot be stored, stream processing is the only solution. Stream processing works when processing can be done with a single pass over the data or has temporal locality. Stream processing fits into use-cases where approximate results are sufficient. You should not use stream processing when: Processing needs multiple passes through full data. Processing needs to have random access to data. Examples: training machine learning models, etc. When doing stream processing using a message broker: We use a message broker system (Kafka, NATS, RabbitMQ, etc.). We create applications and write code to receive messages, do some calculations, and publish back results (actors). When using a stream processing framework (Flink, Kafka Streams, etc.): We only write the logic for actors. We connect the actors and data streams. A stream processor will take care of the hard work (collecting data, running actors in the right order, collecting results, scaling, and so on). Streaming SQL allows users to write SQL-like statements to query streaming data. A window is a working memory on top of a stream. The most common types of streaming windows: Sliding Length Window Keeps last N events and triggers for each new event. Batch Length Window Keeps last N events and triggers once for every N event. Sliding Time Window Keeps events triggered at last N time units and triggers for each new event. Batch Time Window Keeps events triggered at last N time units and triggers once for the time period in the end. Event Sourcing is an architectural pattern built on top of stream processing. Changes to an application state are stored as a sequence of events. These events can be replayed and queried to reconstruct the state of the application at any time. Read More A Gentle Introduction to Stream Processing Stream Processing 101: A Deep Look at Operators Event Sourcing Plumbing At Scale Keystone Real-Time Stream Processing Platform Event Sourcing, CQRS, Stream Processing and Apache Kafka The Data Dichotomy: Rethinking the Way We Treat Data and Services Build Services on a Backbone of Events Using Apache Kafka as a Scalable, Event-Driven Backbone for Service Architectures Chain Services with Exactly-Once Guarantees Messaging as the Single Source of Truth Leveraging the Power of a Database Unbundled Building a Microservices Ecosystem with Kafka Streams and KSQL Toward a Functional Programming Analogy for Microservices ","permalink":"https://milad.dev/gists/stream-processing/","summary":"TL;DR Stream Processing is a big data paradigm. In Batch Processing, We need to have all data stored ahead of time. We process data in batches. We aggregate the results across all batches at the end. Batch processing tries to process all the data at once. In Stream Processing, Data come as a never-ending continuous stream of events. Stream processing naturally fits with time series data. Data are processed in real-time and we can respond to the events faster.","title":"Stream Processing"},{"content":"TL;DR OAuth 2.0 OAuth 2.0 is used for authorization. Terminology: Roles: Client: the application that wants to access the data. Confidential Clients: the clients with the ability to maintain the confidentiality of the client_secret. Public Clients: the clients that cannot maintain the confidentiality of the client_secret. Resource Owner: the user who owns the data. Resource Server: the system that authorizes access to the data. Authorization Server: the system which has the data that the client wants to access. Configurations: Redirect URI Response Type Scope Endpoints: Authorization Endpoint Token Endpoint Resource Endpoint Tokens: Access Token: the token that is used when making authenticated API requests. Refresh Token: the token that is used to get a new access token when the access token expires. Channels: Back Channel (highly secure communication channel) Front Channel (less secure communication channel) Authorization Grant Flows: Authorization Code (front channel + back channel) Use-Case: applications with back-end and front-end Implicit (front channel only) Use-Case: Sinlge-Page App (no backend) Resource Owner Password Credentials (back channel only) Use-Case: Legacy Client Credentials (back channel only) Use-Case: Service to Service Communication (backend-only) OpenID Connect OpenID Connect (OIDC) is used for authentication. OIDC is an extension on top of OAuth for authentication use-caes. Additions: ID Token: a token that has some of the user\u0026rsquo;s information. User Endpoint: the endpoint for getting more information about the user. Standard Scopes etc. JWT JSON Web Token (JWT) is an open standard for encoding and transmitting information. JWT is a common format for OAuth 2.0 and OIDC tokens. Anatomy: base64(header).base64(payload).\u0026lt;signature\u0026gt; Header Type: JWT Signing Algorithm: HS256, RSA Payload Registered Claims Public Claims Private Claims Signature signing_algorithm(base64(header) + \u0026quot;.\u0026quot; + base64(payload), secret) Extensions: JWK (JSON Web Key): a JSON object that represents a cryptographic key. JWKS (JSON Web Key Set): a set of keys which contains the public keys for verifying an issued JWT. PKCE Proof Key for Code Exchange is an extension to authorization code flow for mobile apps to mitigate the risk of having the authorization code intercepted. Tools: https://jwt.io https://oauthdebugger.com https://oidcdebugger.com Read More Specs The OAuth 2.0 Authorization Framework OpenID Connect Core 1.0 OpenID Connect Discovery 1.0 OpenID Connect Dynamic Client Registration 1.0 JSON Web Token (JWT) JSON Web Key (JWK) Reads An Illustrated Guide to OAuth and OpenID Connect Talks OAuth 2.0 and OpenID Connect in Plain English Securing APIs and Microservices with OAuth and OpenID Connect Microservice Authentication and Authorization ","permalink":"https://milad.dev/gists/oauth-oidc/","summary":"TL;DR OAuth 2.0 OAuth 2.0 is used for authorization. Terminology: Roles: Client: the application that wants to access the data. Confidential Clients: the clients with the ability to maintain the confidentiality of the client_secret. Public Clients: the clients that cannot maintain the confidentiality of the client_secret. Resource Owner: the user who owns the data. Resource Server: the system that authorizes access to the data. Authorization Server: the system which has the data that the client wants to access.","title":"OAuth 2.0 and OpenID Connect"},{"content":"TL;DR Microservices architecture encompasses a few services to thousands of services that communicate with each other through APIs. Microservices should NOT introduce any breaking changes to their APIs. Every change in one microservice should be tested against other microservices that rely on it. There are two approaches for integration testing in a microservices architecture: Replica Environments (Parallel Testing) Creating a copy of the production environment for handling test traffic (integration or staging environment). This environment is completely isolated from the production environment, always up and running, and usually operating at a smaller scale. Every new change will be first deployed and tested in this environment and only then it will be deployed to the production. This approach is falling short in the following ways: Extra Cost: there is going to be an extra operational cost. Data Synchronization: Keeping data in sync between the production environment and the staging environment is a challenge. Dependency Synchronization: Keeping the external dependencies identical is also a challenge. Unreliable Testing: As the two environment deviates, the results of tests become less reliable. Inaccurate Performance Testing: Running different types of performance testing such as load testing, capacity testing, etc. is another challenge in a separate environment. Multi-Tenant Environments (Testing in Production) Making the production system multi-tenant. Multi-tenancy requires: Traffic Routing based on the kind of traffic. Isolation and Fairness for both data-in-transit and data-at-rest. Production traffic and test traffic can co-exist by making every service able to handle both kinds of traffics. Test traffic goes to service-under-test (SUT) and production traffic stays unaffected by test traffic. Multi-tenancy paves the path for the following capabilities as well: A/B Testing Advanced Deployments: Blue-Green deployments, _Rolling deployments, and Canary deployments, etc. Record/Replay and Shadow Traffic: Replaying previously captured live traffic or replaying a shadow copy of live production traffic Tenancy-Oriented Architecture Tenancy should be treated as a first-class object and the notion of tenancy should be attached to both data-in-flight and data-at-rest. Ideally, we want services to not deal with tenancy explicitly. Context A tenancy context should be attached to every execution sequence. Context Propagation Tenancy context should always be passed from one service to another service. Tenancy context should also be included in the messages in messaging queues. Tenancy context should also be attached to data-at-rest. Context propagation can be achieved using tools like OpenTelemetry and Jaeger. Tenancy-Based Routing We should route requests based on their tenancy. In general, tenancy-based routing can be achieved either at the egress or at the ingress of services. Service Mesh tools such as Envoy, Istio, etc. can be leveraged for tenancy-based routing. Data Isolation The storage infrastructure needs to take tenancy into account and create isolation between tenants. There are two high-level approaches: Embed the notion of tenancy with the data and co-locate data with different tenancies. Explicitly separate out data based on the tenancy at service-level or using an abstraction layer. Configurations should also be multi-tenant. Database Tenancy Patterns Standalone Single-Tenant App with Single-Tenant Database The whole stack will be spun up repeatedly for each tenant. This model provides the strongest tenant-isolation at infrastructure, application, and database levels. Multi-Tenant App with Database-per-Tenant The application is tenant-aware and there is going to be a single-tenant database per tenant. In this model, we need a catalog to map tenants to corresponding databases. This model offers tenant-isolation at the database level. Multi-Tenant App with a Single Multi-Tenant Database In this model, the database is also tenant-aware. Depending on the underlying database implementation, data for different tenants could be either co-located, isolated, or encrypted-per-tenant. Multi-Tenant App with Sharded Multi-Tenant Databases This model is a combination of the previous two models. Instead of having a single-tenant database per tenant, we will have a multi-tenant database per shard of tenants. This model provides better scalability. Read More Why We Leverage Multi-Tenancy in Uber’s Microservice Architecture Multi-tenant SaaS Database Tenancy Patterns ","permalink":"https://milad.dev/gists/multi-tenancy-in-microservices/","summary":"TL;DR Microservices architecture encompasses a few services to thousands of services that communicate with each other through APIs. Microservices should NOT introduce any breaking changes to their APIs. Every change in one microservice should be tested against other microservices that rely on it. There are two approaches for integration testing in a microservices architecture: Replica Environments (Parallel Testing) Creating a copy of the production environment for handling test traffic (integration or staging environment).","title":"Multi-Tenancy in Microservice Architecture"},{"content":"sed is a stream editor command available on Unix-compatible systems. sed is quite a powerful tool, but the learning curve is also high comparing to other similar tools such as grep or awk. Almost every time I want to do something with sed, I need to look it up and search for some examples. So, I decided to compile a concise tutorial for sed that covers the most common use-cases.\nWith sed, you usually specify a few options and a script and feed it with an input file.\nsed \u0026lt;options\u0026gt; \u0026lt;script\u0026gt; \u0026lt;input_file\u0026gt; Options Here are some options for sed command that you most likely need to know about them.\nOption Description -i Edits the input file in-place. -e Specifies the scripts for editing. -n Suppresses printing each line of input. Commands Here are some common commands that you may use in sed scripts:\nCommand Description 1 Applies a command to only to the first occurrence. 2 Applies a command to only to the second occurrence. g Global applies a command to every occurrence. i Matches in a case-insensitive manner. p Prints the matching patterns to standard output. d Deletes the matching patterns from output or input file. s/regexp/replacement/ Replaces a regexp instance with the replacement. Regular Expressions Pattern Description ^ Matches the beginning of lines. $ Matches the end of lines. . Matches any single character. * Matches zero or more occurrences. [] Matches a class of characters. Examples Example Description sed -n '2p' input.txt Shows a single line by line number. sed -n '2!p' input.txt Shows all lines except one line number. sed -n '2p;4p' input.txt Shows multiple lines by line numbers. sed -n '2,4p' input.txt Shows multiple lines by a range. sed -n '2,4!p' input.txt Shows all lines except a range of lines. sed -n '2,$p' input.txt Shows all lines after a line number. sed -n '2,$!p' input.txt Shows all lines before a line number. sed -i '2d' input.txt Deletes a particular line in-place. sed -i '2,4d' input.txt Deletes a range of lines in-place. sed -i '/regex/d' input.txt Deletes all lines matching regex in-place. sed -i '/regex/,$d' input.txt Deletes all lines after a line matching regex in-place. sed -i 's/foo/bar/g' input.txt Replaces all occurrences of foo with bar in-place. Read More https://www.tutorialspoint.com/sed https://www.gnu.org/software/sed/manual/sed.html ","permalink":"https://milad.dev/gists/sed-by-examples/","summary":"sed is a stream editor command available on Unix-compatible systems. sed is quite a powerful tool, but the learning curve is also high comparing to other similar tools such as grep or awk. Almost every time I want to do something with sed, I need to look it up and search for some examples. So, I decided to compile a concise tutorial for sed that covers the most common use-cases.","title":"sed By Examples"},{"content":"awk is a domain-specific language and command for text processing available on Unix-compatible systems. gawk is the GNU AWK and all Linux distributions come with it. This is a brief tutorial for awk covering the most common use-cases.\nawk reads input line by line from a file, pipe, or stdin and executes a program on each line. An input line has a number of fields separated by white space or by regular expression FS. The fields are denoted $1, $2, and so on and $0 denotes the entire line. If FS is not set, the input line is split into one field per character.\nawk \u0026lt;options\u0026gt; \u0026lt;program\u0026gt; \u0026lt;input_file\u0026gt; Options Here are some options for awk command that you most likely need to know about them.\nOption Description -f Specifies the file containing the awk program. -F Specifies the regular expression for separating fields. Actions An action is a sequence of statements.\nAction Description print Prints data on standard output. printf Prints data on standard output. Examples Example Description awk 'BEGIN {print \u0026quot;Hello, World!\u0026quot;}' Prints Hello, World! on stdout. awk -F '\\t' '{print $2, $4}' Prints the second and fourth columns separated by tab. Read More https://www.tutorialspoint.com/awk https://www.gnu.org/software/gawk/manual/gawk.html ","permalink":"https://milad.dev/gists/awk-by-examples/","summary":"awk is a domain-specific language and command for text processing available on Unix-compatible systems. gawk is the GNU AWK and all Linux distributions come with it. This is a brief tutorial for awk covering the most common use-cases.\nawk reads input line by line from a file, pipe, or stdin and executes a program on each line. An input line has a number of fields separated by white space or by regular expression FS.","title":"awk By Examples"},{"content":"TL;DR The well-known types of performance testing are the following:\nLoad Testing Load testing is the simplest form of performance testing. It is conducted to understand the behavior of a system under a specific load. The goal of load testing is to identify performance bottlenecks in the application. Stress Testing Stress testing is carried out to understand the behavior of a system in an overload situation. The goal of stress testing is to see if the system will perform satisfactorily when the load goes well above the maximum. Sometimes, stress testing is done to identify the breaking point of the application. Endurance Testing Endurance testing (a.k.a. soak testing) involves putting a system under a significant load for an extended period of time. The goal of endurance testing is to ensure that the application maintains its expected performance over a long period of time. Spike Testing Spike testing is done by suddenly increasing the load and observing the behavior of a system. The goal of spike testing is to see if the performance of the application will suffer from sudden and dramatic changes in load. There are also other types of testing sometimes referred to as performance testing.\nCapacity Testing Both load testing and stress testing help with determining and testing the capacity of a system. Capacity testing is concerned with testing the capacity of a system to see how many requests it can handle before the performance goals are hurt. Scalability Testing Scalability testing is closely related to stress testing and spike testing. Scalability testing deals with testing a system capability to scale up or scale down. Configuration Testing Configuration testing verifies the performance of a system against different configuration changes to the system. Read More Software Performance Testing Performance Testing Tutorial: What is, Types, Metrics \u0026amp; Example ","permalink":"https://milad.dev/gists/performance-testing/","summary":"TL;DR The well-known types of performance testing are the following:\nLoad Testing Load testing is the simplest form of performance testing. It is conducted to understand the behavior of a system under a specific load. The goal of load testing is to identify performance bottlenecks in the application. Stress Testing Stress testing is carried out to understand the behavior of a system in an overload situation. The goal of stress testing is to see if the system will perform satisfactorily when the load goes well above the maximum.","title":"Performance Testing Explained"},{"content":"Comparison Matrix Scrum Kanban KPI Team velocity Cycle-time Goal Building highly reliable and predictable teams. Building flexible and resilient teams. Suitable For Consistent and predictable workloads.Mid-term and long-term Deliverables. Multiple teams at scale. Unpredictable and arbitrary workloads. Short-term and high-priority deliverables. Small and independent teams. Cadence 2-Weeks Sprints3 to 5-Sprints Milestones 1-Week Beats Roles Product Owner (PO)Scrum Master (SM)Development Team N/A Ceremonies PlanningDaily ScrumsReviewDemoRetroScrum of Scrums Daily Stand-ups Board Calendar-like board showing all days in sprint.Focusing on deadlines and deliverables. Kanban board showing different stages in work pipeline. Focusing on DONE as quickly as possible. Change of Plan Current sprint plan CANNOT be changed.New stories go to backlog for next sprint. New stories can be added to beat. Number of work-in-progress (WIP) items should be limited. Pros More predictable and reliable. Better scalability. Faster and more flexible. Cons More management overhead.Less flexibility for urgent works. Less predictability.Hard to scale. Both Scrum and Kanban require:\nDefinition of Done (DoD) Groomed Backlog Effective Standups/Scrums The goal of planning is:\nUnderstading details of what needs to be done. Highlighting high-priority and high-impact works. Identifying blockers, risks, and unknowns. Determining what can be done in parallel and what should be serialized. Laying out a plan for collaboration between team members. Adopting reasonable deadlines. Making feasible commitments and communicating deliverables. Coming up with a Plan B! When sizing stories in Scrum,\nUse Fibonacci numbers. Think of story size as an internal relative measure of complexity rather than time. Each story should be small enough to be done in a sprint. Usually, a story size bigger than 8 is a sign of not enough planning and not identifying unknowns and risks. Scrumban tries to bring the best of both worlds while making some compromisation on both sides. In Scrumban, stories are still sized and the key metric is still team velocity, but Scrumban can take in new stories into a sprint as far as another story with similar size is pushed out. So, the total number of stories planned for the sprint stays the same.\n","permalink":"https://milad.dev/gists/scrum-vs-kanban/","summary":"Comparison Matrix Scrum Kanban KPI Team velocity Cycle-time Goal Building highly reliable and predictable teams. Building flexible and resilient teams. Suitable For Consistent and predictable workloads.Mid-term and long-term Deliverables. Multiple teams at scale. Unpredictable and arbitrary workloads. Short-term and high-priority deliverables. Small and independent teams. Cadence 2-Weeks Sprints3 to 5-Sprints Milestones 1-Week Beats Roles Product Owner (PO)Scrum Master (SM)Development Team N/A Ceremonies PlanningDaily ScrumsReviewDemoRetroScrum of Scrums Daily Stand-ups Board Calendar-like board showing all days in sprint.","title":"Agile: Scrum vs. Kanban"},{"content":"The Problem As a developer when you are working on a Kubernetes application on your local machine, if you want to test or debug something, you have the following options:\nA full environment running using docker-compose. A full environment running in a local Kubernetes cluster (Minikube or Docker-for-Desktop) Pushing instrumented code, building, testing, and deploying to a dev Kubernetes cluster through CI/CD pipeline. The problem with the first two options is the environment you get is not close by any means to your actual final environment (staging and production). And, the last option is very time-consuming for developers to make a small change and go through the full CI/CD pipeline each time.\nWhat is Telepresence? Telepresence is a CNCF project created by Datawire. It is such a great tool for developers to debug and test their codes locally without going through the full deployment process to a Kubernetes cluster.\nTelepresence creates a two-way network proxy between a Pod and a process running locally on your machine. TCP connections, environment variables, volumes, etc. are proxied from your pod to the local process. The networking for the local process is also transparently changed, so DNS calls and TCP connections from your local process will be proxied to the remote Kubernetes cluster.\nAfter installing Telepresence, you can try out the following:\ntelepresence --swap-deployment \u0026lt;name\u0026gt; --run-shell This will run a shell locally while all TCP connections, environment variables, volumes from the pod \u0026lt;name\u0026gt; are available.\nWhat is konfig? konfig is a minimal and unopinionated library for reading configuration values in Go applications. You can read more about it and how to use it here. Using konfig, reading and parsing configuration values is as easy as defining a struct! Here is an example:\npackage main import ( \u0026#34;fmt\u0026#34; \u0026#34;net/url\u0026#34; \u0026#34;time\u0026#34; \u0026#34;github.com/moorara/konfig\u0026#34; ) var config = struct { Port int LogLevel string Timeout time.Duration DatabaseURL []url.URL } { Port: 3000, // default port LogLevel: \u0026#34;info\u0026#34;, // default logging level } func main() { konfig.Pick(\u0026amp;config) // ... } How does konfig Help? When working with Kubernetes Secrets, you want to access them as mounted volumes and files. This way you can set file permissions for mounted secrets and they are updated automatically when you make a change to your secrets.\nIn a Telepresence session, the file system of pod including all volumes is mounted from a path specified in TELEPRESENCE_ROOT environment variable. If you want to run your application in a Telepresence session the same way you run it in your pod, you need to build some logic in your application to take TELEPRESENCE_ROOT into account.\nkonfig is an out-of-the-box solution for transparently reading your configuration values either in a real Pod environment or in a Telepresence session.\nAn Example You can find the source code for this example here.\nLet\u0026rsquo;s build and push the Docker image and then deploy the resources to Kubernetes.\n# current directory: examples/4-telepresence make docker k8s-deploy If we get the logs for our pod (kubectl logs \u0026lt;pod_name\u0026gt;), we should see something similar to the following line:\n2019/07/14 23:45:54 making service-to-service calls using this token: super-strong-secret Now, we are going to see how we can use Telepresence and konfig. First, let\u0026rsquo;s see how Telepresence works. Run the following commands:\ngo build -o app telepresence --swap-deployment example --run ./app If you run ls command, you would see your local files. If you run env command, you would see the AUTH_TOKEN_FILE environment variable is present. Next, try running echo $TELEPRESENCE_ROOT and then cd $TELEPRESENCE_ROOT. This is where the file system of your pod is mounted. In a Telepresence session, you need to prepend $TELEPRESENCE_ROOT to the paths of mounted volumes. We will see how konfig can automatically detect a Telepresence session and read your secrets the same way as they are in your pod.\nNow, let\u0026rsquo;s make a change to our code as follows:\nkonfig.Pick(\u0026amp;Config, konfig.Telepresence(), konfig.Debug(5)) log.Printf(\u0026#34;auth token: %s\u0026#34;, Config.AuthToken) Without building a new Docker image and pushing it, we just compile our application and run it using telepresence command.\ngo build -o app telepresence --swap-deployment example --run ./app You will see our app is running locally and the AuthToken is successfully read from the Telepresence environment.\n2019/07/14 19:58:24 ---------------------------------------------------------------------------------------------------- 2019/07/14 19:58:24 Options: Debug\u0026lt;5\u0026gt; + Telepresence 2019/07/14 19:58:24 ---------------------------------------------------------------------------------------------------- 2019/07/14 19:58:24 [AuthToken] expecting flag name: auth.token 2019/07/14 19:58:24 [AuthToken] expecting environment variable name: AUTH_TOKEN 2019/07/14 19:58:24 [AuthToken] expecting file environment variable name: AUTH_TOKEN_FILE 2019/07/14 19:58:24 [AuthToken] expecting list separator: , 2019/07/14 19:58:24 [AuthToken] value read from flag auth.token: 2019/07/14 19:58:24 [AuthToken] value read from environment variable AUTH_TOKEN: 2019/07/14 19:58:24 [AuthToken] value read from file environment variable AUTH_TOKEN_FILE: /secrets/myappsecret/auth-token 2019/07/14 19:58:24 [AuthToken] telepresence root path: /tmp/tel-deewv8wa/fs 2019/07/14 19:58:24 [AuthToken] value read from file /tmp/tel-deewv8wa/fs/secrets/myappsecret/auth-token: super-strong-secret 2019/07/14 19:58:24 [AuthToken] setting string value: super-strong-secret 2019/07/14 19:58:24 ---------------------------------------------------------------------------------------------------- 2019/07/14 19:58:24 auth token: super-strong-secret Conclusion Telepresence is a great tool for developing applications for Kubernetes. konfig makes reading and parsing of configuration values for Go applications extremely easy. It can also read configuration values in a Telepresence session transparently.\n","permalink":"https://milad.dev/posts/telepresence-with-konfig/","summary":"The Problem As a developer when you are working on a Kubernetes application on your local machine, if you want to test or debug something, you have the following options:\nA full environment running using docker-compose. A full environment running in a local Kubernetes cluster (Minikube or Docker-for-Desktop) Pushing instrumented code, building, testing, and deploying to a dev Kubernetes cluster through CI/CD pipeline. The problem with the first two options is the environment you get is not close by any means to your actual final environment (staging and production).","title":"Developing Go Services For Kubernetes with Telepresence and konfig"},{"content":"TL;DR Dynamic configuration management and secret injection refer to updating an application with new configurations and secrets in a non-disruptive way. Kubernetes ConfigMaps and Secrets mounted as files into containers will be updated with new values automatically. konfig makes dynamic configuration management and secret injection very easy to implement and use for Go applications. The Problem Dynamic configuration management and secret injection refer to a situation that your application can update its configurations and secrets without needing a restart.\nExample 1: Let\u0026rsquo;s imagine you have a microservice deployed to Kubernetes. This microservice serves an API and consumers constantly make requests to it. The verbosity level of logging for this microservice is a configurable parameter that is passed to the container through a ConfigMap. As a best practice (ex. cost-saving), you only want warning or error logs in production. Now, let\u0026rsquo;s consider a case that your microservice is handling a long-lived background job and you want to see logs for this job at debug level. Normally, you go change the verbosity level in your ConfigMap and restart your pods. But, in this case, restarting pods also means killing the background job! How do you change the logging verbosity level while your job continues running and without restarting any pod?\nExample 2: In another scenario, you have a microservices application in which you have an auth microservice responsible for signing JSON web tokens (jwt). Other microservices need to authenticate and authorize API requests with the tokens issued by auth microservice. For this purpose, they need to verify a JWT token using the same key that auth microservice signed the token with it. All of these microservices, including auth, read the signing key from a Kubernetes Secret. Now, let\u0026rsquo;s say your signing key is comprised and you want to rotate your secret. How would you do this without restarting all of your microservices?\nSolution Both examples described in the previous section can be addressed by dynamic configuration management and secret injection. For achieving this, two things need to be done:\nInjecting the new configuration/secret into your environment (container). Picking up the new configuration/secret in your application. If you are using a container orchestration platform like Kubernetes, the first problem is taken care of automatically. In Kubernetes, Mounted ConfigMaps and Secrets are updated automatically. This means if you mount a ConfigMap or Secret as a volume into your container, whenever you make a change to your ConfigMap or Secret, the mounted files in your container will eventually be updated with new values.\nFor addressing the second problem, you need to watch for changes to mounted files in a parallel thread in your application. As soon as you detect a change, you need to update your application to reflect the change in a thread-safe way (without data race or deadlock!).\nDRY! You don\u0026rsquo;t have to repeat yourself across all different applications and microservices. More importantly, you also don\u0026rsquo;t have to be trapped in concurrency issues (data races, deadlock, etc.).\nkonfig is a very minimal and unopinionated utility for reading configurations in Go applications based on The 12-Factor App. konfig can also watch for changes to configuration files and notify a list of subscribers on a channel. konfig makes dynamic configuration management and secret injection very easy to implement and use. As a consumer of this library, you don\u0026rsquo;t have to deal with parsing different data types and concurrency issues such as data races. You can find examples of using konfig here.\nA Real-World Example In this section, we want to demonstrate a real-world example of dynamic configuration management using konfig. Here is an overview of what we are going to do:\nWe will deploy two microservices to a Kubernetes cluster. The servers provide a simple HTTP endpoint and the clients call this endpoint every second. Initially, the microservices are configured to only show warn level logs, so we won\u0026rsquo;t see any log. Then, we will update the log-level key in ConfigMaps for server and client to debug. Without restarting the pods, we will see logs from server and client pods in debug level after a little while. Source Codes You can find all the required source codes here.\nDemo First, we build and push Docker images for the server and client microservices. Second, we deploy them to the Kubernetes cluster.\n# current directory: examples/3-kubernetes cd server make docker k8s-deploy cd ../client make docker k8s-deploy Now, we should have 2 pods up and running for each microservice (server and client). Using a handly tool called Stern, we tail all logs from server pods.\nstern app-server app-server-588c8db995-kgwgh server {\u0026#34;level\u0026#34;:\u0026#34;info\u0026#34;,\u0026#34;logger\u0026#34;:\u0026#34;server\u0026#34;,\u0026#34;message\u0026#34;:\u0026#34;starting http server ...\u0026#34;,\u0026#34;timestamp\u0026#34;:\u0026#34;2019-09-02T20:42:03.0653369Z\u0026#34;} app-server-588c8db995-t7mw4 server {\u0026#34;level\u0026#34;:\u0026#34;info\u0026#34;,\u0026#34;logger\u0026#34;:\u0026#34;server\u0026#34;,\u0026#34;message\u0026#34;:\u0026#34;starting http server ...\u0026#34;,\u0026#34;timestamp\u0026#34;:\u0026#34;2019-09-02T20:42:03.8942075Z\u0026#34;} Similarly, we tail all logs from client pods.\nstern app-client app-client-599ff8bf8f-7mtlr client {\u0026#34;level\u0026#34;:\u0026#34;info\u0026#34;,\u0026#34;logger\u0026#34;:\u0026#34;client\u0026#34;,\u0026#34;message\u0026#34;:\u0026#34;start sending requests ...\u0026#34;,\u0026#34;timestamp\u0026#34;:\u0026#34;2019-09-02T20:46:37.8604542Z\u0026#34;} app-client-599ff8bf8f-cvsss client {\u0026#34;level\u0026#34;:\u0026#34;info\u0026#34;,\u0026#34;logger\u0026#34;:\u0026#34;client\u0026#34;,\u0026#34;message\u0026#34;:\u0026#34;start sending requests ...\u0026#34;,\u0026#34;timestamp\u0026#34;:\u0026#34;2019-09-02T20:46:38.6870928Z\u0026#34;} The initial logs in info level are logged before the warn is read from the mounted configuration file and then the logger is updated with warn level. While the client pods are making requests to server pods, we see no other logs from any of the pods.\nNow, it is time to see dynamic configuration management in action! Using the kubectl edit cm app-server we change the log-level to info. Similarly, using the kubectl edit cm app-client we change the log-level to info.\nAfter a few seconds, Kubernetes will update the mounted files with the new values, and we will see logs from server and client pods. Logs from server pods are similar to these ones:\napp-server-588c8db995-t7mw4 server {\u0026#34;level\u0026#34;:\u0026#34;info\u0026#34;,\u0026#34;logger\u0026#34;:\u0026#34;server\u0026#34;,\u0026#34;message\u0026#34;:\u0026#34;new request received\u0026#34;,\u0026#34;timestamp\u0026#34;:\u0026#34;2019-09-02T20:59:20.793411Z\u0026#34;} app-server-588c8db995-kgwgh server {\u0026#34;level\u0026#34;:\u0026#34;info\u0026#34;,\u0026#34;logger\u0026#34;:\u0026#34;server\u0026#34;,\u0026#34;message\u0026#34;:\u0026#34;new request received\u0026#34;,\u0026#34;timestamp\u0026#34;:\u0026#34;2019-09-02T20:59:20.967802Z\u0026#34;} app-server-588c8db995-t7mw4 server {\u0026#34;level\u0026#34;:\u0026#34;info\u0026#34;,\u0026#34;logger\u0026#34;:\u0026#34;server\u0026#34;,\u0026#34;message\u0026#34;:\u0026#34;new request received\u0026#34;,\u0026#34;timestamp\u0026#34;:\u0026#34;2019-09-02T20:59:21.7932503Z\u0026#34;} app-server-588c8db995-kgwgh server {\u0026#34;level\u0026#34;:\u0026#34;info\u0026#34;,\u0026#34;logger\u0026#34;:\u0026#34;server\u0026#34;,\u0026#34;message\u0026#34;:\u0026#34;new request received\u0026#34;,\u0026#34;timestamp\u0026#34;:\u0026#34;2019-09-02T20:59:21.9667149Z\u0026#34;} And logs from client pods are similar to these ones:\napp-client-599ff8bf8f-cvsss client {\u0026#34;level\u0026#34;:\u0026#34;info\u0026#34;,\u0026#34;logger\u0026#34;:\u0026#34;client\u0026#34;,\u0026#34;message\u0026#34;:\u0026#34;response received from server\u0026#34;,\u0026#34;http.statusCode\u0026#34;:200,\u0026#34;timestamp\u0026#34;:\u0026#34;2019-09-02T20:59:50.7583619Z\u0026#34;} app-client-599ff8bf8f-7mtlr client {\u0026#34;level\u0026#34;:\u0026#34;info\u0026#34;,\u0026#34;logger\u0026#34;:\u0026#34;client\u0026#34;,\u0026#34;message\u0026#34;:\u0026#34;response received from server\u0026#34;,\u0026#34;http.statusCode\u0026#34;:200,\u0026#34;timestamp\u0026#34;:\u0026#34;2019-09-02T20:59:50.9315373Z\u0026#34;} app-client-599ff8bf8f-cvsss client {\u0026#34;level\u0026#34;:\u0026#34;info\u0026#34;,\u0026#34;logger\u0026#34;:\u0026#34;client\u0026#34;,\u0026#34;message\u0026#34;:\u0026#34;response received from server\u0026#34;,\u0026#34;http.statusCode\u0026#34;:200,\u0026#34;timestamp\u0026#34;:\u0026#34;2019-09-02T20:59:51.7572924Z\u0026#34;} app-client-599ff8bf8f-7mtlr client {\u0026#34;level\u0026#34;:\u0026#34;info\u0026#34;,\u0026#34;logger\u0026#34;:\u0026#34;client\u0026#34;,\u0026#34;message\u0026#34;:\u0026#34;response received from server\u0026#34;,\u0026#34;http.statusCode\u0026#34;:200,\u0026#34;timestamp\u0026#34;:\u0026#34;2019-09-02T20:59:51.9335878Z\u0026#34;} Conclusion Dynamic configuration management and secret injection refer to updating an application with new configurations and secrets without disrupting (restarting) it. This is an important quality for building a non-disruptive and autonomous application. With containerization technologies like Docker and Kubernetes, it is easy to inject new configurations and secrets into your application environment (container). konfig is a very minimal and unopinionated library that makes dynamic configuration management and secret injection very easy to implement and use for Go application.\n","permalink":"https://milad.dev/posts/dynamic-config-secret/","summary":"TL;DR Dynamic configuration management and secret injection refer to updating an application with new configurations and secrets in a non-disruptive way. Kubernetes ConfigMaps and Secrets mounted as files into containers will be updated with new values automatically. konfig makes dynamic configuration management and secret injection very easy to implement and use for Go applications. The Problem Dynamic configuration management and secret injection refer to a situation that your application can update its configurations and secrets without needing a restart.","title":"Dynamic Configuration Management and Secret Injection with konfig"},{"content":"konfig is a minimal and unopinionated configuration management library for Go applications. It is based on The 12-Factor App. I created this library as a response to repeating myself across almost every single service and application.\nIt is a very minimal and lightweight library for reading configuration values either from command-line arguments, environment variables, or files. It uses reflection to automatically convert the input values to the desired types defined in Go. It also supports slice, time.Duration, and url.URL types. It does NOT use the built-in flag package, so you can separately define and parse your command-line flags.\nHere are all the supported types:\nstring, *string, []string bool, *bool, []bool float32, float64 *float32, *float64 []float32, []float64 int, int8, int16, int32, int64 *int, *int8, *int16, *int32, *int64 []int, []int8, []int16, []int32, []int64 uint, uint8, uint16, uint32, uint64 *uint, *uint8, *uint16, *uint32, *uint64 []uint, []uint8, []uint16, []uint32, []uint64 url.URL, *url.URL, []url.URL regexp.Regexp, *regexp.Regexp, []regexp.Regexp time.Duration, *time.Duration, []time.Duration The supported syntax for Regexp is POSIX Regular Expressions.\nGetting Started All you need to do for your configuration management is defining a struct!\npackage main import ( \u0026#34;fmt\u0026#34; \u0026#34;net/url\u0026#34; \u0026#34;time\u0026#34; \u0026#34;github.com/moorara/konfig\u0026#34; ) var config = struct { Port int LogLevel string Timeout time.Duration DatabaseURL []url.URL } { Port: 3000, // default port LogLevel: \u0026#34;info\u0026#34;, // default logging level } func main() { konfig.Pick(\u0026amp;config) fmt.Printf(\u0026#34;Port: %d\\n\u0026#34;, config.Port) fmt.Printf(\u0026#34;LogLevel: %s\\n\u0026#34;, config.LogLevel) fmt.Printf(\u0026#34;Timeout: %v\\n\u0026#34;, config.Timeout) for _, u := range config.DatabaseURL { fmt.Printf(\u0026#34;DatabaseURL: %s\\n\u0026#34;, u.String()) } } Using Default Values Now, we compile this little piece of code using go build -o app command and run it! Here is the output we see:\nPort: 3000 LogLevel: info Timeout: 0s Using Command-Line Arguments We can pass a different set of configuration values throguh command-line arguments by running any of the following commands:\n./app -port 4000 -log.level debug -timeout 30s -database.url \u0026#34;redis-1:27017,redis-2:27017,redis-3:27017\u0026#34; ./app -port=4000 -log.level=debug -timeout=30s -database.url=\u0026#34;redis-1:27017,redis-2:27017,redis-3:27017\u0026#34; ./app --port 4000 --log.level debug --timeout 30s --database.url \u0026#34;redis-1:27017,redis-2:27017,redis-3:27017\u0026#34; ./app --port=4000 --log.level=debug --timeout=30s --database.url=\u0026#34;redis-1:27017,redis-2:27017,redis-3:27017\u0026#34; And we will see the following output once we run the application:\nPort: 4000 LogLevel: debug Timeout: 30s DatabaseURL: redis-1:27017 DatabaseURL: redis-2:27017 DatabaseURL: redis-3:27017 You may notice how command-line argument names are constructed.\nAll lower-case with . as the separator character between words. You can either use a single dash (-) or double dash (--) for an argument. You can use space ( ) or assignment character (=) for the value of an argument. Using Environment Variables Now Lets try passing the same configuration values through environment variables:\nexport PORT=5000 export LOG_LEVEL=warn export TIMEOUT=90s export DATABASE_URL=\u0026#34;mongo-1:27017,mongo-2:27017,mongo-3:27017\u0026#34; ./app And as expected, we will see the following output:\nPort: 5000 LogLevel: warn Timeout: 1m30s DatabaseURL: mongo-1:27017 DatabaseURL: mongo-2:27017 DatabaseURL: mongo-3:27017 Similarly, here is how environment variable names are constructed.\nAll upper-case with _ as a separator character between words. Using Configuration Files Finally, you can write the configuration values in files and pass the paths to these files into your application. This is useful when you want to pass secrets into your application (mounting Kuberentes secretes as files for example).\necho -n \u0026#34;6000\u0026#34; \u0026gt; port.txt echo -n \u0026#34;error\u0026#34; \u0026gt; log_level.txt echo -n \u0026#34;120s\u0026#34; \u0026gt; timeout.txt echo -n \u0026#34;postgres-1:27017,postgres-2:27017,postgres-3:27017\u0026#34; \u0026gt; database_url.txt export PORT_FILE=\u0026#34;$PWD/port.txt\u0026#34; export LOG_LEVEL_FILE=\u0026#34;$PWD/log_level.txt\u0026#34; export TRACING_FILE=\u0026#34;$PWD/tracing.txt\u0026#34; export TIMEOUT_FILE=\u0026#34;$PWD/timeout.txt\u0026#34; export DATABASE_URL_FILE=\u0026#34;$PWD/database_url.txt\u0026#34; ./app And we will see again the same output:\nPort: 6000 LogLevel: error Timeout: 2m0s DatabaseURL: postgres-1:27017 DatabaseURL: postgres-2:27017 DatabaseURL: postgres-3:27017 Using flag Package konfig plays nice with flag package since it does NOT use flag package for parsing command-line flags. That means you can define, parse, and use your flags using the built-in flag package. If you use flag package, konfig will also add the command-line flags it is expecting.\nHere is an example:\npackage main import ( \u0026#34;flag\u0026#34; \u0026#34;github.com/moorara/konfig\u0026#34; ) var config = struct { Port int LogLevel string } { LogLevel: \u0026#34;info\u0026#34;, } func main() { konfig.Pick(\u0026amp;config) flag.Parse() } Now, if you run the app with -help flag, you would see the following:\nUsage of ./app: -log.level value data type: string default value: info environment variable: LOG_LEVEL environment variable for file path: LOG_LEVEL_FILE -port value data type: int default value: 0 environment variable: PORT environment variable for file path: PORT_FILE Precedence If configuration values are passed via different methods, the precendence is as follows:\nCommand-line arguments Environment variables Configuration files Default values Customization Changing Default Names If you want to override the default name for the command line argument or environment variables, here is how you can do it:\npackage main import ( \u0026#34;fmt\u0026#34; \u0026#34;github.com/moorara/konfig\u0026#34; ) var config = struct { LogLevel string `flag:\u0026#34;loglevel\u0026#34; env:\u0026#34;LOGLEVEL\u0026#34; fileenv:\u0026#34;LOGLEVEL_FILE_PATH\u0026#34;` } { LogLevel: \u0026#34;info\u0026#34;, // default logging level } func main() { konfig.Pick(\u0026amp;config) fmt.Printf(\u0026#34;LogLevel: %s\\n\u0026#34;, config.LogLevel) } Here is how you can use the new names:\n# using flag name ./app --loglevel=debug # using environment variable export LOGLEVEL=debug ./app # using configuration file echo -n \u0026#34;debug\u0026#34; \u0026gt; loglevel.txt export LOGLEVEL_FILE_PATH=\u0026#34;./loglevel.txt\u0026#34; ./app Changing Separator For Lists If you want to pass a list of configuration values that the values themselves may include the default separator character (,), here is how you can specify a different character as the separator:\npackage main import ( \u0026#34;fmt\u0026#34; \u0026#34;github.com/moorara/konfig\u0026#34; ) var config = struct { Rows []string `sep:\u0026#34;|\u0026#34;` } {} func main() { konfig.Pick(\u0026amp;config) for _, r := range config.Rows { fmt.Println(r) } } And now you can pass a value for this entry using command-line argument:\n./app -rows=\u0026#34;a,b,c|1,2,3\u0026#34; Skipping A Source If you do not want your configuration values to be read from any of the sources, you can set its name to -. For example, if you want a secret only be read from a file and neither command flag nor environment variable, you can do it as follows:\npackage main import ( \u0026#34;fmt\u0026#34; \u0026#34;github.com/moorara/konfig\u0026#34; ) var config = struct { Token string `env:\u0026#34;-\u0026#34; fileenv:\u0026#34;-\u0026#34;` } {} func main() { konfig.Pick(\u0026amp;config) fmt.Println(config.Token) } And now you can only pass the token value via a configuration file:\n# Will NOT work! ./app --token=123456789 # Will NOT work! export TOKEN=123456789 ./app # ONLY this works! echo -n \u0026#34;123456789\u0026#34; \u0026gt; token.txt export TOKEN_FILE=\u0026#34;$PWD/token.txt\u0026#34; ./app Options You can pass a list of options to Pick method. These options are helpers for specific setups and situations.\nYou can use konfig.Debug() option for printing debugging information.\nYou can use konfig.ListSep() option to specify the list separator for all slice fields.\nkonfig.SkipFlag() option will skip command-line flags as a source for all fields. Likewise, you can use konfig.SkipEnv() option to skip environment variables as a source for all fields. And you can also use konfig.SkipFileEnv() for skipping file environment variables (and configuration files) as a source for all fields.\nIf you want to prefix all flag names with a specific string you can use konfig.PrefixFlag() option. You can use konfig.PrefixEnv() option to prefix all environment variable names with a string. Similarly, using konfig.PrefixFileEnv() option you can prefix all file environment variable names with a string.\nkonfig.Telepresence() option lets you read configuration files when running your application in a Telepresence environment. You can read more about Telepresence proxied volumes here.\nEach option can also be set using an environment variable, so you don\u0026rsquo;t need to make any code changes.\nDebugging If for any reason configuration values are not read as you expected, you can use Debug option to see how exactly your configuration values are read. Debug accepts a verbosity parameter which specifies the verbosity level of logs. You can also enable debugging logs by setting the KONFIG_DEBUG environment variable to a verbosity level.\nHere is an example:\npackage main import ( \u0026#34;fmt\u0026#34; \u0026#34;github.com/moorara/konfig\u0026#34; ) var config = struct { Port int LogLevel string } { LogLevel: \u0026#34;info\u0026#34;, } func main() { konfig.Pick(\u0026amp;config, konfig.Debug(3)) fmt.Printf(\u0026#34;Port: %d\\n\u0026#34;, config.Port) fmt.Printf(\u0026#34;LogLevel: %s\\n\u0026#34;, config.LogLevel) } Now, try running the app as follows:\nKONFIG_DEBUG=5 ./app And, you see the following output:\n2020/03/04 15:08:25 ---------------------------------------------------------------------------------------------------- 2020/03/04 15:08:25 Options: Debug\u0026lt;5\u0026gt; + ListSep\u0026lt;,\u0026gt; 2020/03/04 15:08:25 ---------------------------------------------------------------------------------------------------- 2020/03/04 15:08:25 Registering configuration flags ... 2020/03/04 15:08:25 ---------------------------------------------------------------------------------------------------- 2020/03/04 15:08:25 [Port] flag registered: port 2020/03/04 15:08:25 [LogLevel] flag registered: log.level 2020/03/04 15:08:25 ---------------------------------------------------------------------------------------------------- 2020/03/04 15:08:25 Reading configuration values ... 2020/03/04 15:08:25 ---------------------------------------------------------------------------------------------------- 2020/03/04 15:08:25 [Port] expecting flag name: port 2020/03/04 15:08:25 [Port] expecting environment variable name: PORT 2020/03/04 15:08:25 [Port] expecting file environment variable name: PORT_FILE 2020/03/04 15:08:25 [Port] expecting list separator: , 2020/03/04 15:08:25 [Port] value read from flag port: 2020/03/04 15:08:25 [Port] value read from environment variable PORT: 2020/03/04 15:08:25 [Port] value read from file environment variable PORT_FILE: 2020/03/04 15:08:25 [Port] falling back to default value: 0 2020/03/04 15:08:25 ---------------------------------------------------------------------------------------------------- 2020/03/04 15:08:25 [LogLevel] expecting flag name: log.level 2020/03/04 15:08:25 [LogLevel] expecting environment variable name: LOG_LEVEL 2020/03/04 15:08:25 [LogLevel] expecting file environment variable name: LOG_LEVEL_FILE 2020/03/04 15:08:25 [LogLevel] expecting list separator: , 2020/03/04 15:08:25 [LogLevel] value read from flag log.level: 2020/03/04 15:08:25 [LogLevel] value read from environment variable LOG_LEVEL: 2020/03/04 15:08:25 [LogLevel] value read from file environment variable LOG_LEVEL_FILE: 2020/03/04 15:08:25 [LogLevel] falling back to default value: info 2020/03/04 15:08:25 ---------------------------------------------------------------------------------------------------- Watching You can write new values to your configuration files, while your application is running. konfig can watch your cofiguration files and if a new value is written, it will notify a list of subscribers. This feature allows you to implement dynamic configuration and secret injection easily. Let\u0026rsquo;s show how this feature works using an example:\npackage main import ( \u0026#34;fmt\u0026#34; \u0026#34;sync\u0026#34; \u0026#34;github.com/moorara/konfig\u0026#34; ) var config = struct { sync.Mutex LogLevel string } {} func main() { ch := make(chan konfig.Update) go func() { for update := range ch { if update.Name == \u0026#34;LogLevel\u0026#34; { config.Lock() fmt.Printf(\u0026#34;Now logging in %s level ...\\n\u0026#34;, config.LogLevel) config.Unlock() } } }() stop, _ := konfig.Watch(\u0026amp;config, []chan konfig.Update{ch}) defer stop() wait := make(chan bool) \u0026lt;-wait } Next, let\u0026rsquo;s create a configuration file and run the application:\necho -n \u0026#34;warn\u0026#34; \u0026gt; log_level export LOG_LEVLE_FILE=\u0026#34;$PWD/log_level\u0026#34; ./app You will see the following output:\nNow logging in warn level ... In a new terminal, we write a new value to log_level file:\necho -n \u0026#34;debug\u0026#34; \u0026gt; log_level Within a few seconds, we should see the following message:\nNow logging in debug level ... ","permalink":"https://milad.dev/projects/konfig/","summary":"konfig is a minimal and unopinionated configuration management library for Go applications. It is based on The 12-Factor App. I created this library as a response to repeating myself across almost every single service and application.\nIt is a very minimal and lightweight library for reading configuration values either from command-line arguments, environment variables, or files. It uses reflection to automatically convert the input values to the desired types defined in Go.","title":"Zero-Config Configuration Management"},{"content":"TL;DR The Rust compiler is built on top of LLVM. Rust is a statically-typed language. Rust has optional types for handling null and the compiler requires the None case to be handled. Rust requires top-level items like function arguments and constants to have explicit types while allowing type inference inside of function bodies. Rust\u0026rsquo;s strong type system and memory safety are all enforced at compile time! Rust does not need to have a garbage collector! Rust gives you the choice of storing data on the stack or on the heap. Rust determines at compile time when memory is no longer needed and can be cleaned up. Rust projects can be used as libraries by other programming languages via foreign-function interfaces. This allows existing projects to incrementally replace performance-critical pieces with Rust code without the memory safety risks. Rust is an ideal language for embedded and bare-metal development. Borrow checker is the part of the compiler ensuring that references do not outlive the data they refer to. When safe Rust is not able to express some concept, you can use unsafe Rust. This enables more power, but the programmer is responsible for ensuring that the code is truly safe. The unsafe code can be wrapped in higher-level abstractions which guarantee that all uses of the abstraction are safe. Many aspects of creating and maintaining production-quality software such as testing, dependency management, documentation, etc. are first-class citizens in Rust. Prototyping solutions in Rust can be challenging since Rust requires covering 100% of the conditions! READ MORE What is Rust and why is it so popular? Why Discord is switching from Go to Rust ","permalink":"https://milad.dev/gists/what-is-rust/","summary":"TL;DR The Rust compiler is built on top of LLVM. Rust is a statically-typed language. Rust has optional types for handling null and the compiler requires the None case to be handled. Rust requires top-level items like function arguments and constants to have explicit types while allowing type inference inside of function bodies. Rust\u0026rsquo;s strong type system and memory safety are all enforced at compile time! Rust does not need to have a garbage collector!","title":"What is Rust and Why is it So Popular?"},{"content":" Coding conventional: comments Goodbye, Clean Code Source Code Management Setting Up Git Identities The History of Git: The Road to Domination in Software Version Control Why Google Stores Billions of Lines of Code in a Single Repository Bring your monorepo down to size with sparse-checkout Go https://abhinavg.net/posts/understanding-token-pos Kubernetes A Simple Kubernetes Admission Webhook AI/ML Machine learning has a backdoor problem Misc Using the iPad Pro as my development machine Google Recruiters Say Using the \u0026lsquo;X-Y-Z Formula\u0026rsquo; on Your Resume Will Improve Your Odds of Getting Hired at Google ","permalink":"https://milad.dev/reads/","summary":" Coding conventional: comments Goodbye, Clean Code Source Code Management Setting Up Git Identities The History of Git: The Road to Domination in Software Version Control Why Google Stores Billions of Lines of Code in a Single Repository Bring your monorepo down to size with sparse-checkout Go https://abhinavg.net/posts/understanding-token-pos Kubernetes A Simple Kubernetes Admission Webhook AI/ML Machine learning has a backdoor problem Misc Using the iPad Pro as my development machine Google Recruiters Say Using the \u0026lsquo;X-Y-Z Formula\u0026rsquo; on Your Resume Will Improve Your Odds of Getting Hired at Google ","title":"Reads"},{"content":"TL;DR\nDistributed transactions are one of the hardest problems in computer science. NoSQL was a response to scalability limitation and a very high cost of traditional RDBMS. CAP theorem says in case of network partitions, among consistency (correctness) and availability, one has to be comprised in favor of the other. The first generation of NoSQL DBMS chose availability and they were eventually consistent. In theory, they will reconcile conflicts in a finite time after a network partitioning by probabilistically voting on what the data is supposed to be. Most real-world eventually consistent systems use a simplistic last-write-wins based on the local system time. Although NoSQL systems are available and eventually consistent, they do NOT guarantee correctness. Therefore, databases need external consistency or ACID. Legacy databases have a single centralized machine and serialize writes to a single disk in a deterministic order. Scalability is only vertical. In primary/follower replicated databases, followers asynchronously replicate the state of the primary. Google\u0026rsquo;s Spanner: Multi-shard transactions are done by a two-phase prepare/commit algorithm. Shard failover is automated via Paxos. Physical atomic clock hardware synchronizes the system time on all shards within very small error bounds. Google has the resources to build and maintain atomic clock hardware and bounded-latency networks. Calvin is a logical clock oracle that does not rely on any single physical machine and can be widely distributed. The order of transactions is determined by preprocessing an externally consistent order to all incoming transactions. FaunaDB is a Relation NoSQL system implementing Calvin model. In practice, the availability of systems like Spanner and FaunaDB in the cloud is the same as the availability of AP systems. Bounded consistency is better than eventual consistency. READ MORE\n","permalink":"https://milad.dev/gists/relational-nosql/","summary":"TL;DR\nDistributed transactions are one of the hardest problems in computer science. NoSQL was a response to scalability limitation and a very high cost of traditional RDBMS. CAP theorem says in case of network partitions, among consistency (correctness) and availability, one has to be comprised in favor of the other. The first generation of NoSQL DBMS chose availability and they were eventually consistent. In theory, they will reconcile conflicts in a finite time after a network partitioning by probabilistically voting on what the data is supposed to be.","title":"Back to the Future with Relational NoSQL"},{"content":"TL;DR\nKnow Why With microservices, you will inevitably ship your org chart! Think about why you are doing it at an organizational level. Don\u0026rsquo;t focus on computer science! Focus on velocity. Optimize for velocity (not engineering velocity and not systems throughput). By assigning project teams to microservices, you reduce person-to-person communication and increase velocity. Serverless Still Runs on Servers The idea of single-purpose services is a failure mode to do it blindly. There is a tendency to break things down into smaller and smaller pieces. Don\u0026rsquo;t forget about the significant cost of having two processes communicate with each other over a network. Structure services around functional units in an engineering organization. Think about compartmentalization and not breaking down services into the smallest possible pieces. Don\u0026rsquo;t keep on making things smaller and smaller. You will regret that! Independence If you let teams make their own decisions, you will not have a mechanism to observe everything. Think about which dimensions are independent, and which ones should be delegated to a platform team. Beware Giant Dashboards Each service generates a lot of pro-forma dashboards and then you put in the business metrics as well. Figuring out the root cause becomes hard when cascading failures are visible in all interdependent microservices. The dashboards should be limited to SLIs (what your consumer cares about) and the root cause analysis will be a guided refinement. Observability is not about the three pillars. It\u0026rsquo;s about detection and refinement. Observability boils down to two activities: Detection of critical signals Refining the search space You Can\u0026rsquo;t Trace Everything Distributed tracing is a way to do transactional logging with some kind of sampling built-in. READ MORE\n","permalink":"https://milad.dev/gists/microservices-lessons/","summary":"TL;DR\nKnow Why With microservices, you will inevitably ship your org chart! Think about why you are doing it at an organizational level. Don\u0026rsquo;t focus on computer science! Focus on velocity. Optimize for velocity (not engineering velocity and not systems throughput). By assigning project teams to microservices, you reduce person-to-person communication and increase velocity. Serverless Still Runs on Servers The idea of single-purpose services is a failure mode to do it blindly.","title":"Lessons from the Birth of Microservices"},{"content":"TL;DR Three different approaches to dynamic configuration: Templating Examples: Helm, gomplate, etc. Text templating very quickly becomes fragile, hard-to-understand, and hard-to-maintain. Template writers lack the tools to build abstractions around the data. Layering Examples: kustomize Data layering breaks down when configurations grow in complexity and scale. Template writers lack abstraction and type validation. For large scale projects, inheritance creates deep layers of abstractions. Semantics are locked into an opaque tool and not exposed as language features. Data Configuration Language (DCL) Examples: jsonnet, ksonnet kubecfg, etc. Any declarative systems and tools that grow in size and complexity need to be abstracted. CUE is a language for defining, generating, and validating all kinds of data. CUE is designed around graph unification. Sets of types and values can be modeled as directed graphs and then unified to the most specific representation of all graphs. CUE makes tasks like validation, templating, querying, and code generation first class features. CUE\u0026rsquo;s type system is expressive. For a given field, you can specify the type as well as optionality and constraints from other fields. CUE also has a declarative scripting layer built on top of the configuration language. Read More The Configuration Complexity Curse CUE The CUE Data Constraint Language Kubernetes Tutorial ","permalink":"https://milad.dev/gists/cue/","summary":"TL;DR Three different approaches to dynamic configuration: Templating Examples: Helm, gomplate, etc. Text templating very quickly becomes fragile, hard-to-understand, and hard-to-maintain. Template writers lack the tools to build abstractions around the data. Layering Examples: kustomize Data layering breaks down when configurations grow in complexity and scale. Template writers lack abstraction and type validation. For large scale projects, inheritance creates deep layers of abstractions. Semantics are locked into an opaque tool and not exposed as language features.","title":"The Configuration Complexity Curse"},{"content":" Recap ","permalink":"https://milad.dev/books/leaders-eat-last/","summary":" Recap ","title":"Leaders Eat Last"},{"content":"TL;DR\nIn every organization of any size, the steady state is always a superposition of many different wavefronts of changes. Some of those changes are technological and some of them are market-driven The changes are originating at different places and sweeping through the organization at different speeds. Stop chasing the end state! Let\u0026rsquo;s focus on continuous adaptation instead of the grand vision. Embrace Plurality Avoid single system of record (SSoR). Federate extents from multiple different systems (multiple systems of record). Contextualize Downstream Business rules are contextual. The more we push rules upstream, the larger the surface area of every change becomes. Augment the information upstream, but contextualize it and apply rules and policies downstream. Apply policies in systems closest to the users. Minimize the entities that all systems need to know about it. Beware Grandiosity Seek compromises. Assume an open world. Begin small and incrementalize. Allow lengthy comment periods. Decentralize Transparency: methods, work, and results must be visible. Isolation: one group\u0026rsquo;s failure cannot cause widespread damage. Economics: distributed economic decision-making. Isolate Failure Domains Six different ways of introducing modularity: splitting, substitution, augmenting, excluding, inversion, porting Data Outlives Applications Data outlives many different technologies and applications. Applications Outlive Integrations Hexagonal architecture or Ports and Adapters Increase Discoverability Improve by building on the work of others. Visible work: open code repositories, internal blogs, etc. Presentation\n","permalink":"https://milad.dev/gists/arch-without-end/","summary":"TL;DR\nIn every organization of any size, the steady state is always a superposition of many different wavefronts of changes. Some of those changes are technological and some of them are market-driven The changes are originating at different places and sweeping through the organization at different speeds. Stop chasing the end state! Let\u0026rsquo;s focus on continuous adaptation instead of the grand vision. Embrace Plurality Avoid single system of record (SSoR). Federate extents from multiple different systems (multiple systems of record).","title":"Architecture Without an End State"},{"content":" Be Kind! Everyone you meet is fighting a hard battle!\n-?\nEverything should be made as simple as possible, but no simpler.\n-Albert Einstein\nThat\u0026rsquo;s been one of my mantras - focus and simplicity. Simple can be harder than complex: You have to work hard to get your thinking clean to make it simple. But it\u0026rsquo;s worth it in the end because once you get there, you can move mountains.\n-Steve Jobs\n","permalink":"https://milad.dev/quotes/","summary":"Be Kind! Everyone you meet is fighting a hard battle!\n-?\nEverything should be made as simple as possible, but no simpler.\n-Albert Einstein\nThat\u0026rsquo;s been one of my mantras - focus and simplicity. Simple can be harder than complex: You have to work hard to get your thinking clean to make it simple. But it\u0026rsquo;s worth it in the end because once you get there, you can move mountains.","title":"Quotes"},{"content":"TL;DR\nObservability is the ability to understand what is going on in the inner workings of a system just by observing it from the outside. Your software should explain itself and what is doing! Pillars of observability are logs, metrics, traces, and events. Logs are structured logging or non-structured textual data. Used for auditing and debugging purposes. Very expensive at scale. Cannot be used for real-time computational purposes. Hard to track across different and distributed processes. You need know what to look for ahead of the time (know unknowns vs. unknown unknowns). Metrics are time-series data (regular) with low cardinality. Aggregated by time. Used for real-time monitoring purposes. Can take the distribution of data into account. Enable service-level indicators (SLIs) and service-level objectives (SLOs). CANNOT be broken down by high-cardinality dimensions (unique ids such user ids). Traces are used for debugging and tracking requests across different processes and services. Can be used for identifying performance bottlenecks. Need to be sampled due to their very data-heavy nature. Not optimized for aggregation. Cannot precisely know about the distribution of data (detecting outliers). Events are time-series (irregular) data. Occur in temporal order, but the interval between occurrences are inconsistent and sporadic. Used for reporting and alerting on important or critical events such as errors, crashes, etc. Logs, metrics, and traces each prematurely optimize one thing and comprise another thing based on a premise upfront. You do NOT want: Writing duplicate data into three different places. Copy-pasting IDs from tool to tool trying to track down a single problem! Paying for three (four) different services doing almost the same thing! You want: One source of truth for your observability data. Looking at high-level dashboards, spot anomalies, and zoom in to get detailed information as needed. You are either throwing away data at ingestion time by aggregating or you are throwing away data after that by sampling. Presentation\n","permalink":"https://milad.dev/posts/observability-overview/","summary":"TL;DR\nObservability is the ability to understand what is going on in the inner workings of a system just by observing it from the outside. Your software should explain itself and what is doing! Pillars of observability are logs, metrics, traces, and events. Logs are structured logging or non-structured textual data. Used for auditing and debugging purposes. Very expensive at scale. Cannot be used for real-time computational purposes. Hard to track across different and distributed processes.","title":"An Overview of Observability"},{"content":"Theory What is a Language? Every Language is defined by specifying four sets:\nAlphabet The most primitive building block of a language is its alphabet. An alphabet is a finite set of symbols. The alphabet for English consists of letters A to Z (both capital and small) as well as punctuation marks. The alphabet for a programming language includes characters A to Z, a to z, and other characters such as -, +, *, /, \u0026lsquo;, \u0026ldquo;, etc.\nWords Having defined the alphabet for a language, we can define the words of that language. Each word is a combination of symbols from the language\u0026rsquo;s alphabet. The set of words for a language can be finite or infinite. The set of words in English is finite while the set of words in a programming language, with identifiers with arbitrary length, is infinite.\nGrammer The grammar of a language is a set of rules determining how sentences in the language can be constructed. The English grammar allows a sentence like \u0026ldquo;Olivia goes to work\u0026rdquo; while it rejects a sentence like \u0026ldquo;Oliva to home went\u0026rdquo; (although all words are in English). The Go programming language allows a statement like var i int while it does allow a statement such as int i.\nSemantic The semantics of a language, also known as type system, determines which sentences are meaningful and which ones are not. In English, \u0026ldquo;Olivia eats an apple\u0026rdquo; makes sense while \u0026ldquo;An apple eats Olivia\u0026rdquo; does not! This is because in English an object with type human can eat an object with type fruit while an object with type fruit cannot eat an object with type human!\nDifferent Types of Grammars and Languages In the theory of formal languages, there are four types of grammars and respectively four types of languages. These classes of formal grammars have been formulated by Noam Chomsky in 1956.\nType-3 Type-3 grammars, also known as regular grammars, generate regular languages. These languages can be represented and decided by a finite state machine. In a finite state machine, the next state is the function of current state and the input. Equivalently, these languages can be denoted by regular expressions.\nType-2 Type-2 grammars, also known as context-free grammars, generate context-free languages. Type-2 grammars are a superset of type-3 languages. These languages can be represented and decided by a non-deterministic pushdown automaton. In a pushdown automaton, the next state is the function of current state, the input, and the top of the stack. Most programming languages are context-free languages (and its subset of deterministic context-free languages).\nBNF (Backus normal form) is a notation for defining context-free grammars. We will also use this notation for defining the grammar for our language.\nType-1 Type-1 grammars, also known as context-sensitive grammars, generate context-sensitive languages. Type-1 grammars are a superset of type-2 languages (thus type-3 languages as well). These languages can be represented and decided by a linear bounded automaton.\nType-0 Type-0 grammars, also known as unrestricted grammars and recursively enumerable, contains all formal grammars. They generate all languages that can be represented and decided by a Turing machine.\nWhat is a Compiler? A compiler is a computer program that translates a source code written in one programming language (source language) into another programming language (target language). Compilers are usually used for translation from a higher-level programming language (Go, C/C++, etc.) to a lower-level one (assembly, binary code, etc.) for creating an executable program.\nCompilers Architecture Compilers benefit from a moduler design. Very high level, compilers have three modules: front-end, middle-end, and back-end.\nFront-end The front-end of a compiler takes the source code as input and generates an intermediate representation for it. The front-end has four sub-modules: lexer (scanner), parser, semantic analyzer, and intermediate code generator. Hence, lexers are usually implemented using finite state machines.\nLexer or scanner takes the source code as a stream of characters, tokenize them, and returns a stream of tokens as output. If you remember, a language is defined by four sets. Characters are defined by language alphabet and tokens are defined by language words. Tokens are language keywords, valid identifiers, literals, and so on. Tokens are defined using regular expressions (type-3).\nThe parser is the sub-module that understands the language grammar. The parser takes the stream of tokens and generates an abstract syntax tree. Abstract syntax tree is a data structure that has all sentences (expressions) of your program. The syntax of a language is usually defined using context-free grammars (type-2) and in BNF (Backus normal form).\nThe semantic analyzer is the sub-module that understands the language semantic. It traverses the AST (abstract syntax tree) and ensures sentences (expressions) are semantically valid using the type system. The output of this module is an intermediate representation which is the AST with additional metadata, properties, and resolved references using symbol table.\nThe front-end of a compiler is the core of a compiler. It translates the source code into a machine-independent code that can later be translated into machine-dependent code.\nMiddle-end The middle-end of a compiler takes the raw intermediate representation and optimize it. The optimization is still independent of the target machine.\nBack-end The back-end of a compiler takes the (optimized) intermediate representation and generates a machine-dependent code that can be executed. It has two sub-modules. The first sub-module generates machine code and the second sub-module optimize the generated machine code. The end result is an assembly code targeted and optimized for a specific architecture (386, amd64, arm, arm64, \u0026hellip;).\nTools In this section, we briefly review some of the widely-used tools available for building compilers.\nLex/Flex Lex is a program for generating and creating lexers and scanners. Flex (fast lexical analyzer generator) is an alternative to Lex. Flex takes an input file that defines all valid tokens (words) of a language and generates a lexer in C/C++. You can find an example of using Flex here.\nYacc/Bison Yacc (Yet Another Compiler-Compiler) is a program for creating and generating LALR parsers (Look-Ahead LR parser). Bison is a version of Yacc. Yacc/Bison are used together with Lex/Flex. Bison reads a specification of a context-free language written in BNF and generates a parser in C/C++. You can find an example of using Bison here.\nLLVM LLVM (The LLVM Compiler Infrastructure) is an open-source project providing a set of tools for building a front-end for any programming language and a back-end for any target instruction-set (architecture). LLVM is built around a machine-independent intermediate representation which is a portable high-level assembly language. You can find the examples and tutorials here.\nBuilding A Compiler in Go Now it is time for the fun part! We want to build a micro-compiler in Go.\nOur language is a logical formula of labels with ANDs and ORs. For example, ((sport,soccer);(music,dance)) means we want all contents either related to sport and soccer or related to music and dance. Our compiler is going to create a SQL query for a given label formula. You can find the source code for this example here\nDefining The Language Let\u0026rsquo;s first define our language. Our language has a very simple syntax for defining a logical formula of labels for querying different contents. For example (food,health) means that we want all products belonging to food and health categories. Or, (store-1;store-2) means that we are interested in all products that are available either in store 1 or store 2.\nFor defining a language, we need to specify four sets: alphabet, words, grammer, and semantic. We define these four sets informally as follows:\nalphabet = unicode characters, digits, -, ,, ;, (, and ) words = (, ), ,, ;, and \u0026lt;labels\u0026gt; (a unicode char followed by unicode chars, digits, or/and -) grammar = a sequence of \u0026lt;labels\u0026gt; separated by , or ; and enclosed by ( and ) semantic = empty set (no semantic and no type-system) Examples of letters in the alphabet are a, b, c, m, x, 0, 1, 7, -, (, ). Examples of words are food, health, store-1, store-2, (, ). Examples of sentences (based on words and grammar) are (food,health), (store-1;store-2).\nBuilding The Compiler For building a compiler, we need to build a scanner and a parser. The parser will create the abstract syntax tree (intermediate representation). We then use AST to generate our target code.\ngoyacc is a version of Yacc for building parsers and syntax analyzers in Go. For building the scanner in Go, there is no equivalent of Lex. You have a few options. Depending on how complex is your language, you can create your own scanner/lexer using available packages such as text/scanner. There are also tools like Ragel for generating lexical analyzers in Go. Many times, using a tool for generating your lexer is overkill and generated lexers are not usually as fast as hand-written ones.\nParser We first start by formally defining our language grammar (syntax) and creating a parser for it.\nHere is our language grammer in in BNF (Backus normal form):\nformula ----\u0026gt; \u0026#39;(\u0026#39; expr \u0026#39;)\u0026#39; expr ----\u0026gt; label | \u0026#39;(\u0026#39; expr \u0026#39;)\u0026#39; | expr \u0026#39;,\u0026#39; expr | expr \u0026#39;;\u0026#39; expr Our grammar has four rules.\nThis grammar is ambiguous in the sense that precedence of operators (, and ;) are not specified, and we don\u0026rsquo;t know if we should group operands of these operators from the right or the left (associativity). We show these ambiguities using an example:\nGrouping l1;l2;l3;l4 Left \u0026ndash;\u0026gt; ((l1;l2);l3);l4) Right \u0026ndash;\u0026gt; (l1;(l2;(l3;l4))) Precedence l1,l2;l3,l4 , over ; \u0026ndash;\u0026gt; (l1,l2);(l3,l4) ; over , \u0026ndash;\u0026gt; l1,(l2;l3),l4 To address these ambiguities, we give , and ; equal precedence and group them from left to right.\nHere is our language grammar in Yacc (parser.y):\n%{ package goyacc func setResult(l yyLexer, root *node) { l.(*lexer).ast = \u0026amp;ast{ root: root, } } %} %token OPEN %token CLOSE %token \u0026lt;op\u0026gt; OP %token \u0026lt;label\u0026gt; LABEL %type \u0026lt;node\u0026gt; formula %type \u0026lt;node\u0026gt; expr %left OP %start formula %union{ op string label string node *node } %% formula: OPEN expr CLOSE { setResult(yylex, $2) } expr: LABEL { $$ = \u0026amp;node{typ: label, val: $1} } | OPEN expr CLOSE { $$ = $2 } | expr OP expr { $$ = \u0026amp;node{typ: op, val: $2, left: $1, right: $3} } Let\u0026rsquo;s debunk our grammar.\n%{ package goyacc func setResult(l yyLexer, root *node) { l.(*lexer).ast = \u0026amp;ast{ root: root, } } %} Whatever defined between %{ and %} will be inserted to the generated source code. In this case, we are specifying our package name and adding a helper function to create an abstract syntax tree.\nNext, we define our tokens and types:\n%token OPEN %token CLOSE %token \u0026lt;op\u0026gt; OP %token \u0026lt;label\u0026gt; LABEL %type \u0026lt;node\u0026gt; formula %type \u0026lt;node\u0026gt; expr In our language, we have four types of tokens: OPEN, CLOSE, OP, and LABEL. Notice , and ; are defined as one token (OP). OPis tagged with \u0026lt;op\u0026gt; and LABEL is tagged with \u0026lt;label\u0026gt;. These tags are correspondent to union fields that we will see shortly.\nWe also have two types: formula and expr. Types are non-terminal (non-leaf) nodes in our abstract syntax tree. When types are present, type-checking is performed.\nThen, we specify associativities and precedences for our grammar constructs:\n%left OP In this case, our operators (, and ;) have equal precedence and they are left-associative.\nNext, we specify the start symbol:\n%start formula formula is a non-terminal and the root of our abstract syntax tree.\nWhen our parser reads the stream of tokens: TODO:\nFor each token (terminals or leaves in AST), the value of the token (lexeme) needs to be returned. For each grammar rule (non-terminals in AST), a node needs to be returned. We specify these values using a construct called union (In C/C++ the generated code will be a union and in Go it will be a struct).\n%union{ op string label string node *node } Here, op and label are terminals or leaves in our abstract syntax tree. Notice that we don\u0026rsquo;t have any terminal or leaf for OPEN and CLOSE tokens. They are only used for specifying associativity and precedence to create the abstract syntax tree in a deterministic non-ambiguous fashion. They are implied in the structure of the abstract syntax tree.\nIf you remember, OP token is tagged with \u0026lt;op\u0026gt;. When our lexer reads an OP token, it sets the op field of the union to the actual value of token (, or ; lexeme). Similarly, when our lexer reads a LABEL token, it sets the label field of the union to the actual value of the token (lexeme). Using these tags, we are telling the parser where to find the actual values of tokens. OPEN and CLOSE are tagged because we know OPEN always means ( and CLOSE always means ).\nnode is the type for non-terminal nodes in our abstract syntax tree. Later, we define the node struct in our lexer as follows:\ntype node struct { typ nodeType val string left, right *node } Each node is either a terminal (leaf) with value (val) or a non-terminal operator with left and right operands.\nFinally, we define our language grammar in BNF after %%:\nformula: OPEN expr CLOSE { setResult(yylex, $2) } expr: LABEL { $$ = \u0026amp;node{typ: label, val: $1} } | OPEN expr CLOSE { $$ = $2 } | expr OP expr { $$ = \u0026amp;node{typ: op, val: $2, left: $1, right: $3} } Upon evaluation of each rule (non-terminals), we need to return a node type. The node that has to be returned is defined between { and } after each rule.\nWhen parser sees expr --\u0026gt; expr OP expr rule, we return the following node:\n$$ = \u0026amp;node{ typ: op, val: $2, left: $1, right: $3, } $$ is the variable that we are supposed to set the result on it. op is a constant we define in our code. $1, $2, and $3 are the values for tokens and types in the right side of rule (values for expr, OP and expr respectively).\nSince formula is marked as %start, when we see the formula --\u0026gt; OPEN expr CLOSE rule, we need to complete our AST. We do this by calling setResult function. yylex is the lexer we pass to the parser.\nNow, we can generate the source code for our parser by running the following command:\ngoyacc -l -o parser.go parser.y Now, let\u0026rsquo;s take a look at the generated file (parser.go). The following function is what we are interested in:\nfunc yyParse(yylex yyLexer) int { ... } This function receives an input parameter of type yyLexer which is an interface defined as:\ntype yyLexer interface { Lex(lval *yySymType) int Error(s string) } Now, we know what methods our lexer should implement (Lex and Error). The parser will pass an input parameter of type *yySymType to Lex method. yySymType is a struct defined as:\ntype yySymType struct { yys int op string label string node *node } You can notice that this struct is generated based on the union construct we defined in our parser.y.\nLexer Here is the source code for our lexer (lexer.go):\n//go:generate goyacc -l -o parser.go parser.y package goyacc import ( \u0026#34;errors\u0026#34; \u0026#34;io\u0026#34; \u0026#34;text/scanner\u0026#34; \u0026#34;unicode\u0026#34; ) type nodeType int const ( op nodeType = iota + 1 label ) const ( orOp = \u0026#34;OR\u0026#34; andOp = \u0026#34;AND\u0026#34; ) // node represents a node in abstract syntax tree type node struct { typ nodeType val string left, right *node } // ast is the abstract syntax tree for a label formula type ast struct { root *node } func parse(name string, src io.Reader) (*ast, error) { l := newLexer(name, src) yyParse(l) // generated by goyacc return l.ast, l.err } // lexer implements yyLexer interface for the parser generated by goyacc type lexer struct { s scanner.Scanner err error ast *ast } func newLexer(name string, src io.Reader) *lexer { var s scanner.Scanner s.Init(src) s.Filename = name // Accept tokens with \u0026#34;-\u0026#34; s.IsIdentRune = func(ch rune, i int) bool { return unicode.IsLetter(ch) || unicode.IsDigit(ch) \u0026amp;\u0026amp; i \u0026gt; 0 || ch == \u0026#39;-\u0026#39; \u0026amp;\u0026amp; i \u0026gt; 0 } return \u0026amp;lexer{ s: s, } } func (l *lexer) Error(msg string) { l.err = errors.New(msg) } // yySymType is generated by goyacc func (l *lexer) Lex(lval *yySymType) int { if token := l.s.Scan(); token == scanner.EOF { return -1 } lexeme := l.s.TokenText() switch lexeme { case \u0026#34;(\u0026#34;: return OPEN // generated by goyacc case \u0026#34;)\u0026#34;: return CLOSE // generated by goyacc case \u0026#34;,\u0026#34;: lval.op = andOp return OP // generated by goyacc case \u0026#34;;\u0026#34;: lval.op = orOp return OP // generated by goyacc default: lval.label = lexeme return LABEL // generated by goyacc } } The code for lexer makes things more clear, but let\u0026rsquo;s go through a few important things.\nThe constants we used to return a node upon evaluation of a rule are defined here as:\nconst ( op nodeType = iota + 1 label ) The values of OP are defined as:\nconst ( orOp = \u0026#34;OR\u0026#34; andOp = \u0026#34;AND\u0026#34; ) As we already see, node is defined as:\ntype node struct { typ nodeType val string left, right *node } We are using the built-in text/scanner package for tokenizing and creating our lexer. A Scanner by default skips all white spaces and recognizes all identifiers as defined by the Go language specification. In our language, labels can have - between characters and digits. We need to change the default behavior of Scanner, so it can recognize our labels properly. We do this by defining a custom fucntion called IsIdentRune as follows:\ns.IsIdentRune = func(ch rune, i int) bool { return unicode.IsLetter(ch) || unicode.IsDigit(ch) \u0026amp;\u0026amp; i \u0026gt; 0 || ch == \u0026#39;-\u0026#39; \u0026amp;\u0026amp; i \u0026gt; 0 } Here we say a character (rune) is part of our label if\nIt\u0026rsquo;s a letter It\u0026rsquo;s a digit and it\u0026rsquo;s not the first character in the identifier It\u0026rsquo;s a hyphen and it\u0026rsquo;s not the first character in the identifier Lex is called by yyParse() function and an input parameter of type *yySymType is passed to it every time (lval). yySymType is generated using our union definition. Using our scanner, we scan an identifier and then we figure out which type of token is that. If the token is an OP, we set a value (either orOp or andOp) for it to op field of lval (we tagged OP token with \u0026lt;op\u0026gt;). Likewise, if the token is a LABEL, we set its value to label field of lval (we tagged LABEL token with \u0026lt;label\u0026gt;).\nBackend Our compiler backend traverses an AST (abstract syntax tree) and creates a SQL query. Here is the source code for our compiler backend:\npackage goyacc import ( \u0026#34;fmt\u0026#34; \u0026#34;strings\u0026#34; ) var opToANDOR = map[string]string{ orOp: \u0026#34;OR\u0026#34;, andOp: \u0026#34;AND\u0026#34;, } // Formula represents a formula for labels type Formula struct { ast *ast } // FromString creates a new instance of formula from a string func FromString(str string) (*Formula, error) { src := strings.NewReader(str) ast, err := parse(\u0026#34;formula\u0026#34;, src) if err != nil { return nil, err } return \u0026amp;Formula{ ast: ast, }, nil } // postgresQuery traverses an AST in an LRV fashion and creates a SQL query func postgresQuery(n *node, table, field string) string { // Label (leaf node) if n.typ == label { return fmt.Sprintf(`EXISTS (SELECT 1 FROM %s WHERE %s LIKE \u0026#39;%%%s%%\u0026#39;)`, table, field, n.val) } // Op node if n.typ == op { leftQuery := postgresQuery(n.left, table, field) rightQuery := postgresQuery(n.right, table, field) return fmt.Sprintf(`(%s %s %s)`, leftQuery, opToANDOR[n.val], rightQuery) } return \u0026#34;\u0026#34; } // PostgresQuery constructs a PostgreSQL query for the formula func (f *Formula) PostgresQuery(table, field string) string { where := postgresQuery(f.ast.root, table, field) query := fmt.Sprintf(`SELECT * FROM %s WHERE (%s)`, table, where) return query } The code is pretty straightforward. We traverse the abstract syntax tree recursively and construct the SQL query.\nRunning Tests You need to first install goyacc. You can install it using go get command as follows:\ngo get -u golang.org/x/tools/cmd/goyacc Then you need to generate parser.go from parser.y as follows:\ngo generate Now, you run the tests or import your compiler from this package.\nConclusion Compilers are an integral part of our experience as developers. They translate a program written in one language into another. A programming language is defined by specifying four sets: alphabet, words, syntax, semantic.\nA compiler has three parts: front-end, middle-end, and back-end. The front-end translates source code into a machine-independent intermediate representation (IR). The middle-end optimizes the machine-independent intermediate representation. The back-end translates the machine-independent intermediate representation into a machine-dependent executable code. This modular design of compilers allows compilers developers to build their front-end independent of middle-end and back-end.\nReferences Chomsky hierarchy Compiler Wikipedia Compiler Architecture LLVM Wikipedia Tutorials The Bison Parser Algorithm Writing A Compiler In Go How to Write a Parser in Go Lexical Scanning in Go A look at Go lexer/scanner packages Lexing with Ragel and Parsing with Yacc using Go How to write a compiler in Go: a quick guide ","permalink":"https://milad.dev/posts/compilers-in-go/","summary":"Theory What is a Language? Every Language is defined by specifying four sets:\nAlphabet The most primitive building block of a language is its alphabet. An alphabet is a finite set of symbols. The alphabet for English consists of letters A to Z (both capital and small) as well as punctuation marks. The alphabet for a programming language includes characters A to Z, a to z, and other characters such as -, +, *, /, \u0026lsquo;, \u0026ldquo;, etc.","title":"Compilers 101 in Go"},{"content":"TL;DR\nObservability refers to three different things: logs, metrics, and traces. The problem with logs is that you have to know what to search for before you know what the problem is! The problem with metrics is they are aggregated by time and you cannot break them down by high-cardinality dimensions (like user id for example). Logs, metrics, traces, and events they each prematurely optimize one thing and comprise another thing based on a premise upfront. You don\u0026rsquo;t want to write your observability data to many different places and copy-paste IDs from tool to tool trying to track down a single problem! You want one source of truth and you want to be able to go from very high-level dashboards to very low-level data. According to control theory definition, observability is the ability to understand what is going on in the inner workings of a system just by observing it from the outside. Libraries that you build into your code should give you insights from the inside out (the software should explain itself). Observability total cost should be 10 to 30 percent of the infrastructure cost. You are either throwing away data at ingestion time by aggregating or you are throwing away data after that by sampling. Observability can be incredibly cost-effective by using intelligent sampling. Software engineers should write operable services and run them themselves! Software engineers need to be on-call for their own systems. This is a way to support software engineers to build an observable and scalable system. Every single alert you get should be actionable. Every time you get paged you should be like this is new, I don\u0026rsquo;t understand this (and not oh that again)! Ops should stop being gatekeepers and blocking people. They have to stop a building castle and they have to start building a playground! Every developer should be looking at prod every day. They should know what is normal, how to debug it, and how to get to a known state! If management is not carving out enough project development time to get things fixed, no on-call situation will ever work! SLOs (service-level objectives) define the quality of service that we agree to provide for users. As long as you hit the SLO line, anything you do in engineering if fine! Everyone gets what they need, nobody feels micromanaged, and nobody feels completely abandoned! SLOs help with defining how much time is enough for improving things! WATCH HERE\n","permalink":"https://milad.dev/gists/charity-majors-on-observability/","summary":"TL;DR\nObservability refers to three different things: logs, metrics, and traces. The problem with logs is that you have to know what to search for before you know what the problem is! The problem with metrics is they are aggregated by time and you cannot break them down by high-cardinality dimensions (like user id for example). Logs, metrics, traces, and events they each prematurely optimize one thing and comprise another thing based on a premise upfront.","title":"Charity Majors on Observability and Quality of Microservices"},{"content":"TL;DR The Open Container Initiative (OCI) is launched in June 2015 by Docker, CoreOS, and other leaders in the container industry. The OCI currently contains two specifications: runtime-spec and image-spec RunC RunC is the runtime for running containers according to the OCI specification (implements OCI runtime-spec). Runc leverages technologies available in Linux Kernel (cgroups and namespaces) to create and run containers. containerd containerd is a daemon and it manages the complete lifecycle of a container on the host operating system. containerd manages image storage and transfer, container execution and supervision, low-level storage, network attachment, etc. containerd uses RunC for creating and running containers from OCI-compatible images. dockerd dockerd (docker-engine) provides an API for clients via three different types of sockets: unix, tcp, and file. dockerd serves all features of Docker platform. dockerd leverages containerd gRPC API for managing containers. containerd-shim containerd-shim allows daemonless containers and acts as a middleman between containers and containerd. Using containerd-shim, runc can exit after creating and starting containers (removes the need for long-running runtime processes for containers). containerd-shim also keeps the STDIO and FDs open for containers in case dockerd or containerd dies. This also allows updating dockerd or containerd without killing the running containers. Docker CLI (docker command) and other Docker clients communicate with dockerd (docker-engine). READ MORE OCI containerd Docker components explained rkt vs other projects How containerd compares to runC ","permalink":"https://milad.dev/gists/docker-components/","summary":"TL;DR The Open Container Initiative (OCI) is launched in June 2015 by Docker, CoreOS, and other leaders in the container industry. The OCI currently contains two specifications: runtime-spec and image-spec RunC RunC is the runtime for running containers according to the OCI specification (implements OCI runtime-spec). Runc leverages technologies available in Linux Kernel (cgroups and namespaces) to create and run containers. containerd containerd is a daemon and it manages the complete lifecycle of a container on the host operating system.","title":"Docker Components Explained"},{"content":"TL;DR The majority (70%) of CVE and vulnerabilities fixed at Microsft are caused by memory corruption bugs in C/C++ code. There are many tools for preventing, detecting, and fixing memory bugs. Developers tend to miss these tools because they are not the first-class citizen of the programming language and their learning curve are high. Developers should worry more about features and less about tooling and security. A memory-safe programming language removes the burden from developers and puts it on language designers. Memory safety is a property of programming languages where all memory accesses are well-defined. Most programming languages are memory-safe by using some of form of grabage collection. System programming languages cannot afford the runtime overhead of using grabage collector. Spatial memory safety is about ensuring all memory accesses are within bounds of the type being accessed. Temporal memory safety is about ensuring pointers still point to valid memory when dereferencing. A data race happens when two or more threads in a process, that one of them at least is a writer, concurrently access the same memory location without any mechanism for exclusive access. Rust is a memory-safe programming language for system programming and high-performance use-cases. Rust provides strong memory safety and it is completely memory safe (except the unsafe keyword). Rust is comparable with C/C++ in terms of performance, speed, control, and predictability. Rust runtime (standard library) depends on libc, but it is optional. (it can be run without an operating system). Rust provides performance, control on memory allocation, and strong memory-safety and empowers developers to write robust and secure programs. Some of the issues with Rust are lack interoperability with C/C++ and limiting the usage of the unsafe superset at scale. Read More Rust A proactive approach to more secure code We need a safer systems programming language Why Rust for safe systems programming ","permalink":"https://milad.dev/gists/safe-system-programming/","summary":"TL;DR The majority (70%) of CVE and vulnerabilities fixed at Microsft are caused by memory corruption bugs in C/C++ code. There are many tools for preventing, detecting, and fixing memory bugs. Developers tend to miss these tools because they are not the first-class citizen of the programming language and their learning curve are high. Developers should worry more about features and less about tooling and security. A memory-safe programming language removes the burden from developers and puts it on language designers.","title":"A Safer System Programming Language (Rust)"},{"content":"TL;DR A study shows there is a cybersecurity attack every 39 seconds. In a typical SSH protocol: the server trusts the client if the client\u0026rsquo;s public key is listed as authorized, and the client trusts the server\u0026rsquo;s public key on first use (TOFU). The trust on first use (TOFU) approach delegates the trust to the clients and leave them vulnerable to man-in-the-middle attacks. One solution to fix this is using SSH certificates and SSH certificate authorities (CA). Many companies take Zero-Trust approach. BeyondCorp is Google\u0026rsquo;s Zero-Trust model that does NOT use a VPN. Uber uses the Uber SSH Certificate Authority (USSHCA) along with a pam module for continued validity of a user. Facebook has implemented its own SSH servers to trust based on certificate authorities (CA). Certificates issued by CA includes all permissions and privileges for each user. Netflix uses Bastion’s Lambda Ephemeral SSH Service (BLESS) certificate authority. BLESS runs on AWS Lambda and uses AWS Key Management Service (KMS). Netflix’s SSH bastion uses SSO to authenticate users and issuing short-lived certificates. Teleport provides role-based access control using existing SSH protocol. READ MORE How Uber, Facebook, and Netflix Do SSH Introducing the Uber SSH Certificate Authority Hackers Attack Every 39 Seconds ","permalink":"https://milad.dev/gists/how-to-do-ssh/","summary":"TL;DR A study shows there is a cybersecurity attack every 39 seconds. In a typical SSH protocol: the server trusts the client if the client\u0026rsquo;s public key is listed as authorized, and the client trusts the server\u0026rsquo;s public key on first use (TOFU). The trust on first use (TOFU) approach delegates the trust to the clients and leave them vulnerable to man-in-the-middle attacks. One solution to fix this is using SSH certificates and SSH certificate authorities (CA).","title":"How Uber, Facebook, and Netflix Do SSH"},{"content":"TL;DR GitOps is an operation model for cloud-native applications running on Kubernetes (created by Weaveworks). To the most part, it is infrastructure-as-code with continuous integration and continuous delivery. The idea is having Git as the source of truth for all operations. A single Git repository describes the entire desired state of the system. Operational changes are made through pull requests. Changes can be peer-reviewed, versioned, released, rolled back, audited, etc. Diff tools detect any divergence and sync tools enable convergence. GitOps can be used for managing Kubernetes clusters since Kubernetes uses declarative resource definitions. Kubernetes secrets can also be stored in Git repo using one-way encryption (take a look at sealed-secrets) GitOps in contrast to CIOps improves your workflow in the following ways: All of your configurations and changes to them are centralized in one place (easier to track, audit, and reason about). Divergences will be detected and the cluster will be converged again automatically (failed deployments will be retried too). GitOps can be done using either a Push approach or a Pull approach. With push your cluster credentials are in your build system whereas with pull no external system has access to your cluster. Read More GitOps GitOps - Operations by Pull Request Kubernetes Anti-Patterns: Let\u0026rsquo;s do GitOps, not CIOps! Managing Secrets in Kubernetes GitOps — Comparison Pull and Push Argo CD ","permalink":"https://milad.dev/gists/gitops/","summary":"TL;DR GitOps is an operation model for cloud-native applications running on Kubernetes (created by Weaveworks). To the most part, it is infrastructure-as-code with continuous integration and continuous delivery. The idea is having Git as the source of truth for all operations. A single Git repository describes the entire desired state of the system. Operational changes are made through pull requests. Changes can be peer-reviewed, versioned, released, rolled back, audited, etc. Diff tools detect any divergence and sync tools enable convergence.","title":"GitOps?"},{"content":" Recap Site reliability engineering is Google\u0026rsquo;s approach to service management. If you think of DevOps more of as a culture, as a mindset, or as a set of guidelines, SRE is a framework that implements DevOps. This book is more like a collection of essays with a single common vision.\nSRE teams consist of people with software engineering skills and operation knowledge. Google places a 50% cap on all operation (ops) work aggregated for all SREs, and the remaining 50% should be spent on development work for the purpose of automation. Ideally, SRE teams should spend all of their time and capacity on development as opposed to operations. This is possible if services are autonomous and run and repair themselves. Monitoring the amount of operational work done by SREs is necessary. Once the 50% cap is reached, extra operational work should be redirected and delegated to development teams.\nThe following is a summary of some of the key chapters in this book.\nRisk and Error Budget Site reliability engineering tries to balance the risk of service unavailability with development velocity. SRE manages service reliability mostly by managing the risk. One metric to measure the risk tolerance of service is availability. 100% reliability or availability is never right! It is impossible to achieve and most likely is not what a user needs. As reliability increases, the cost does not increase linearly. From 3 nines availability to 5 nines, the cost may increase by the factor of 100! In the context of SRE, availability is not just uptime. It is measured as request success rate, latency, and so on. Google sets quarterly availability targets for services and tracks them against those target on a monthly, weekly, and sometimes daily basis. When setting availability targets, the background error rate (availability of the underlying network, infrastructure, etc.) should be taken into account.\nOne very interesting and useful concept defined by SRE framework is the concept of error budget. Once the product team defines an availability target, the error budget is one minus the availability target. Error budget provides a common incentive and a quantitative measure for both development and SRE teams to find the right balance between innovation and reliability.\nSLI, SLO, and SLA A service-level indicator (SLI) is a quantitative measure for some aspects of the level of service provided. It can be defined as request latency, request throughput, error rate, etc.\nA service-level objective (SLO) is a specific target value or range of values for a service level measured by an SLI. SLO sets the expectations for the user of a service. Choosing an appropriate SLO is not a straightforward task. Do NOT pick an SLO based on the current performance of the system. Keep your SLOs few and simple and do NOT strive for perfection!\nA service-level agreement (SLA) is an implicit or explicit contract that tells users what are the consequences of not meeting the SLOs.\nSLIs for services usually can be chosen from one of the following categories:\nAll systems should care about correctness. User-facing systems generally care about availability, latency, and throughput. Storage systems often deal with latency, availability, and durability. Data processing services usually care about throughput and end-to-end latency. Average metrics may seem simple and good enough, but they hide the details and they do not reveal anything about special cases. For example, when a burst of requests come or when the system is overloaded. Research has shown that users prefer a slightly slower system to a system with high variance. Thus, the distribution of data is important. Using percentiles for SLIs take the shape of the distribution into account as well.\nSLOs represent users expectations, so they should be used as a driver for prioritizing work for SREs and product developers. As far as SLOs are met, new risk can be taken and development velocity can increase. If any of the SLOs is violated, all effort should be spent on meeting the SLOs again. Define your SLIs, sets your SLOs against these SLIs, monitor your SLIs and SLOs, and alert on violation of SLOs. When defining SLOs, keep a safety margine and do NOT overachieve!\nAutomation Automation is superior to manual operation in software, but more importantly, we need to build autonomous systems. Automated systems serve as platforms that can be extended for applying to more system and use-cases. In many cases, they can be productized for the benefits of the business and profit as well.\nNeedless to say, automation brings many values: consistency, reliability, scaling, quickness, time, cost, and so forth. It is not pragmatic to automated every aspect of a system for different reasons. Automated systems should not be maintained separately from core systems; otherwise, they start to diverge and fail.\nAutonomous systems do not require human intervention and automation of operations.\nAutomate yourself out of a job! Automating all the things and everything that can be automated. Such improvements have a cascading effect. The more time you spend on automating, the more time you would have for more automation and optimization.\nSimplicity _Simplicity is a prerequisite to reliability. Constantly strive for eliminating complexity in the systems.\nEssential complexity and accidental complexity are very different! Essential complexity is an inherent property of the problem and cannot be removed while accidental complexity is associated with the solution and can be removed.\nSoftware bloat refers to the situation in which a piece of software getting bigger, slower and harder to maintain over time as a result of adding more features, codes, and case-specific logics.\nDo not build something that will be not used. Do not comment code or flag it for possible use in the future! In software, less is more! Creating clear and minimal APIs are an integral aspect of having a simple system. Modularity, versioning, well-scoped functionalities, releasing smaller batches, etc. are all examples of requirements for simplicity.\nMonitoring and Alerting Monitoring is the process of collecting, aggregating, processing, and presenting real-time quantitative data about a system. A monitoring system should determine \u0026ldquo;what is broken\u0026rdquo; and \u0026ldquo;why is broken\u0026rdquo;. The symptoms can be used for answering \u0026ldquo;what is broken\u0026rdquo; and the cause can be used for answering \u0026ldquo;why is broken\u0026rdquo;.\nChoose an appropriate resolution for measurements. Your SLOs can help with choosing the right resolution. Google prefers simple monitoring systems and avoids magic systems that try to learn threshold and automatically detect causality.\nFor golden signals that Google defines for paging humans are the following:\nLatency: how long it takes for requests to be fulfilled. Error: the rate of requests that fail. Traffic: how much demand is currently being placed on your system. Saturation: the utilization of available resources to your system. Every page that happens distracts a human from improving the system and building more automation! Pages should have enough context, should be actionable, and should require human intelligence. If a page can be resolved by a robotic response, it the response should be automated and there should not be a page for it. When a system is not able to automatically fix itself, we can notify a human to investigate the issue.\nYour alerting and paging system should keep the noise low and signal high. Rules that generate alerts for people should be simple to understand and represent a clear failure. The rules should allow a minimum period in which the alerting rule is true before firing an alert (to prevent the flapping situation).\nIt is also very important to monitor a system from the user point of view to make sure we monitor the actual user experience and what the user sees (black-box monitoring). Making sure that the cost of maintenance scale sublinearly with the size of service and number of services, is the key to make monitoring and alerting maintainable.\nPostmorterm Culture The primary goals of writing a postmortem are to ensure:\nThe incident is documented All root cause(s) are well understood Preventive actions are put in place Every incident is an opportunity to improve and harden the system. Postmortems must be blameless. They should focus on the contributing root cause(s) of the incident without mentioning any individual or team. A blameless postmortem culture gives people the confidence to escalate an incident without fear.\nTesting For Reliability Traditional tests:\nUnit tests Integration tests System tests: Smoke tests Performance tests Regression tests Production tests:\nConfiguration test Stress test Canary test Overloads and Cascading Failures Clients and backend applications should be built to handle resource restrictions gracefully. In case of overload, they redirect when possible, serve degraded results, and handle errors transparently.\nIn case of a global overload, the service should only return errors to misbehaving clients and keep others unaffected. Service owners should provision capacity for their services based on usage quotas per client assuming that not all of the clients are going to hit their limits simultaneously. When a client hits its quota, the backend service should quickly reject the requests. When the client detects that a large number of requests are rejected due to insufficient quota, it should start to self-regulate and cap the number of outgoing requests it makes. Adaptive throttling works well with clients that make frequent (as opposed to sporadic) requests.\nWhen the utilization approaches the configured threshold, the requests should be rejected based on their criticality. Requests with higher criticalities should have higher thresholds. Usually, a small number of services are overloaded. In this case, failed requests can be retried with a retry budget. If a large number of services are overloaded, failed requests should not be retried and errors should bubble up all the way to the caller.\nBlindly retrying requests can have a cascading effect and lead to even more overload and failures. Requests should be retried at the layer immediately above the layer that rejected the requests (return an error code implying the downstream service is overloaded and do not retry). Limit retries per request and always use randomized exponential backoff when scheduling retries.\nThe most common cause of cascading failures is overload. Processor, memory, threads, and file descriptors are examples of resources that can be exhausted. When a couple of services become unavailable due to overload, the load on other services increases and cause them to become unavailable too.\nTo prevent service overload, the following measures can be taken:\nPerform capacity planning Load test the capacity limits of services, and test the failure mode for overload Serve degraded responses Reject requests when the service is overloaded Upstream systems should reject the requests, rather than overloading the downstream ones: At the reverse proxies At the load balancers Load shedding refers to dropping some amount of load (incoming requests/traffic) when a service approaches overload conditions. Graceful degradation refers to reducing the amount of work required for fulfilling a request when a service is overloaded. Make sure you monitor and alert when services entering any of these modes.\nLong deadlines can result in resource consumption in upstream systems while downstream systems having problems. Short deadlines also can cause expensive requests to fail consistently (including retries). Sometimes services spend resources on handling requests that will miss their deadlines (retries will cause more resource waste in turn). Services should implement deadline propagation. For the requests that get fulfilled in multiple stages, at every stage, every service should check how much time is left before trying to work on the request. Also, consider setting an upper bound for outgoing deadlines. Services should also implement cancellation propagation to prevent unnecessary work being done by downstream services.\nYou should test your services to understand how they behave when approaching overload and when overloaded. Under overload situation, the service starts serving errors and/or degraded results, but the rate at which requests are served successfully should not reduce significantly. You should also test and understand how your services return back to normal load situation. Furthermore, you should test and understand how clients use your services.\nSome of the factors that can trigger a cascading failure:\nProcess death Process updates New rollouts Organic growth Changes in request profile Changes in resource limits Here are some immediate measures you can take in response to a cascading failure:\nIncrease resources (scale up vertically) Temporarily disable health checks until all the services are stable Restart services Drop traffic Enter degraded modes Eliminate batch load (non-critical offline jobs) Eliminate bad traffic (requests creating heavy load or causing crashes) Data Integrity From the user experience point of view, data loss and data corruption are same as service unavailability. data integrity means services in the cloud remain accessbile and usable to users. data integrity is the means for achieving data availability. The key to having data integrity is proactive prevention, early detection, and rapid recovery of data corruptions.\nA data integrity strategy can be chosen with respect to the following requirements for an application:\nUptime Latency Scale Velocity Privacy If a combination of transactional (ACID) and eventual consistency (BASE) systems are used, the recovered data may not be necessarily correct. Moreover, with non-disruptive deployments and zero-downtime migrations, different versions of business logic may change data in parallel. Incompatible versions of independent services may act on data momentarily and causing data corruption or data loss.\nThe factors of data integrity failure modes are root cause, scope, rate.\nArchives cannot be used for recovery and they are meant to be for auditing, discovery, and compliance purposes. Backups can be loaded back into an application and are used for disaster recovery (preferably within uptime requirements of a service). Deliver a recovery system, rather than a backup system! Test your disaster recovery process and make sure your backups can be restored in-time when needed.\nreplication and redundancy are not recoverability. point-in-time or time-travel recovery refers to the process of recovering data and artifacts to a unique point in time.\nDefense in depth!. The first layer is soft deletion and lazy deletion which are an effective defense against inadvertent data deletion. The second line of defense is backup and recovery. And the last layer is data validation pipelines.\nProduct Launch Any new code that introduces an externally visible change to an application is a launch. Any launch process should be lightweight, robust, thorough, scalable, and adaptable. Launch checklists are used to ensure consistency and completeness and reduce failures.\nSome very useful techniques for reliable launches are:\nGradual and staged rollouts Using feature flags frameworks Dealing with abusive client behavior Load testing and testing overload behavior ","permalink":"https://milad.dev/books/sre/","summary":"Recap Site reliability engineering is Google\u0026rsquo;s approach to service management. If you think of DevOps more of as a culture, as a mindset, or as a set of guidelines, SRE is a framework that implements DevOps. This book is more like a collection of essays with a single common vision.\nSRE teams consist of people with software engineering skills and operation knowledge. Google places a 50% cap on all operation (ops) work aggregated for all SREs, and the remaining 50% should be spent on development work for the purpose of automation.","title":"Site Reliability Engineering"},{"content":" Recap This book starts with a story about an imaginary tech company that is going through a hard time after an initial success. The board decides to bring a new CEO onboard. She starts to make changes and building a team until she manages to turn the company around.\nThe author defines five dysfunctions that prevent teams from achieving collective result and success. These dysfunctions are not independent of each other. Instead, they are like a pyramid in which each dysfunction creates a base for the next one.\nAbsence of Trust Team members who are not vulnerable to the group and not genuinely open to their mistakes and weaknesses, make it impossible to build a foundation for trust. In context of team building, trust is the confidence among team members that their peers will not use their weaknesses against them. Fear of Conflict In the absence of trust, it becomes very hard for team members to have unfiltered and passionate debates. Healthy discussions and debates are about ideas and concepts as opposed to personality-focused criticisms. Lack of Commitment Without healthy debates and discussions among team members and without everyone having a chance to be heard, team members may not fully commit to decisions made. In the context of a team, commitment is a function of clarity and team buy-in. The desire for consensus and the need for certainty can lead to lack of commitment. Avoidance of Accountability Without the commitment to a clear plan, team members avoid calling their peers on the actions and behaviors that hurt the team. Sometimes, teams members avoid holding one another accountable because they are afraid of interpersonal conflicts or hurting personal relationships. The will hurt the relationships over the time as team members start to concern about the group standards and meeting the expectations. Inattention to Results In the absence of holding one another accountable for team goals, team members put their individual goals above the collective goals of the team. A functional team must make the collective results and performance of the team more important than individual goals and accomplishments. Functional teams:\nThey trust each other and they are open to their mistakes and weaknesses. They engage in passionate and healthy discussions and debates. They disagree and commit to decisions and plans made by the team. They hold each other accountable for delivering against the team plans. They put the collective and team results first. ","permalink":"https://milad.dev/books/five-dysfunctions/","summary":"Recap This book starts with a story about an imaginary tech company that is going through a hard time after an initial success. The board decides to bring a new CEO onboard. She starts to make changes and building a team until she manages to turn the company around.\nThe author defines five dysfunctions that prevent teams from achieving collective result and success. These dysfunctions are not independent of each other.","title":"The Five Dysfunctions of a Team"},{"content":" Recap This book defines software delivery performance and how to measure it. Software delivery performance can be measured by the four following metrics:\nLead time Deployment frequency Mean time to restore (MTTR) Change fail percentage These four measures of software delivery performance are classifiers for three groups:\nHigh performers Medium performers Low performers Finally, the following 24 capabilities are suggested to drive improvement to software delivery performance.\nContinuous Delivery User version control for all production artifacts. Automate your deployment process. Implement continuous integration. Use trunk-based development methods. Implement test automation. Support test data management. Shift left on security. Implement continuous delivery. Architecture Use a loosely coupled architecture. Architect for empowered team. Product and Process Gather and implement customer feedback. Make the flow of work visible through the value stream. Work in small batches. Foster and enable team experimentation. Lean Management and Monitoring Have a lightweight change approval process. Monitor across application and infrastructure to inform business decisions. Check system health proactively. Improve processes and manage work with work-in-progress limits. Visualize work to monitor quality and communicate throughout the team. Cultural Support a generative culture (as outlined by Westrum). Encourage and support learning. Support and facilitate collaboration among teams. Provide resources and tools that make work meaningful. Support or embody transformational leadership. ","permalink":"https://milad.dev/books/accelerate/","summary":"Recap This book defines software delivery performance and how to measure it. Software delivery performance can be measured by the four following metrics:\nLead time Deployment frequency Mean time to restore (MTTR) Change fail percentage These four measures of software delivery performance are classifiers for three groups:\nHigh performers Medium performers Low performers Finally, the following 24 capabilities are suggested to drive improvement to software delivery performance.\nContinuous Delivery User version control for all production artifacts.","title":"Accelerate: The Science of Lean Software and DevOps"},{"content":"TL;DR Microservices are about communicating through APIs! A service mesh defines the communication interface between microservices. In an orchestrated environment (Kubernetes), containers talk to each other on top of overlay networking. Service mesh is a central source of truth for controlling the information flow between microservices. Mesh enables both the scalability benefits of microservices as well as centralized advantages of monoliths. Service meshes come with built-in observability (logging, metrics, and tracing) for microservices communications. Service meshes have built-in support for resiliency features (retries, timeouts, deadlines, and circuit breaking). They also have capabilities such as east-west routing, access control, mTLS, smart load balancing, etc. Data plane refers to the layer allowing data to move between microservices and is implemented using sidecars. A sidecar is an auxiliary container running side-by-side of the main container in your pod. Microservices (main containers) communicate to each other through these sidecar containers. Data plane does things like service discovery, routing, load balancing, health checking, authn and authz. Control plane refers to the layer defining communication rules between microservices. Control plane provides configurations and rules for all running data planes in the mesh. Service mesh interface (SMI) defines a standard API for different services meshes, so they can interoperate. Istio, Linkerd, and Consul are service meshes adopted widely. Read More Service Mesh Ultimate Guide What is a Service Mesh? Intro to Service Meshes: Data Planes, Control Planes, and More You Have a Service Mesh, Now What? Service Mesh Data Plane vs. Control Plane Microservices Mesh — Part I Microservices Mesh — Part II Microservices Mesh — Part III Hello Service Mesh Interface (SMI) Comparing Service Meshes: Istio, Linkerd and Consul Connect Comparing Kubernetes Service Mesh Tools API Gateways and Service Meshes: Opening the Door to Application Modernisation ","permalink":"https://milad.dev/gists/service-mesh/","summary":"TL;DR Microservices are about communicating through APIs! A service mesh defines the communication interface between microservices. In an orchestrated environment (Kubernetes), containers talk to each other on top of overlay networking. Service mesh is a central source of truth for controlling the information flow between microservices. Mesh enables both the scalability benefits of microservices as well as centralized advantages of monoliths. Service meshes come with built-in observability (logging, metrics, and tracing) for microservices communications.","title":"Service Meshes and SMI Demystified"},{"content":"TL;DR Knative is a cloud-native serverless framework for Kubernetes environments. It is created and open-sourced by Google with contributions from other companies (Pivotal, IBM, Lyft, etc.). Unlike current serverless frameworks (AWS Lambda, Azure Functions, \u0026hellip;), Knative eliminates cloud vendor lock-in. Knative usese Kubernetes for container orchestration and Istio service mesh for routing, load balancing, etc. Knative has three components: Build, Serving, and Eventing. Build: builds containers from source code on Kubernetes (on-cluster container builds). Eventing: enables a scalable event-driven system for management of events between producers and consumers. Serving: provides an abstraction for deployment, gradual rollouts, autoscaling, and configuring Istio components. Cloud Run is a managed Knative service offered by Google Cloud. Read More Knative Cloud Run Knative: The Serverless Environment for Kubernetes Fans Knative: A Complete Guide Hands on Knative — Part 1 Hands on Knative — Part 2 Hands on Knative — Part 3 ","permalink":"https://milad.dev/gists/knative/","summary":"TL;DR Knative is a cloud-native serverless framework for Kubernetes environments. It is created and open-sourced by Google with contributions from other companies (Pivotal, IBM, Lyft, etc.). Unlike current serverless frameworks (AWS Lambda, Azure Functions, \u0026hellip;), Knative eliminates cloud vendor lock-in. Knative usese Kubernetes for container orchestration and Istio service mesh for routing, load balancing, etc. Knative has three components: Build, Serving, and Eventing. Build: builds containers from source code on Kubernetes (on-cluster container builds).","title":"What is Knative?"},{"content":"I have been using and evaluating dozens of GitHub Marketplace Apps for a few months now for a real-world microservices application built in Go. So, I decided to share what I liked and what I didn\u0026rsquo;t like about these integrations.\nThe nice thing about using GitHub Marketplace is that your integrations and billing are all consolidated in one place. As an organization or a billing manager, it is much easier to manage all these different services from a single hub.\nCI/CD GitHub Actions https://github.com/features/actions Features: Built-in automation solution for GitHub repositories HCL-based configuration Pros: Very simple, highly flexible and so powerful Caching Docker images based on image digests Sharing and reusing actions by referencing them directly from GitHub repositories Cons: Currently, it is in beta mode UI/UX is very basic Needs more work to build a full automation Sharing files between actions does not have built-in support CircleCI https://circleci.com Features: Managed/SaaS platform for continuous integration and continuous delivery YAML-based configuration Pros: Very easy-to-use documentation High performance, availability and reliability Very flexible and powerful (caching, remote docker engines, etc.) Workflows with parallel jobs, fan-in, and fan-out Powerful and fast debugging through ssh to build jobs Exporting artifacts for each build job Sharing artifacts between build jobs very efficiently and fast Provides build environments using containers and/or machines (VMs) Provides build environment for OS X applications Reusing and sharing code fragments through Orbs Slack integration for alerting on failed builds Cons: Mono repo builds do not have built-in support External Orbs need to be packaged and published (they cannot be directly referenced from their code bases) Pipelines are triggered on push to branches and tags (cannot work with other GitHub events such as release) Codefresh https://codefresh.io Features: Managed/SaaS platform for continuous integration and continuous delivery YAML-based configuration Pros: Good documentation with examples Native support for Docker, Kubernetes, and Helm workflows Provides managed private Docker registry and Helm repository Extremely flexible pipelines that work with all GitHub events Workflows with parallel steps, fan-in, and fan-out Built-in Selenium/Protractor for automated end-to-end testing Reusing and sharing code fragments through Steps Slack integration for alerting on failed builds Cons: No built-in environment for OS X applications Steps/plugins need to be submitted (they cannot be directly referenced from their code bases) Docker Automated Builds https://cloud.docker.com Features: Managed/SaaS build and push automation for Docker images Pros: Simple and easy setup Does not require any code Cons: Very inflexible Extremely slow Cannot do semantic versioning Configuration is manual and through web interface Monitoring Rollbar https://rollbar.com Features: Monitoring application errors Automated alerting on errors Real-time insights and analytics Pros: Supports a wide range of programming languages and technologies Tracks code versions and deployments Improves observability and visibility Helps with improving stability Powerful configurations and features Integrations with popular tools (GitHub, Slack, PagerDuty, Asana, etc.) Cons: Requires configuration per repo Programming API is fairly complex User management and access control need to be done separately on web UI Airbrake https://airbrake.io Features: Monitoring application errors Automated alerting on errors Real-time insights and analytics Pros: Supports a wide range of programming languages and technologies Tracks deployments Improves observability and visibility Helps with improving stability Simple UI, simple configuration, and simple programming API Submits onboarding issues on GitHub Integrations with other tools (GitHub, Slack, PagerDuty, Asana, etc.) Cons: Requires configuration per repo User management and access control? Code Review WIP https://github.com/marketplace/wip Features: Blocks work-in-progress pull requests, so you cannot merge them! Pros: very simple and zero configuration Configurable terms, title, body, labels, or commit message Code Climate https://codeclimate.com Features: Code coverage reports Automated code review for code quality Managed/SaaS Pros: Supports a wide range of programming languages GitHub status for code coverage and diff coverage Configuration-as-code through a yml file in repo GitHub PR checks are very clear and informative Automated pull request comments for better code quality and readability Shows covered and uncovered new lines of code in pull requests through a browser extension Quantifies and measures code quality and promotes reducing technical debt through analytics Provides code coverage and code quality badges Cons: Expensive Has its own user access management Most code quality comments are very basic and static, so developers may start to ignore them Codecov https://codecov.io Features: Code coverage reports Managed/SaaS Pros: Pricing is per user Provides code coverage badge Supports organization-wide configurations for all repos Supports configuration-as-code through a yml file in repo Can post coverage reports as comments on pull requests Reuses GitHub user permissions for user access management Cons: No code quality feature Coveralls https://coveralls.io Features: Code coverage reports Managed/SaaS Pros: Pricing is fixed (not per repo) GitHub status for code coverage Provides code coverage badge Cons: Bad UI/UX No code quality feature diff coverage GitHub status is combined with total coverage GitHub status Shows a check on master causing master branch goes red in repos with not enough coverage GolangCI https://golangci.com Features: Automated code review comments on pull requests Pricing is per user Pros: Accurate and useful comments for Golang Configuration as code Cons: Only works for Golang Cannot detect bot users Very basic control panel Some comments can become very noisy Dependency Management Renovate https://renovatebot.com Features: Automated dependency updates Pros: Supports a wide range of programming languages and technologies An out-of-box solution (submits an onboarding pull request) Highly configurable through a configuration file in the repo Supports private npm repositories and packages Cons: Currently, cannot work with private Go modules Can become very noisy with mono repos For disabling the bot with GitHub all repositories* permission approach, the configuration file needs to be kept in repo Dependabot https://dependabot.com Features: Automated dependency updates Pros: Configurable either via a separate web UI or a configuration file in the repo Supports security updates Supports private Go modules Labels pull requests Cons: Requires configuration for each repo Security updates are not available for Go Mono repos require a fairly large deal of manual configuration Finding Vulnerabilities Snyk https://snyk.io Features: Finding and fixing vulnerabilities in your dependencies Pros: Free Automates the process of finding common vulnerabilities and exposures (CVE) Cons: Currently, does not support Go modules Supports Go only through command-line interface (not integrated with GitHub) WhiteSource Bolt https://bolt.whitesourcesoftware.com/github Features: Finding and fixing open source vulnerabilities Pros: Free Automates the process of finding common vulnerabilities and exposures (CVE) Supports configuration through a configuration file in the repo Cons: Currently, does not support Go modules Communication Slack + GitHub https://slack.github.com Features: Updates for pull requests, issues, build checks, etc. Shows details of pull requests and issues directly on Slack Pros: FREE Centralize pull requests and issues and make them visible Easy configuration through /github command on Slack Cons: Could become very noisy Pull Reminders https://pullreminders.com Features: Sends periodic reminders for pull requests to reviewers Pros: Helps with prioritizing pull requests and reduce lead time Flexible configurations for GitHub repos, GitHub teams, and Slack channels Cons: Needs configuration on its own website Can become chatty and cause people to ignore the reminders Documentation GitBook https://www.gitbook.com Features: Managed/SaaS Centralized documentation Pros: Simple and responsive UI Supports bi-directional integration with GitHub Non-technical people can use the WYSIWYG interface Documentation can be searched from Slack Cons: Billing is not through GitHub Work Management ZenHub https://www.zenhub.com Features: Kanban-style project management Customizable workflows Pros: Automated workflow with GitHub pull requests and issues The UI can be viewed on GitHub via a browser extension Provides analytics and metrics (velocity, burndown, etc.) Can have workspace per repo or a workspace with multiple repos Integration with Slack Cons: Cannot delete a workspace once created Modifies the GitHub interface and experience Zube https://zube.io Features: Kanban-style project management Customizable workflows Pros: Automated workflow with GitHub pull requests and issues Provides analytics and metrics (velocity, burndown, etc.) Integration with Slack Cons: The UI is not accessible on GitHub Analytics ","permalink":"https://milad.dev/posts/github-tools/","summary":"I have been using and evaluating dozens of GitHub Marketplace Apps for a few months now for a real-world microservices application built in Go. So, I decided to share what I liked and what I didn\u0026rsquo;t like about these integrations.\nThe nice thing about using GitHub Marketplace is that your integrations and billing are all consolidated in one place. As an organization or a billing manager, it is much easier to manage all these different services from a single hub.","title":"A Comparison of GitHub Marketplace Apps"},{"content":"I have been working with a microservices application using gRPC as the main service-to-service communication mechanism for almost a year. So, I decided to write a blog post and share my experience on how to do gRPC right in a microservices world! So, let\u0026rsquo;s get started!\nTL;DR DRY! Have a package for your common messages. Choose unique names for your gRPC packages. Choose singular names for your gRPC packages. Distinguish your gRPC package names with a prefix or suffix. Implement health check probes as HTTP endpoints. Use a service mesh for load balancing gRPC requests. Centralize your gRPC service definitions in a single repo. Automate updating of your gRPC tools and dependencies. Automate generation of source codes and other artifacts. Microservices Defining what microservices architecture is and whether you need to adopt it or not are beyond the scope of this post. Unfortunately, the term microservices has become one of those buzz words these days. Microservices is neither about the number of microservices, different programming languages, nor your API paradigms! In essence, microservices architecture is Software-as-as-Service done right!\nThe most import thing to get microservices right is the concept of service contract. A service contract is an API that a given service exposes through an API paradigm (REST, RPC, GraphQL, etc.). In the microservices world, the only way for microservices to communicate and share data is through their APIs. A microservice is solely the source of truth for a bounded context or resource that it owns. A service should not break its API (contract) since other services rely on that. Any change to the current major version of API should be backward-compatible. Breaking changes should be introduced in a new major version of that service (technically a new service).\nOne important implication of microservices architecture is the organizational change that it requires and introduces! The same way that microservices are small, independent, and self-sufficient, the same way they can use different technologies and follow different development workflows and release cycles. Each microservice can be owned by a very small team. Different teams (microservices) can adopt slightly different practices (coding style, dependency management, etc.) as far as they do not break their commitment (service contract or API) to other teams (other microservices).\ngRPC gRPC is an RPC API paradigm for service-to-service communications. You can implement and expose your service API (contract) using gRPC. Thanks to grpc-web project, you can now make gRPC calls from your web application too. Topics like the comparison between gRPC and REST or whether you need to implement your service API as gRPC are again out of the scope of this post.\ngRPC itself is heavily based on Protocol Buffers. Protocol Buffers are a cross-platform language-agnostic standard for serializing and deserializing structured data. So, instead of sending plaintext JSON or XML data over the network, you will send and receive highly optimized and compacted bytes of data. Version 1 of Protocol Buffers has been used internally at Google for many years. Since version 2, Protocol Buffers are publicly available. The latest and recommended version of Protocol Buffers is version 3.\ngRPC uses Protocol Buffers to define service contracts. Each service definition specifies a number of methods with expected input and output messages. Using gRPC libraries available for major programming languages, these gRPC protocol buffers can be implemented both as server or client. For compiled programming languages like Go, source codes need to be generated using Protocol Buffers Compiler (protoc) ahead of time.\nArchitecture The Microservices architecture I have been working on consists of roughly 40 microservices all written in Go and containerized using Docker. Since Go is a compiled and statically typed language, all gRPC/protobuf definitions should be compiled and source codes should be generated in advance. An API gateway receives HTTP RESTful requests and backend communications are done through gRPC calls between different microservices.\nChallenges Health Check One immediate question for a service that only talks gRPC is how do we implement health check? Or if you are using Kubernetes as your container platform, how do we implement liveness and readiness probes?\nTo this end, we have two options:\nDefining and implementing the health check probes as gRPC calls. Starting an HTTP server on a different port and implementing health check probes as HTTP endpoints. Implementing the health check as HTTP is straightforward. All external systems can easily work with HTTP health checks. However, setting up a separate HTTP server requires some coordination with gRPC server to ensure that the gRPC server can successfully serve the requests.\nImplementing the health check as another gRPC method is not a challenge itself, but getting the external systems (AWS load balancer, Kubernetes, etc.) to talk to it is the challenging part. This approach has a better semantic since every gRPC service comes with health check and the health check itself is a gRPC request.\nHere are some useful resources on this topic:\nGRPC Health Checking Protocol Health checking gRPC servers on Kubernetes Load Balancing Here is another interesting challenge! How do we do load balancing for services talking gRPC? For answering this question, we need to remember how gRPC is working under the hood.\ngRPC is built on top of HTTP/2 and HTTP/2 uses long-lived TCP connections. For gRPC this means that an instance of gRPC client will open a TCP connection to an instance of a gRPC server, sends the requests to and receives the responses from the same connection, and it keeps the connection open until the connection is closed. Requests are multiplexed over the same connection. This is a big performance improvement since we do not need to go through the overhead of establishing a tcp connection for every request. However, this also means that the requests cannot be load-balanced in the transport layer (L3/L4). Instead, we need to load balance gRPC requests in the application layer (L7).\nFor this purpose, a load balancer component needs to open a long-lived connection per instance, retrieves enough information from Protocol Buffers data being transferred, and then it can load balance gRPC requests.\nNeedless to say, you should not implement an ad-hoc load balancer for your gRPC requests. You should rather use a solution that works for supported programming languages and platforms and addresses additional requirements such as observability.\nSome resources worth reading:\ngRPC Load Balancing gRPC Load Balancing on Kubernetes without Tears Dependency Management Dependency management is another important topic for maintaining microservices in general. gRPC community has done a great job maintaining backward-compatibility between different versions Protocol Buffers compiler (protoc). This has been one of the key factors in making gRPC a successful RPC protocol. protoc plugin for generating Go source codes also has done a good job maintaining backward-compatibility among different versions of protoc and go.\nHowever, from time to time we may see some breaking changes are introduced (of course for a reason). One example I can think of was the introduction of XXX_ fields for generated structs in Go (#276, #607).\nAs a result, if you do not update your gRPC toolchain reguarly, updating them for getting new features and performance improvements in the future will become harder. In the worst case, you may be stuck with using specific old versions of your gRPC compiler and plugins.\nCentralize or Decentralize Protocol Buffers Management? This is another interesting topic since it may not look an important decision. When it comes to managing your gRPC Protocol Buffers and generated files, you have two options:\nKeep Protocol Buffers and generated files on the same repo as the owner microservice (decentralized) Centralizing all Protocol Buffers and generated files in one mono repo. If you think of HTTP-based APIs such as REST, you define your HTTP endpoints per repo. Basically, each repo owns all the definitions regarding its HTTP APIs. This is absolutely a best practice (self-sufficient repos) and complies with Microservices philosophy (self-sufficiency).\nSimilarly, it also makes sense for gRPC service definitions to live inside their own repos. The repo that implements the gRPC server for a given gRPC service, owns the gRPC service definitions alongside the corresponding generated source codes (if required). Other repos that want to consume a given gRPC service, import the grpc package from the owner (server) repo.\nSo far so good, right? But, there is one important difference especially for compiled programming languages. HTTP protocol is very established and it is very hard to imagine that suddenly something about HTTP changes. For gRPC and Protocol Buffers to work, a middleware layer for marshalling and unmarshalling is required. Furthermore, compiled programming languages require source codes be generated using Protocol Buffer compiler and language-specific plugins.\nFor this purpose, we need to make sure that gRPC source codes are generated in the pipeline as the artifacts of builds using the same versions of protoc, protoc plugins, and other tools. If you have a central pipeline, you only need to implement this functionality in one place and updating the functionality requires changes only in one place. If you have a repo per microservice and your pipeline is a yaml file in each repo, then you need to implement a modular pipeline in which you import the functionality for generating gRPC source codes from a single source of truth.\nIf building a modular pipeline is not a straightforward task, you can centralize all of your gRPC services and their generated codes in a mono repo. In the pipeline or build job for this repo, you can use the same versions of all tools you need to generate gRPC source codes and other files. At minimum, you can re-generate the gRPC source codes and other files in your build job and make sure there is no difference between those files and the files checked in by developers.\nLessons Learnt 1. DRY Don\u0026rsquo;t repeat yourself! If you have a common message that you need it in multiple service definitions, you can define it in a separate package and import it in your service definitions.\nFor example, if you are implementing health checks as gRPC requests, you can define request and response messages for health check method in a common package.\n2. Use Unique and Consistent Package Names The name of a package is part of your gRPC service definition. This means changing the name of a package will break that gRPC service definition.\nChoosing a package name for your gRPC service definition can be a bit different depending on the target programming language. You need to make sure that your package names follow the conventions for your programming languages and are consistent with your other gRPC package names.\nChoose unique names for your gRPC packages Choose singular names for your gRPC packages Distinguish your gRPC package names with a prefix or suffix (In Go, you can use PB suffix for example) 3. Implement Health Check Probes as HTTP HTTP health checks (including Kubernetes liveness and readiness probes) can be consumed easily by all external systems. So, this way you do not need to get a gRPC client to check your service health and your service implementation is more future-proof (You can still have a health check method on your gRPC service and your http health check handler makes a call to it).\n4. Use a Service Mesh for Load Balancing For loading balancing gRPC requests, use a service mesh. All major service meshes (Linkerd, Istio, and Consul) supports L7 load balancing for gRPC. Service meshes also provide observability capabilities for your gRPC calls such as metrics and tracing.\n5. Centralize Your gRPC Protocol Buffers From our experience, centralizing all gRPC service definitions in one repo works better than keeping service definitions per repo. You can use the same version (ideally always the latest versions) of protoc, protoc plugins, and other tools for generating source codes for all of your service definitions. You can also make sure all of your gRPC service definitions are consistent with respect to naming, formatting, documentation, and other conventions.\nIt is worth mentioning that all of these qualities are also achievable with per repo setup by employing enough automation and tooling.\nIdeally, you should not have any breaking change in your gRPC service definitions. But, if for any reason you need to do so, centralizing your service definitions in one repo allows you to semantically version all of your service definitions together. So, you do not need to know which version of a given package is working with which version of another package.\n6. Automate Code Generation For gRPC Do not trust your developers to generate the source codes for your gRPC service definitions! You don\u0026rsquo;t want every developer to generate source codes with their own versions of locally installed gRPC tools every time they make a change. Remember! Everything that can be automated, must be automated.\nIn your pipeline you should lint your service definitions, and generate source codes and other artifacts for your service definitions as part of your build process.\n7. Automate Your Dependency Management Automating dependency management is a best practice and is not specific to gRPC. Depending on the target programming language, gRPC needs other libraries to work. Make sure you automate updating your tools and dependencies for both generating source codes and runtime.\n","permalink":"https://milad.dev/posts/grpc-in-microservices/","summary":"I have been working with a microservices application using gRPC as the main service-to-service communication mechanism for almost a year. So, I decided to write a blog post and share my experience on how to do gRPC right in a microservices world! So, let\u0026rsquo;s get started!\nTL;DR DRY! Have a package for your common messages. Choose unique names for your gRPC packages. Choose singular names for your gRPC packages. Distinguish your gRPC package names with a prefix or suffix.","title":"gRPC in Microservices"},{"content":"TL;DR\nA container is a package format and a content addressable bundle of content addressable layers! namespaces and cgroups are two key features of Linux kernel enabling containerization. Containers running on a host share a single Linux kernel! (a singler scheduler, a single memory manager, and so on) The Linux kernel has so many known and unknown bugs! Sandboxes are a way of getting an extra layer of isolation for containers. gVisor is an OCI container runtime implementing Linux kernel API in userspace using Go. gVisor is a sandbox for containers and does not let them talk directly to the kernel. As a result, gVisor comes with a bit of performance cost. READ MORE\n","permalink":"https://milad.dev/gists/gvisor-container-runtime/","summary":"TL;DR\nA container is a package format and a content addressable bundle of content addressable layers! namespaces and cgroups are two key features of Linux kernel enabling containerization. Containers running on a host share a single Linux kernel! (a singler scheduler, a single memory manager, and so on) The Linux kernel has so many known and unknown bugs! Sandboxes are a way of getting an extra layer of isolation for containers.","title":"gVisor: Building and Battle Testing a Userspace OS in Go"},{"content":"TL;DR\nGraphQL is an API integration layer in distributed software (microservices) world. GraphQL is both a query language and a runtime for executing the queries. GraphQL solves underfetching and overfetching problems. GraphQL is strongly typed. Caching, profiling, and rate limiting are challenging with GraphQL! Schema stitching is a technique for decentralizing a GraphQL schema in microservices world. Schema stitching can be done by convention or configuration. GraphQL schemas can be completely decentralized by choreography. A pragmatic approach to enable a GraphQL API is building a centralized GraphQL gateway. Presentation\n","permalink":"https://milad.dev/posts/graphql-overview/","summary":"TL;DR\nGraphQL is an API integration layer in distributed software (microservices) world. GraphQL is both a query language and a runtime for executing the queries. GraphQL solves underfetching and overfetching problems. GraphQL is strongly typed. Caching, profiling, and rate limiting are challenging with GraphQL! Schema stitching is a technique for decentralizing a GraphQL schema in microservices world. Schema stitching can be done by convention or configuration. GraphQL schemas can be completely decentralized by choreography.","title":"An Overview of Graphql"},{"content":"Cherry is a single opinionated tool for all of your DevOps processes (build, test, release, deploy, etc.). Instead of keeping hard-to-understand and hard-to-maintain shell scripts in every repository, you can use Cherry!\nGitHub Repo\n","permalink":"https://milad.dev/projects/cherry/","summary":"Cherry is a single opinionated tool for all of your DevOps processes (build, test, release, deploy, etc.). Instead of keeping hard-to-understand and hard-to-maintain shell scripts in every repository, you can use Cherry!\nGitHub Repo","title":"Cherry: Build And Release!"},{"content":"Takeaways:\nReusing software through external dependencies has become widespread so quickly. The risks associated with software dependencies are not yet fully studied and considered. Follow the latest and best practices available for managing software dependencies. READ MORE\n","permalink":"https://milad.dev/gists/software-dep-problem/","summary":"Takeaways:\nReusing software through external dependencies has become widespread so quickly. The risks associated with software dependencies are not yet fully studied and considered. Follow the latest and best practices available for managing software dependencies. READ MORE","title":"Our Software Dependency Problem"},{"content":"This is my first post as I am getting to know Hugo and setting up my personal tech blog.\n","permalink":"https://milad.dev/posts/hello-world/","summary":"This is my first post as I am getting to know Hugo and setting up my personal tech blog.","title":"Hello World"}]