Abinav R

Designing for 429: Why LLM Rate Limits Are a Systems Problem?

Thu, 19 Mar 2026 07:15:00 +0000

Introduction

LLM adoption has reached a critical inflection point in the enterprise. While the majority of organizations rely on managed API providers to accelerate time-to-market, transitioning from experimental prototypes to business-critical production systems reveals a significant hurdle: upstream throughput constraints.

Managed LLM APIs abstract away infrastructure complexity but introduce a “black box” variable—rate limits. When multiple distributed services, background workers, and real-time user requests compete for a shared upstream quota, rate limiting ceases to be a simple client-side error and becomes a core systems-design constraint.

In this post we try and propose some solutions to the rate limiting problem when interacting with LLM providers.

Problem Statement

LLM providers typically enforce multi-dimensional limits to maintain multi-tenant stability. These are usually defined by:

RPM (Requests Per Minute): Limits the frequency of orchestrating calls.
TPM (Tokens Per Minute): Limits the actual “compute” or payload volume.
Concurrency: Limits the number of active HTTP connections.

These limits are generally existing to provide fair and consistent service to all their customers and ensure service stability.

A sample LLM Application functions something like this

In a naive architecture, scaling traffic simply increases the probability of breaching these thresholds. Without sophisticated backpressure mechanisms, a burst in traffic doesn’t just slow the system down—it triggers a cascade of failures that can compromise the user experience and system reliability. The challenge is not handling individual failures but ensuring that the whole system remains resilient

Let us see few patterns which can help ensure the resiliency

Pattern 1: Exponential Backoff

Exponential backoff is the simplest of resilient patterns to handle rate limiting, By rate limiting what the server is telling us is we cannot get any responses right now as we have exhausted the quota by a burst of traffic, So instead of constantly hitting the servers and asking it to serve we add a retry but in exponential time. That is the client waits for a progressively longer period after each subsequent failure.

However, Standard backoff is insufficient for distributed systems. If ten workers hit a rate limit simultaneously and retry on the same schedule, they create “thundering herd” spikes that re-congest the API. Introduce Jitter (randomized delay). This desynchronizes retries, smoothing out the request distribution and increasing the likelihood of hitting an open window in the provider’s bucket.

Pattern 2: Multi Model Fallback

This is also quite simple to implement, Instead of just requesting one model which has limits have multiple LLM models from the same provider or different providers, for example if we are choosing claude-opus model and it is responding with 429 Too Many Requests, We should consider requesting a different model for example a Sonnet or Haiku Model or even a lower generation Opus model.

The advantage to such a chage is that traffic is now distributed across models if one model is congested. While it maintains availability, it introduces output variance. A prompt optimized for a flagship model may produce hallucinations or formatting errors when executed on a smaller, faster model.

Pattern 3: Multi Model Fallback with Appropriate prompts

To mitigate the downside of Pattern 2, the system should store versioned, model-specific prompt templates.

Mechanism: When falling back from a high-reasoning model to a lower-tier one, the application swaps the prompt for one with more explicit few-shot examples or stricter constraints to maintain output parity.

Challenge: This increases the maintenance surface area. You are no longer managing one LLM integration, but an “N-model” matrix that requires robust evaluation (Evals) to ensure consistent behavior across all fallback paths.

Pattern 4: Event Driven Processing

By moving from a request response model we now ensure that there is no urgency to serve the response, We can control the following things by moving to a queue based model

Backpressure Absorption: The queue acts as a shock absorber for traffic spikes.

Concurrency Control: You can limit the number of active consumers to exactly match your provider’s Tier limits.

Prioritization: You can route “Premium User” requests to a priority lane while background tasks wait for available quota.

Conclusion

External AI providers are distributed systems with shared, limited capacity. To build at scale, we must treat rate limits as a first-class architectural constraint rather than an edge case.

Successful LLM platforms do not just retry; they orchestrate. They distribute traffic across providers, control global concurrency, and ensure prompts are portable enough to survive a provider’s downtime. Retries may recover from failures, but architecture determines whether those failures happen in the first place.

PS: All diagrams have been generated with the use of AI Model (Gemini)

How to Deliver Business value with AI systems

Sat, 13 Apr 2024 07:15:00 +0000

Introduction

If you also want AI systems to deliver business value, you have come to the right place. To deliver proper business value there needs to be a deep understanding of how to create reliable systems that can consistently deliver value. Having a couple of machine learning models that do well on notebooks but can’t be put into production does the opposite.

In this post, we will talk about how to think about a system that can deliver business value using machine learning effectively.

Maturity of Machine Learning

To deliver value there is a need to understand where we stand and what we need to provide. There are multiple models of the Machine Learning Operations cycle that can help us understand this scenario. Out of the different maturity models, the Azure MLops maturity model is the one I could completely relate to in increasing order of complexity and milestones.

Since we are mostly concerned with designing machine learning systems in this post the emphasis from the Azure model is on the Model creation, Model release and application integration phase. The organization aspect of things isn’t covered in this post but is also a main pillar to ensure that there are teams of different strengths coming together to build a reliable system.

Since we are concerned with bringing a machine learning model to production the emphasis will be on Model creation, Model release and integration with the application. There is another aspect of building cross functional teams of various capacity which we will not delve into detail in this blog post as that is a post for another day.

The issues with most teams across the board is that data scientist generally come from a non engineering background and hence are required to learn skills beyond their paycheck and engineers need an understanding of machine learning which again exceeds their role capacity. The end result is that we have data scientist building a model and handing over to engineers to make it into deployable software into production.

While you can refer to the machine learning maturity model here and maybe look at the images below from here. I would go on to talk about stages of deployment.

Stages of Machine Learning deployment

Minimum Deployable Product:

Like in a startup where there is a minimum viable product on which there are iterations performed we should start thinking about Minimum deployable product which can get the customer some value and is a move towards the grand utopia of no manual intervention anywhere in the ML training cycle.

So let us start thinking about how to serve simple machine learning models into production first, the LLM world brings hardware challenges which is probably too much to aspire if the organisation hasn’t deployed even simple machine learning models.

We would have a machine learning model at this point in time this can be stored at a cloud storage location or at the server. The first step is to wrap an API with endpoints that can take requests and can then return an inference value of some meaning.

For example if the problem is a classification problem just returning a vector of 0 and 1 makes no sense to the end user. This API would do the conversion to possible categories and return value to the user that can be either consumed by the frontend client for display or by other services to do some processing.

In addition to the API we should add some testing in the code unit tests to ensure that functions work at an implementation level and integration tests so that we know that it works with the entire system. How to write these tests is not our main focus now. Add a CI pipeline where we can check for builds and tests in an automated fashion whenever we push code into the repository. This should offer some basic software testing reliability to a model that has not seen production.

Automatic Retraining

Now that we have a minimum deployable version which is available as a containerised deployment in a container registry we should probably take a look at how can we achieve the next feat of making the retraining process smooth, so that in case of issues with the deployed model we can automatically get the data, perform pre-processing and then do a training of the model using the newer data.

Let us take a look at some of the components and why they would be needed at this stage.

Data Pipelines and data contracts: With data pipeline and data contracts we can fetch very reliable data that meets the input requirements of the model and the data contracts can ensure that this happens in a very type safe fashion.
Data versioning: While we do retraining it is probably a good idea to maintain the version of the data using timestamps and other metadata requirements. Tools like DVC help us maintain versions and this is essential to create a reproducible training environment if there are issues with the training or any
Training pipelines: Building Machine learning pipelines where it is probably good practice to use orchestration to run the preprocessing, training and testing as a part of single flow. By using orchestration tools we can modularize the code and recognize if there is any issues in a particular module.
Model registry: Model registry can be thought of as a metadata store for models which can help version models, provide aliases, tag and annotate them, it provides an interface which solves when multiple versions and multiple models are produced by having something unique.
Machine Learning monitoring: While a full fledged monitoring is not required an observability framework that can monitor the drifts in models, data is required and maybe an alerting system which can notify when the threshold of performance is declining can be present so that there is a manual intervention from the team on whether the models need to be retrained or not.

While these components can add complexity to the existing workflow it is necessary to remove obvious mistake when scale of model creation increases. The best way to approach this would be to have a team that can maintain these components as a ML platform and the data scientists can be their internal customer who can use the platform.

Complete Automatic Machine Learning pipeline

This stage is attained by very few companies where everything is automated and there is minimal manual involvement in making things work together. It is the utopia that people aim for when they start building an MLOps department. while there is minimal manual intervention needed in between components we would still need a team that can add or remove components perform some spring cleaning to keep the platform clean and technical requirements running up to data.

Conclusion

While the blog limits it within the maturity model, there needs to be a deep dive into each component and this can be extremely challenging. While MLOps is still not a matured field each deployment choice can have different effects on the end product. My aim with this article was to bring some thought into what are different stages and how a beginner can approach each stage depending on their specific requirements.

I havent covered the details of each component and that is something for the future. For starters I would recommend this blog series for building a minimum deployable product version and then the journey deepens quite a bit.

A deep dive into Async patterns

Sat, 30 Mar 2024 20:15:00 +0000

In the recent times I have been trying to understand a bit more about asynchronous communication between services and real time streaming. For real time streaming the Pub sub pattern seems the most important architectural pattern.

In this post I will go on a journey where we assume that we know about synchronous client server interaction to various async patterns and how each of them solve the problem of scale.

Client Server Architecture

In the world of application development which involves internet, there are basically two components one is the frontend/Client which is accessible through a browser or app where people can click or enter data and perform some actions, which is then sent across to the server (some form of a computer) that performs the logical operations on it. The data is relayed from client to server through a secure network connection.

As consumer we would want our actions also technically known as requests to be executed as fast as possible which is the right of the customer.

Synchronous Communication

While simple to implement, synchronous communication in client-server systems suffers from several limitations that impact scalability and responsiveness. Each request blocks the server, occupying resources (like RAM) until the entire logic executes. This creates a single thread of execution, meaning the server can only handle one request at a time. In multi-process systems, this forces you to allocate a dedicated machine for each customer, even if their requests take minimal processing time.

This is why Synchronous communication can be disadvantageous for dynamic and scalable systems. This also brings in

Inefficient use of compute power
Scalability bottlenecks — There is only so much machines that can be provisioned at a time

Asynchronous Communication

In an asynchronous communication pattern, the server does not immediately return a response. Instead, it initiates the request and allows the client to continue execution without waiting. The processing can then happen concurrently on the server, and the client receives a notification or the processed information at a later date.

The detail lies in the implementation and if incorrectly implemented async await operations can be slower than synchronous communication patterns.

Patterns of Asynchronous Communication

Now that we understand that asynchronous patterns are essential for building a scalable system let us try and understand different patterns of communication.

Request Reply Pattern
Publish Subscribe
Fire and Forget
Event Driven
Websockets

Request Reply pattern

The request-reply asynchronous pattern allows a client to send a request to a server without waiting for an immediate response. This is the pattern that is closest to the synchronous request reply pattern. This frees the client to continue execution on other tasks while the server processes the request concurrently. Once processing is complete, the server sends a response back to the client, often through a callback function or message queue. This pattern is particularly beneficial for tasks involving network calls, database interactions, or any scenario where the client doesn’t need to wait for the server’s response to proceed. It improves responsiveness and throughput by allowing the client and server to work independently. However, it introduces additional complexity compared to synchronous communication, as the client needs a mechanism to handle the eventual response.

This pattern is generally implemented using async/await framework and the coroutines in languages such as python.

The Publish-Subscribe pattern also known as pub sub is an alternative pattern of asynchronous communication. The advantage of pub sub over request reply is that pub sub patterns allows for a loose coupling between publishers and subscribers.

The loose coupling is established by using a message broker component which uses topics to establish connection. The message broker efficiently routes the message to all interested subscribers. This pattern forms the backbone of microservices architecture.

Some common implementations are message queues and event buses. Some of the open source implementations for usage is the RabbitMQ, Apache Kafka.

Fire and Forget Pattern

The fire-and-forget pattern is a simple approach to asynchronous communication where a sender transmits a message without waiting for a response or confirmation. This pattern prioritizes sending the message quickly and efficiently, and the sender does not track its delivery status.

It’s often used for one-way tasks like sending logs, notifications, or initiating background processes. While fire-and-forget offers simplicity and speed, it lacks the guarantee of message delivery.

Common implementations include message queues, email clients, or any system that allows sending messages without requiring feedback. However, it’s crucial to consider the potential for lost messages and design your system accordingly, especially if reliable delivery is critical.

Event Driven Pattern

The event-driven pattern structures applications around the concept of events — significant occurrences within the system. Producers (components that trigger events) publish these events, carrying relevant data about the happenings. Consumers (components interested in these events) subscribe to specific events or categories.

When an event is published, the event broker efficiently routes it to all interested consumers. Consumers then process the event information to perform their designated tasks.

This loose coupling and asynchronous nature make event-driven systems highly scalable, responsive, and adaptable to changes. Common implementations include message brokers, event buses, and pub-sub systems configured to handle event routing and delivery.

The event-driven pattern is well-suited for microservices architectures, real-time applications, and scenarios where components need to react to changes in the system without tight dependencies on each other.

Websockets

WebSockets, the persistent two-way communication channels for web applications, thrive alongside asynchronous programming patterns. Here’s how they work together:

Persistent Connection: Unlike traditional HTTP requests, WebSockets establish a long-lived connection between the client and server. This eliminates the need for repeated connection setup, improving efficiency.
Asynchronous Communication: Both client and server can send and receive messages asynchronously, meaning they don’t have to wait for a response before sending the next message. This allows for real-time data exchange without blocking the main application flow.
Event-Driven Model: WebSockets often leverage an event-driven approach. The server can push messages to the client, triggering event handlers on the client-side to react to the received data. This enables real-time updates and interactive experiences.

Conclusion

While sync patterns are easy to implement async patterns bring real scalability to code that is written to bring software products to users.

If readers notice any mistake please do mention in the comments and it will be corrected.

DSPy Programming Not prompting

Sat, 20 Jan 2024 07:15:00 +0000

Python Package management with uv

Sat, 13 Jan 2024 07:15:00 +0000

Background

Working as a Machine learning Engineer who deploys most of the stuff on the web downloading dependencies become a bottleneck at some point in time. When I started working we were using requirements.txt with versions and this was a britte setup where if by accident a version number could be changed unintentionally everything could cause issues, how to resolve this issue is when pyenv came but usability was a slight concern, then came poetry which I had been using for about 3.5 years. What was the problem that poetry solved

It came with a lock file which could be regenerated if something went wrong a pyproject.toml file which where multiple configurations could be set and modified, but with time generating the lock file and just doing poetry install seemed a bit slow as poetry is a project that is written completely in python and that meant the GIL was a limiting factor.

With PyO3 and rust bindings for python came a set of python packages which outsourced the major processing to rust and pydantic is one of the libraries that took this by rewriting their entire core library in rust and bringing some breaking changes with pydantic v2.0. So it was only a matter of time before package managers were also rewritten in rust. While the environment was right came out uv from astral which had some success stories of writing some key pythonic linting software such as ruff.

What is uv?

uv is basically a drop in replacement for poetry and is incredibly fast. The documentation at uv says that it might be 10-100x faster. From anecdotal evidence it reduced our build times especially requirements installation by atleast 2-3x.

How to use uv?

Installation

Follow this documentation to install the uv according to your operating system

Project usage

A new project can be created with uv init if the repo already exists or if the repo doesnt exist then you can create a new project with uv init <reponame>. There are two types of projects that can be created

applications
Libraries

a uv init creates an application project by default and we have use uv init --lib for the library project.

with project creation the following files are created

pyproject.toml
.python-version
README.md
hello.py

if the code needs to be a python package we can use uv init --package to create the corresponding build system and utils in pyproject.toml.

For further information on usage please refer documentation

Adding dependencies

How Phone camera has destroyed the feeling of nostalgia? (Rant)

Mon, 11 Sep 2023 23:45:00 +0000

I have been on social media actively lurking around and liking posts, my posts are the occasional photo dump. I have never posted a instagram story nor do I intend to do anyday soon. Recently I saw that many of my real life friends go to a concert and then take a video of the concert while the performance is on. This somehow makes me want to grab phones of the audience and throw it away.

Now I am not an unreasonable man who is saying you shouldnt take a 30s clip of a song to show your friends that you are in a concert. But off late I see a rush of stories or status that are filled with 30s of each song performed. Now I may sound old and like a boomer but what happened to just enjoying the performance and beauty of it. From my limited interactions with people who do this their justification is that We are saving this for memories.

My retorting question is if you save it for the future by capturing it on a phone now, are you not living in the present? which brings me to the major opposition point to such behavior. The mankind has always lived on stories and nostalgia of some events from the past. When we meet for parties, dinner etc the conversation generally revolves around such memories and I feel not capturing all of them on a phone allows for some creative liberty in life. You can always add superlative adjectives and make your friends, family and acquaintances jealous of the fact. But if you do show some proof they can come to a judgement call about your memory and can easily spoil it for you.

Over exaggeration while story telling brings character and that is being lost when every moment is captured on a phone and the actual events are being replayed and then arguments happen with evidence of the event. While this is legally quite helpful having a camera in the fun moments kind of spoil the feeling of nostalgia and imposes limits of creativity on the retelling of the event to a person.

My first Prefect ETL Pipeline

Sat, 09 Sep 2023 23:45:00 +0000

Problem Statement

There is a business case where the organization is dependent on an API to fetch the data and luckily the data can be stored and used for downstream analytics later. This API costs a lot and the data also needs to be stored securely. But the idea is to not write a script which is put together of all the API calls panda conversions etc that will fail on the drop of a hat because it is meant as a one time usage

Requirements

The script should

re runnable
configurable
fast velocity of development
easy to maintain.

Tasks

Create a function that fetches the data from the API
Convert the API response into a dataframe
Store the dataframe in a database
Create a pipeline that does this in a sequence.

Solution

We are going to build the entire thing using prefect and python. I tried my hand at airflow but it just seemed so complex with binaries etc that prefect documentation seemed like a breeze.

0. Install prefect

Install this inside a vitual environment

pip install -U prefect

Code snippets

1. Create a function that fetches data from API

Since the API call that I have to make is a post API call, the input data to be sent in the request is stored as a modifiable content in a json and then loaded during the task phase. This would be a task in the prefect universe which is a smaller brick in the larger pipeline represented as a flow.

import requests
import json

def fetch_data(url, data):
    response = requests.post(url, data=json.dumps(data))
    return response.json()

2. Convert the API response into a dataframe

Once the response is obtained as json convert it into a dataframe by using pandas to concat if the dataframe is not empty and if empty just create a dataframe from json

3. Dumpt the Data to a Postgresql database using SQLAlchemy and pandas

Create a engine for dumping the data using SQLAlchemy and then use df.to_sql() method from pandas to just dump the data into the postgres table

Putting the above parts together

steps 1,2 and 3 are individual tasks in the pipeline, now just write a simple python function which will basically execute this in sequence and then add a @flow decorator to the top of the function something like this

from prefect import flow, task

@task
def fetch_data(url, data):
    ...

@task
def convert_to_df(response):
    ...

@task
def dump_to_db(df):
   ...

@flow
def my_etl_pipeline():
    response = fetch_data(url, data)
    df = convert_to_df(response)
    dump_to_db(df)

Now all that needs to be done is to run this script as if running a python script and this accomplished ETL and the localhost at port 4200 gives the dashboard of the task execution etc. Such a simple way to run simple workflows

Private Python packages on Github and Gitlab

Sun, 19 Feb 2023 23:45:00 +0000

If we work in an industry setting we need private packages for using as libraries to share across teams working on same project. To do that we can use package registry in github and gitlab.

GitLab

Requirements

A deploy token pair with username and deploy key from gitlab repository
A repository for this purpose
poetry package management

Steps

Create a repository with the name of the package and then put all the code inside the repository
Create a __init__.py file inside the repository and import all the classes and functions that needs to be exposed to users of the package
pip install poetry to install poetry in the virtual environment
Import the dependencies for the package by using poetry add <package name>
To ensure that package is maintained properly write tests to keep tabs on functionality
Once everything is done then run the following three commands in sequence to build and push the package to gitlab repository
- poetry config repositories.<variable_name> https://gitlab.com/api/v4/projects/<project_id>/packages/pypi
- poetry build
- poetry publish --repository <variable_name> -u <gitlab_deploy_token_name> -p <gitlab_deploy_token>

Now the pypi package is available for internal usage and can be installed by

pip install <package_name> --no-deps --index-url https://<gitlab_deploy_token_name>:<gitlab_deploy_token>@gitlab.example.com/api/v4/projects/<project_id>/packages/pypi/simple

GitHub

I would like to thank my friend Abhijeet Parida for providing a template with a very beautiful readme to accomplish this task in Github

Extensions

Ideally should use a ci pipeline to create the packages so that there can be a versioning that can be automatically done

References

https://docs.gitlab.com/ee/user/packages/pypi_repository/
https://docs.mpcdf.mpg.de/doc/data/gitlab/devop-tutorial.html
https://github.com/a-parida12/poetry_pypi_template

Common Optimization Algorithms

Thu, 16 Feb 2023 21:45:00 +0000

The Learning Process

An important question to ask is what does it mean to learn in Machine Learning/Deep Learning? The answer is simple the goal of a Machine Learning model is to represent the knowledge of most data points with the help of a function which can either be linear or non linear. So how do we decide which function to choose to represent the data points? We use weights and biases to represent the function and randomly assign them a value, then we go on to then iteratively modify the weights with the help of cost function/loss function. When we minimize the cost function we then bring the prediction function closer to the data points.

How to go about the minimization process? For this we use the optimization algorithms such as gradient descent, Stochastic gradient descent etc

The Gradient Descent Algorithm

The gradient descent algorithm is a first order optimization method which means that the highest order of derivative is first derivative. For gradient descent to be applied on the function there are two basic requirements. The function must be

Differentiable - Derivative exists in all points of the domain of concern
Convex - The second derivative is a positive value i.e. the function has one minimum value

What is a gradient?

In univariate calculus it is a derivate of the function and in multivariate calculus it represents the vector of derivatives with respect to a certain direction.

Algorithm

The gradient descent algorithm is an iterative process (happens in a loop) until convergence is achieved. In layman terms consider moving down a mountain gradient descent can be said as the shortest number of steps that would be required to reach the valley. The pace at which we can get down is limited by the length of the steps and this is analogous to learning rate. If I were 30 ft tall I will reach in 5 steps opposed to 25 odd steps if I were 6ft tall.

Please find a simple implementation of gradient descent in python below

import numpy as np

def gradient_descent(start, gradient, learn_rate, max_iter, tol=0.01):
  steps = [start] # history tracking
  x = start

  for _ in range(max_iter):
    diff = learn_rate*gradient(x)
    if np.abs(diff)<tol:
      break    
    x = x - diff
    steps.append(x) # history tracing

  return steps, x

The code is from this blog

Disadvantage.

Though the approach is quite clean cut, since we are dealing with a large number of samples the calculation of the next step after going through all samples is very computation intensive. This is where there is a need for stochasticity and working in smaller batches and iterations. This method of performing gradient descent for random batches is known as Stochastic Gradient Descent.

Momentum

ADAM optimization

Other optimization algorithms

Cost Functions and Loss Functions in Machine Learning

Tue, 14 Feb 2023 17:45:00 +0000

Goal of Machine Learning

Machine Learning or Deep Learning can be viewed as function approximation based on data. What this means in simpler terms is that given a set of data points the aim is to get a function that can represent each of the data point very well and this function generalizes across many such similar data points.

A simple polynomial curve fitting example is shown below. The idea of machine learning is to achieve something similar to left bottom figure so that there is a little bit of generalization or extrapolation that can be done if a sample comes from similar distribution.

[1]

In case of the first diagram

Loss Function

Loss function is the function that can depict the error between the predicted function value at a data point and the actual value. Loss functions are measured over single datum points.

Cost Function

Cost function also denotes the error between predicted value and actual value but for a batch of data values.

Mathematical properties of Loss Function

For a loss function these mathematical properties are important

Differentiable in real domain completely
Possibly convex to ensure that there is a minimum, but in case of deep learning the loss function landscape will generally not be convex and we will have to do with the local minimum.

Common Loss Functions

Cross Entropy Loss

Cross entropy loss is generally used for classification problems and can be expressed mathematically as

Multi label classification

\[L = \frac{1}{m} \sum_{i=1}^{m} y_i log(\hat{y}_i)\]

Binary classification

\[L = \frac{1}{m} \sum_{i=1}^{m} (y_i log(\hat{y}_i) + (1 - y_i log(1 - \hat{y}_i)))\]

where $y_i$ represents the label and $\hat{y}_i$ is the prediction for $m $ samples

Hinge Loss

This loss function was developed for the SVM classification methods

\[L = max(0, 1 - y* f(x))\]

Mean Squared Error

Mean Squared error is generally used for regression based problems and can be represented mathematically as

\[L = \frac {\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{n}\]

Basically computes the distance between the sample and prediction and then averages it over n samples

Adversarial Loss function

Adversarial Loss is used in case Generative Adversarial networks where the discriminator and generator fight a zero sum game against each other.

\[\min_{G}\max_{D}\mathbb{E}_{x\sim p_{\text{data}}(x)}[\log{D(x)}] + \mathbb{E}_{z\sim p_{\text{z}}(z)}[1 - \log{D(G(z))}]\]

where G is the Generator and D is the discriminator

References

Figure 1.4, Pattern Recognition and Machine Learning, Christoher Bishop

Abinav R

Designing for 429: Why LLM Rate Limits Are a Systems Problem?

Introduction

Problem Statement

Pattern 1: Exponential Backoff

Pattern 2: Multi Model Fallback

Pattern 3: Multi Model Fallback with Appropriate prompts

Pattern 4: Event Driven Processing

Conclusion

How to Deliver Business value with AI systems

Introduction

Maturity of Machine Learning

Stages of Machine Learning deployment

Minimum Deployable Product:

Automatic Retraining

Complete Automatic Machine Learning pipeline

Conclusion

A deep dive into Async patterns

Client Server Architecture

Synchronous Communication

Asynchronous Communication

Patterns of Asynchronous Communication

Request Reply pattern

Publish Subscribe Pattern

Fire and Forget Pattern

Event Driven Pattern

Websockets

Conclusion

DSPy Programming Not prompting

Python Package management with uv

Background

What is uv?

How to use uv?

Installation

Project usage

Adding dependencies

How Phone camera has destroyed the feeling of nostalgia? (Rant)

My first Prefect ETL Pipeline

Problem Statement

Requirements

Tasks

Solution

0. Install prefect

Code snippets

1. Create a function that fetches data from API

2. Convert the API response into a dataframe

3. Dumpt the Data to a Postgresql database using SQLAlchemy and pandas

Putting the above parts together

Private Python packages on Github and Gitlab

GitLab

Requirements

Steps

GitHub

Extensions

References

Common Optimization Algorithms

The Learning Process

The Gradient Descent Algorithm

What is a gradient?

Algorithm

Disadvantage.

Momentum

ADAM optimization

Other optimization algorithms

Cost Functions and Loss Functions in Machine Learning

Goal of Machine Learning

Loss Function

Cost Function

Mathematical properties of Loss Function

Common Loss Functions

References